Ongaku

Overview

Ongaku is a method for creating playlists programmatically, using the content of the song alone. It uses gammatone cepstral analysis to create unique matrices to represent each song. A gammatone cepstrum is similar to the more common spectra used for audio analysis. Instead of doing a Fourier transform, we do a reverse Fourier transform, and then apply a transformation according to the gammatone function. This function was designed to mimic the signals sent to the brain through the cochlear nerve, the nerve which connects the ear to the brain.

The gammatone cepstra are then compiled together into a corpus, which is used for manifold learning (using sklearn). This creates a metric space for the songs in the library. The manifold learning process needs to be tuned to idealize the playlist outputs, but this is something which is difficult to define mathematically. I’ve had good results with n_components = 45 but your mileage may vary. I’ve defined a few rudimentary metrics which can be optimized over as well. We can draw shapes in this metric space, to define playlists.

The code will be documented in full at readthedocs

Analysis

The analysis module is full of pre-processing methods to turn a song or song library into a gammatone cepstrum or gammatone corpus. It does also have tools for constructing a Fourier spectrum corpus, but the main usage is intended for gammatone cepstrum corpora. Corpora production has been parallelized in the preprocess function, but the default is to do this with a pool_size = 2 due to the fact that each gammatone analysis takes about 5GB of RAM to complete. In the event that I get around to building a gammatone function that isn’t a MATLAB port, this may change, but for now, only increase pool_size if you know your machine can handle it. preprocess, as the main workhorse, will put a corpus.pkl in your working directory, which will be needed for the learning and construction stages.

Learning

The learning module implements a couple of manifold learning techniques, and is fully compatible with sklearn, so it should interact well with any other pipelines. The gammatone cepstra can be compiled into a corpus, and then used for manifold learning using the cropped_corpus function, and then the flatten_corpus function, whose output is safe to use for generalized sklearn operations. The learning module also contains the generate_m3u function, which takes a list of tags and generates a .m3u file which represents the location of the specified songs on your computer, being a playlist which is compatible with all major music players.

Metrics

The metrics module contains a number of naïve metrics to measure the “compactness” and the “disjointness” of the resultant manifold. They, while useful to compare different manifold generation methods to each other, don’t give a great idea as to how to tune metaparameters, since they’re really just measures of the degree to which the curse of dimensionality is affecting your data.

Playlists

The playlists module contains some algorithms for generating playlists from input songs and the manifold data frame. They mostly involve drawing geometric shapes on the manifold and sorting the songs within those n-volumes by distance from some object. For example, the distance playlist draws an n-circle around the input song, with radius equal to the distance between the input song and the m-th closest song.

Analysis

analysis.corpus_tag_generator(song_loc)

This looks at a file’s metadata and turns it into a corpus tag for later usage.

Parameters:song_loc – str the file location
Returns:str the tag as used by the learning parts of the system.
analysis.create_location_dictionary(lib, tags=None)

Takes everything in the library and adds it to a location dictionary stored in locations.pkl.

Parameters:
  • lib – list of song locations
  • tags – list or NoneType if you already have the tag, then you don’t need to generate it again.
Returns:

NoneType

analysis.gt_and_store(song_loc, locale='cepstra\\')

This calculates the gammatone cepstrum, pickles it, and drops it in a designated folder. Default is a folder called cepstra. This acts like a worker function, so it doesn’t return anything.

Parameters:
  • song_loc – str filepath
  • locale – str folder to drop cepstra in
Returns:

NoneType

analysis.library_addition(library, target, locale='D:\\What.cd\\', filetype='.flac')

A utility for recursively adding files of type filetype to a list. Stores them as a list of their full location.

Parameters:library – list the library you’re adding to.

Note that it edits library in-place so don’t drop an unnamed object in there.

Parameters:
  • target – str the top of the tree. do ‘’ if you want this to be your locale.
  • locale – str the location for the system to start from. Should be the location of your target.

Default is my music library’s location.

Parameters:filetype – str The file extension you want this to work on.
Returns:NoneType
analysis.library_from_regex(target_regex, library_locale='D:\\What.cd\\')

Takes in a regex, and a pointer to your music library and compiles a list of song locations from it.

Parameters:
  • target_regex – re.Pattern
  • library_locale – str
Returns:

list

analysis.make_spect(filepath, method='fourier', height=60, interval=1, verbose=False, max_len=1080)

Turns a file containing sound data into a matrix for processing. Two methods are supported, fourier spectrum analysis, which returns a spectrogram, and gammatone which returns a gammatone quefrency cepstrum. Gammatones take much longer, but are ostensibly better for feature analysis, and are smaller. Spectrograms are big but don’t take very much time to create.

Parameters:
  • filepath (str) – path to file
  • method (str) – ‘fourier’ or ‘gamma’
  • max_len (num) – the maximum length in seconds of a song to convert. Important for memory management.

Default is a half-hour

Parameters:
  • height (int) – for gammatones, how many quefrency bins should be used. default 60.
  • interval (int) – for gammatones, the width in seconds of the time bins. default 2.
  • verbose (bool) – toggles behavior showing a plot of the returned ‘gram.
Returns:

np.array

a matrix representing (in decibels) the completed analysis.

analysis.preprocess(target_regex, library_locale='D:\\What.cd\\', pool_size=2)

This runs `gt_and_store()` on every file which is in a folder that matches with target_regex. Some notes about running this on a personal computer. If you have more than 16 GB of ram, you should be fine. If you have 16 or less, Be prepared for the spin-up to lag your computer. It should stabilize after a while once the processes get out of sync. Also creates a dictionary that relates corpus tags to their file location.

Parameters:
  • target_regex – re.compile a regex of the things you want. Might be long and full of pipes.
  • library_locale – str the location of your music library.
Returns:

a list of successes and failures for if something went wrong with a song.

analysis.song_name_gen(fname: str)

A utility function which takes a file name and strips out all the stuff that’s probably not the song title.

Parameters:fname – str filename to work on
Returns:str a cleaned up title.

Learning

learning.create_tag_dict(lib, loc='/home/docs/checkouts/readthedocs.org/user_builds/ongaku/checkouts/latest/locations.pkl')

Makes a dictionary that relates the tags to associated filename.

Parameters:
  • lib – list contains all of the filenames.
  • loc – str file to dump the tag dictionary in if you want to avoid doing this more than once.
Returns:

learning.cropped_corpus(corp, tar_len=90, pad_shorts=False)

Takes in a corpus and crops out the middle tar_len seconds. Default is a minute and a half. If pad_shorts is True, then it’ll pad the shorter songs with -inf.

Parameters:
  • corp – dict
  • tar_len – int MUST BE EVEN
  • pad_shorts – bool
Returns:

dict

learning.flattened_corpus(corp)

Takes in a corpus and flattens it out so that the 2D gammatone cepstra is a single vector representation.

Parameters:corp – dict
Returns:dict
learning.generate_m3u(tags, title, reference, locale='playlists\\')

Takes a list of corpus tags and turns it into a playlist (.m3u).

Parameters:
  • tags – list of tags
  • title – name of plist
  • reference – dict tag dictionary, default just runs load_tag_dict()
  • locale – str place to dump your playlist
Returns:

learning.load_corpus(loc='cepstra\\', precompiled=False)

Generates a corpus for machine learning from your preprocessed cepstra. Location should be the same folder you used for the analysis.py run. Returns a dict with keys being the ‘song code’ as made by the analysis.corpus_tag_generator function.

Parameters:
  • loc – str directory where the spectra are.
  • precompiled – bool triggers whether or not it should load the corpus from a pickle file or the cepstra folder
Returns:

dict

learning.load_tag_dict(loc='/home/docs/checkouts/readthedocs.org/user_builds/ongaku/checkouts/latest/locations.pkl')

Loads the tag dictionary, and returns it.

Parameters:loc – str location of pkl
Returns:
learning.make_manifold(processed_corp, pipeline=Pipeline(memory=None, steps=[('reduce_dims', PCA(copy=True, iterated_power='auto', n_components=None, random_state=None, svd_solver='auto', tol=0.0, whiten=False)), ('embedding', Isomap(eigen_solver='auto', max_iter=None, metric='minkowski', metric_params=None, n_components=45, n_jobs=None, n_neighbors=5, neighbors_algorithm='auto', p=2, path_method='auto', tol=0))], verbose=False))

Uses sklearn to construct a manifold data frame. You can use whatever pipeline you like, but the default is PCA into Isomap with 45 components, I’ve had good success with this value. :param processed_corp: dict :param pipeline: sklearn.pipeline.Pipeline :return: pd.DataFrame

learning.padded_corpus(corp)

Takes in a corpus and pads it out to the length of the longest song. -inf padding.

Parameters:corp – dict corpus
Returns:dict padded corpus

Metrics

metrics.album_cohesion_score(xformd_songlist, corp)

Calculates an average of the pairwise distance metric across albums. For use in an sklearn scoring system. :param xformd_songlist: pd.DataFrame input :param corp: dict original corpus :return: float

metrics.album_metric(mdf)

Groups a manifold database by album and then calculates the average pairwise distance per album, divided by log dimensionality.

Parameters:mdf – pd.DataFrame
Returns:dict
metrics.album_xdsd(mdf)

Returns the cross-dimensional standard deviation for each album.

Parameters:mdf – pd.DataFrame
Returns:dict
metrics.album_xdsd_score(xformd_songlist, corp)

Calculates an average of the cross-dimensional standard deviation metric across albums. For use in an sklearn scoring system. :param xformd_songlist: pd.DataFrame input :param corp: dict original corpus :return: float

metrics.artist_cohesion_score(xformd_songlist, corp)

Calculates an Average of the pairwise distance metric across artists. For use in a sklearn scoring system.

Parameters:
  • xformd_songlist – pd.DataFrame input
  • corp – dict original corpus
Returns:

float

metrics.artist_metric(mdf)

Groups the corpus by artist and then calculates the pairwise distance per artist. Divided by log dimensionality.

Parameters:mdf – pd.DataFrame
Returns:dict
metrics.artist_xdsd(mdf)

Calculates the cross-dimensional standard deviation by artist.

Parameters:mdf – pd.DataFrame
Returns:dict
metrics.artist_xdsd_score(xformd_songlist, corp)

Calculates an average of the cross-dimensional standard deviation metric across artists. For use in an sklearn scoring system. :param xformd_songlist: pd.DataFrame input :param corp: dict original corpus :return: float

metrics.avg_album_metric(mdf)

Returns the mean of the album metrics.

Parameters:mdf – pd.DataFrame
Returns:float
metrics.avg_album_xdsd(mdf)

Calculates an average of the cross-dimensional standard deviation across albums.

Parameters:mdf – pd.DataFrame
Returns:float
metrics.avg_artist_metric(mdf)

Returns the mean of the artist metrics.

Parameters:mdf – pd.DataFrame
Returns:float
metrics.avg_artist_xdsd(mdf)

Calcuates an average of the cross-dimensional standard deviation across artists.

Parameters:mdf – pd.DataFrame
Returns:float
metrics.corpus_cohesion(mdf)

This is a metric for the average pairwise distance for the whole corpus. Divided by the log dimensionality.

Parameters:mdf – pd.DataFrame
Returns:float
metrics.corpus_xdsd(mdf)

I call this the xd_sd, it’s a cross-dimensional standard deviation, it’s essentially the average of the standard deviations in each dimension.

Parameters:mdf – pd.DataFrame of songs defining manifold location
Returns:float the xd_sd
metrics.corpus_xdsd_score(xformd_songlist, corp)

Calculates an average of the cross-dimensional standard deviation metric across the whole corpus. For use in an sklearn scoring system. :param xformd_songlist: pd.DataFrame input :param corp: dict original corpus :return: float

metrics.group_albums(mdf)

Groups a manifold dataframe by album.

Parameters:mdf – pd.DataFrame
Returns:dict
metrics.group_artists(mdf)

Groups a corpus by artist.

Parameters:mdf – pd.DataFrame
Returns:dict

Playlists

playlists.abs_dist_playlist(tag, manifold_df, length=5, metrics=False)

Takes in two song tags, and the manifold data frame, and creates a playlist of the input song, and the length nearest neighbors, in order of distance from the input song.

Parameters:
  • tag (str) – The starting song
  • manifold_df (pd.DataFrame) – manifold data frame
  • length (int) – desired length of playlist
  • metrics (bool) – Toggles printing the playlist to the console.
Returns:

playlists.generate_m3u(tags, title, reference, locale='playlists\\')

Takes in a corpus, and a list of tags, and generates a playlist file, which gets dumped in the locale.

Parameters:
  • tags (list) – The songs you want to target.
  • title (str) – the name of the created playlist file.
  • reference (dict) – a dictionary that relates tags to locations
  • locale (str) – the folder to dump playlists in.
Returns:

NoneType