Skip to content

Instantly share code, notes, and snippets.

@hellska
Last active August 20, 2016 08:04
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save hellska/641c9bc04688ed13a4b8c5f72a610954 to your computer and use it in GitHub Desktop.
Save hellska/641c9bc04688ed13a4b8c5f72a610954 to your computer and use it in GitHub Desktop.
GSoC 2016 - Dataset Creation Toolkit
The Dataset Creation Toolkit GSoC Project
This summer I worked for the Metabrainz foundation to add some functionality to the Acousticbrainz server and client. The project developed during the Google Summer of Code period is still a work in progress, but part of it is already published in the official repository.
The basic idea of the project is contained in the project proposal published in the GSoC website and in the metabrainz community forum at this link:
https://community.metabrainz.org/t/gsoc-2016-acousticbrainz-dataset-creation-toolkit/10583
The main code is contained in the Pull Request 189 that is waiting for merging into the master branch of the Acousticbrainz server, this code permit to use the Acousticbrainz client to submit Datasets that contains samples without MBID and ask Messybrainz to generate an MessybrainzID (msid) in order to keep track of the fact that the recording is not related to any MBID. The detail of the PR and the realtive comments are available at this link:
https://github.com/metabrainz/acousticbrainz-server/pull/189
The link with all the commits:
https://github.com/hellska/acousticbrainz-server/commits/dataset_creation_toolkit?author=hellska
In the course of the project I had to perform a major schema change to accept different kind of uuid, this code add a field to the lowlevel table to mark the gid type, at the moment we accept only two gid types:
1 Musicbrainz IDs (mbid)
2 Messybrainz IDs (msid)
The field named gid_type is created as an enum to have the possibility to extend the types of gid accepted in the future. The code relative to this task is already published in the main repository and the details can be foud here:
https://github.com/metabrainz/acousticbrainz-server/pull/194
The link with all the commits:
https://github.com/hellska/acousticbrainz-server/commits/submission_type?author=hellska
I also performed a very simple correction of the vagrant VM installation process submitted in another Pull Request, this simple fix was necessary to use the new version of the waf tool to install some relevant libraries in the vagrant VM, the libraries are Essentia and Gaia. The code is contained in this pull request:
https://github.com/metabrainz/acousticbrainz-server/pull/191
This link shows the commit for this specific task:
https://github.com/hellska/acousticbrainz-server/commits/fix_hl_extractor_install?author=hellska
The client side has also been modified to permit the submission of datasets of items without MBID and all the code writte is collected in another pull request that can be seen in details at the following link:
https://github.com/MTG/acousticbrainz-client/pull/42
The link with all the commits:
https://github.com/hellska/acousticbrainz-client/commits/dataset_creation_toolkit_client?author=hellska
This is all the code written during the GSoC period and the project is still a work in progress.
:Dan
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment