Skip to content

Instantly share code, notes, and snippets.

@shivam-tripathi
Last active March 25, 2018 05:33
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save shivam-tripathi/b3868d71a60ddbbe5842e67503686a31 to your computer and use it in GitHub Desktop.
Save shivam-tripathi/b3868d71a60ddbbe5842e67503686a31 to your computer and use it in GitHub Desktop.

Timeline:

Community Bonding (April 24 – May 13)

In this period I aim to familiarize myself more with metabrainz community and it's projects. I will talk with the bookbrainz users and metabrainz community members and know what all they would like to see improved or additionally implemented in my gsoc project. Additionally, I will write some documentation and FAQs for the project and edit entries in BookBrainz and MusicBrainz data. I will discuss with the mentor and confirm the implementation details, based on which I will make relevant changes to the proposed plan, adding, removing or improving as required.

Phase 1 (May 14 – June 11)

Week 1:

Start working on Milestone 1. Write SQL queries reflecting schema changes in bookbrainz-sql and write relevant functions to access the newly created import object in bookbrainz-data.

Week 2:

Start working on Milestone 2. Prune the dumps to a manageable size, while maintaining diversity. Draw a basic layout of the data dumps and propose a generic data object to be developed from the dumps to be used later by import script.

Week 3:

Write scripts to pull data out of openlibrary.org and Library of Congress dumps and create previously decided data object.

Week 4:

Write scripts to feed the generated data object to the database. Develop the import endpoint at bookbrainz-site.

Phase 2 (June 12 – July 9)

Week 1:

Buffer week. Catch up if the data import is lagging behind, document code, fix bugs, write tests and clean up the code. Take reviews from the mentor and make relevant changes. Unit tests will be written using mocha.js and chai.js.

Week 2:

Start working on Milestone 3. Update the elasticsearch indexing to make the imports appear in the search results. Start working on the review page.

Week 3:

Finish off the review page. Finish the imported entity page.

Week 4:

Start working on the ‘edit and approve’ page of the imported entities. This will include issuing warnings to the user if the name of imported entity already exists.

Phase 3 (July 10 – August 6)

Week 1:

Buffer week. Catch up if the milestone 3 is lagging behind, fix bugs, document code, write tests and clean up the code. Take reviews from the mentor and make relevant changes. Unit tests will be written using chai.js and mocha.js, as already being used in bookbrainz-site.

Week 2:

Start working on Milestone 4. Set up and configure the project (transpiler, linting, bundler etc.). Write the UI rendering modules.

Week 3:

Start working on userscripts for goodreads.com, and if time permits for bookdepository.com and amazon.com.

Week 4:

Clean up the code and write documentation. Discuss with the mentor relevant changes before the final submission of the work.

Post GSoC

I will continue to contribute to the BookBrainz project, and also try my hands at other *brainz projects. I wish to learn from the MetaBrainz projects all about “the art of data hacking”.

Alternate ideas

Adding relationships in the imports Present infrastructure to add relationship between entities cannot be used with imports, as they rely on BBIDs to mark entities. Also, during upgrading the entities to imports object, there needs to be some manner through which we can manage relationships between the upgraded import and yet to be upgraded import. Another design question is whether we allow adding of relationship between an entity existing inside the database and an imported object, and if yes how we manage it.

One possible solution would be to use UUIDs for imports as well, and add column(s) in the relationship as well as edition tables to especify if the UUID specifies an entity or an import. That way, if an import already exists as an entity, we can add UUID and column value stating it is an entity type UUID. If an import is upgraded, all the relations containing import type UUID will be found and upgraded with the new entity type UUID. This can be better understood by the diagram:

proposed|690x385

Figure: Proposed relationship management between entity and imports

In this proposal, I am not including relationships between the imports. This is in alignment with the idea that the importing object is nothing but a temporary place for the data to reside, and proper place for the data to reside is the entity object where it can have all the relationships and revisions.


Detailed information about yourself

I am an undergraduate student of Computer Science and Engineering at the Indian Institute of Information Technology, Una. I learned about GSoC from a friend.

Tell us about the computer(s) you have available for working on your SoC project! I own an Apple MacBook Air Core i5 5th Gen (8 GB/128 GB SSD) running as of now Mac OS High Sierra 10.13.1. To handle the data dumps, I have ordered Seagate 2 TB Wired External Hard Disk Drive. It will arrive before the program begins.

When did you first start programming? I started writing code in Java in my secondary school. I picked up some Python and C++ in my freshman year. Recently I have started working in javascript.

What type of music do you listen to? (Please list a series of MBIDs as examples.) If applying for a BookBrainz project: what type of books do you read? (Please list a series of BBIDs as examples. (And feel free to also list music you listen to!)) The choice of music mostly depends on the mood. However, I do listen a lot of pop (especially 80’s/90’s), house (progressive and deep mostly), sometimes hip-hop and a little rock here and there. I love to read. Some of my favourite books are GBS’s Pygmalion, GO’s 1984 and CD’s A tale of two cities. I also follow comics, American Vampire being one of my favourites.

What aspects of the project you’re applying for (e.g., MusicBrainz, AcousticBrainz, etc.) interest you the most? As I mentioned, I love to read; so BB is the obvious choice. On top of that, I am a staunch believer of open data for inclusive education.

Have you ever used MusicBrainz to tag your files? I have tried Picard.

Have you contributed to other Open Source projects? If so, which projects and can we see some of your code? I have contributed to GNOME photos and OpenDataKit in the past.

What sorts of programming projects have you done on your own time? I have written a multi-threaded event driven server using libevent with self designed application layer protocol to communicate and transfer files. I have worked for a joint research program undertaken by IIT Kanpur and Nissan-Renault to develop a 360 degree vision driver assistance tool. It was written in C++. My role was to integrate and display the data (Ip-camera feed and other relevant data) into the existing system, which could be later used for different forms of analysis. I recently wrote a very simple cryptocurrency in node using express.js framework with a simple HTTP API.

How much time do you have available, and how would you plan to use it? I plan on putting in at least 40 hours a week on this project.

Do you plan to have a job or study during the summer in conjunction with Summer of Code? I have no other obligations during the period.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment