Skip to content

Instantly share code, notes, and snippets.

@kattak
Last active May 3, 2022 07:18
Show Gist options
  • Save kattak/71ad77f214ca9c5a71f31454c01cceab to your computer and use it in GitHub Desktop.
Save kattak/71ad77f214ca9c5a71f31454c01cceab to your computer and use it in GitHub Desktop.

###Big Questions for search optimization

How many queries? Can get (and parse) full lyrics? Can store/serialize lyrics? How to store once acquired? Restricted list Possible queries and benchmarks from existing APIs

Crawler Search and ranking already done when you do a search. To make search so fast, the search layer just returns the next 10 results from a list starting at position 1.

Issue, allowed to cache?

But can do the work of crawling song lyrics beforehand hmmm

#Crawling four core activities in a search engine: crawling, indexing, ranking and query serving.

Crawler/Spider

ex: https://github.com/postmodern/spidr

##Indexing an index is simply a collection of lists of results for keywords and N-grams (phrases). It is all about packing this collection efficiently, and making lookup and insertion of new entries fast and efficient, keeping in mind that they trade off against each other.

##Ranking Elastic Search

  • Conventional SQL database managements systems aren't really designed for full-text searches, and they certainly don't perform well against loosely structured raw data that resides outside the database. On the same hardware, queries that would take more than 10 seconds using SQL will return results in under 10 milliseconds in Elasticsearch.

https://github.com/elastic/elasticsearch-ruby

--

##Query Serving -Spotify gems - Ruby -Ranking factor: likes, clicks, spotify plays -Easiest? likes

####How to deal with JSON?

Optimized JSON https://github.com/ohler55/oj

either use an additional gem or cherrypick methods only portions of the JSON are of interest (title/pre-existing song in spotify playlist)

####How is API data callbacks stored vs.

####Easiest

  • Lyrics API that already allows searching by keyword

To check if fastest:

  • API benchmarking

###Music APIs Lyrics: https://github.com/rhnvrm/lyric-api

MusicGraph!

LastFM: http://www.last.fm/zh/api

cannot get lyrics

AZLyrics

##Music Lyrics APIs

  • SongMeanings API SongMeanings is a song lyric and meaning platform that allows users to follow music artists, search for song lyrics, and song meanings. The SongMeanings API by Echonest allows developers to access and integrate the functionality of SongMeanings with other applications. The main API method is retrieving song meanings by song ID numbers. https://www.programmableweb.com/api/songmeanings - song ID numbers

  • Lyricsfly API

-Lololyrics

-LyricsWiki API: https://www.programmableweb.com/news/lyricwiki-api-get-song-lyrics-code/2008/03/31

-GENIUS API: https://genius.com/developers Millions of annotations, millions of songs, every page on the internet — build the next great app with Genius and the Genius API.

https://github.com/timrogers/genius/blob/1bff3a47762bdee9fefcf0d7f6e194241f5ec06a/spec/genius/song_spec.rb

Official Ruby gem, can search annotations but not lyrics? unclear

-Ruby Echnoest https://github.com/youpy/ruby-echonest analysis = echonest.track.analysis(filename)

"lyrics_copyright":"Lyrics powered by www.musiXmatch.com. This Lyrics is NOT for Commercial use and only 30% of the lyrics are returned." https://developer.musixmatch.com/plans (have plans for students) -Limited to 2000 API calls daily, 30% of lyrics, -Matcher.lyrics.get https://playground.musixmatch.com/#!/Lyrics/get_matcher_lyrics_get

Ruby GEM to retrieve lyrics: scrapes rapgenius website or uses genius API? -https://github.com/leishman/lyricgenius/blob/master/lib/lyricgenius.rb -https://github.com/kenshiro-o/RapGenius-JS

###Possible optimizations Problem: how to generate a list of songs to search lyrics for?

  • Narrow by other query (artist, trackname includes ___)
  • Less than 2000

Problem: need album/track/artist to search lyrics

  • For a person's spotify library, generate searchable lyrics?

##Serialization Serialization is the process of converting an object into a stream of bytes in order to store the object or transmit it to memory, a database, or a file. -shards: containers

For processing JSON:

  • JSON parse >

###Etc. -mechanize: community gem host

###Resources Site to search APIs: https://www.programmableweb.com Alchemy: sentiment analysis GloVe - Global vectors for word representations https://github.com/vesselinv/glove GLoVe word embeddings model for word similarity, TensorFlow Serving for scoring song relevance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment