You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
Instantly share code, notes, and snippets.
💫
in the latent space
Vicki Boykis
veekaybee
💫
in the latent space
ML @ Mozilla.ai | Engineering, Recsys, IR, ML, vectors, viberary.pizza 🍕
Goals: Add links that are reasonable and good explanations of how stuff works. No hype and no vendor content if possible. Practical first-hand accounts and experience preferred (super rare at this point).
@inproceedings{DBLP:conf/recsys/WanM18,
author = {Mengting Wan and
Julian J. McAuley},
editor = {Sole Pera and
Michael D. Ekstrand and
Xavier Amatriain and
This book is all about patterns for doing ML. It's broken up into several key parts, building and serving. Both of these are intertwined so it makes sense to read through the whole thing, there are very many good pieces of advice from seasoned professionals. The parts you can safely ignore relate to anything where they specifically use GCP. The other issue with the book it it's very heavily focused on deep learning cases. Not all modeling problems require these. Regardless, let's dive in. I've included the stuff that was relevant to me in the notes.
Most Interesting Bullets:
Machine learning models are not deterministic, so there are a number of ways we deal with them when building software, including setting random seeds in models during training and allowing for stateless functions, freezing layers, checkpointing, and generally making sure that flows are as reproducible as possib
An interview problem that I've gotten fairly often is, "Given a stream of elements, how do you get the median, or average, or sum of the elements in the stream?"
I've thought about this problem a lot and my naive implementation was to put the elements in a hashmap (dictionary) and then pass over the hashmap with whatever other function you need.
This episode of Recsperts was transcribed with Whisper from OpenAI, an open-source neural net trained on almost 700 hours of audio. The model includes an encoder-decoder architecture by tokenizing audio into 30-second chunks, normalizing audio samples to the log-Mel scale, and passing the data into an encoder. A decoder is trained to predict the captioned text matching the encoder, and the model includes transcription, as well as timestamp-aligned transcription, and multilingual translation.
The transcription process outputs a single string file, so it's up to the end-user to parse out individual speakers, or run the model [through a sec