Skip to content

Instantly share code, notes, and snippets.

@JoeMalt
Created October 25, 2016 14:21
Show Gist options
  • Save JoeMalt/ad6befc9424f47d5ccca96c9c6d28dca to your computer and use it in GitHub Desktop.
Save JoeMalt/ad6befc9424f47d5ccca96c9c6d28dca to your computer and use it in GitHub Desktop.
Facebook Tech Talk @ Cambridge CL, 25 Oct 2016
Facebook talk 2016-10-25, LT1, Computer Laboratory
Main talk: Daniel Bernhardt, Software Engineer, FB London since June 13
- danielbe@fb.com
- facebook.com/alone.on.a.hill
Scale:
- One person reading 50 posts a day -> 18,000 posts a day -> easy search problem
- However, FB in total has 2 trillion posts: need to find content in all of them
- What if someone searches for "rain", looking for a post with the word "drizzle" - NLP problem!
Search results different for every user
- Privacy requirements: what is each user allowed to see?
- Ranking different: what's relevant for each user? Demographics, profile, etc.
- Doing this at scale is hard
How do we solve this?
- Unicorn search backend
-- Paper: Curtiss, Becker et. al., "Unicorn: A System for Searching the Social Graph"
-- Frontend processing -> Top aggregator ('understands' query, corrects spelling, etc) -> Fan out query to three racks of machines (700+ TB data in memory) -> Index servers on rack machines
-- Indices of posts, users, events, pages (all types of entity)
- Entity linking
-- "cambridge drizle" becomes (location: cambridge OR text: cambridge) AND (text: drizle OR text: drizzle) [simplified]
-- Uses STRONG OR to make sure final results meet conditions, e.g. at least 20% of posts matching (location: cambridge) or 80% matching (text: drizzle)
-- May be many entities with the same name: does "cambridge" mean the city or the university?
-- Need to use context: students may mean university, others may mean the city
- Query understanding
-- Does "pictures of cat" mean cats or people called Catherine?
-- Can train this on what people search for immediately afterwards (e.g. if they search "cat" then "catherine", chances are they meant "catherine" first time round)
-- What kind of entity are we looking for: people, pages, events? Could query all the indices but that would be a waste of resources, better to prune early
Spelling correction
- Language model: how common is the word globally? -> How likely is it that this word was intended?
- Error model: how likely is this mistake? Consider keyboard layout, phonetic similarity
- Personalisation: how common is the word for the searcher, their friends, recently seen posts
- Example query: "ike moutain top"
-- Ngram language models: P(tom|mountain) vs P(top|mountain)
-- Co-occurrence of words
-- Semantic embeddings
--- Embed words into n-dimensional vector space, find that "mountain", "hike", "bike" end up close together: more likely than "ikea mountain"!
--- Further reading: "Natural Language Processing (Almost) From Scratch"
Career Opportunities
Victoria Clarke, University Recruiting Manager, Facebook London
Bootcamp approach: every new hire (new grad or experienced professional) goes through boot camp
- 3 weeks in UK, 3 weeks in US
- Spend time working on lots of projects: can then choose team and project that interests you
Hackathons and hack-a-months
- Take a month out to work on a different project in a different team
Internships
- 12 weeks
- Assigned an intern manager to work 1:1 with you
- Areas
-- Soft. Eng.
-- Production Eng.
-- Product Design
-- Data Science / Eng.
-- Connectivity Labs
-- UX Research
Main hubs: London, Tel Aviv and the US
Next steps:
- Apply Online
- facebook.com/careers
- By 4 November 2016
Application process:
- CV review
- Initial phone / skype interviews: live coding exercise
Hack the Future:
- 9th Nov, 6pm, London
- Transport reimbursed
- hackthefuture-lon.splashthat.com
- Facebook CTO talking about the biggest tech breakthroughs we'll see in the next decade
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment