Created October 25, 2016 14:21
Facebook Tech Talk @ Cambridge CL, 25 Oct 2016
Facebook talk 2016-10-25, LT1, Computer Laboratory
Main talk: Daniel Bernhardt, Software Engineer, FB London since June 13
- One person reading 50 posts a day -> 18,000 posts a day -> easy search problem
- However, FB in total has 2 trillion posts: need to find content in all of them
- What if someone searches for "rain", looking for a post with the word "drizzle" - NLP problem!
Search results different for every user
- Privacy requirements: what is each user allowed to see?
- Ranking different: what's relevant for each user? Demographics, profile, etc.
- Doing this at scale is hard
How do we solve this?
- Unicorn search backend
-- Paper: Curtiss, Becker et. al., "Unicorn: A System for Searching the Social Graph"
-- Frontend processing -> Top aggregator ('understands' query, corrects spelling, etc) -> Fan out query to three racks of machines (700+ TB data in memory) -> Index servers on rack machines
-- Indices of posts, users, events, pages (all types of entity)
- Entity linking
-- "cambridge drizle" becomes (location: cambridge OR text: cambridge) AND (text: drizle OR text: drizzle) [simplified]
-- Uses STRONG OR to make sure final results meet conditions, e.g. at least 20% of posts matching (location: cambridge) or 80% matching (text: drizzle)
-- May be many entities with the same name: does "cambridge" mean the city or the university?
-- Need to use context: students may mean university, others may mean the city
- Query understanding
-- Does "pictures of cat" mean cats or people called Catherine?
-- Can train this on what people search for immediately afterwards (e.g. if they search "cat" then "catherine", chances are they meant "catherine" first time round)
-- What kind of entity are we looking for: people, pages, events? Could query all the indices but that would be a waste of resources, better to prune early
Spelling correction
- Language model: how common is the word globally? -> How likely is it that this word was intended?
- Error model: how likely is this mistake? Consider keyboard layout, phonetic similarity
- Personalisation: how common is the word for the searcher, their friends, recently seen posts
- Example query: "ike moutain top"
-- Ngram language models: P(tom|mountain) vs P(top|mountain)
-- Co-occurrence of words
-- Semantic embeddings
--- Embed words into n-dimensional vector space, find that "mountain", "hike", "bike" end up close together: more likely than "ikea mountain"!
--- Further reading: "Natural Language Processing (Almost) From Scratch"
Career Opportunities
Victoria Clarke, University Recruiting Manager, Facebook London
Bootcamp approach: every new hire (new grad or experienced professional) goes through boot camp
- 3 weeks in UK, 3 weeks in US
- Spend time working on lots of projects: can then choose team and project that interests you
Hackathons and hack-a-months
- Take a month out to work on a different project in a different team
- 12 weeks
- Assigned an intern manager to work 1:1 with you
- Areas
-- Soft. Eng.
-- Production Eng.
-- Product Design
-- Data Science / Eng.
-- Connectivity Labs
-- UX Research
Main hubs: London, Tel Aviv and the US
Next steps:
- Apply Online
- By 4 November 2016
Application process:
- CV review
- Initial phone / skype interviews: live coding exercise
Hack the Future:
- 9th Nov, 6pm, London
- Transport reimbursed
- Facebook CTO talking about the biggest tech breakthroughs we'll see in the next decade
