JoeMalt/facebook_tech_talk_20161025

## facebook_tech_talk_20161025
Facebook talk 2016-10-25, LT1, Computer Laboratory

Main talk: Daniel Bernhardt, Software Engineer, FB London since June 13
- danielbe@fb.com
- facebook.com/alone.on.a.hill

Scale:
- One person reading 50 posts a day -> 18,000 posts a day -> easy search problem
- However, FB in total has 2 trillion posts: need to find content in all of them
- What if someone searches for "rain", looking for a post with the word "drizzle" - NLP problem!

Search results different for every user
- Privacy requirements: what is each user allowed to see?
- Ranking different: what's relevant for each user? Demographics, profile, etc.
- Doing this at scale is hard

How do we solve this?
- Unicorn search backend
-- Paper: Curtiss, Becker et. al., "Unicorn: A System for Searching the Social Graph"
-- Frontend processing -> Top aggregator ('understands' query, corrects spelling, etc) -> Fan out query to three racks of machines (700+ TB data in memory) -> Index servers on rack machines
-- Indices of posts, users, events, pages (all types of entity)

- Entity linking
-- "cambridge drizle" becomes (location: cambridge OR text: cambridge) AND (text: drizle OR text: drizzle) [simplified]
-- Uses STRONG OR to make sure final results meet conditions, e.g. at least 20% of posts matching (location: cambridge) or 80% matching (text: drizzle)
-- May be many entities with the same name: does "cambridge" mean the city or the university?
-- Need to use context: students may mean university, others may mean the city

- Query understanding
-- Does "pictures of cat" mean cats or people called Catherine?
-- Can train this on what people search for immediately afterwards (e.g. if they search "cat" then "catherine", chances are they meant "catherine" first time round)
-- What kind of entity are we looking for: people, pages, events? Could query all the indices but that would be a waste of resources, better to prune early

Spelling correction
- Language model: how common is the word globally? -> How likely is it that this word was intended?
- Error model: how likely is this mistake? Consider keyboard layout, phonetic similarity
- Personalisation: how common is the word for the searcher, their friends, recently seen posts
- Example query: "ike moutain top"
-- Ngram language models: P(tom|mountain) vs P(top|mountain)
-- Co-occurrence of words
-- Semantic embeddings
--- Embed words into n-dimensional vector space, find that "mountain", "hike", "bike" end up close together: more likely than "ikea mountain"!
--- Further reading: "Natural Language Processing (Almost) From Scratch"


Career Opportunities
Victoria Clarke, University Recruiting Manager, Facebook London

Bootcamp approach: every new hire (new grad or experienced professional) goes through boot camp
- 3 weeks in UK, 3 weeks in US
- Spend time working on lots of projects: can then choose team and project that interests you

Hackathons and hack-a-months
- Take a month out to work on a different project in a different team

Internships
- 12 weeks
- Assigned an intern manager to work 1:1 with you

- Areas
-- Soft. Eng.
-- Production Eng.
-- Product Design
-- Data Science / Eng.
-- Connectivity Labs
-- UX Research

Main hubs: London, Tel Aviv and the US

Next steps:
- Apply Online
- facebook.com/careers
- By 4 November 2016

Application process:
- CV review
- Initial phone / skype interviews: live coding exercise

Hack the Future:
- 9th Nov, 6pm, London
- Transport reimbursed
- hackthefuture-lon.splashthat.com
- Facebook CTO talking about the biggest tech breakthroughs we'll see in the next decade
	Facebook talk 2016-10-25, LT1, Computer Laboratory

	Main talk: Daniel Bernhardt, Software Engineer, FB London since June 13
	- danielbe@fb.com
	- facebook.com/alone.on.a.hill

	Scale:
	- One person reading 50 posts a day -> 18,000 posts a day -> easy search problem
	- However, FB in total has 2 trillion posts: need to find content in all of them
	- What if someone searches for "rain", looking for a post with the word "drizzle" - NLP problem!

	Search results different for every user
	- Privacy requirements: what is each user allowed to see?
	- Ranking different: what's relevant for each user? Demographics, profile, etc.
	- Doing this at scale is hard

	How do we solve this?
	- Unicorn search backend
	-- Paper: Curtiss, Becker et. al., "Unicorn: A System for Searching the Social Graph"
	-- Frontend processing -> Top aggregator ('understands' query, corrects spelling, etc) -> Fan out query to three racks of machines (700+ TB data in memory) -> Index servers on rack machines
	-- Indices of posts, users, events, pages (all types of entity)

	- Entity linking
	-- "cambridge drizle" becomes (location: cambridge OR text: cambridge) AND (text: drizle OR text: drizzle) [simplified]
	-- Uses STRONG OR to make sure final results meet conditions, e.g. at least 20% of posts matching (location: cambridge) or 80% matching (text: drizzle)
	-- May be many entities with the same name: does "cambridge" mean the city or the university?
	-- Need to use context: students may mean university, others may mean the city

	- Query understanding
	-- Does "pictures of cat" mean cats or people called Catherine?
	-- Can train this on what people search for immediately afterwards (e.g. if they search "cat" then "catherine", chances are they meant "catherine" first time round)
	-- What kind of entity are we looking for: people, pages, events? Could query all the indices but that would be a waste of resources, better to prune early

	Spelling correction
	- Language model: how common is the word globally? -> How likely is it that this word was intended?
	- Error model: how likely is this mistake? Consider keyboard layout, phonetic similarity
	- Personalisation: how common is the word for the searcher, their friends, recently seen posts
	- Example query: "ike moutain top"
	-- Ngram language models: P(tom\|mountain) vs P(top\|mountain)
	-- Co-occurrence of words
	-- Semantic embeddings
	--- Embed words into n-dimensional vector space, find that "mountain", "hike", "bike" end up close together: more likely than "ikea mountain"!
	--- Further reading: "Natural Language Processing (Almost) From Scratch"




	Career Opportunities
	Victoria Clarke, University Recruiting Manager, Facebook London

	Bootcamp approach: every new hire (new grad or experienced professional) goes through boot camp
	- 3 weeks in UK, 3 weeks in US
	- Spend time working on lots of projects: can then choose team and project that interests you

	Hackathons and hack-a-months
	- Take a month out to work on a different project in a different team

	Internships
	- 12 weeks
	- Assigned an intern manager to work 1:1 with you

	- Areas
	-- Soft. Eng.
	-- Production Eng.
	-- Product Design
	-- Data Science / Eng.
	-- Connectivity Labs
	-- UX Research

	Main hubs: London, Tel Aviv and the US

	Next steps:
	- Apply Online
	- facebook.com/careers
	- By 4 November 2016

	Application process:
	- CV review
	- Initial phone / skype interviews: live coding exercise

	Hack the Future:
	- 9th Nov, 6pm, London
	- Transport reimbursed
	- hackthefuture-lon.splashthat.com
	- Facebook CTO talking about the biggest tech breakthroughs we'll see in the next decade