mjamesruggiero/mlconf_google_deep_learning.md

## mlconf_google_deep_learning.md

      
    Raw
  

              mlconf_google_deep_learning.md
            
          
    large scale deep learning

Quoc V.Le - Google
parallel neural networks at Google scale

machine learning requires domain knowledge from human experts
we want to move beyond hiring domain experts; it would be good to have machines create features rather than human experts

deep learning:

great performance on many problems
works well with a lot of data
requires less domain knowledge

applying non-linearity (like a sigmoid) in successive iterations to build complex neural networks
The network can "learn" a lot of complex functions, independent of domain knowledge.
pixels -> edge detectors -> face detectors
Google's DistBelief


trains deep learning on many machines (10K or more)
forward pass to compute the gradient, backward pass to compute the gradient
model parameters are partitioned
can use up to 1000 cores.
"1000 cores is still really small" so they partition the data
and apply the functions to separate nodes and then send answers back to
a "parameter server"
the problem with this model: the server needs to wait for all answers to
compute. so they relax the constraint and allow for asynch computation

Uses

voice search, photo search, and text understanding
Voice search: your speech is sent to a deep neural network that

extracts a speech frame
classifies the phonemes
then puts the phonemes together to recognize your speech

Completely done with parallelized networks.
Text understanding: useful but very difficult

programatically understanding the meaning of words in context (complete with metaphors and idioms)
you can map each word to a 100-dimension space.
translation can be mapped geometrically by matching words that occupy the same XY

Related:


Quoc Le speaking in 2012 on deep learning
Exploiting Similarities among Languages for Machine Translation (paper)
2012 NYT story on how Google deep learning "identified" cat pictures