chrishwiggins/some thoughts on ML, AI, and deep (and TensorFlow)

## some thoughts on ML, AI, and deep (and TensorFlow)
% thoughts on deep, 2016-09-01
% @chrishwiggins

(apologies this ended up being long; most of the ideas are in 1 graphic image,
so feel free to just click on the link
( https://sketch.io/render/sk-e40f367014c9440fef81de46271b4395.jpeg )
and you'll get the main ideas in about 1-2 seconds, and can save
the text for sometime when you're stuck in an elevator)

I was thinking about how deep learning as a capability relates to a company's challenges.

# AI vs ML ; deep vs shallow:

My current thinking is in this Venn diagram [0]:

https://sketch.io/render/sk-e40f367014c9440fef81de46271b4395.jpeg

(threw it together in sketch.io this morning)

1) AI vs ML (left-right):

the first thing to explain is the left-right divide in this cartoon.

artificial intelligence means building machines which emulate natural intelligence
(usually human intelligence). this could mean playing chess or
any of the other things that people do or show "intelligence". The fact that it's in quotes and not well
defined is why there is a whole branch of philosophy in intelligence, but we all
have work to do, so let's just put it in quotes.

machine learning is "the study of algorithms whose performance improves
when presented with more data" [1]. The key thing there is *data*: you don't
need data driven approaches to make a computer act like a human; you
could just use rules or "heuristics". This was explicitly the thinking of the
world's first AI conference in 1956 [2] and dominated the field for decades. It failed
and this failure led to the first AI winter in the 70s-80s [3].

(Examples of AI approaches that are not ML include
systems in which many, many rules are encoded, i.e., programmed
in one rule at a time, e.g., "expert systems" from the 1980s [4]
or chat programs like Eliza (1964) [5])

The revolution of machine learning was the realization that we could
"program with data": write programs not with rules but
write programs to learn the rules from more data.

There are AI problems that are not ML (rules).
There are ML problems that are not AI (e.g., diagnosing cancer directly
from abundant sequence data)
There are problems that are both (most of the hot press in ML these days
is AI tasks like captioning images or playing video games)

2) Deep vs shallow ML (up-down)

"up" is ML the way we've done for centuries: we specify what features
we think matter, then learn from data how they matter.

"down" is the deep way in which you just feed in "raw" data, e.g., buckets of
JSON describing every event the user performed, or raw pixel data for images.

Neither is "better":
"shallow" requires work in feature engineering, but gives you insight + interpretability
"deep" requires code, data, and hardware (see below), but gives you performance
even for problems where the features are unclear.

3) the divide

the things that make deep work are 3:
- more data (e.g., google scale)
- hardware (including GPUs)
- better algorithms (i.e., deep neural networks).

e.g., TensorFlow gives us the 3rd of these things, but not the 1st 2.

# references

[0] I tweeted it to see if The Internet had better ideas. We'll see
https://twitter.com/chrishwiggins/status/771333942607802368

[1] here i'm translating a famous definition into English
https://en.wikipedia.org/wiki/Machine_learning#Overview

[2] https://en.wikipedia.org/wiki/Dartmouth_Conferences

[3] https://en.wikipedia.org/wiki/AI_winter

[4] https://en.wikipedia.org/wiki/Expert_system

[5] https://en.wikipedia.org/wiki/ELIZA

# acknowledgements

thanks to @nlpsnarkbot for helpful comments
	% thoughts on deep, 2016-09-01
	% @chrishwiggins

	(apologies this ended up being long; most of the ideas are in 1 graphic image,
	so feel free to just click on the link
	( https://sketch.io/render/sk-e40f367014c9440fef81de46271b4395.jpeg )
	and you'll get the main ideas in about 1-2 seconds, and can save
	the text for sometime when you're stuck in an elevator)

	I was thinking about how deep learning as a capability relates to a company's challenges.

	# AI vs ML ; deep vs shallow:

	My current thinking is in this Venn diagram [0]:

	https://sketch.io/render/sk-e40f367014c9440fef81de46271b4395.jpeg

	(threw it together in sketch.io this morning)

	1) AI vs ML (left-right):

	the first thing to explain is the left-right divide in this cartoon.

	artificial intelligence means building machines which emulate natural intelligence
	(usually human intelligence). this could mean playing chess or
	any of the other things that people do or show "intelligence". The fact that it's in quotes and not well
	defined is why there is a whole branch of philosophy in intelligence, but we all
	have work to do, so let's just put it in quotes.

	machine learning is "the study of algorithms whose performance improves
	when presented with more data" [1]. The key thing there is data: you don't
	need data driven approaches to make a computer act like a human; you
	could just use rules or "heuristics". This was explicitly the thinking of the
	world's first AI conference in 1956 [2] and dominated the field for decades. It failed
	and this failure led to the first AI winter in the 70s-80s [3].

	(Examples of AI approaches that are not ML include
	systems in which many, many rules are encoded, i.e., programmed
	in one rule at a time, e.g., "expert systems" from the 1980s [4]
	or chat programs like Eliza (1964) [5])

	The revolution of machine learning was the realization that we could
	"program with data": write programs not with rules but
	write programs to learn the rules from more data.

	There are AI problems that are not ML (rules).
	There are ML problems that are not AI (e.g., diagnosing cancer directly
	from abundant sequence data)
	There are problems that are both (most of the hot press in ML these days
	is AI tasks like captioning images or playing video games)

	2) Deep vs shallow ML (up-down)

	"up" is ML the way we've done for centuries: we specify what features
	we think matter, then learn from data how they matter.

	"down" is the deep way in which you just feed in "raw" data, e.g., buckets of
	JSON describing every event the user performed, or raw pixel data for images.

	Neither is "better":
	"shallow" requires work in feature engineering, but gives you insight + interpretability
	"deep" requires code, data, and hardware (see below), but gives you performance
	even for problems where the features are unclear.

	3) the divide

	the things that make deep work are 3:
	- more data (e.g., google scale)
	- hardware (including GPUs)
	- better algorithms (i.e., deep neural networks).

	e.g., TensorFlow gives us the 3rd of these things, but not the 1st 2.

	# references

	[0] I tweeted it to see if The Internet had better ideas. We'll see
	https://twitter.com/chrishwiggins/status/771333942607802368

	[1] here i'm translating a famous definition into English
	https://en.wikipedia.org/wiki/Machine_learning#Overview

	[2] https://en.wikipedia.org/wiki/Dartmouth_Conferences

	[3] https://en.wikipedia.org/wiki/AI_winter

	[4] https://en.wikipedia.org/wiki/Expert_system

	[5] https://en.wikipedia.org/wiki/ELIZA

	# acknowledgements

	thanks to @nlpsnarkbot for helpful comments