Okay so welcome practical deep learning
for coders lesson one it's kind of
lesson two because there's a lesson zero
in less than zero is is why do you need
a GPU and how do you get it set up so if
you haven't got the GPU running yet then
go back and do that make sure that you
can access a jupiter notebook and and
then you're ready to start the real
lesson one so if you're ready you will
be able to see something like this and
in particular hopefully you have gone to
notebook tutorial it's at the top that's
right with zero zero here as this grows
you'll see more and more files but will
keep a notebook tutorial at the top and
you will have used your jupiter notebook
to add one and one together getting the
expected result bigger and hopefully
you've learned these four keyboard
shortcuts so the basic idea is that your
jupiter notebook has pros in it it can
have pictures you know it can have
charts in it and most importantly it can
have code in it okay so the code is in
python how many people have used Python
before so nearly all of you that's great
um if you haven't used Python that's
totally okay okay it's a pretty easy
language to pick up but if you haven't
used Python this will feel a little bit
more intimidating because the code that
you're seeing will be unfamiliar to you
yes Rachel oh yeah no because I'm trying
to keep them up separate yeah yeah okay
we're not the way here so as I say there
are things like this where people in the
room in person
and this is one of those bits just like
this is really for the book audience not
for you that's I think this will be the
only time like this in the in the lesson
where we've assumed you've got this set
up thanks to a mother okay
all right so yeah this is you're in the
room or on foreign faster you're alive
you can go back after this and make sure
that you can get this running using the
information in course III go faster do
okay okay okay so a Jupiter notebook is
a really interesting device for our data
scientists because it kind of lets you
run interactive experiments and it lets
us give you not just a static piece of
information but it let it lets ask you
something that you can actually
interactively experiment with so let me
explain how we think works well to use
these notebooks and to use this material
and this is based on the kind of last
three years of experience we've had with
the students who have gone through this
course first of all it works pretty well
just to watch a lesson end to end okay
don't try and follow along because it's
not really designed to go to speed where
you can follow along it's designed to be
something where you just take in the
information you get a general sense of
all of the pieces how it all fits
together right and then you can do it
back and go through it more slowly
pausing on in the video and trying
things out making sure that you can do
the things that I'm doing and that you
can try and extend them to do it things
in your own way okay so don't worry if
things are zipping along faster then you
can do them that's normal and also don't
try and stop and understand everything
the first time if you do understand
everything the first time good for you
but most people don't particularly as
the lessons go on they get faster and
they get more difficult okay so at this
point we've got our notebooks going
we're ready to start doing deep learning
and so the main thing that hopefully
you're going to agree at the end of this
is that you can do deep learning
regardless of who you are now I don't
just mean do we mean do at a very high
level
I mean world-class practitioner level
deep learning okay so your main place to
be looking for things is course b3 to
fast
AI where you can find out how to get a
GPU other information and you can also
access our forums you can also access
our forums and on our forums you'll find
things like how do you build a deep
learning box yourself and that's
something that you can do after you
don't later on once you've kind of got
going Who am I so why should you listen
to me well maybe you shouldn't but I'll
try and justify why you should listen to
me I've been doing stuff with machine
learning for over 25 years I started out
in management consulting where actually
initially I was I think Mackenzie and
company's first analytical specialist
and went into a general consulting ran
number of startups for a long time
eventually became the president of
cattle but actually the thing I'm
probably most proud of in my life is
that I got to be the number one ranked
contestant in travel competitions
globally so I think that's a good fact
to call why can you actually train a
predictive model that predicts things
pretty important aspect of data science
I didn't found a company called analytic
which was the first kind of medical deep
learning company nowadays I'm on the
faculty at University of San Francisco
and also co-founder with Rachel of fast
AI so I
used machine learning throughout that
time and I guess I'm not really although
I am at usf for the University
I'm not really an academic type I'm much
more interested in in using this tool to
do useful things specifically through
fast AI we are trying to help people use
deep learning to do useful things
through creating software to make deep
learning easier to use at a very high
level through education such as the
thing you are watching now through
research which is where we spend a very
large amount of our time which is
researching to figure out how can you
make deep learning easier to use at a
very high level which ends up in as
you'll see in the software and the
education and by helping to build a
community which has made me through the
forums so that practitioners can find
each other and work together so that's
what we're doing so this lesson
practical deep learning for coders is
kind of the starting point in this
journey it contains seven lessons each
one's about two hours long we're then
expecting you to do about eight to ten
hours of homework during the week so
it'll end up being something around 70
or 80 hours of work I will say there is
a lot as to how much people put into
this I know a lot of people who work
full time on fast AI some folks whose do
the two parts can spend a whole year
doing it really intensively I know some
folks watch the videos on double-speed
and never do any homework and come at
the end of it with you know a general
sense of what's going on so there's lots
of different ways you can do this but if
you follow along with this kind of ten
hours a week or so approach for the
seven weeks by the end you will be able
to build an image classification model
on pictures that you choose that will
work at a world class level you'll be
able to classify text again using
whatever datasets you're interested in
you'll be able to make predictions of
kind of commercial applications like
sales you'll be able to build
recommendation systems such as the one
used by Netflix not Tory examples of any
of these but actually things that can
come top ten and capital competitions
that
be everything that's in the academic
community very very high-level versions
of these things so that might surprise
you that's slightly over the the
prerequisite here is literally one year
of coding and high school math but we
have thousands of students now who have
done this and shown it to be true you
will probably hear a lot of naysayers
less now than a couple of years ago than
we started but a lot of naysayers
telling you that you can't do it or that
you shouldn't be doing it or the deep
learnings got all these problems it's
not perfect but these are all things
that people claim about deep learning
which are either pointless or untrue
it's not a black box as you'll see it's
really great for interpretive
interpreting what's going on it does not
need much data for most practical
applications you certainly don't need a
PhD rate from house one so it doesn't
actually stop you from doing deep
learning if you have a PhD I certainly
don't I have a philosophy degree and
nothing else it can be used very widely
for lots of different applications not
just for vision which is where it's most
well-known you don't need lots of
hardware you know that thirty-six cent
and our server is more than enough to
get world-class results for most
problems it's true that maybe this is
not going to help you to build a
sentient brain but that's not our focus
okay so for all the people who say deep
learning is not interesting because it's
not really AI not really a conversation
that I'm interested in we're focused on
solving interesting real-world problems
what are you going to be able to do by
the end of lesson one well this was an
example from Nikhil who's actually in
the audience now cuz he was in last
year's course as well this is an example
of something he did which is he
downloaded 30 images of people playing
cricket and people playing baseball and
around the coach will see you today and
build a nearly perfect classifier of
riches which so this kind of its kind of
stuff that you can build with some fun
hobby examples like this or you can try
stuff as we'll see in the workplace that
could be of direct commercial value so
this is the idea
we're going to get to by the end of
lesson one we're going to start by
looking at code which is very different
to many of the academic courses so for
those of you who haven't kind of an
engineering or math or computer science
background this is very different to the
approach where you start with lots and
lots of theory and then eventually you
get to a postgraduate degree and you're
finally at the point where you can build
something useful we're gonna learn to
build the useful thing today okay now
that means that at the end of the day
you want level of a theory okay there
will be lots of aspects of what we do
that you don't know why or how it works
that's okay
you will learn why and how it works over
the next seven weeks but for now we've
found that what works really well is to
actually get your hands dirty
coding not focusing on theory because
there's still a lot of Addison ship in
deep learning unfortunately it's still a
situation where people who are good
practitioners have a really good feel
for how to work with code and how to
work with the data and you can only get
that through experience and so the best
way to get that that that feel of how to
get good models is to create lots of
models through lots of coding and study
them carefully and it's Jupiter notebook
provides a really great way to study
them so let's try that let's try getting
started
that's so to get started you will open
your Jupiter notebook and you'll click
on lesson 1 lesson 1 yes and it will pop
open looking something like this and so
here it is so you can run a sail and a
Jupiter notebook by clicking on it and
pressing run but if you do so everybody
will know that you're not a real deep
learning practitioner because real deep
learning practitioners know the keyboard
shortcuts and the keyboard shortcut is
shift enter given how often you have to
run a cell don't be going all the way up
here finding your clicking at just shift
enter
ok so type like type shift enter don't
actually
up and down to move around to pick
something to run shift-enter to run okay
so we're going to go through this
quickly and then later on we're going to
go back over it more carefully so here's
the quick version to get a sense of
what's going on
so here we are in lesson 1 and these
three lines is what we start every
notebook with these things starting with
percent are special directives to
Jupiter notebook itself they're not
Python code they're called magics which
is kind of a cool name and these three
directives the details aren't very
important but basically it says hey if
somebody changes the underlying library
code while I'm running this place
reloaded automatically if somebody asks
to plot something then please plot it
here in this Jupiter mo book so just put
those three lines at the top of
everything the next two lines load up
the fast AI library
what is the faster a library so it's a
little bit confusing fast AI with no dot
is the name of our software and then
first dot AI with the dot is the name of
our organization so if you go to dark
start fast
AI this is the fast a I might be okay
well learn more about it in a moment but
for now just realize everything we are
going to do is going to be using
basically either first AI or the thing
that fast AI sits on top of which is
platform height which is one of the most
popular libraries for deep learning in
the world it's a bit newer than
tensorflow so in a lot of ways it's more
modern than tensorflow it's extremely
fast growing extremely popular and we
use it because we used to use tensorflow
a couple of years ago and we found we
can just do a lot more a lot more
quickly with paid watch and then we have
this software that sits on top of plate
watch unless you do far far far more
things that are far more easily than can
with plate or alone so it's a good
combination we'll be talking about about
it but for now just know that you can
use past AI by doing two things
importing
star from past AI and then importing
staff and fast AI dot something where
something is the application you want
concurrently fast AI supports for
applications computer vision natural
language text tabular data and
collaborative filtering and we're and
we're going to see lots of examples of
all of those during the seven weeks so
we're going to be doing some computer
vision at this point if you are a Python
software engineer you are probably
feeling sick because you see me go
import star which is something that
you've all been told to never ever do
okay and there's very good reasons to
not use import star in standard
production code with most libraries but
you might have also seen for those of
you that have used something like MATLAB
it's kind of the opposite everything's
there for you all the time you don't
even have to import things a lot of the
time it's kind of funny we've got these
two extremes of like how to write code
you've got a scientific programming
community that has one way and then
you've got the software engineering
community that has the other both have
really good reasons for doing things and
with the faster a library we actually
support both approaches you know you put
a note block where you want to be able
to quickly interactively try stuff out
you don't want to be constantly going
back up to the top and importing more
stuff and trying to figure out where
things are you want to be able to use
lots of tab complete be you know very
experimental so import start is great
then when you're building stuff in
production you can do the normal Pepe
style you know proper software
engineering practices so so don't worry
when you see me doing stuff which at
your workplace is found upon okay it's
it's this is a different style of coding
it's not that there are no rules in data
science programming it's that the rules
are different right when you're training
models the most important thing is to be
able to interactively experiment quickly
and so you'll see we use a lot of very
different processes styles and stuff to
what you're used to but they're there
for a reason and you'll learn about them
over time you can choose to abuse a
similar approach or not it's entirely up
to you
the other thing to mention
is that the faster a library's it
designed in a very interesting modular
way and you'll find over time that when
you do use import star there's far less
clobbering of things and you might
expect it's all explicitly designed to
allow you to pull in things and use them
quickly without having problems okay so
we're going to look at some data and
there's two main places that were
pretending to get data from for the
course one is from academic datasets
academic datasets are really important
they're really interesting they're
things where academics spend a lot of
time curating and gathering a data set
so that they can show how well different
kinds of approaches work with that data
though the end here is they try to
design data sets that are challenging in
some way and require some kind of
breakthrough to do them well so we're
going to be starting with an academic
data set called the pet data set the
other kind of data set we'll be using
during the course is data sets from the
categorical competitions platform both
academic data sets and cadwal data sets
are interesting for us particularly
because they provide strong baselines
that is to say you want to know if
you're doing a good job so with capital
data sets that have come from a
competition you can actually submit your
results to Carol and see how well would
you have gone in that competition and if
you can get in about the top 10% that
I'd say you're doing pretty well for
academic data sets academics write down
in papers what the state of the art is
so how well did they go with using
models on that data set so this is this
is what we're going to do we're going to
try and create models that get right up
towards the top of capital competitions
preferably actually in the top ten what
does the top 10% or that meet or exceed
academic state-of-the-art published
results so the when you use an academic
data set it's important to cite it so
you'll see here there's a link to the
paper that it's from you definitely
don't need to read that paper right now
but if you're interested in learning
more about it and why it was created and
how it was created all the details there
so in this case this is a pretty
difficult challenge the PEC datasets
going to ask us to distinguish between
37 different categories of dog breed and
cat breed so that's really hard in fact
every course until this one we've used a
different data set which is one where
you just have to decide is something a
dog or is it a cat so you've got a 50-50
chance right away
right and dogs and cats look really
different there are lots of dog breeds
and cat breeds look pretty much the same
so why have we changed that dataset
we've got to the point now where deep
wedding is so fast and so easy that the
dogs versus cats problem which a few
years ago was considered extremely
difficult 80% accuracy was
state-of-the-art it's now too easy
our models were basically getting
everything right all the time without
any tuning and so they want you know
really a lot of opportunities for me to
show you how to do more sophisticated
stuff so we've picked a harder problem
this year so this is the first class
where we're going to be learning how to
do this difficult problem and this kind
of thing where you have to distinguish
between similar categories it's called
in the academic context is called
fine-grained classification so we're
going to do the fine-grained
classification task with figuring out
particular kind of pet and so the first
thing we have to do is download and
extract the data that we want we're
going to be using this function called
ant our data which will download it
automatically and we'll enter it
automatically AWS has been kind enough
to give us lots of space and bandwidth
for these datasets so they are download
super quickly for you and so the first
question then would be how do I know
what entire data does so you could just
type help and you will find out what my
talk did it come from because since we
imported staff we don't necessarily know
that what does it do and something you
might not have seen before
even if you're an experienced programmer
is what exactly do you pass to it you're
probably used to seeing the names URL
file name destination but you might not
be used to seeing these bits these bits
are tight
and if you've used a tight programming
language you'll be used to seeing them
but frankly programmers are less used to
it but if you think about it you don't
actually know how to use a function
unless you know what type each thing is
that you're providing it so we make sure
that we give you that type information
directly here in the help so in this
case the URL is a string and the file
name
is either Union means either over a path
or a string and it defaults to nothing
and the destination is either a path or
a string of defaults to nothing so we'll
learn more short me about how to get
more documentation about the details of
this but for now we can see we don't
have to pass in a file name or a
destination it'll figure that out for us
from the URL so and for all the data
sets we'll be using in the course we
already have constants defined for all
of them right so in this URLs module or
class actually you can see that's where
it's going to grab it from okay so it's
going to download that to some
convenient path and untie it for us and
we'll then return the value of path okay
and then in Jupiter map book it's kind
of handy you can just write a variable
on its own and semicolon is just it in
the statement marker in Python so that's
the same as doing this you can write it
on phone and it fits it you can also say
print write but again we're trying to do
everything fast and interactively
there's write it and here is the path
where it's given us that data next time
you run this since you've already
downloaded it it won't download it again
since you've already untied it it won't
untie or it again so everything's kind
of designed to be pretty automatic
pretty easy there are some things in
Python that are less convenient for
interactive use and they should be for
example when you do have a path object
seeing what's in it actually is takes a
lot more typing that I would like so
sometimes we add functionality into
existing Python stuff one of the things
we do is we add an LS method to paths so
if you go to path type LS here is what's
inside
this path so that's what we just
downloaded so when you try this yourself
you wait a couple of minutes for it to
download unzip and then you can see
what's in there if you're an experienced
Python programmer you may not be
familiar with this approach of using a
splash like this now this is a really
convenient function that's part of
Python three its functionality from
something called path Lib these are path
objects path objects are much better to
use then strings that lets you basically
create sub paths like this it doesn't
matter if you're on Windows Linux Mac
it's always going to work exactly the
same way so here's a path to the images
in that data set alright so if you're
starting with a brand new data set
trying to do some deep learning on it
what do you do well the first thing you
would want to do is probably see what's
in there so we've found that these are
the directories that in there so what's
in this images there's a lot of
functions in fast i/o for you there's
one called get image files that will
just grab a array of all of the image
files based on extension in a path and
so here you can see we've got lots of
different files okay so this is a pretty
common way to for image computer vision
datasets to get passed around as that is
just one folder with a whole bunch of
files in it so the interesting bit then
is how do we get the labels so in
machine learning the labels refer to the
thing we're trying to predict and if we
just eyeball this we could immediately
see that the labels are actually part of
the file name you see that right it's
kind of like path slash label underscore
number extension so we need to somehow
get a list of these bits of each file
name and that will give us our labels
because that's all you need to build a
deep learning model you need see
pictures so files containing the images
and you need some labels so in fast AI
this is made really easy there's a
object called image data Bunch and an
image data bunch represents all of the
data you need to build a model and
there's basically some factory methods
which try to make it really easy for you
to create that data bunch we talked more
about this role even a training set and
the validation set with images and
labels for you now in this case we can
see we need to extract the labels from
the names okay so we're going to use
from name re so for those of you that
use Python you know re is the module in
Python that does regular expressions
things that's really useful for
extracting text I just went ahead and
created the regular expression that
would extract the label from this text
okay so those of you who are not
familiar with regular expressions super
useful to be very useful to spend some
time figuring out how and why that
particular regular expression is going
to extract the label from this text okay
so with this factory method we can
basically say okay I've got this path
containing images this is a list of file
names remember I got them back here this
is the regular expression pattern that
is going to be used to extract the label
from the filename will talk about
transforms later and then you obviously
to say what size images do you want to
work with so that might seem weird why
do I need to say what size images I want
to work with because the images have a
size we can see what size the images are
and I guess honestly this is a
shortcoming of current deep learning
technology which is that a GPU has to
apply the exact same instruction through
a whole bunch of things at the same time
in order to be fast and so if the images
are different shapes and sizes you can't
do that right so we actually have to
make all of the images the same shape
and size in part one of the course we're
always going to be making images square
shapes in part two we'll learn how to
use rectangles as well it turns out to
be surprisingly nuanced but pretty much
everybody in pretty much all computer
vision modeling nearly all of it uses
this approach of square and 224 by 224
for reasons we learn about is an
extremely common size that most models
tend to use so if you just use size
equals to 24 you're probably going to
get pretty good results most of the time
and this is kind of the little bits of
artists in the ship that I want to teach
you folks which is like what generally
just works okay so if you just use size
equal to 24 that'll generally just work
for most things most of the time so this
is kind of return a data bunch object
and in fast AI everything you model with
is going to be a data bunch object we're
going to learn all about them and what's
in them and how do we look at them and
so forth they're basically a data bunch
object contains two or three data sets
it contains your training data we'll
learn about this shortly it'll contain
your validation data and optionally it
contains your test data and for each of
those it contains your your images and
your labels or your texts and your
labels or your tabular data and your
labels or so forth and that all sits
there in this one place something we'll
learn more about a little bit is
normalization but generally in all
nearly all machine learning tasks you
have to make all of your data about the
same size they're specifically about the
same mean and about the same standard
deviation so there's a normalized
function that we can use to normalize
our data bunch in that way okay rich or
come and ask the question thanks what is
the function do an image size is not 224
great so this is propaganda known about
shortly basically this thing called
transforms is is used to do a number of
things and one of the things it does is
to make something size 224
let's take a look at a few pictures here
are a few pictures of things from my
digger from my data bunch so you can see
data dot show batch can be used to show
me the contents of
some of the contents of my data bunch so
this is going to be three by three and
you can see roughly what's happened is
that they all seem to have been kind of
zoomed and cropped in a reasonably nice
way so basically what it'll do is
something called by default center
cropping which means it'll kind of grab
the middle bit and it also resize it so
we'll talk more about the detail this
because it turns out to actually be
quite important but basically a
combination of cropping and resizing is
used something else we'll learn about is
we also use this to do something called
data augmentation so there's actually
some randomization in how much and where
it crops and stuff like that okay but
that's the basic idea is some cropping
and some resizing that often we also
also do some some padding so there's
also all kinds of different ways and it
depends on data augmentation which we're
going to learn about shortly and what
does it mean to normalize the images so
normalizing the images we're going to be
learning more about later in the course
but in short it means that the the pixel
values we're going to be learning more
about pixel values the pixel values
start out from naught to 255 and some
pixel values might tend to be really I
should say some channels because there's
red green and blue so some channels
might tend to be really bright and some
might tend to be really not bright at
all and some might be area large and
some might not very much at all it
really helps train a deep learning model
if each one of those red green and blue
channels has a mean of 0 and a standard
deviation of 1 ok we'll learn more about
that if you haven't studied or don't
remember means and standard deviations
we'll get back to some of that later but
that's the basic idea
that's what normalization does if your
data and again we'll learn much more
about details but if your data is not
normalized it can be quite difficult for
your model to train well so if you do
have trouble training a model one thing
to check is that you've normalized it as
GPU man will be in power up to doesn't
size 256
some more practical considering to be a
little utilization so we're going to be
getting into that shortly but the brief
answer is that the models are designed
so that the final layer is of size seven
by seven so we actually want something
where if you go seven times to a bunch
of times then you end up with something
that's a good size yeah all of these
details we are going to we are going to
get to but the key thing is I wanted to
get you training a model as quickly as
possible but you know one of the most
important things to be a really good
practitioner is to be able to look at
your data okay so it's really important
to remember to go do batch and take a
look it's surprising how often when you
actually look at the data set you've
been given that you realize it's got
weird black borders on earth or some of
the things have text covering up some of
it or some of its rotated in odd ways so
make sure you take a look okay and then
the other thing we want to do is not
just look at the pictures but also look
at the labels and so all of the possible
label names accord your classes that's
where the data bunch you can print out
your data type classes and so here they
are that's almost the possible labels
that we found by using that regular
expression on the file names and we
learnt earlier on in that prose I wrote
at the top that there are 37 possible
categories and so just checking length
data classes it is indeed 37 a data
bunch will always have a property called
C and that property called C the
technical details will kind of get to it
later but for now you can kind of think
of it as being a number of classes for
things like regression problems and
multi-label classification and stuff
that's not exactly accurate but it will
do for them it's it's important to know
that data dot C is a really important
piece of information that is something
like or at least for classification
problems it is the number of classes
okay believe it or not we're now ready
to train a model and so a model is
trained in fast AI using something
called a learner and just like a data
bunch is a general fast AI concept for
your data and from there there are
subclasses for particular applications
like image data bunch Alanna is a
general concept for things that can
learn to fit the model and from that
there are various subclasses to make
things easier and a particular there's
one called con flora which is something
that will create a convolutional neural
network for you and we'll be learning a
lot about that over the next few lessons
but for now just know that to create a
learner for a convolutional neural
network you just have to tell it two
things the first is what's your data and
not surprisingly it takes a data bunch
and the second thing you need to tell it
is what's your model or what's your
architecture so as I learned there are
lots of different ways of constructing a
convolutional neural network but for now
the most important thing for you to know
is that there's a particular kind of
model called a res net which works
extremely well nearly all the time and
so for a while at least you really only
need to be doing choosing between two
things which is what size ResNet do you
want don't is basically how big is it
and we'll learn them all about the
details of what that means but there's
that one quarter risen at 34 and there's
one quarter of ResNet 50 and so when
we're getting started with something up
because small one because it'll train
faster so that's kind of it that's as
much as you need to know to be a pretty
good practitioner about architectures
for now which is that there's two
architectures or two variants of one
architecture that work pretty well
present at 30 450 start with a smaller
one and see if it's good enough so that
is all the information we need to create
a convolutional neural network learner
there's one other thing I'm going to
give it though which is a list of
metrics metrics are literally just
things that get printed out as it's
training so I've saying I would like you
to print out the error rate please now
you can see the first time I ran this on
a newly installed box it downloaded
something
what's it downloading it's downloading
the rest net 30 for pre-trained weights
now what this means is that this
particular model has actually already
been trained for a particular task and
that particular task is that it was
trained on looking at about one and a
half million pictures of all kinds of
different things a thousand different
categories of things using an image data
set called image net and so we can
download those pre trained weights so
that we start start with a model that
knows nothing about anything but we
actually start with a model that knows
how to recognize there are thousand
categories of things in image net now I
don't think I'm not sure but I don't
think all of these 37 categories of pet
or in image net but there was certainly
some kinds of dog know certainly some
kinds of cat so this pre trained model
already knows quite a little bit about
what pets look like and it certainly
knows quite a lot about what animals
look like and what photos look like so
the idea is that we don't start with a
model that knows nothing at all but we
start by downloading a model that does
something about recognizing images
already so it downloads for us
automatically the first time we use it a
pre trained model and then from now on
it won't need to download it again it'll
just use the one we've got this is
really important we're going to learn a
lot about this it's kind of the focus of
the whole course which is how to do this
is called transfer learning how to take
a moral that already knows how to do
something pretty well and make it so
that it can do your thing really well I
take a pre trained model and then we fit
it so that instead of predicting Li a
thousand categories of imagenet with
image net data it predicts the 37
categories of pets using your pet data
and it turns out that by doing this you
can train models in 1/100 or less of the
time of regular model training with
1/100 or less of the data the regular
model training in fact potentially many
thousands of times less remember I
showed you the slide of nickels lesson
one
from last year he used 30 images and
there's not cricket and baseball images
in imagenet but but it just turns out
that image gets already so good at
recognizing things in the world
they're just 30 examples of people
playing baseball and cricket was enough
to build a nearly perfect classifier
okay
now you would naturally be potentially
saying well wait a minute how do you
know that it was going to actually that
it can actually recognize pictures of
people playing cricket versus baseball
in general maybe it just learnt to
recognize those 13 maybe it's just
cheating right and that's called
overfitting we'll be going talking a lot
about that during this course right but
what a fitting is where you don't learn
to recognize pictures of say cricket
versus baseball but just these
particular cricketers and these
particular photos and these particular
baseball players in these particular
photos we have to make sure that we
don't move a fit and so the way we do
that is using something called a
validation set a validation set is a set
of images that your model does not get
to look at and so these metrics like in
this case error rate get printed out
automatically using the validation set
and sort of images that our model never
got to see when we created our data
bunch it automatically created a
validation set for us okay and we'll
learn lots of ways of creating and using
validation sets but because we're trying
to bake in all of the best practices we
actually make it nearly impossible for
you not to use a validation set because
if you're not using a validation set you
don't know if you're overfitting okay so
we always print out the metrics on a
validation set we've always hold it out
we always make sure that the model
doesn't touch it that's all done for you
okay and that's all built into this data
bunch object so now that we have a
corner we can fit it you can just use a
method called fit but in practice you
should nearly always use a method called
fit one cycle we'll learn more about
this during the course but in short one
cycle learning is a paper
that was released I'm trying to think
few months ago listen a year ago yeah so
a few months ago and it turned out to be
dramatically better both more accurate
and faster than any previous approach so
again I don't want to teach you how to
do 2017 deep learning right in 2018 the
best way to fit models is to use
something called one cycle well learn
all about it but for now just know you
should probably take my own fit one
cycle okay if you forget how to type
then you can start typing a few letters
in hit tab okay and you'll get a list of
potential options and then if you forget
what to pass it you can press shift tab
and it will show you exactly what to
pass it so you don't actually have to
type help and again this is kind of nice
that we have all the types here because
we can see cycle length that we'll learn
more about what that is shortly is an
integer and then next learning rate
could either be the flow for reflection
or whatever and so forth and you can see
that the mentions will default to this
couple and so forth okay so for now just
know that this number four basically
decides how many times do we go through
the entire data set how many times do we
show the data set to the model so that
it can learn from it each time it sees a
picture it's going to get a little bit
better
but it's going to take time and it means
it could over fit but sees the same
picture too many times
it'll just learn to recognize that
picture not pets in general so we'll
learn all about how to tune this number
during the next couple of lessons but
starting out with four is a pretty good
start just to see how it goes and you
can actually see after four epochs or
four cycles we put an error rate of 6%
so a natural question is how long that
took that took a minute and 56 seconds
yeah so we're paying you know 60 cents
an hour now we just pay for two minutes
I mean we actually pay for the whole
time that it's on and running there's
two minutes of compute time and we've
got an error rate of 6% so 95% of the
time we correctly picked the exact right
one of those 94 dog and cat breeds which
feels pretty good to me
but to get a sense of how good it is
maybe we should go back and look at the
paper just remember I said the nice
thing about using academic papers or
capital data sets is we can compare our
solution to whatever the best people in
Cabell did or whatever the academics did
so this particular data set of pet
breeds is from 2012 and if I scroll
through the paper you'll generally find
in any academic paper there'll be a
section called experiments about 2/3 of
the way through and if you find the
section on experiments then you can find
the section on accuracy and they've got
lots of different models and their
models as you're read about in the paper
it's really kind of pet specific they
learn something about how pet heads look
and how pet body is broken and techne
which is in general look and they
combine them all together and once they
use all of this complex code and math
they got an accuracy of 59% okay so in
2012 this highly pet specific analysis
got an accuracy of 59% these were the
top researchers from Oxford University
today in 2018 with basically if you go
back and look at actually how much code
we just wrote it's about three lines of
code the other stuff is just printing
out things to see what we're doing
we got ninety four percent so six
percent error so like that gives you a
sense of you know how far we've come
with deep learning and particularly with
pay torch and fast AI how easy things
are yeah so um before we take a break I
just want to check to see if we've got
any and just remember if you're in the
audience and you see a question that you
want asked please click them up heart
next to it so that Rachel knows that you
want to hear about it well
if there is something with six likes and
Rachel didn't notice it which is quite
possible just just quote it in a reply
and say hey Rachel this one's got six
legs okay
so what we're going to do is we're going
to take a eight minute break so we'll
come back at five past eight so where we
got to was we just we just trained a
model we don't exactly know what that
involved or how it happened but we do
know that we're three or four lines of
code we've built something which smashed
the accuracy of the state-of-the-art of
2012 6% arrow certainly sounds like
pretty impressive for something that can
recognize different dog breeds and cat
breeds but we don't really know why it
works that we will that's okay all right
and in terms of getting the most out of
this course we very very regularly here
after the course is finished the same
basic feedback which this is literally
copy and paste it for them forum I fell
into the habit of watching the lectures
too much and googling too much about
concepts without running the code at
first I thought I should just read it
and then research the theory and we keep
hearing people saying my number one
regret is I just spent 70 hours doing
that and at the very end I started
running the code and oh it turned out I
learned a lot more
so please run the code really run the
code I should have spent the majority of
my time on the actual code and the
notebooks running it seeing what goes in
and seeing what comes out so your most
important skills to practice our
learning and we going to show you how to
do this in a lot more detail but
understanding what goes in and what goes
out so we've already seen an example of
looking at what goes in which is data
dot show batch and that's going to show
you examples of labels and images and so
next we're going to be seeing how to
look at what came out so that's the most
important thing to study as I said the
reason we've been able to do this so
quickly is heavily because of the
fostered a library now if I stay a
library is pretty new but it's already
getting an extraordinary amount of
direction as you've seen all of the
major cloud providers either support it
or are about to support it
a lot of researchers are starting to use
it it's it's - remaking a lot of things
a lot easier but it's also making new
things possible and so really
understanding the faster I software is
something which is going to take you a
long way and the best way to really
understand the faster your software well
is by using the fast AI documentation
and we'll be learning more about the
fast a documentation shortly so how does
it compare I mean there's really only
one major other piece of software like
fast AI that is something that tries to
make deep learning easy to use and
that's chaos chaos is a really terrific
piece of software we actually used it
for the previous courses until we switch
to first AI it runs on top of tensorflow
it was kind of the gold standard for
making deep learning easy to use before
but life is much easier with bostero so
if you look for example at the last
year's course exercise which is getting
dogs vs. cats fast AI lets you get more
much more accurate less than half the
error on a validation set of course
training time is less than half the time
lines of code is about a six of the
lines of code and the lines of code are
more important than you might realize
because those 31 lines of Karis code
involved you making a lot of decisions
setting lots of parameters during list
of configuration so that's all stuff
where you have to know how to set those
things to get kind of best practice
results or else these five lines of code
anytime we know what to do for you we do
it for you anytime we can pick a good
default we pick it for you okay so
hopefully your
is a really useful library not just for
learning deep learning but for taking it
a very long way how far can you take it
well as you'll see all of the research
that we do at past AI uses the library
and an example of the research we did
which was recently featured in Wired
describes a new breakthrough in a
natural language processing processing
which people are calling the image net
moment which is basically we broke a new
state of the art resolved in text
classification which open AI then built
on top of our paper to do with more
computing more data into different tasks
to take it even further and like this is
an example of something that we've done
in the last six months in conjunction
actually with my colleague Sebastian
Reuter an example of something that's
being built in the faccio library and
you're going to learn how to use this
brand-new model in three lessons time
and you're actually going to get this
exact result from this exact paper
yourself another example one of our
alums ml Hussain who you'll come across
on the forum plenty because he's a great
guy very active built a new system for
natural language semantic code search
you can find an on github where you can
actually type in English sentences and
find snippets of codes that do the thing
you asked for and again it's being built
with the FASTA a library using the
techniques you'll be learning in the
next seven weeks in production yeah well
I think this stage is a part of their
experiments platform so it's kind of
pre-production I guess and so the best
place to learn about these things and
get involved from these things is on the
forums where as well as categories for
each part of the course and there's also
a general category for deep learning
where people talk about research papers
applications so on and so forth so even
though today we're kind of got to focus
on a small number of lines of code to a
particular thing which is image
classification and we're not learning
much math or theory or whatever over
these seven weeks and then part two
another seven weeks we're going to go
deeper and deeper and deeper and so
where can that take you I want to give
you some examples that there is Sarah
hooker she did our first course a couple
of years ago her background was
economics didn't have a background in
coding math computer science I think she
started learning to code two years
before she took our costs she helped
develop something at she started a
nonprofit called Delta analytics they
helped build this amazing system where
they attached old mobile phones to trees
in the Kenyan rain forests and used it
to listen for chainsaw noises and then
they used deep learning to figure out
when there was a chainsaw being used and
then they had a system set up to alert
Rangers to go out and stop illegal
deforestation in the rainforests so that
was something that she was doing well
she was in the course as part of her
kind of class projects
what's she doing now she is now a Google
brain researcher which I guess is one of
the top if not the top place to do deep
learning
she's just been publishing some papers
now she is going to Africa to set up a
Google brains first deep learning
Research Center in Africa now I'll say
like she worked her ass off you know she
really really invested in this course
not just doing all of the assignments
but also going out and reading in
Goodfellows book and doing lots of other
things but it really shows where
somebody who has no computer science or
math background at all can be now one of
the world's top deep learning
researchers and doing very valuable work
another example from our most recent
course
Christine Payne she is now at open AI
and you can find her post and actually
listen to her music samples of she
actually built something
to automatically create chamber music
compositions you can play and you can
listen to online and so again it's her
background math and computer science
actually that's her there
classical pianist now I will say she is
not your average classical pianist she's
a festival pianist who also has a
master's a medical researcher in
Stanford and studied neuroscience and
was a high-performance computing expert
at Ian's shore and was valedictorian at
Princeton anyway she you know very
annoying person who did everything she
does but you know I think it's really
cool to see how I kind of a domain
expert in this case the domain of
playing piano can go through the
fascinator course and come out the other
end
I guess open AI would be you know of the
three top research institutes bugle
playing or open a would be two of them
probably along with Diamond and
interesting Lee actually one of our
other students or alumni of the course
recently interviewed her for a blog post
series he's doing on top AI researchers
and she said one of the most important
pieces advice she got was from me and
she said the piece of advice was kick
one project do it really well make it
fantastic okay so that was the piece of
advice she found the most useful and
we're going to be talking a lot about
you doing projects and making them
fantastic during this course having said
that I don't really want you to go to AI
or Google brain what I really want you
to do is go back to your workplace or
your passion project and apply these
skills there right like let me give you
an example
MIT released a deep learning course and
they highlighted in their announcement
for this deep learning course this
medical imaging example and one of our
students Alex who is a radiologist said
you guys just showed a model overfitting
I can tell because I'm a radiologist
and this is not what this would look
like on a chest film
this is what it should look like and
this is a deep breading practitioner
this is how I know that this is what
happened in your model so alex is
combining his knowledge of radiology and
his knowledge of deep learning to assess
mi t--'s model from just two images very
accurately right and so this is actually
what I want most of you to be doing is
to take your domain expertise and
combine it with the deep learning
practical aspects that you'll learn in
this course and bring them together like
Alex is doing here and so a lot of
radiologists have actually gone through
this course now and have built journal
clubs and American Council of radiology
practice groups there's a data science
Institute at the ACR now and so forth
and Alex is one of the people who's
providing kind of a lot of leadership in
this area I would love you to do the
same kind of thing that alex is doing
which is to really bring deep learning
related leadership into your industry
and just your social impact project
whatever it is that you're trying to do
so another great example was this was
Melissa fab bras who was a English
literature PhD who studied like gendered
language in English literature or
something and actually wrench over the
previous job taught her to code
I think and then she came into the first
day a course and she helped Kiva a micro
lending a social impact organization to
build a system that can recognize faces
why is that necessary well we're going
to be talking a lot about this but
because most a I researchers are white
men most computer vision software can
only recognize white male faces
effectively in fact I think of as IBM
system is like ninety-nine point eight
percent accurate on common white face
men versus sixty percent accurate
sixty-five percent accurate on dark
faith dark-skinned women so it's like
what is that like 30 or 40 times worse
for black women versus white men and
this is really important because for
chemo black women
you know perhaps the most common user
base for their microlending platform so
melissa after taking our course and
again working in her ass off and being
super intensive in her study and her
work
won this $1,000,000 AI challenge for her
work for Kiva Karthik did our course and
realize that the thing he wanted to do
wasn't at his company it was something
else which is to help blind people to
understand the world around them so he
started a new startup you can find it
now it's called envision you can
download the app you can point your
phone of things and it will tell you
what it sees and I actually talked to a
blind lady about these kinds of apps the
other day and she confirmed to me this
is a super useful thing for visually
disabled users and it's not it's the
level that you can get to with with the
content that you're going to get over
these seven weeks and with this software
can get you right to the cutting edge in
areas you might find surprising for
example I helped a team of some of our
students and some collaborators on
actually breaking the world record for
training remember I mentioned the
imagenet data set lots of people want to
train on the imagenet dataset we smashed
the world record for how quickly you can
train it we do standard AWS cloud
infrastructure cost of $40 of compute to
train this model using again faster
library the techniques that we learn in
this course so it can really take you a
long way so don't be kind of put off by
this what might seem pretty simple at
first we're going to get deeper and
deeper you can also use it for other
kinds of passion project so Helene
esaron actually you should definitely
check out her Twitter account like ELISA
this art is a basically a new style of
art that she's developed which combines
her painting and drawing with generative
adversarial models to create these
extraordinary results and so I think
this is super cool she's not a
professional artists she is a
professional software developer
but she just keeps on producing these
beautiful results and when she started
you know her art had not really been
shown anywhere I discussed anywhere now
there's recently been some quite
high-profile articles describing how she
is creating a new form of art again this
is come out of the FASTA a course that
she developed these skills or equally
important bred counselor who figured out
how to make a picture of Kanye out of
pictures of Patrick Stewart's head also
something you will learn to do if you
wish to this particular style this
particular type of what's called style
transfer was a really interesting tweak
it allowed him to do some things that
hadn't quite been done before and this
particular picture helped him to get a
job as a deep learning specialist at AWS
so another interesting example another
alumni actually worked at Splunk as a
software engineer and he'd signed an
algorithm after like lesson three which
basically turned out its plant to be
fantastically good at identifying fraud
and we'll talk more about it shortly if
you've seen Silicon Valley the HBO
series the the hot dog hot dog app
that's actually a real app you can
download and it was actually built by a
team on Glade as a fast AI student
project so there's a lot of cool stuff
that you can do I'm like yes it wasn't
very nominated so I think we only have
one any nominated fast day alumni at
this stage so please help change that
alright the other thing you know is is
is the forum threads can kind of turn
into these really cool things so
Francisco was actually here in the
audience he's are really boring McKinsey
consultant like me
it's a Francisco and I both have this
shameful past that we were McKinsey
consultants but we left and we're okay
now
and he started his threat saying like oh
this stuff we've just been learning
about building
NLP in different languages let's try and
do lots of different languages we
started this thing with the language
model zoom and add that there's now been
an academic competition was one in
Polish that led to an academic paper tie
state-of-the-art German state of the art
basically as students have been coming
up with new study that results across
lots of different languages and this all
is entirely being done by students
working together through the forum so
please get on the forum but don't be
intimidated because remember and one of
the people everybody you see on the
forum the vast majority posting post all
the damn time right they've been doing
this a lot and they do it a lot of the
time and so at first it can feel
intimidating because it can feel like
you're the only new person there but
you're not right all of you people in
the audience everybody who's watching
everybody who's listening you're all new
people right and so when you just get
out there and say like okay nor your
people getting these state-of-the-art
results in German language modeling if I
can't start my server I try to click the
notebook and I get an error what do I do
people will help you okay just make sure
you provide all the information this is
the you know I'm using paper space this
was the particular instance I try to use
here's a screenshot of my error people
will help you okay well if you've got
something to add so if people were
talking about crop yield analysis and
you're a farmer and you think you know
oh I've got something to add so please
mention it even even if you're not sure
it's exactly relevant it's fine you know
just get involved and because remember
everybody else in the forum's started
out also intimidated right we all start
out not knowing things and so just get
out there and try it okay
so let's get back and do some more
coding yes Rachel do we have some
questions about why you're using breast
net is opposed to this session so the
question is about this architecture
so there are lots of architectures to
choose from and it would be fair to say
there isn't one best one but if you look
at things like the Stanford dawn bench
benchmark or imagenet classification
you'll see in first place in second
place in third place in fourth place is
faster i Jeremy Hatton first a
hydrometer plus the irony response from
the Department of Defense innovation
team Google RIS net ResNet ResNet ResNet
listen it's good enough ok so it's fun
there are other architect is the main
reason you might want a different
architecture is if you want to do inch
computing so if you want to create a
model that's gonna sit on somebody's
mobile phone having said that even their
most of the time I reckon the best way
to get a model onto somebody's mobile
phone is to run it on your server and
then have your mobile phone app talk to
it it really makes life a lot easier and
you get a lot more flexibility but if
you really do need to run something on a
low powered device then there are some
special architectures for them so the
particular question was about inception
that's a particular another architecture
which tends to be pretty memory
intensive and yeah resident I'm for
inception tends to be pretty memory
intensive but it's it's ok it's also
like it's not terribly resilient one of
the things we try to show you is like
stuff which just tends to always work
even if you don't quite ruin everything
perfectly
so Reznor tends to work pretty well
across a wide range of different kind of
details around choices that you might
make so I think it's pretty good so
we've got this trained model and so
what's actually happened as we'll learn
is it's basically creating a set of
weights if you've ever done anything
like a linear regression or logistic
regression you'll be familiar with
coefficients we basically found some
coefficients and parameters that work
pretty well and it took us a minute and
56 seconds so if we want to start doing
some more playing around and come back
later we probably should save those
weights
we can save that minute and 56 seconds
so you can just go and learn got save
and give it a name it's going to put it
in a model subdirectory in the same
place the data came from so if you save
different models or different data
bunches from different data sets
they'll all be kept separate so don't
worry about it
all right so we've talked about how the
most important things that add on learn
what goes into your model what comes out
we've seen one way of seeing what goes
in now let's see what comes out this is
the other thing you need to get really
good at so to see what comes out we
could use this class for classification
interpretation and we're going to use
this factory method from learner so we
pass in a loan object so remember a
learn object from those two things
what's your data and what is your model
it's now I'm not just an architecture
it's actually a trained model inside
there and that's all the information we
need to interpret that model so if this
pass in the learner and we now have a
classification interpretation object and
so one of the things we can do it
perhaps the most useful things to do is
called plot top losses so we're going to
be learning a lot about this idea of
loss functions shortly but in short a
loss function is something that tells
you how good was your prediction and so
specifically that means if you predicted
one class of cat with great confidence
you said I am very very sure that this
is a BER man but actually you were wrong
then then that's going to have a high
loss because you were very confident
about the wrong answer okay so that's
what it basically means to have a high
loss so by putting the top losses we are
going to find out what were the things
that we were the most wrong on are the
most confident about what we got wrong
so you can see here it prints out three
things German Shorthaired before things
beat all 7.0 for 0.92 well what do they
mean perhaps we should look at the
document
so if you we've already seen help but
and help just prints out a quick little
summary but if you won't really see how
to do something use doc and doc tells
you the same information is help but it
has this very important thing which is
show in Doc's so when you click on
showing dots it pops up the
documentation for that method or class
or function or whatever starts out by
showing us the same information about
what is what are the parameters it takes
along with the doc string but then tells
you more information so in this case I
saw the thing that tells me the title of
eight shows the prediction the actual
the loss and the probability that was
predicted so for example and you can see
there's actually some code you can run
so the documentation always has working
code and so in this case it was trying
things with handwritten digits and so
the first one it was predicted to be a
seven it was actually a three the loss
is five point four four and the
probability of the actual class was 0.07
okay so I you know we did not have a
high probability associated yet for
class I can see why I thought this was a
seven unless it was wrong so this is the
documentation okay and so this is your
friend when you're trying to figure out
how to use these things the other thing
I'll mention is if you're a somewhat
experienced Python programmer you'll
find the source code of faster I'm
really easy to read we're trying to
write everything in just a small number
of you know much less than half a screen
of code generally four or five lines of
code
if you click source you can jump
straight to the source code right so
here is the plot top losses and this is
also a great way to find out how to use
the faster I'm I agree because every
line of code here nearly every line of
code is calling stuff in the faster you
library okay so don't be afraid to look
at the source code I've got another
really cool trick about the
documentation that you're going to see a
little bit later
okay so that's how we can look at these
top losses and these suppress the most
important image classification
interpretation tools that we have
because it lets us see what are we
getting wrong and quite often like in
this case if you're a dog and cat expert
you'll realize that the things that's
getting wrong breeds that are actually
very difficult to tell apart and you'd
be able to look at these and say oh I
can see why they've got this one wrong
so this is a really useful tool another
useful tool kind of is to use something
called a confusion matrix which
basically shows you for every actual
type of dog or cat how many times was it
predicted to be that dog okay but
unfortunately in this case because it's
so accurate this diagonal basically says
how it's pretty much right all the time
and you can see this in slightly darker
ones like a five here it's really hard
to read exactly what their combination
is so what I suggest you use is instead
of if you've got lots of classes don't
use a classification confusion matrix
but this is my favorite named function
in faster I are very proud of this you
can call most confused and most confused
will simply grab out of the confusion
matrix the particular combinations have
predicted and actual that got wrong the
most often so this case the
Staffordshire Bull Terrier was what it
should have predicted and instead it
predicted an American Pitbull Terrier
and so forth it should have ridiculous I
mean actually predicted Burma that
happened four times this particular
combination happens six times so this is
again a very useful thing because you
can look and you can say like with my
domain expertise does it make sense that
that would be something that was
confused about so these are some of the
kinds of tools you can use to look at
the upload let's make our model better
so how do we make the bottle better we
can make it better using fine tuning so
far we fitted for epochs and it ran
pretty quickly and the reason it ran
pretty quickly is that there was a
little trick we used these deep learning
models these convolutional networks they
have
lanes they learned a lot about exactly
what layers are but but now just know it
goes through a computer computational
computation or computational computation
what we did was we added a few extra
layers to the end and we only trained
votes we basically left most of the
model exactly as it was so that's really
fast and if we're trying to build a
model at something that's similar to the
original pre-trained model so in this
case similar the imagenet data that
works pretty well but what we really
want to do is actually go back and train
the whole model so this is why we pretty
much always use this two-stage process
so by default when we call fit or fit
one cycle on a con Florida it'll just
fine-tune these few extra layers add up
to the end and it'll run very fast it'll
basically never over fit but to really
get it good you have to call an crits
and unfreeze is the thing that says
please train the whole model and then I
can call fit one cycle again and of the
error got much worse okay
why in order to understand why we're
actually going to have to learn more
about exactly what's going on behind the
scenes so let's start out by trying to
get an intuitive understanding of what's
going on behind the scenes and again
we're going to do it by looking at
pictures we're gonna start with this
picture these pictures come from a
fantastic paper by Nets Iowa who
nowadays is CEO of clarify which is a
very successful computer vision start
and his supervisor is PhD Rob Fergus and
they kind of paper showing how you can
visualize the layers of a convolutional
neural network so a convolutional neural
network will learn mathematically about
what the layers are shortly but the
basic idea is that your red green and
blue pixel values that are numbers from
nought to 255 go into the simple
computation the first layer and
something comes out of that and then the
result of that goes into a second layer
back
the third layer and so forth and there
can be up to a thousand layers of a
neural network president 34 has 34
layers there's no 50s 50 layers but
that's not that layer one there's this
very simple computation it's a
convolution if you know what they are
we'll learn more about them shortly what
comes out of this first layer well we
can actually visualize these specific
coefficients the specific parameters by
drawing them as a picture there's
actually a few dozen of of them in the
first layer so we won't draw all of them
and let's just look at mine at random so
here are my examples of the actual
coefficients from the first layer and so
these operate on groups of pixels that
are next to each other and so this first
one basically finds groups of pixels
that have a little horizontal diagonal
line in this direction this one finds
diagonal lines in the other direction
despite ingredients that go from yellow
to blue in this direction this one finds
greated to go from pink to green in this
direction
and so forth that's a very very simple
little filters let's layer one of a
imagenet pre-trained convolutional
neural net layer two takes the results
of those filters and does a second layer
of computation and it allows it to
create so here at nine examples of kind
of a way of visualizing this one of the
second layer features and you can see
it's basically learned to create
something that looks for Connors top
left corners and this one is learn to
find things that find right-hand curves
this one is learn to find things that
find little circles right so you can see
how Maya - like this is the easiest way
to see it in layer one we have things
that can find just one line and lay it -
we can find things that have two lines
turned up or one line repeated if you
then look over here these nine show you
nine examples of actual bits of actual
photos that activated this filter a lot
that's what other words this little bit
of function math function here was good
at finding these kind of window corners
and stuff
like that this little surly one was very
good at finding bits of photos that had
circles it okay so this is the kind of
stuff you've got to get a really good
intuitive understanding for slightly the
start of my neural nets gonna find
simple very simple gradients lines the
second layer can find very simple shapes
the third layer can find combinations of
votes so now we can find repeating
patterns of two-dimensional objects or
we can find kind of things that joins
that join together or we can find well
what are these things well let's find
out what is this let's go and have a
look at some bits of picture that
activated this one highly Oh mainly
they're bits of text although sometimes
windows so it's nice to be able to find
kind of like four petered horizontal
patterns and this one here since we have
a find kind of edges of fluffy or
flowery things this one here is kind of
finding geometric patterns so layer
three was able to take all the stuff in
layer two and combine them together
layer four can take all the stuff from
layer three and combine them together by
layer four we put something that can
find dog faces and let's see what else
we've got here yeah various kinds of oh
here we have bird legs so you kind of
get the idea so by layer five we've got
something that can find the eyeballs of
birds and wizards or faces of particular
breeds of dogs and so forth so you can
see how by the time you get to layer 34
you can find specific dog breeds and cat
breeds right this is kind of how it
works
so when we first trained when we first
fine-tune that pre-trained model we kept
all of these layers that you've seen so
far and we just trained a few more
layers on top of all of those
sophisticated features that are already
being created okay and so now we're
fine-tuning we're going back and saying
let's change all of these rookies that
we'll start with them where they are
right but let's see if we can make them
better now it seemed very unlikely that
we can make these lay
lively features better like is there I
am likely that the kind of the
definition of a diagonal line is going
to be different when we look at dog and
cat breeds versus the image net data
that this is originally trained on so we
don't really want to change layer one
very much if at all or else the last
layers you know this thing of like types
of dog face seems very likely that we do
want to change that right so you kind of
want this intuition is understanding
that the different layers of a neural
network represents different levels of
kind of semantic complexity so this is
why our attempt to find through this
model didn't work is because we actually
by default it trains all the layers at
the same speed right which is to say it
will update those like things
representing diagonal lines and
gradients just as much as it tries to
update the things that represent the
exact specifics of what a my ball looks
like so we have to change that okay and
so um to change it we first of all need
to go back to where we were before okay
we did we just broke this model right
just much worse than it started out so
if we just go load this brings back the
model that we saved earlier remember we
saved it as stage one okay so let's go
ahead and load that back up so that's
now our models back to where it was
before we killed it
and let's run learning rate finder we're
learning about what that is next week
but for now just know this is the thing
that figures out what is the fastest I
can train this neural network at without
making it zip off the rails and get
blown apart okay so we can call it low
ll find and then we can go and learn
don't recorded up plot and that will
plot the result of our LR finder and
what this basically shows you is this
this is T parameter that we're going to
learn all about called the learning rate
and the learning rate basically says how
quickly am i updating the parameters in
my model and you can see that what
happens is as I in this this bottom one
here shows me what happens as I increase
the learning rate and this one here show
what hapless and so you can see once the
learning rate gets past ten to the
negative four
my last gets worse okay so it actually
so happens in fact I can check this if I
press shift tab here my learning rate
defaults to 0.003 so my default loading
rate is about here so you can see where
I lost got worse right because we kind
of fine-tune things now we can't use
such a high learning rate so based on
the learning rate finder I tried to pick
something you know well before it
started getting worse so I decided to
pick one Enix's so I decided I got to
train at that rate but there's no point
trading all the layers of that rate
because we know that the latent layers
work just fine before when we were
training much more quickly again it was
the default which was to remind us 0.003
so what we can actually do is we can
pass a range of learning rates to learn
theater and we do it like this you pass
and use this keyword in fact in Python
you may have come across fourth called
slice and that can take a start value in
a stock value and basically what this
says is trained the very first players
at a learning rate of 1e make 6 and the
very last layers at a rate of 1 enoch 4
and then kind of distribute all the
other layers across that you know
between those two values equally so
we're going to see that in a lot more
detail but basically for now this is
kind of a good rule of thumb is to say
when you after you unfreeze this is the
thing that's going to train the whole
thing past hey max learning rate
parameter pass it a slice make the
second part of that slice about 10 times
smaller than your first stage so our
first stage defaulted to about 1 in Dec
3 so let's use about what I knew for and
then this one should be a value from
your learning rate finder which is well
before things started getting worse and
you can see things
adding to get worse maybe about here so
I picked something that's at least ten
times smaller than that so if I do that
then I get 0.05 788 so I don't quite
remember what we got before now bit
better all right so we've gone down from
a six point one percent to a five point
seven percent so that's about a 10
percentage point relative improvement
with another 58 seconds of training so I
would perhaps save for most people most
of the time these two stages are enough
to get pretty much a world-class model
you won't win a Carol competition
particularly because now a lot faster I
am on liar are competing on Carol and
this is the first thing that they do but
it'll in practice you'll get something
that's you know about as good in
practice as the vast majority of
practitioners can do we can improve it
by using more layers and we'll do this
next week by basically doing a ResNet 50
instead of ResNet 34 and you can try
running this during the week if you want
to you'll see it's exactly the same as
before but I'm using resident 50 instead
of resident 34 what you'll find is it's
very likely if you try to do this you
will get an error and the error will be
your GPU is ran out of memory and the
reason for that is that resident 50 is
bigger than resident 34 and therefore it
has more parameters and therefore it
uses more of your graphics card memory
just totally separate to your normal
computer Ram this is GPU Ram if you're
using the kind of default salamander AWS
and so forth suggestion then you will be
having a 16 gig of compute the pad I use
most the time has 11 gig GPU memory the
cheaper ones have 8 gig of GPU memory
that's kind of the main range you tend
to get if you also has less than 8 gig
of GPU memory it's going to be
frustrating for you anyway so you'll be
somewhere around there and it's very
likely that we're trying to run this
you'll get
out of memory error and that's because
it's just trying to do too much too many
parameter updates for the amount of RAM
you have and that's easily fixed this
image data bunch constructor has a
parameter at the end batch size yes for
batch size and this basically says how
many images do you train at one time if
you run out of memory just make it
smaller okay
so this worked for me on an 11 gig card
it probably won't work for you if you've
got an 8 gig card if you do just make
that 32 it's fine to use a smaller batch
size it just it might take a little bit
longer that's all ok if you've got a big
oak like a 16 gig you might be able to
get away with 64 ok so that's just one
number you'll need to try during the
week and again we filled it for awhile
and we get down 44.4%
early so this is pretty extraordinary
you know I was pretty surprised because
I mean when we first did in the first
course does cats versus dogs really kind
of getting somewhere around a three
percent error for something where you've
got a fifty percent chance of being
right and the two things work totally
different so that we can get a four
point four percent error of assad's for
such a fine grain thing it's quite
extraordinary in this case I unfroze it
and fit it a little bit more than for
4.4 to 4.3 five it's a tiny improvement
basically risen at 50 is already a
pretty good model it's interesting
because again you can call the most
confused here and you can see the kinds
of things that it's getting wrong and I
actually depending on when you run it
you're going to get slightly different
numbers but you'll get roughly the same
kinds of things so quite often I find
that rag doll and bir-men of things that
it gets confused and I actually have
never heard of either of those things
so I actually looked them up on the
internet and I found a page on the cat
site called is this Superman or rag doll
and there is a long spread of cats
it's like arguing intentionally about
which it is so I feel fine that my
computer had problems I thoughtfully
similar I think was this pitbull versus
Staffordshire Bull Terrier
apparently the main difference is like
the particular Kennel Club guidelines as
to how they are assessed but some people
think that one of them might have a
slightly read in those so this is the
kind of stuff we're actually even if
you're not a domain expert it helps you
become one right because I now know more
about which kinds of pet breeds are hard
to identify than I used to so muddled
interpretation works both ways so what I
want you to do this week is to run this
notebook you know make sure you can get
through it but then what I really want
you to do is to get your own image data
set and actually um Francisco who I
mentioned earlier he started the
language to model thread and he's you
know now helping to TA the costs he's
actually putting together a dye it will
show you how to download data from
Google Images so you can create your own
data set to play with but before I do I
want to before I do I want to show you
because how to create labels in lots of
different ways because your data set
wherever you get it from won't
necessarily be that kind of regex based
approach it could be in lots of
different formats so it was telling you
how to do this I'm going to use the
feminist sample embolus is pictures of
hand drawn numbers I'm just because I
want to show you different ways of
creating these data sets the the Emnes
simple basically looks like this so I go
path LS and you can see it's got a
training set in the validation set
already so basically the people that put
together this data set have already
decided what they want you to use as a
validation set okay so if you go path
slash train dot LS you'll see there's a
Farva quadtree in a folder called seven
now this is really really common way to
just to give people labels it's
basically to say Oh everything that's a
three
I'll put in a folder called three
everything that's a seven I'll put in a
folder called seven this is a muffin
cordon imagenet style data set this is
the self-image net is distributed so if
you have something in this honor where
the labels just whatever the folders
called you can say from folder okay and
that will create an image data bunch for
you and as you can see 3/7 it's created
the labels just by using the folder
names
another possibility and as you can see
we can train there at 99.5% accuracy buh
buh buh
another possibility and for this M list
sample I've got both it might come with
a CSV file that would look something
like this for each file name
what's its label now in this case the
labels are three or seven they're 0 or 1
which is basically is it a 7 or not so
that's another possibility so if this is
how your labels are you can use from CSV
and if it's called labels dot CSV you
don't even have to pass in a file name
if it's called anything else then you
can call pass in the CSV labels bar
there okay so that's how you can use a
CSV okay there it is this is now is it a
7 or not and not the possibility and
then you can coordinated up classes to
see what them another possibility is as
we've seen this you've got paths that
look like this and so in this case this
is the same thing these are the folders
that I could actually grab the the label
by using a regular expression and so
here's the original expression so we've
already seen that approach and again you
can see that our classes is founded so
what if you it's something that's in the
file name of a path but it's not just a
regular expression it's more complex you
can create an arbitrary function that
extracts a label from the file name or
path and in that case you would say from
name and function another possibility is
that even you need something even more
flexible on there
and so you're going to write some code
to create an array of labels and so in
that case you can just pass him from
lists so here as I've created an array
of labels through my labels is from
lists okay and then I just pass in that
break so you can see there's lots of
different ways of creating labels so so
during the week try this out now you
might be wondering how would you know to
do all these things like where am I
going to find this kind of information
right now would I
how do you possibly know to do all this
stuff so I'll show you something
incredibly cool let's grab this function
and do you remember to get documentation
we type doc and here is the
documentation for the function and I can
click show in dots and it pops up the
documentation so here's the thing every
single line of code I just showed you
I took it this morning and I copied and
pasted it from the documentation so you
can see here the exact code that I just
used so the documentation for fast AI
doesn't just tell you what to do but
step to step how to do it and here is
perhaps the coolest bit if you go too
fast AI fast AI underscored drops and
click on drop sauce it turns out that
all of our documentation is actually
just stupid about books so in this case
I was looking at vision data so here is
the vision data notebook you can
download this repo you can get clone up
and if you run it you can actually run
every single line of the documentation
yourself okay so so all of our Doc's is
also code and so like this is the kind
of the ultimate example to me of of
experimenting right is that you can now
experiment and
you'll see in in github it doesn't quite
render properly this github doesn't
quite know how to render notebooks
properly but if you get plowing this and
open it up in Jupiter you can see it and
so now anything that you read about the
documentation nearly everything of the
documentation has actual working
examples in it with actual data sets
that are already sitting in there in the
repo for you and so you can actually try
every single function in your browser
try seeing what goes in and try seeing
what comes out there's a question and
can will the library use multi GPU and
parallel by default the library will use
multiple CPUs by default but just one
GPU by default we've probably what
you're looking at maka GPU into your pot
true it's easy to do and you'll find it
on the forum but most people won't be
needing to use that now and the second
question is whether the library can use
3d data centers in IR yes it can and
there is actually a forum thread about
that already although that's not as
developed as 2d yet but maybe by the
time the MOOC is out it will be so
before I wrap up I'll just show you an
example of the kind of interesting stuff
that you can do by doing this kind of
exercise remember earlier I mentioned
that one of our alums who works at
Splunk which is a nasdaq listed big
successful company created this new ad
fraud software this is actually how he
created it as part of a fast AI part one
class project he talked the telemetry of
the of users who had Splunk analytics
installed and watched their mouse
movements and included pictures of the
mouse movements he converted speed into
color and right and left clicks into
splotches he then took the exact code
that we saw with an earlier version of
the software and trained a CNN in
exactly the way we saw and use that at a
train his fraud model so he basically
took something which is not obviously a
picture and he turned it into a picture
I've got these fantastically good
results for police overall analysis
software so it they're pleased to think
creatively so if you're wanting to study
sounds a lot of people that study sounds
do it by actually creating a spectrogram
image and then sticking that into a
confident so there's a lot of cool stuff
you can do with this so during the week
yeah get your jet your GPU going try and
use your first notebook make sure that
you can use Lesson one and work through
it and then see if you can repeat the
process on your own data set get on the
forum and tell us any little success you
had it's like oh I spent three days
trying to get my GPU running and I
finally did any constraints you hit you
know try it for an hour or two but if
you get stuck please ask and if you're
able to successfully build a model with
a new data set let us know and I will
see you next week
/fastai-1-v3-transcript-links.md Secret
Last active
November 4, 2018 04:28
Fastai 1 v3 - Lesson 1 Transcript links
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment