toffebjorkskog/fastai-1-v3-transcript-links.md Secret

## fastai-1-v3-transcript-links.md

      
    Raw
  

              fastai-1-v3-transcript-links.md
            
          
    Okay so welcome practical deep learning

for coders lesson one it's kind of

lesson two because there's a lesson zero

in less than zero is is why do you need

a GPU and how do you get it set up so if

you haven't got the GPU running yet then

go back and do that make sure that you

can access a jupiter notebook and and

then you're ready to start the real

lesson one so if you're ready you will

be able to see something like this and

in particular hopefully you have gone to

notebook tutorial it's at the top that's

right with zero zero here as this grows

you'll see more and more files but will

keep a notebook tutorial at the top and

you will have used your jupiter notebook

to add one and one together getting the

expected result bigger and hopefully

you've learned these four keyboard

shortcuts so the basic idea is that your

jupiter notebook has pros in it it can

have pictures you know it can have

charts in it and most importantly it can

have code in it okay so the code is in

python how many people have used Python

before so nearly all of you that's great

um if you haven't used Python that's

totally okay okay it's a pretty easy

language to pick up but if you haven't

used Python this will feel a little bit

more intimidating because the code that

you're seeing will be unfamiliar to you

yes Rachel oh yeah no because I'm trying

to keep them up separate yeah yeah okay

we're not the way here so as I say there

are things like this where people in the

room in person

and this is one of those bits just like

this is really for the book audience not

for you that's I think this will be the

only time like this in the in the lesson

where we've assumed you've got this set

up thanks to a mother okay

all right so yeah this is you're in the

room or on foreign faster you're alive

you can go back after this and make sure

that you can get this running using the

information in course III go faster do

okay okay okay so a Jupiter notebook is

a really interesting device for our data

scientists because it kind of lets you

run interactive experiments and it lets

us give you not just a static piece of

information but it let it lets ask you

something that you can actually

interactively experiment with so let me

explain how we think works well to use

these notebooks and to use this material

and this is based on the kind of last

three years of experience we've had with

the students who have gone through this

course first of all it works pretty well

just to watch a lesson end to end okay

don't try and follow along because it's

not really designed to go to speed where

you can follow along it's designed to be

something where you just take in the

information you get a general sense of

all of the pieces how it all fits

together right and then you can do it

back and go through it more slowly

pausing on in the video and trying

things out making sure that you can do

the things that I'm doing and that you

can try and extend them to do it things

in your own way okay so don't worry if

things are zipping along faster then you

can do them that's normal and also don't

try and stop and understand everything

the first time if you do understand

everything the first time good for you

but most people don't particularly as

the lessons go on they get faster and

they get more difficult okay so at this

point we've got our notebooks going

we're ready to start doing deep learning

and so the main thing that hopefully

you're going to agree at the end of this

is that you can do deep learning

regardless of who you are now I don't

just mean do we mean do at a very high

level

I mean world-class practitioner level

deep learning okay so your main place to

be looking for things is course b3 to

fast

AI where you can find out how to get a

GPU other information and you can also

access our forums you can also access

our forums and on our forums you'll find

things like how do you build a deep

learning box yourself and that's

something that you can do after you

don't later on once you've kind of got

going Who am I so why should you listen

to me well maybe you shouldn't but I'll

try and justify why you should listen to

me I've been doing stuff with machine

learning for over 25 years I started out

in management consulting where actually

initially I was I think Mackenzie and

company's first analytical specialist

and went into a general consulting ran

number of startups for a long time

eventually became the president of

cattle but actually the thing I'm

probably most proud of in my life is

that I got to be the number one ranked

contestant in travel competitions

globally so I think that's a good fact

to call why can you actually train a

predictive model that predicts things

pretty important aspect of data science

I didn't found a company called analytic

which was the first kind of medical deep

learning company nowadays I'm on the

faculty at University of San Francisco

and also co-founder with Rachel of fast

AI so I

used machine learning throughout that

time and I guess I'm not really although

I am at usf for the University

I'm not really an academic type I'm much

more interested in in using this tool to

do useful things specifically through

fast AI we are trying to help people use

deep learning to do useful things

through creating software to make deep

learning easier to use at a very high

level through education such as the

thing you are watching now through

research which is where we spend a very

large amount of our time which is

researching to figure out how can you

make deep learning easier to use at a

very high level which ends up in as

you'll see in the software and the

education and by helping to build a

community which has made me through the

forums so that practitioners can find

each other and work together so that's

what we're doing so this lesson

practical deep learning for coders is

kind of the starting point in this

journey it contains seven lessons each

one's about two hours long we're then

expecting you to do about eight to ten

hours of homework during the week so

it'll end up being something around 70

or 80 hours of work I will say there is

a lot as to how much people put into

this I know a lot of people who work

full time on fast AI some folks whose do

the two parts can spend a whole year

doing it really intensively I know some

folks watch the videos on double-speed

and never do any homework and come at

the end of it with you know a general

sense of what's going on so there's lots

of different ways you can do this but if

you follow along with this kind of ten

hours a week or so approach for the

seven weeks by the end you will be able

to build an image classification model

on pictures that you choose that will

work at a world class level you'll be

able to classify text again using

whatever datasets you're interested in

you'll be able to make predictions of

kind of commercial applications like

sales you'll be able to build

recommendation systems such as the one

used by Netflix not Tory examples of any

of these but actually things that can

come top ten and capital competitions

that

be everything that's in the academic

community very very high-level versions

of these things so that might surprise

you that's slightly over the the

prerequisite here is literally one year

of coding and high school math but we

have thousands of students now who have

done this and shown it to be true you

will probably hear a lot of naysayers

less now than a couple of years ago than

we started but a lot of naysayers

telling you that you can't do it or that

you shouldn't be doing it or the deep

learnings got all these problems it's

not perfect but these are all things

that people claim about deep learning

which are either pointless or untrue

it's not a black box as you'll see it's

really great for interpretive

interpreting what's going on it does not

need much data for most practical

applications you certainly don't need a

PhD rate from house one so it doesn't

actually stop you from doing deep

learning if you have a PhD I certainly

don't I have a philosophy degree and

nothing else it can be used very widely

for lots of different applications not

just for vision which is where it's most

well-known you don't need lots of

hardware you know that thirty-six cent

and our server is more than enough to

get world-class results for most

problems it's true that maybe this is

not going to help you to build a

sentient brain but that's not our focus

okay so for all the people who say deep

learning is not interesting because it's

not really AI not really a conversation

that I'm interested in we're focused on

solving interesting real-world problems

what are you going to be able to do by

the end of lesson one well this was an

example from Nikhil who's actually in

the audience now cuz he was in last

year's course as well this is an example

of something he did which is he

downloaded 30 images of people playing

cricket and people playing baseball and

around the coach will see you today and

build a nearly perfect classifier of

riches which so this kind of its kind of

stuff that you can build with some fun

hobby examples like this or you can try

stuff as we'll see in the workplace that

could be of direct commercial value so

this is the idea

we're going to get to by the end of

lesson one we're going to start by

looking at code which is very different

to many of the academic courses so for

those of you who haven't kind of an

engineering or math or computer science

background this is very different to the

approach where you start with lots and

lots of theory and then eventually you

get to a postgraduate degree and you're

finally at the point where you can build

something useful we're gonna learn to

build the useful thing today okay now

that means that at the end of the day

you want level of a theory okay there

will be lots of aspects of what we do

that you don't know why or how it works

that's okay

you will learn why and how it works over

the next seven weeks but for now we've

found that what works really well is to

actually get your hands dirty

coding not focusing on theory because

there's still a lot of Addison ship in

deep learning unfortunately it's still a

situation where people who are good

practitioners have a really good feel

for how to work with code and how to

work with the data and you can only get

that through experience and so the best

way to get that that that feel of how to

get good models is to create lots of

models through lots of coding and study

them carefully and it's Jupiter notebook

provides a really great way to study

them so let's try that let's try getting

started

that's so to get started you will open

your Jupiter notebook and you'll click

on lesson 1 lesson 1 yes and it will pop

open looking something like this and so

here it is so you can run a sail and a

Jupiter notebook by clicking on it and

pressing run but if you do so everybody

will know that you're not a real deep

learning practitioner because real deep

learning practitioners know the keyboard

shortcuts and the keyboard shortcut is

shift enter given how often you have to

run a cell don't be going all the way up

here finding your clicking at just shift

enter

ok so type like type shift enter don't

actually

up and down to move around to pick

something to run shift-enter to run okay

so we're going to go through this

quickly and then later on we're going to

go back over it more carefully so here's

the quick version to get a sense of

what's going on

so here we are in lesson 1 and these

three lines is what we start every

notebook with these things starting with

percent are special directives to

Jupiter notebook itself they're not

Python code they're called magics which

is kind of a cool name and these three

directives the details aren't very

important but basically it says hey if

somebody changes the underlying library

code while I'm running this place

reloaded automatically if somebody asks

to plot something then please plot it

here in this Jupiter mo book so just put

those three lines at the top of

everything the next two lines load up

the fast AI library

what is the faster a library so it's a

little bit confusing fast AI with no dot

is the name of our software and then

first dot AI with the dot is the name of

our organization so if you go to dark

start fast

AI this is the fast a I might be okay

well learn more about it in a moment but

for now just realize everything we are

going to do is going to be using

basically either first AI or the thing

that fast AI sits on top of which is

platform height which is one of the most

popular libraries for deep learning in

the world it's a bit newer than

tensorflow so in a lot of ways it's more

modern than tensorflow it's extremely

fast growing extremely popular and we

use it because we used to use tensorflow

a couple of years ago and we found we

can just do a lot more a lot more

quickly with paid watch and then we have

this software that sits on top of plate

watch unless you do far far far more

things that are far more easily than can

with plate or alone so it's a good

combination we'll be talking about about

it but for now just know that you can

use past AI by doing two things

importing

star from past AI and then importing

staff and fast AI dot something where

something is the application you want

concurrently fast AI supports for

applications computer vision natural

language text tabular data and

collaborative filtering and we're and

we're going to see lots of examples of

all of those during the seven weeks so

we're going to be doing some computer

vision at this point if you are a Python

software engineer you are probably

feeling sick because you see me go

import star which is something that

you've all been told to never ever do

okay and there's very good reasons to

not use import star in standard

production code with most libraries but

you might have also seen for those of

you that have used something like MATLAB

it's kind of the opposite everything's

there for you all the time you don't

even have to import things a lot of the

time it's kind of funny we've got these

two extremes of like how to write code

you've got a scientific programming

community that has one way and then

you've got the software engineering

community that has the other both have

really good reasons for doing things and

with the faster a library we actually

support both approaches you know you put

a note block where you want to be able

to quickly interactively try stuff out

you don't want to be constantly going

back up to the top and importing more

stuff and trying to figure out where

things are you want to be able to use

lots of tab complete be you know very

experimental so import start is great

then when you're building stuff in

production you can do the normal Pepe

style you know proper software

engineering practices so so don't worry

when you see me doing stuff which at

your workplace is found upon okay it's

it's this is a different style of coding

it's not that there are no rules in data

science programming it's that the rules

are different right when you're training

models the most important thing is to be

able to interactively experiment quickly

and so you'll see we use a lot of very

different processes styles and stuff to

what you're used to but they're there

for a reason and you'll learn about them

over time you can choose to abuse a

similar approach or not it's entirely up

to you

the other thing to mention

is that the faster a library's it

designed in a very interesting modular

way and you'll find over time that when

you do use import star there's far less

clobbering of things and you might

expect it's all explicitly designed to

allow you to pull in things and use them

quickly without having problems okay so

we're going to look at some data and

there's two main places that were

pretending to get data from for the

course one is from academic datasets

academic datasets are really important

they're really interesting they're

things where academics spend a lot of

time curating and gathering a data set

so that they can show how well different

kinds of approaches work with that data

though the end here is they try to

design data sets that are challenging in

some way and require some kind of

breakthrough to do them well so we're

going to be starting with an academic

data set called the pet data set the

other kind of data set we'll be using

during the course is data sets from the

categorical competitions platform both

academic data sets and cadwal data sets

are interesting for us particularly

because they provide strong baselines

that is to say you want to know if

you're doing a good job so with capital

data sets that have come from a

competition you can actually submit your

results to Carol and see how well would

you have gone in that competition and if

you can get in about the top 10% that

I'd say you're doing pretty well for

academic data sets academics write down

in papers what the state of the art is

so how well did they go with using

models on that data set so this is this

is what we're going to do we're going to

try and create models that get right up

towards the top of capital competitions

preferably actually in the top ten what

does the top 10% or that meet or exceed

academic state-of-the-art published

results so the when you use an academic

data set it's important to cite it so

you'll see here there's a link to the

paper that it's from you definitely

don't need to read that paper right now

but if you're interested in learning

more about it and why it was created and

how it was created all the details there

so in this case this is a pretty

difficult challenge the PEC datasets

going to ask us to distinguish between

37 different categories of dog breed and

cat breed so that's really hard in fact

every course until this one we've used a

different data set which is one where

you just have to decide is something a

dog or is it a cat so you've got a 50-50

chance right away

right and dogs and cats look really

different there are lots of dog breeds

and cat breeds look pretty much the same

so why have we changed that dataset

we've got to the point now where deep

wedding is so fast and so easy that the

dogs versus cats problem which a few

years ago was considered extremely

difficult 80% accuracy was

state-of-the-art it's now too easy

our models were basically getting

everything right all the time without

any tuning and so they want you know

really a lot of opportunities for me to

show you how to do more sophisticated

stuff so we've picked a harder problem

this year so this is the first class

where we're going to be learning how to

do this difficult problem and this kind

of thing where you have to distinguish

between similar categories it's called

in the academic context is called

fine-grained classification so we're

going to do the fine-grained

classification task with figuring out

particular kind of pet and so the first

thing we have to do is download and

extract the data that we want we're

going to be using this function called

ant our data which will download it

automatically and we'll enter it

automatically AWS has been kind enough

to give us lots of space and bandwidth

for these datasets so they are download

super quickly for you and so the first

question then would be how do I know

what entire data does so you could just

type help and you will find out what my

talk did it come from because since we

imported staff we don't necessarily know

that what does it do and something you

might not have seen before

even if you're an experienced programmer

is what exactly do you pass to it you're

probably used to seeing the names URL

file name destination but you might not

be used to seeing these bits these bits

are tight

and if you've used a tight programming

language you'll be used to seeing them

but frankly programmers are less used to

it but if you think about it you don't

actually know how to use a function

unless you know what type each thing is

that you're providing it so we make sure

that we give you that type information

directly here in the help so in this

case the URL is a string and the file

name

is either Union means either over a path

or a string and it defaults to nothing

and the destination is either a path or

a string of defaults to nothing so we'll

learn more short me about how to get

more documentation about the details of

this but for now we can see we don't

have to pass in a file name or a

destination it'll figure that out for us

from the URL so and for all the data

sets we'll be using in the course we

already have constants defined for all

of them right so in this URLs module or

class actually you can see that's where

it's going to grab it from okay so it's

going to download that to some

convenient path and untie it for us and

we'll then return the value of path okay

and then in Jupiter map book it's kind

of handy you can just write a variable

on its own and semicolon is just it in

the statement marker in Python so that's

the same as doing this you can write it

on phone and it fits it you can also say

print write but again we're trying to do

everything fast and interactively

there's write it and here is the path

where it's given us that data next time

you run this since you've already

downloaded it it won't download it again

since you've already untied it it won't

untie or it again so everything's kind

of designed to be pretty automatic

pretty easy there are some things in

Python that are less convenient for

interactive use and they should be for

example when you do have a path object

seeing what's in it actually is takes a

lot more typing that I would like so

sometimes we add functionality into

existing Python stuff one of the things

we do is we add an LS method to paths so

if you go to path type LS here is what's

inside

this path so that's what we just

downloaded so when you try this yourself

you wait a couple of minutes for it to

download unzip and then you can see

what's in there if you're an experienced

Python programmer you may not be

familiar with this approach of using a

splash like this now this is a really

convenient function that's part of

Python three its functionality from

something called path Lib these are path

objects path objects are much better to

use then strings that lets you basically

create sub paths like this it doesn't

matter if you're on Windows Linux Mac

it's always going to work exactly the

same way so here's a path to the images

in that data set alright so if you're

starting with a brand new data set

trying to do some deep learning on it

what do you do well the first thing you

would want to do is probably see what's

in there so we've found that these are

the directories that in there so what's

in this images there's a lot of

functions in fast i/o for you there's

one called get image files that will

just grab a array of all of the image

files based on extension in a path and

so here you can see we've got lots of

different files okay so this is a pretty

common way to for image computer vision

datasets to get passed around as that is

just one folder with a whole bunch of

files in it so the interesting bit then

is how do we get the labels so in

machine learning the labels refer to the

thing we're trying to predict and if we

just eyeball this we could immediately

see that the labels are actually part of

the file name you see that right it's

kind of like path slash label underscore

number extension so we need to somehow

get a list of these bits of each file

name and that will give us our labels

because that's all you need to build a

deep learning model you need see

pictures so files containing the images

and you need some labels so in fast AI

this is made really easy there's a

object called image data Bunch and an

image data bunch represents all of the

data you need to build a model and

there's basically some factory methods

which try to make it really easy for you

to create that data bunch we talked more

about this role even a training set and

the validation set with images and

labels for you now in this case we can

see we need to extract the labels from

the names okay so we're going to use

from name re so for those of you that

use Python you know re is the module in

Python that does regular expressions

things that's really useful for

extracting text I just went ahead and

created the regular expression that

would extract the label from this text

okay so those of you who are not

familiar with regular expressions super

useful to be very useful to spend some

time figuring out how and why that

particular regular expression is going

to extract the label from this text okay

so with this factory method we can

basically say okay I've got this path

containing images this is a list of file

names remember I got them back here this

is the regular expression pattern that

is going to be used to extract the label

from the filename will talk about

transforms later and then you obviously

to say what size images do you want to

work with so that might seem weird why

do I need to say what size images I want

to work with because the images have a

size we can see what size the images are

and I guess honestly this is a

shortcoming of current deep learning

technology which is that a GPU has to

apply the exact same instruction through

a whole bunch of things at the same time

in order to be fast and so if the images

are different shapes and sizes you can't

do that right so we actually have to

make all of the images the same shape

and size in part one of the course we're

always going to be making images square

shapes in part two we'll learn how to

use rectangles as well it turns out to

be surprisingly nuanced but pretty much

everybody in pretty much all computer

vision modeling nearly all of it uses

this approach of square and 224 by 224

for reasons we learn about is an

extremely common size that most models

tend to use so if you just use size

equals to 24 you're probably going to

get pretty good results most of the time

and this is kind of the little bits of

artists in the ship that I want to teach

you folks which is like what generally

just works okay so if you just use size

equal to 24 that'll generally just work

for most things most of the time so this

is kind of return a data bunch object

and in fast AI everything you model with

is going to be a data bunch object we're

going to learn all about them and what's

in them and how do we look at them and

so forth they're basically a data bunch

object contains two or three data sets

it contains your training data we'll

learn about this shortly it'll contain

your validation data and optionally it

contains your test data and for each of

those it contains your your images and

your labels or your texts and your

labels or your tabular data and your

labels or so forth and that all sits

there in this one place something we'll

learn more about a little bit is

normalization but generally in all

nearly all machine learning tasks you

have to make all of your data about the

same size they're specifically about the

same mean and about the same standard

deviation so there's a normalized

function that we can use to normalize

our data bunch in that way okay rich or

come and ask the question thanks what is

the function do an image size is not 224

great so this is propaganda known about

shortly basically this thing called

transforms is is used to do a number of

things and one of the things it does is

to make something size 224

let's take a look at a few pictures here

are a few pictures of things from my

digger from my data bunch so you can see

data dot show batch can be used to show

me the contents of

some of the contents of my data bunch so

this is going to be three by three and

you can see roughly what's happened is

that they all seem to have been kind of

zoomed and cropped in a reasonably nice

way so basically what it'll do is

something called by default center

cropping which means it'll kind of grab

the middle bit and it also resize it so

we'll talk more about the detail this

because it turns out to actually be

quite important but basically a

combination of cropping and resizing is

used something else we'll learn about is

we also use this to do something called

data augmentation so there's actually

some randomization in how much and where

it crops and stuff like that okay but

that's the basic idea is some cropping

and some resizing that often we also

also do some some padding so there's

also all kinds of different ways and it

depends on data augmentation which we're

going to learn about shortly and what

does it mean to normalize the images so

normalizing the images we're going to be

learning more about later in the course

but in short it means that the the pixel

values we're going to be learning more

about pixel values the pixel values

start out from naught to 255 and some

pixel values might tend to be really I

should say some channels because there's

red green and blue so some channels

might tend to be really bright and some

might tend to be really not bright at

all and some might be area large and

some might not very much at all it

really helps train a deep learning model

if each one of those red green and blue

channels has a mean of 0 and a standard

deviation of 1 ok we'll learn more about

that if you haven't studied or don't

remember means and standard deviations

we'll get back to some of that later but

that's the basic idea

that's what normalization does if your

data and again we'll learn much more

about details but if your data is not

normalized it can be quite difficult for

your model to train well so if you do

have trouble training a model one thing

to check is that you've normalized it as

GPU man will be in power up to doesn't

size 256

some more practical considering to be a

little utilization so we're going to be

getting into that shortly but the brief

answer is that the models are designed

so that the final layer is of size seven

by seven so we actually want something

where if you go seven times to a bunch

of times then you end up with something

that's a good size yeah all of these

details we are going to we are going to

get to but the key thing is I wanted to

get you training a model as quickly as

possible but you know one of the most

important things to be a really good

practitioner is to be able to look at

your data okay so it's really important

to remember to go do batch and take a

look it's surprising how often when you

actually look at the data set you've

been given that you realize it's got

weird black borders on earth or some of

the things have text covering up some of

it or some of its rotated in odd ways so

make sure you take a look okay and then

the other thing we want to do is not

just look at the pictures but also look

at the labels and so all of the possible

label names accord your classes that's

where the data bunch you can print out

your data type classes and so here they

are that's almost the possible labels

that we found by using that regular

expression on the file names and we

learnt earlier on in that prose I wrote

at the top that there are 37 possible

categories and so just checking length

data classes it is indeed 37 a data

bunch will always have a property called

C and that property called C the

technical details will kind of get to it

later but for now you can kind of think

of it as being a number of classes for

things like regression problems and

multi-label classification and stuff

that's not exactly accurate but it will

do for them it's it's important to know

that data dot C is a really important

piece of information that is something

like or at least for classification

problems it is the number of classes

okay believe it or not we're now ready

to train a model and so a model is

trained in fast AI using something

called a learner and just like a data

bunch is a general fast AI concept for

your data and from there there are

subclasses for particular applications

like image data bunch Alanna is a

general concept for things that can

learn to fit the model and from that

there are various subclasses to make

things easier and a particular there's

one called con flora which is something

that will create a convolutional neural

network for you and we'll be learning a

lot about that over the next few lessons

but for now just know that to create a

learner for a convolutional neural

network you just have to tell it two

things the first is what's your data and

not surprisingly it takes a data bunch

and the second thing you need to tell it

is what's your model or what's your

architecture so as I learned there are

lots of different ways of constructing a

convolutional neural network but for now

the most important thing for you to know

is that there's a particular kind of

model called a res net which works

extremely well nearly all the time and

so for a while at least you really only

need to be doing choosing between two

things which is what size ResNet do you

want don't is basically how big is it

and we'll learn them all about the

details of what that means but there's

that one quarter risen at 34 and there's

one quarter of ResNet 50 and so when

we're getting started with something up

because small one because it'll train

faster so that's kind of it that's as

much as you need to know to be a pretty

good practitioner about architectures

for now which is that there's two

architectures or two variants of one

architecture that work pretty well

present at 30 450 start with a smaller

one and see if it's good enough so that

is all the information we need to create

a convolutional neural network learner

there's one other thing I'm going to

give it though which is a list of

metrics metrics are literally just

things that get printed out as it's

training so I've saying I would like you

to print out the error rate please now

you can see the first time I ran this on

a newly installed box it downloaded

something

what's it downloading it's downloading

the rest net 30 for pre-trained weights

now what this means is that this

particular model has actually already

been trained for a particular task and

that particular task is that it was

trained on looking at about one and a

half million pictures of all kinds of

different things a thousand different

categories of things using an image data

set called image net and so we can

download those pre trained weights so

that we start start with a model that

knows nothing about anything but we

actually start with a model that knows

how to recognize there are thousand

categories of things in image net now I

don't think I'm not sure but I don't

think all of these 37 categories of pet

or in image net but there was certainly

some kinds of dog know certainly some

kinds of cat so this pre trained model

already knows quite a little bit about

what pets look like and it certainly

knows quite a lot about what animals

look like and what photos look like so

the idea is that we don't start with a

model that knows nothing at all but we

start by downloading a model that does

something about recognizing images

already so it downloads for us

automatically the first time we use it a

pre trained model and then from now on

it won't need to download it again it'll

just use the one we've got this is

really important we're going to learn a

lot about this it's kind of the focus of

the whole course which is how to do this

is called transfer learning how to take

a moral that already knows how to do

something pretty well and make it so

that it can do your thing really well I

take a pre trained model and then we fit

it so that instead of predicting Li a

thousand categories of imagenet with

image net data it predicts the 37

categories of pets using your pet data

and it turns out that by doing this you

can train models in 1/100 or less of the

time of regular model training with

1/100 or less of the data the regular

model training in fact potentially many

thousands of times less remember I

showed you the slide of nickels lesson

one

from last year he used 30 images and

there's not cricket and baseball images

in imagenet but but it just turns out

that image gets already so good at

recognizing things in the world

they're just 30 examples of people

playing baseball and cricket was enough

to build a nearly perfect classifier

okay

now you would naturally be potentially

saying well wait a minute how do you

know that it was going to actually that

it can actually recognize pictures of

people playing cricket versus baseball

in general maybe it just learnt to

recognize those 13 maybe it's just

cheating right and that's called

overfitting we'll be going talking a lot

about that during this course right but

what a fitting is where you don't learn

to recognize pictures of say cricket

versus baseball but just these

particular cricketers and these

particular photos and these particular

baseball players in these particular

photos we have to make sure that we

don't move a fit and so the way we do

that is using something called a

validation set a validation set is a set

of images that your model does not get

to look at and so these metrics like in

this case error rate get printed out

automatically using the validation set

and sort of images that our model never

got to see when we created our data

bunch it automatically created a

validation set for us okay and we'll

learn lots of ways of creating and using

validation sets but because we're trying

to bake in all of the best practices we

actually make it nearly impossible for

you not to use a validation set because

if you're not using a validation set you

don't know if you're overfitting okay so

we always print out the metrics on a

validation set we've always hold it out

we always make sure that the model

doesn't touch it that's all done for you

okay and that's all built into this data

bunch object so now that we have a

corner we can fit it you can just use a

method called fit but in practice you

should nearly always use a method called

fit one cycle we'll learn more about

this during the course but in short one

cycle learning is a paper

that was released I'm trying to think

few months ago listen a year ago yeah so

a few months ago and it turned out to be

dramatically better both more accurate

and faster than any previous approach so

again I don't want to teach you how to

do 2017 deep learning right in 2018 the

best way to fit models is to use

something called one cycle well learn

all about it but for now just know you

should probably take my own fit one

cycle okay if you forget how to type

then you can start typing a few letters

in hit tab okay and you'll get a list of

potential options and then if you forget

what to pass it you can press shift tab

and it will show you exactly what to

pass it so you don't actually have to

type help and again this is kind of nice

that we have all the types here because

we can see cycle length that we'll learn

more about what that is shortly is an

integer and then next learning rate

could either be the flow for reflection

or whatever and so forth and you can see

that the mentions will default to this

couple and so forth okay so for now just

know that this number four basically

decides how many times do we go through

the entire data set how many times do we

show the data set to the model so that

it can learn from it each time it sees a

picture it's going to get a little bit

better

but it's going to take time and it means

it could over fit but sees the same

picture too many times

it'll just learn to recognize that

picture not pets in general so we'll

learn all about how to tune this number

during the next couple of lessons but

starting out with four is a pretty good

start just to see how it goes and you

can actually see after four epochs or

four cycles we put an error rate of 6%

so a natural question is how long that

took that took a minute and 56 seconds

yeah so we're paying you know 60 cents

an hour now we just pay for two minutes

I mean we actually pay for the whole

time that it's on and running there's

two minutes of compute time and we've

got an error rate of 6% so 95% of the

time we correctly picked the exact right

one of those 94 dog and cat breeds which

feels pretty good to me

but to get a sense of how good it is

maybe we should go back and look at the

paper just remember I said the nice

thing about using academic papers or

capital data sets is we can compare our

solution to whatever the best people in

Cabell did or whatever the academics did

so this particular data set of pet

breeds is from 2012 and if I scroll

through the paper you'll generally find

in any academic paper there'll be a

section called experiments about 2/3 of

the way through and if you find the

section on experiments then you can find

the section on accuracy and they've got

lots of different models and their

models as you're read about in the paper

it's really kind of pet specific they

learn something about how pet heads look

and how pet body is broken and techne

which is in general look and they

combine them all together and once they

use all of this complex code and math

they got an accuracy of 59% okay so in

2012 this highly pet specific analysis

got an accuracy of 59% these were the

top researchers from Oxford University

today in 2018 with basically if you go

back and look at actually how much code

we just wrote it's about three lines of

code the other stuff is just printing

out things to see what we're doing

we got ninety four percent so six

percent error so like that gives you a

sense of you know how far we've come

with deep learning and particularly with

pay torch and fast AI how easy things

are yeah so um before we take a break I

just want to check to see if we've got

any and just remember if you're in the

audience and you see a question that you

want asked please click them up heart

next to it so that Rachel knows that you

want to hear about it well

if there is something with six likes and

Rachel didn't notice it which is quite

possible just just quote it in a reply

and say hey Rachel this one's got six

legs okay

so what we're going to do is we're going

to take a eight minute break so we'll

come back at five past eight so where we

got to was we just we just trained a

model we don't exactly know what that

involved or how it happened but we do

know that we're three or four lines of

code we've built something which smashed

the accuracy of the state-of-the-art of

2012 6% arrow certainly sounds like

pretty impressive for something that can

recognize different dog breeds and cat

breeds but we don't really know why it

works that we will that's okay all right

and in terms of getting the most out of

this course we very very regularly here

after the course is finished the same

basic feedback which this is literally

copy and paste it for them forum I fell

into the habit of watching the lectures

too much and googling too much about

concepts without running the code at

first I thought I should just read it

and then research the theory and we keep

hearing people saying my number one

regret is I just spent 70 hours doing

that and at the very end I started

running the code and oh it turned out I

learned a lot more

so please run the code really run the

code I should have spent the majority of

my time on the actual code and the

notebooks running it seeing what goes in

and seeing what comes out so your most

important skills to practice our

learning and we going to show you how to

do this in a lot more detail but

understanding what goes in and what goes

out so we've already seen an example of

looking at what goes in which is data

dot show batch and that's going to show

you examples of labels and images and so

next we're going to be seeing how to

look at what came out so that's the most

important thing to study as I said the

reason we've been able to do this so

quickly is heavily because of the

fostered a library now if I stay a

library is pretty new but it's already

getting an extraordinary amount of

direction as you've seen all of the

major cloud providers either support it

or are about to support it

a lot of researchers are starting to use

it it's it's - remaking a lot of things

a lot easier but it's also making new

things possible and so really

understanding the faster I software is

something which is going to take you a

long way and the best way to really

understand the faster your software well

is by using the fast AI documentation

and we'll be learning more about the

fast a documentation shortly so how does

it compare I mean there's really only

one major other piece of software like

fast AI that is something that tries to

make deep learning easy to use and

that's chaos chaos is a really terrific

piece of software we actually used it

for the previous courses until we switch

to first AI it runs on top of tensorflow

it was kind of the gold standard for

making deep learning easy to use before

but life is much easier with bostero so

if you look for example at the last

year's course exercise which is getting

dogs vs. cats fast AI lets you get more

much more accurate less than half the

error on a validation set of course

training time is less than half the time

lines of code is about a six of the

lines of code and the lines of code are

more important than you might realize

because those 31 lines of Karis code

involved you making a lot of decisions

setting lots of parameters during list

of configuration so that's all stuff

where you have to know how to set those

things to get kind of best practice

results or else these five lines of code

anytime we know what to do for you we do

it for you anytime we can pick a good

default we pick it for you okay so

hopefully your

is a really useful library not just for

learning deep learning but for taking it

a very long way how far can you take it

well as you'll see all of the research

that we do at past AI uses the library

and an example of the research we did

which was recently featured in Wired

describes a new breakthrough in a

natural language processing processing

which people are calling the image net

moment which is basically we broke a new

state of the art resolved in text

classification which open AI then built

on top of our paper to do with more

computing more data into different tasks

to take it even further and like this is

an example of something that we've done

in the last six months in conjunction

actually with my colleague Sebastian

Reuter an example of something that's

being built in the faccio library and

you're going to learn how to use this

brand-new model in three lessons time

and you're actually going to get this

exact result from this exact paper

yourself another example one of our

alums ml Hussain who you'll come across

on the forum plenty because he's a great

guy very active built a new system for

natural language semantic code search

you can find an on github where you can

actually type in English sentences and

find snippets of codes that do the thing

you asked for and again it's being built

with the FASTA a library using the

techniques you'll be learning in the

next seven weeks in production yeah well

I think this stage is a part of their

experiments platform so it's kind of

pre-production I guess and so the best

place to learn about these things and

get involved from these things is on the

forums where as well as categories for

each part of the course and there's also

a general category for deep learning

where people talk about research papers

applications so on and so forth so even

though today we're kind of got to focus

on a small number of lines of code to a

particular thing which is image

classification and we're not learning

much math or theory or whatever over

these seven weeks and then part two

another seven weeks we're going to go

deeper and deeper and deeper and so

where can that take you I want to give

you some examples that there is Sarah

hooker she did our first course a couple

of years ago her background was

economics didn't have a background in

coding math computer science I think she

started learning to code two years

before she took our costs she helped

develop something at she started a

nonprofit called Delta analytics they

helped build this amazing system where

they attached old mobile phones to trees

in the Kenyan rain forests and used it

to listen for chainsaw noises and then

they used deep learning to figure out

when there was a chainsaw being used and

then they had a system set up to alert

Rangers to go out and stop illegal

deforestation in the rainforests so that

was something that she was doing well

she was in the course as part of her

kind of class projects

what's she doing now she is now a Google

brain researcher which I guess is one of

the top if not the top place to do deep

learning

she's just been publishing some papers

now she is going to Africa to set up a

Google brains first deep learning

Research Center in Africa now I'll say

like she worked her ass off you know she

really really invested in this course

not just doing all of the assignments

but also going out and reading in

Goodfellows book and doing lots of other

things but it really shows where

somebody who has no computer science or

math background at all can be now one of

the world's top deep learning

researchers and doing very valuable work

another example from our most recent

course

Christine Payne she is now at open AI

and you can find her post and actually

listen to her music samples of she

actually built something

to automatically create chamber music

compositions you can play and you can

listen to online and so again it's her

background math and computer science

actually that's her there

classical pianist now I will say she is

not your average classical pianist she's

a festival pianist who also has a

master's a medical researcher in

Stanford and studied neuroscience and

was a high-performance computing expert

at Ian's shore and was valedictorian at

Princeton anyway she you know very

annoying person who did everything she

does but you know I think it's really

cool to see how I kind of a domain

expert in this case the domain of

playing piano can go through the

fascinator course and come out the other

end

I guess open AI would be you know of the

three top research institutes bugle

playing or open a would be two of them

probably along with Diamond and

interesting Lee actually one of our

other students or alumni of the course

recently interviewed her for a blog post

series he's doing on top AI researchers

and she said one of the most important

pieces advice she got was from me and

she said the piece of advice was kick

one project do it really well make it

fantastic okay so that was the piece of

advice she found the most useful and

we're going to be talking a lot about

you doing projects and making them

fantastic during this course having said

that I don't really want you to go to AI

or Google brain what I really want you

to do is go back to your workplace or

your passion project and apply these

skills there right like let me give you

an example

MIT released a deep learning course and

they highlighted in their announcement

for this deep learning course this

medical imaging example and one of our

students Alex who is a radiologist said

you guys just showed a model overfitting

I can tell because I'm a radiologist

and this is not what this would look

like on a chest film

this is what it should look like and

this is a deep breading practitioner

this is how I know that this is what

happened in your model so alex is

combining his knowledge of radiology and

his knowledge of deep learning to assess

mi t--'s model from just two images very

accurately right and so this is actually

what I want most of you to be doing is

to take your domain expertise and

combine it with the deep learning

practical aspects that you'll learn in

this course and bring them together like

Alex is doing here and so a lot of

radiologists have actually gone through

this course now and have built journal

clubs and American Council of radiology

practice groups there's a data science

Institute at the ACR now and so forth

and Alex is one of the people who's

providing kind of a lot of leadership in

this area I would love you to do the

same kind of thing that alex is doing

which is to really bring deep learning

related leadership into your industry

and just your social impact project

whatever it is that you're trying to do

so another great example was this was

Melissa fab bras who was a English

literature PhD who studied like gendered

language in English literature or

something and actually wrench over the

previous job taught her to code

I think and then she came into the first

day a course and she helped Kiva a micro

lending a social impact organization to

build a system that can recognize faces

why is that necessary well we're going

to be talking a lot about this but

because most a I researchers are white

men most computer vision software can

only recognize white male faces

effectively in fact I think of as IBM

system is like ninety-nine point eight

percent accurate on common white face

men versus sixty percent accurate

sixty-five percent accurate on dark

faith dark-skinned women so it's like

what is that like 30 or 40 times worse

for black women versus white men and

this is really important because for

chemo black women

you know perhaps the most common user

base for their microlending platform so

melissa after taking our course and

again working in her ass off and being

super intensive in her study and her

work

won this $1,000,000 AI challenge for her

work for Kiva Karthik did our course and

realize that the thing he wanted to do

wasn't at his company it was something

else which is to help blind people to

understand the world around them so he

started a new startup you can find it

now it's called envision you can

download the app you can point your

phone of things and it will tell you

what it sees and I actually talked to a

blind lady about these kinds of apps the

other day and she confirmed to me this

is a super useful thing for visually

disabled users and it's not it's the

level that you can get to with with the

content that you're going to get over

these seven weeks and with this software

can get you right to the cutting edge in

areas you might find surprising for

example I helped a team of some of our

students and some collaborators on

actually breaking the world record for

training remember I mentioned the

imagenet data set lots of people want to

train on the imagenet dataset we smashed

the world record for how quickly you can

train it we do standard AWS cloud

infrastructure cost of $40 of compute to

train this model using again faster

library the techniques that we learn in

this course so it can really take you a

long way so don't be kind of put off by

this what might seem pretty simple at

first we're going to get deeper and

deeper you can also use it for other

kinds of passion project so Helene

esaron actually you should definitely

check out her Twitter account like ELISA

this art is a basically a new style of

art that she's developed which combines

her painting and drawing with generative

adversarial models to create these

extraordinary results and so I think

this is super cool she's not a

professional artists she is a

professional software developer

but she just keeps on producing these

beautiful results and when she started

you know her art had not really been

shown anywhere I discussed anywhere now

there's recently been some quite

high-profile articles describing how she

is creating a new form of art again this

is come out of the FASTA a course that

she developed these skills or equally

important bred counselor who figured out

how to make a picture of Kanye out of

pictures of Patrick Stewart's head also

something you will learn to do if you

wish to this particular style this

particular type of what's called style

transfer was a really interesting tweak

it allowed him to do some things that

hadn't quite been done before and this

particular picture helped him to get a

job as a deep learning specialist at AWS

so another interesting example another

alumni actually worked at Splunk as a

software engineer and he'd signed an

algorithm after like lesson three which

basically turned out its plant to be

fantastically good at identifying fraud

and we'll talk more about it shortly if

you've seen Silicon Valley the HBO

series the the hot dog hot dog app

that's actually a real app you can

download and it was actually built by a

team on Glade as a fast AI student

project so there's a lot of cool stuff

that you can do I'm like yes it wasn't

very nominated so I think we only have

one any nominated fast day alumni at

this stage so please help change that

alright the other thing you know is is

is the forum threads can kind of turn

into these really cool things so

Francisco was actually here in the

audience he's are really boring McKinsey

consultant like me

it's a Francisco and I both have this

shameful past that we were McKinsey

consultants but we left and we're okay

now

and he started his threat saying like oh

this stuff we've just been learning

about building

NLP in different languages let's try and

do lots of different languages we

started this thing with the language

model zoom and add that there's now been

an academic competition was one in

Polish that led to an academic paper tie

state-of-the-art German state of the art

basically as students have been coming

up with new study that results across

lots of different languages and this all

is entirely being done by students

working together through the forum so

please get on the forum but don't be

intimidated because remember and one of

the people everybody you see on the

forum the vast majority posting post all

the damn time right they've been doing

this a lot and they do it a lot of the

time and so at first it can feel

intimidating because it can feel like

you're the only new person there but

you're not right all of you people in

the audience everybody who's watching

everybody who's listening you're all new

people right and so when you just get

out there and say like okay nor your

people getting these state-of-the-art

results in German language modeling if I

can't start my server I try to click the

notebook and I get an error what do I do

people will help you okay just make sure

you provide all the information this is

the you know I'm using paper space this

was the particular instance I try to use

here's a screenshot of my error people

will help you okay well if you've got

something to add so if people were

talking about crop yield analysis and

you're a farmer and you think you know

oh I've got something to add so please

mention it even even if you're not sure

it's exactly relevant it's fine you know

just get involved and because remember

everybody else in the forum's started

out also intimidated right we all start

out not knowing things and so just get

out there and try it okay

so let's get back and do some more

coding yes Rachel do we have some

questions about why you're using breast

net is opposed to this session so the

question is about this architecture

so there are lots of architectures to

choose from and it would be fair to say

there isn't one best one but if you look

at things like the Stanford dawn bench

benchmark or imagenet classification

you'll see in first place in second

place in third place in fourth place is

faster i Jeremy Hatton first a

hydrometer plus the irony response from

the Department of Defense innovation

team Google RIS net ResNet ResNet ResNet

listen it's good enough ok so it's fun

there are other architect is the main

reason you might want a different

architecture is if you want to do inch

computing so if you want to create a

model that's gonna sit on somebody's

mobile phone having said that even their

most of the time I reckon the best way

to get a model onto somebody's mobile

phone is to run it on your server and

then have your mobile phone app talk to

it it really makes life a lot easier and

you get a lot more flexibility but if

you really do need to run something on a

low powered device then there are some

special architectures for them so the

particular question was about inception

that's a particular another architecture

which tends to be pretty memory

intensive and yeah resident I'm for

inception tends to be pretty memory

intensive but it's it's ok it's also

like it's not terribly resilient one of

the things we try to show you is like

stuff which just tends to always work

even if you don't quite ruin everything

perfectly

so Reznor tends to work pretty well

across a wide range of different kind of

details around choices that you might

make so I think it's pretty good so

we've got this trained model and so

what's actually happened as we'll learn

is it's basically creating a set of

weights if you've ever done anything

like a linear regression or logistic

regression you'll be familiar with

coefficients we basically found some

coefficients and parameters that work

pretty well and it took us a minute and

56 seconds so if we want to start doing

some more playing around and come back

later we probably should save those

weights

we can save that minute and 56 seconds

so you can just go and learn got save

and give it a name it's going to put it

in a model subdirectory in the same

place the data came from so if you save

different models or different data

bunches from different data sets

they'll all be kept separate so don't

worry about it

all right so we've talked about how the

most important things that add on learn

what goes into your model what comes out

we've seen one way of seeing what goes

in now let's see what comes out this is

the other thing you need to get really

good at so to see what comes out we

could use this class for classification

interpretation and we're going to use

this factory method from learner so we

pass in a loan object so remember a

learn object from those two things

what's your data and what is your model

it's now I'm not just an architecture

it's actually a trained model inside

there and that's all the information we

need to interpret that model so if this

pass in the learner and we now have a

classification interpretation object and

so one of the things we can do it

perhaps the most useful things to do is

called plot top losses so we're going to

be learning a lot about this idea of

loss functions shortly but in short a

loss function is something that tells

you how good was your prediction and so

specifically that means if you predicted

one class of cat with great confidence

you said I am very very sure that this

is a BER man but actually you were wrong

then then that's going to have a high

loss because you were very confident

about the wrong answer okay so that's

what it basically means to have a high

loss so by putting the top losses we are

going to find out what were the things

that we were the most wrong on are the

most confident about what we got wrong

so you can see here it prints out three

things German Shorthaired before things

beat all 7.0 for 0.92 well what do they

mean perhaps we should look at the

document

so if you we've already seen help but

and help just prints out a quick little

summary but if you won't really see how

to do something use doc and doc tells

you the same information is help but it

has this very important thing which is

show in Doc's so when you click on

showing dots it pops up the

documentation for that method or class

or function or whatever starts out by

showing us the same information about

what is what are the parameters it takes

along with the doc string but then tells

you more information so in this case I

saw the thing that tells me the title of

eight shows the prediction the actual

the loss and the probability that was

predicted so for example and you can see

there's actually some code you can run

so the documentation always has working

code and so in this case it was trying

things with handwritten digits and so

the first one it was predicted to be a

seven it was actually a three the loss

is five point four four and the

probability of the actual class was 0.07

okay so I you know we did not have a

high probability associated yet for

class I can see why I thought this was a

seven unless it was wrong so this is the

documentation okay and so this is your

friend when you're trying to figure out

how to use these things the other thing

I'll mention is if you're a somewhat

experienced Python programmer you'll

find the source code of faster I'm

really easy to read we're trying to

write everything in just a small number

of you know much less than half a screen

of code generally four or five lines of

code

if you click source you can jump

straight to the source code right so

here is the plot top losses and this is

also a great way to find out how to use

the faster I'm I agree because every

line of code here nearly every line of

code is calling stuff in the faster you

library okay so don't be afraid to look

at the source code I've got another

really cool trick about the

documentation that you're going to see a

little bit later

okay so that's how we can look at these

top losses and these suppress the most

important image classification

interpretation tools that we have

because it lets us see what are we

getting wrong and quite often like in

this case if you're a dog and cat expert

you'll realize that the things that's

getting wrong breeds that are actually

very difficult to tell apart and you'd

be able to look at these and say oh I

can see why they've got this one wrong

so this is a really useful tool another

useful tool kind of is to use something

called a confusion matrix which

basically shows you for every actual

type of dog or cat how many times was it

predicted to be that dog okay but

unfortunately in this case because it's

so accurate this diagonal basically says

how it's pretty much right all the time

and you can see this in slightly darker

ones like a five here it's really hard

to read exactly what their combination

is so what I suggest you use is instead

of if you've got lots of classes don't

use a classification confusion matrix

but this is my favorite named function

in faster I are very proud of this you

can call most confused and most confused

will simply grab out of the confusion

matrix the particular combinations have

predicted and actual that got wrong the

most often so this case the

Staffordshire Bull Terrier was what it

should have predicted and instead it

predicted an American Pitbull Terrier

and so forth it should have ridiculous I

mean actually predicted Burma that

happened four times this particular

combination happens six times so this is

again a very useful thing because you

can look and you can say like with my

domain expertise does it make sense that

that would be something that was

confused about so these are some of the

kinds of tools you can use to look at

the upload let's make our model better

so how do we make the bottle better we

can make it better using fine tuning so

far we fitted for epochs and it ran

pretty quickly and the reason it ran

pretty quickly is that there was a

little trick we used these deep learning

models these convolutional networks they

have

lanes they learned a lot about exactly

what layers are but but now just know it

goes through a computer computational

computation or computational computation

what we did was we added a few extra

layers to the end and we only trained

votes we basically left most of the

model exactly as it was so that's really

fast and if we're trying to build a

model at something that's similar to the

original pre-trained model so in this

case similar the imagenet data that

works pretty well but what we really

want to do is actually go back and train

the whole model so this is why we pretty

much always use this two-stage process

so by default when we call fit or fit

one cycle on a con Florida it'll just

fine-tune these few extra layers add up

to the end and it'll run very fast it'll

basically never over fit but to really

get it good you have to call an crits

and unfreeze is the thing that says

please train the whole model and then I

can call fit one cycle again and of the

error got much worse okay

why in order to understand why we're

actually going to have to learn more

about exactly what's going on behind the

scenes so let's start out by trying to

get an intuitive understanding of what's

going on behind the scenes and again

we're going to do it by looking at

pictures we're gonna start with this

picture these pictures come from a

fantastic paper by Nets Iowa who

nowadays is CEO of clarify which is a

very successful computer vision start

and his supervisor is PhD Rob Fergus and

they kind of paper showing how you can

visualize the layers of a convolutional

neural network so a convolutional neural

network will learn mathematically about

what the layers are shortly but the

basic idea is that your red green and

blue pixel values that are numbers from

nought to 255 go into the simple

computation the first layer and

something comes out of that and then the

result of that goes into a second layer

back

the third layer and so forth and there

can be up to a thousand layers of a

neural network president 34 has 34

layers there's no 50s 50 layers but

that's not that layer one there's this

very simple computation it's a

convolution if you know what they are

we'll learn more about them shortly what

comes out of this first layer well we

can actually visualize these specific

coefficients the specific parameters by

drawing them as a picture there's

actually a few dozen of of them in the

first layer so we won't draw all of them

and let's just look at mine at random so

here are my examples of the actual

coefficients from the first layer and so

these operate on groups of pixels that

are next to each other and so this first

one basically finds groups of pixels

that have a little horizontal diagonal

line in this direction this one finds

diagonal lines in the other direction

despite ingredients that go from yellow

to blue in this direction this one finds

greated to go from pink to green in this

direction

and so forth that's a very very simple

little filters let's layer one of a

imagenet pre-trained convolutional

neural net layer two takes the results

of those filters and does a second layer

of computation and it allows it to

create so here at nine examples of kind

of a way of visualizing this one of the

second layer features and you can see

it's basically learned to create

something that looks for Connors top

left corners and this one is learn to

find things that find right-hand curves

this one is learn to find things that

find little circles right so you can see

how Maya - like this is the easiest way

to see it in layer one we have things

that can find just one line and lay it -

we can find things that have two lines

turned up or one line repeated if you

then look over here these nine show you

nine examples of actual bits of actual

photos that activated this filter a lot

that's what other words this little bit

of function math function here was good

at finding these kind of window corners

and stuff

like that this little surly one was very

good at finding bits of photos that had

circles it okay so this is the kind of

stuff you've got to get a really good

intuitive understanding for slightly the

start of my neural nets gonna find

simple very simple gradients lines the

second layer can find very simple shapes

the third layer can find combinations of

votes so now we can find repeating

patterns of two-dimensional objects or

we can find kind of things that joins

that join together or we can find well

what are these things well let's find

out what is this let's go and have a

look at some bits of picture that

activated this one highly Oh mainly

they're bits of text although sometimes

windows so it's nice to be able to find

kind of like four petered horizontal

patterns and this one here since we have

a find kind of edges of fluffy or

flowery things this one here is kind of

finding geometric patterns so layer

three was able to take all the stuff in

layer two and combine them together

layer four can take all the stuff from

layer three and combine them together by

layer four we put something that can

find dog faces and let's see what else

we've got here yeah various kinds of oh

here we have bird legs so you kind of

get the idea so by layer five we've got

something that can find the eyeballs of

birds and wizards or faces of particular

breeds of dogs and so forth so you can

see how by the time you get to layer 34

you can find specific dog breeds and cat

breeds right this is kind of how it

works

so when we first trained when we first

fine-tune that pre-trained model we kept

all of these layers that you've seen so

far and we just trained a few more

layers on top of all of those

sophisticated features that are already

being created okay and so now we're

fine-tuning we're going back and saying

let's change all of these rookies that

we'll start with them where they are

right but let's see if we can make them

better now it seemed very unlikely that

we can make these lay

lively features better like is there I

am likely that the kind of the

definition of a diagonal line is going

to be different when we look at dog and

cat breeds versus the image net data

that this is originally trained on so we

don't really want to change layer one

very much if at all or else the last

layers you know this thing of like types

of dog face seems very likely that we do

want to change that right so you kind of

want this intuition is understanding

that the different layers of a neural

network represents different levels of

kind of semantic complexity so this is

why our attempt to find through this

model didn't work is because we actually

by default it trains all the layers at

the same speed right which is to say it

will update those like things

representing diagonal lines and

gradients just as much as it tries to

update the things that represent the

exact specifics of what a my ball looks

like so we have to change that okay and

so um to change it we first of all need

to go back to where we were before okay

we did we just broke this model right

just much worse than it started out so

if we just go load this brings back the

model that we saved earlier remember we

saved it as stage one okay so let's go

ahead and load that back up so that's

now our models back to where it was

before we killed it

and let's run learning rate finder we're

learning about what that is next week

but for now just know this is the thing

that figures out what is the fastest I

can train this neural network at without

making it zip off the rails and get

blown apart okay so we can call it low

ll find and then we can go and learn

don't recorded up plot and that will

plot the result of our LR finder and

what this basically shows you is this

this is T parameter that we're going to

learn all about called the learning rate

and the learning rate basically says how

quickly am i updating the parameters in

my model and you can see that what

happens is as I in this this bottom one

here shows me what happens as I increase

the learning rate and this one here show

what hapless and so you can see once the

learning rate gets past ten to the

negative four

my last gets worse okay so it actually

so happens in fact I can check this if I

press shift tab here my learning rate

defaults to 0.003 so my default loading

rate is about here so you can see where

I lost got worse right because we kind

of fine-tune things now we can't use

such a high learning rate so based on

the learning rate finder I tried to pick

something you know well before it

started getting worse so I decided to

pick one Enix's so I decided I got to

train at that rate but there's no point

trading all the layers of that rate

because we know that the latent layers

work just fine before when we were

training much more quickly again it was

the default which was to remind us 0.003

so what we can actually do is we can

pass a range of learning rates to learn

theater and we do it like this you pass

and use this keyword in fact in Python

you may have come across fourth called

slice and that can take a start value in

a stock value and basically what this

says is trained the very first players

at a learning rate of 1e make 6 and the

very last layers at a rate of 1 enoch 4

and then kind of distribute all the

other layers across that you know

between those two values equally so

we're going to see that in a lot more

detail but basically for now this is

kind of a good rule of thumb is to say

when you after you unfreeze this is the

thing that's going to train the whole

thing past hey max learning rate

parameter pass it a slice make the

second part of that slice about 10 times

smaller than your first stage so our

first stage defaulted to about 1 in Dec

3 so let's use about what I knew for and

then this one should be a value from

your learning rate finder which is well

before things started getting worse and

you can see things

adding to get worse maybe about here so

I picked something that's at least ten

times smaller than that so if I do that

then I get 0.05 788 so I don't quite

remember what we got before now bit

better all right so we've gone down from

a six point one percent to a five point

seven percent so that's about a 10

percentage point relative improvement

with another 58 seconds of training so I

would perhaps save for most people most

of the time these two stages are enough

to get pretty much a world-class model

you won't win a Carol competition

particularly because now a lot faster I

am on liar are competing on Carol and

this is the first thing that they do but

it'll in practice you'll get something

that's you know about as good in

practice as the vast majority of

practitioners can do we can improve it

by using more layers and we'll do this

next week by basically doing a ResNet 50

instead of ResNet 34 and you can try

running this during the week if you want

to you'll see it's exactly the same as

before but I'm using resident 50 instead

of resident 34 what you'll find is it's

very likely if you try to do this you

will get an error and the error will be

your GPU is ran out of memory and the

reason for that is that resident 50 is

bigger than resident 34 and therefore it

has more parameters and therefore it

uses more of your graphics card memory

just totally separate to your normal

computer Ram this is GPU Ram if you're

using the kind of default salamander AWS

and so forth suggestion then you will be

having a 16 gig of compute the pad I use

most the time has 11 gig GPU memory the

cheaper ones have 8 gig of GPU memory

that's kind of the main range you tend

to get if you also has less than 8 gig

of GPU memory it's going to be

frustrating for you anyway so you'll be

somewhere around there and it's very

likely that we're trying to run this

you'll get

out of memory error and that's because

it's just trying to do too much too many

parameter updates for the amount of RAM

you have and that's easily fixed this

image data bunch constructor has a

parameter at the end batch size yes for

batch size and this basically says how

many images do you train at one time if

you run out of memory just make it

smaller okay

so this worked for me on an 11 gig card

it probably won't work for you if you've

got an 8 gig card if you do just make

that 32 it's fine to use a smaller batch

size it just it might take a little bit

longer that's all ok if you've got a big

oak like a 16 gig you might be able to

get away with 64 ok so that's just one

number you'll need to try during the

week and again we filled it for awhile

and we get down 44.4%

early so this is pretty extraordinary

you know I was pretty surprised because

I mean when we first did in the first

course does cats versus dogs really kind

of getting somewhere around a three

percent error for something where you've

got a fifty percent chance of being

right and the two things work totally

different so that we can get a four

point four percent error of assad's for

such a fine grain thing it's quite

extraordinary in this case I unfroze it

and fit it a little bit more than for

4.4 to 4.3 five it's a tiny improvement

basically risen at 50 is already a

pretty good model it's interesting

because again you can call the most

confused here and you can see the kinds

of things that it's getting wrong and I

actually depending on when you run it

you're going to get slightly different

numbers but you'll get roughly the same

kinds of things so quite often I find

that rag doll and bir-men of things that

it gets confused and I actually have

never heard of either of those things

so I actually looked them up on the

internet and I found a page on the cat

site called is this Superman or rag doll

and there is a long spread of cats

it's like arguing intentionally about

which it is so I feel fine that my

computer had problems I thoughtfully

similar I think was this pitbull versus

Staffordshire Bull Terrier

apparently the main difference is like

the particular Kennel Club guidelines as

to how they are assessed but some people

think that one of them might have a

slightly read in those so this is the

kind of stuff we're actually even if

you're not a domain expert it helps you

become one right because I now know more

about which kinds of pet breeds are hard

to identify than I used to so muddled

interpretation works both ways so what I

want you to do this week is to run this

notebook you know make sure you can get

through it but then what I really want

you to do is to get your own image data

set and actually um Francisco who I

mentioned earlier he started the

language to model thread and he's you

know now helping to TA the costs he's

actually putting together a dye it will

show you how to download data from

Google Images so you can create your own

data set to play with but before I do I

want to before I do I want to show you

because how to create labels in lots of

different ways because your data set

wherever you get it from won't

necessarily be that kind of regex based

approach it could be in lots of

different formats so it was telling you

how to do this I'm going to use the

feminist sample embolus is pictures of

hand drawn numbers I'm just because I

want to show you different ways of

creating these data sets the the Emnes

simple basically looks like this so I go

path LS and you can see it's got a

training set in the validation set

already so basically the people that put

together this data set have already

decided what they want you to use as a

validation set okay so if you go path

slash train dot LS you'll see there's a

Farva quadtree in a folder called seven

now this is really really common way to

just to give people labels it's

basically to say Oh everything that's a

three

I'll put in a folder called three

everything that's a seven I'll put in a

folder called seven this is a muffin

cordon imagenet style data set this is

the self-image net is distributed so if

you have something in this honor where

the labels just whatever the folders

called you can say from folder okay and

that will create an image data bunch for

you and as you can see 3/7 it's created

the labels just by using the folder

names

another possibility and as you can see

we can train there at 99.5% accuracy buh

buh buh

another possibility and for this M list

sample I've got both it might come with

a CSV file that would look something

like this for each file name

what's its label now in this case the

labels are three or seven they're 0 or 1

which is basically is it a 7 or not so

that's another possibility so if this is

how your labels are you can use from CSV

and if it's called labels dot CSV you

don't even have to pass in a file name

if it's called anything else then you

can call pass in the CSV labels bar

there okay so that's how you can use a

CSV okay there it is this is now is it a

7 or not and not the possibility and

then you can coordinated up classes to

see what them another possibility is as

we've seen this you've got paths that

look like this and so in this case this

is the same thing these are the folders

that I could actually grab the the label

by using a regular expression and so

here's the original expression so we've

already seen that approach and again you

can see that our classes is founded so

what if you it's something that's in the

file name of a path but it's not just a

regular expression it's more complex you

can create an arbitrary function that

extracts a label from the file name or

path and in that case you would say from

name and function another possibility is

that even you need something even more

flexible on there

and so you're going to write some code

to create an array of labels and so in

that case you can just pass him from

lists so here as I've created an array

of labels through my labels is from

lists okay and then I just pass in that

break so you can see there's lots of

different ways of creating labels so so

during the week try this out now you

might be wondering how would you know to

do all these things like where am I

going to find this kind of information

right now would I

how do you possibly know to do all this

stuff so I'll show you something

incredibly cool let's grab this function

and do you remember to get documentation

we type doc and here is the

documentation for the function and I can

click show in dots and it pops up the

documentation so here's the thing every

single line of code I just showed you

I took it this morning and I copied and

pasted it from the documentation so you

can see here the exact code that I just

used so the documentation for fast AI

doesn't just tell you what to do but

step to step how to do it and here is

perhaps the coolest bit if you go too

fast AI fast AI underscored drops and

click on drop sauce it turns out that

all of our documentation is actually

just stupid about books so in this case

I was looking at vision data so here is

the vision data notebook you can

download this repo you can get clone up

and if you run it you can actually run

every single line of the documentation

yourself okay so so all of our Doc's is

also code and so like this is the kind

of the ultimate example to me of of

experimenting right is that you can now

experiment and

you'll see in in github it doesn't quite

render properly this github doesn't

quite know how to render notebooks

properly but if you get plowing this and

open it up in Jupiter you can see it and

so now anything that you read about the

documentation nearly everything of the

documentation has actual working

examples in it with actual data sets

that are already sitting in there in the

repo for you and so you can actually try

every single function in your browser

try seeing what goes in and try seeing

what comes out there's a question and

can will the library use multi GPU and

parallel by default the library will use

multiple CPUs by default but just one

GPU by default we've probably what

you're looking at maka GPU into your pot

true it's easy to do and you'll find it

on the forum but most people won't be

needing to use that now and the second

question is whether the library can use

3d data centers in IR yes it can and

there is actually a forum thread about

that already although that's not as

developed as 2d yet but maybe by the

time the MOOC is out it will be so

before I wrap up I'll just show you an

example of the kind of interesting stuff

that you can do by doing this kind of

exercise remember earlier I mentioned

that one of our alums who works at

Splunk which is a nasdaq listed big

successful company created this new ad

fraud software this is actually how he

created it as part of a fast AI part one

class project he talked the telemetry of

the of users who had Splunk analytics

installed and watched their mouse

movements and included pictures of the

mouse movements he converted speed into

color and right and left clicks into

splotches he then took the exact code

that we saw with an earlier version of

the software and trained a CNN in

exactly the way we saw and use that at a

train his fraud model so he basically

took something which is not obviously a

picture and he turned it into a picture

I've got these fantastically good

results for police overall analysis

software so it they're pleased to think

creatively so if you're wanting to study

sounds a lot of people that study sounds

do it by actually creating a spectrogram

image and then sticking that into a

confident so there's a lot of cool stuff

you can do with this so during the week

yeah get your jet your GPU going try and

use your first notebook make sure that

you can use Lesson one and work through

it and then see if you can repeat the

process on your own data set get on the

forum and tell us any little success you

had it's like oh I spent three days

trying to get my GPU running and I

finally did any constraints you hit you

know try it for an hour or two but if

you get stuck please ask and if you're

able to successfully build a model with

a new data set let us know and I will

see you next week