Deep learning applications that I want to build
LeafNet: app that recognizes tree from photo
I don't know lots about trees. Just looking at their leaves, I know that experts could pinpoint exactly what is the tree's name. But I couldn't do that so I want to have an application that recognizes trees from just looking at its leaf photo.
You could use this for fun, just learning about trees or use it for research. Sometimes, even if you are in a forest full of trees you don't recognize, use LeafNet. It would tell you whether or not the leaves you find are edible or poisonous.
An application for instantaneous class attendance checking.
Normally, teachers would call the students' name individually (sequentially) one-by-one to check for attendance. But that's too slow and waste lecture time. I want to build an app where teachers could just use their smart phone to take a photo of students in their class room as a group. The app then tells all the names of students inside the photo ranked by how confidence it has for each face almost instantly. The data would then be transferred and stored on the official university's web server.
We could extend it to make the software understand facial expression so that we can evaluate students' understanding of the materials at any given point in time. The possibilities are endless.
No more calling names! "Somchai; Present! Somying; Present! Chanchana; ... Chanchana; So Chanchana is absent! ..." All of these phrases need to be gone.
An application for preventing crime especially car theft
CCTVs are great for detecting crime scene after the crime has ALREADY been done. I want to make CCTVs smarter by making it detect crime earlier. How early? As soon as it sees a suspect crime scene happening, it should alert the security guard to go check the scene. If we make it smart enough, you would feel like a non-fatigue non-tired person with sharp eyes watching the screen to detect the crime for you at all times. That's a super high quality safety guard right there!
How to make the app
The simple solution (I think) is to use deep learning to classify cars and its owner's faces. Then we just take a frame from the video and detect cars and persons. A simple heuristic is to just check if the car recently detected has its owner equal to the person standing right next to it. If the car moves but the person riding it is not the owner then ring the alarm! We could do many things to improve it like classifying whether or not the person looks suspicious. But you get the idea.
We then do this for every CCTV's frame because deep convolutional neural networks' inference time is very little. You could do this a hundred time per second. The benefit of this application is off the roof!
Install CCTVs and feed video from them real-time into the deep learning model in a low-end computer. Wait for the computer to yell BEWARE THIEF and you are set!
Now, in the sleep there will be less thieves who dare to steal your car during the night. This applies to all type of vehicles including motorcycles and buses. You can also detect helmets thieves because helmets are also expensive nowadays. Again, the possibilities are endless. These are just small examples of use cases.
Algorithms and strategies I have discovered
Injecting Domain Knowledge into Neural Networks
Injecting domain knowledge makes deep learning more tractable for ordinary people that do not own/conquer a huge amount of data like Google or Facebook does. The model would generalizes faster if you teach it the concepts that you know instead of letting it learn everything from scratch. Covnets and Recurrent Nets are good examples of modifications to the good old plain feed forward networks that make learning with grid data and sequences much more feasible. The good same way applies to domain knowledge, if a network has been taught a good domain knowledge, the time it takes to learn will be less and it will generalizes better with less training data.
The first revision of "Nice-Student Network"
The first simple version I think of is to just using hand-crafted features concatenated to the raw input features.
For example, if you have a raw data like pixel values, and you want to classify whether a picture is of which digit. (A good example of this is from MNIST dataset)
Then you could just find a domain knowledge feature like how many holes there are in the picture to guide your network. (8 has 2 holes, 6 has one, 1 has none, etc) So you have hundreds of pixel values plus a single "holes" feature. Your vector shape should be something like [100+1]
After the network has been fed the crucial feature like that, it will try to use the feature for inference. Because it is a very useful feature, it can use to eliminate a lot of classes if they don't belong in certain "holes" number.
The second revision "Nice-Student Network"
The first revision's flaws
The network relies on that hand-crafted features fed by you. It doesn't need to learn, just memorize is enough for it! The problem with a brain that doesn't learn? It just sucks when there aren't any domain knowledge feed in. If you neglect to put the "holes" feature into the input layer then the network won't be able to classify anything correctly!
Instead of feeding in the domain knowledge as the input, I would force the network to learn from the raw input data to create that hand-crafted feature inside some hidden units!
Put simply, I would encode the domain knowledge inside hidden layers instead of the input layer.
How to make the network learn to encode the domain knowledge
Select a handful of hidden units then force it to output hand-crafted features by generating some kind of loss (or some would call 'cost function') and optimize/minimize that! Simple, isn't it?
Optimize using Gradient Descent together with the actual label loss would be ideal. So the network have to learn about hand-crafted features and the label at the same time.
- The 2nd revision doesn't rely on the hand-crafted features. If we do not feed it, it won't be starved craving for more features!
- The network will be less black-box, some hidden units will be more interpretable. The black-box is a really troublesome problem for neural networks. In the past, neural nets are good for accuracy but not so explainable to people who crave for explanation.
- The network will perform better if our hand-crafted features are useful.
- The network will learn instead of recklessly memorize the hand-crafted features.
- Need less training data and time to converge and generalize.
Extra thoughts, try to guess why I name it "Nice-Student Network";