Deforestation accounts for nearly 20% of all global carbon emissions, and is responsible for trillions of dollars of economic loss. With 80% of all Amazon timber being illegally logged, a solution for the detection and detterence of this is desperately required. However, there are a number of obstacles . Given the vast and dense nature of rainforests, rangers simply don't have the resources or manpower to physically monitor thousands of acres.
A not-for-profit called Rainforest Connection, lead by Topher White, has devised an effective and resourceful solution. Old mobile phones are collected, equipped with solar panels and placed in the branches of trees where they listen for sounds of logging trucks and chainsaws. One mobile phone can detect illegal logging over a kilometer away, protecting over 300 hectares of rainforest and preventing 15,000 tons of CO2 release - more than what 3000 cars release in a year.
The detection algorithm behind this is driven by TensorFlow, a deep learning framework developed by the Google Brain Team. I used TensorFlow to develop a model that is at least 93% accurate at detecting distant chainsaw noises in the forest.
Due to the lack of rainforests and illegal loggers in my immediate vicinity, I used the scaper library to simulate a rainforest soundscape. What this code does is overlay various chainsaw noises (from YouTube) on top of a variety of rainforesty noises (also from YouTube), including thunderstorms, animals and flowing water. In order to give a wide representation of possible chainsaw noises, the samples are randomly varied in volume and pitch.
Here are two example soundscapes, the first without a chainsaw and the second with. Can you hear it?
While there have been great advances in visual deep learning, auditory deep learning remains primitive. A spectrogram is a visual representation of a sound, where the occurences of each frequency are plotted against time. Convolutional Neural Network are pretty good at recognising signals in images - like chainsaw noises in spectograms. In order to leverge the power of CNN's, we have to convert the raw audio data into an image format using the librosa python library.
This an example spectrogram and it's associated soundscape. See if you can hear some of the features:
Originally the images were 600x400 but this was way to much for my GPU to handle. Scaling down the images to 100x100 produced faster and more accurate results. There is a 50/50 split between soundscapes with chainsaws and soundscapes without.
The following snippet breaks the sample images into train, test and evaluation sets. There are 2000 images in total, 100 in evaluation, and the rest split 80/20 into train/test.
Then, we train the model on 2 CNN layers and a single Dense output. Heres each of the layers briefly explained
source: https://www.freecodecamp.org/news/an-intuitive-guide-to-convolutional-neural-networks-260c2de0a050/ Scans over the image, filtering the important features using a 3x3 'eye'. The weights in the filter determine what the output value is.
source: https://medium.com/@adityaraj_64455/it-all-started-with-cnns-alexnet-3023b21bb891 From the output of the convolutional layer, simplify the matrix by only taking the largest value from a 2x2 square. This reduces the amount of work the GPU has to do, whilst maintaining the information.
The dropout layer randomly removes neurons to prevent overfitting - basically the neural network cheating by memorising the images. A dropout rate of 0.4 means 40% of the neurons will be randomly removed at each batch.
The flatten layer converts the 2D image into a long 1D column for the Dense layer to process. The activation is sigmoid, meaning the output of the neural network is a value between 0 and 1. Values close to 0 suggest the network is confident there is no chainsaw in the sample, and a value close to 1 suggests there is.
The network was trained for about 2 minutes over 200 epochs on a GTX 1660Ti GPU. The results were fantastic - close to 97% validation accuracy and 0.057 loss with a pretty healthy looking curve.
On 100 evaluation images the network has never seen before, the model scored 93%!
I think there is a huge potential for auditory machine learning and IoT devices. Privacy concerns aside, a web of internet connected, solar-powered microphones running a simple TensorFlow model could detect illegal poaching, traffic conjestion, and trespassing over huge areas with little maintenance.