Skip to content

Instantly share code, notes, and snippets.

@sammlapp
Last active August 19, 2023 22:29
Show Gist options
  • Save sammlapp/37901c8e9a2a18310a146c5e9c2e811b to your computer and use it in GitHub Desktop.
Save sammlapp/37901c8e9a2a18310a146c5e9c2e811b to your computer and use it in GitHub Desktop.

Use cases

Load, inspect, and manipulate audio files

``` from opensoundscape import Audio Audio.from_file('/path/audio.wav') ```

Head to the Audio tutorial notebook

See also:

  • docs for Audio class

Create and view spectrograms

``` from opensoundscape import Spectrogram, Audio s=Spectrogram.from_audio(Audio.from_file('/path/audio.wav')) s.plot() ``` Check out the Spectrogram tutorial notebook

See also:

  • docs for Spectrogram class

Use an existing ML model to recognize sounds

Select your use case depending on where you will get your pre-trained model from:

See Readme of bioacoustics model zoo for a tutorial.

In short:

list available models in the GitHub repo bioacoustics-model-zoo

import torch
torch.hub.list('kitzeslab/bioacoustics-model-zoo')

Get a ready-to-use model object: choose from the models listed in the previous command

model = torch.hub.load('kitzeslab/bioacoustics-model-zoo','rana_sierrae_cnn')

model is an OpenSoundscape CNN object which you can use as normal. For instance, use the model to generate predictions on an audio file:

audio_file_path = './hydrophone_10s.wav'
scores = model.predict([audio_file_path],activation_layer='softmax')
scores

Model file on your computer

Head to the CNN prediction tutorial notebook

Train a model to recognize sounds

Select a use case from below:

Not sure where to begin?

- Consider how much automated detection and classification will help you versus how much effort and time it will take to develop: how much data do you have? what sort of information do you want from it? Currently, AI approaches for acoustic monitoring are useful for detecting specific sounds in large datasets, such as those recorded by ARUs. More complex tasks such as counting individual organisms, monitoring behavior, or recognizing individuals are, at best, very difficult for current AI methods. - if you are trying to detect a relatively common bird species, check if the [Bird-NET](https://github.com/kahst/BirdNET-Analyzer) detector works well enough for your needs. Make sure to check false positive and false negative rates on your field data. - if you are trying to detect bats, there may be an existing software tool that meets your needs. A quick google search will get you started - do you have (or can you aquire) many (i.e., tens to hundreds) of examples of the sound you are interested in? If not, consider signal processsing approaches. In particular, if the sound you are interested in contains regular periodict structure in time, like the calls of many frogs, toads, and insects, check out [RIBBIT](https://github.com/kitzeslab/ribbit_manuscript_notebooks) and the [accelerating series detector](https://github.com/orgs/kitzeslab/repositories?type=all). - if you do have labeled data, consider training a CNN using opensoundscape (see below)

Train a CNN using Raven annotations

Start with the notebook tutorial on preparing Raven annotations for training CNN in OpenSoundscape, then proceed to CNN training below

Train a CNN from a dataframe of labels

Head to the tutorial notebook for training a CNN

If you're feeling good about those and want to keep making tweaks to improve your model, proceed to the deep end with the Customizing CNN Training notebook. 

Deep learning with distributed/cluster computing

OpenSoundscape supports GPU and multi-GPU acceleration of deep learning models during training and inference (prediction). Here are some tips to get you started

  • the nvidia-smi command is usually the best way to monitor GPU usage and GPU memory usage.
    • example where we check GPU usage once per second for 100 seconds:
timeout 100 nvidia-smi --query-gpu=timestamp,pci.bus_id,utilization.gpu,memory.used,memory.total, --format=csv -l 1

(add on >> cuda_log.csv at the end to log to a file instead of printing to your terminal)

  • install a version of PyTorch that is compatible with your CUDA version. This page will help you find the correct Pytorch version. Use nvidia-smi to check your cuda version.
  • choose the num_workers argument >1 to parallelize preprocessing across CPUs when running .train() and .predict()
  • for training, generally choose large batch sizes that are powers of 2 (at least 64, sometimes as large as 1024+) but if you get CUDA out-of-memory errors, lower your batch size
  • the CNN class's .device attribute specifies where the network will run forward and backward passes.
    • OpenSoundscape will automatically try to find and use a cuda device (cuda:0) by default
    • you can specify a cuda device by writing, for instance, cnn.device='cuda:1' where cnn is your opensoundscape.CNN object
    • to parallelize over multiple GPU devices, wrap the CNN object's .network attribute (which is a PyTorch model object) in DataParallel like this: model.network = torch.nn.DataParallel(model.network, device_ids=[0, 1]).cuda(). The device IDs list should specify which CUDA devices to use. Use the command nvidia-smi to list all CUDA-compatible GPU devices on your machine.
    • if you have an Apple Silicon (M1, M2, etc) chip on your Mac laptop, you can use GPU acceleration by setting cnn.device='mps' where cnn is your opensoundscape.CNN object
  • Even when using GPU nodes, the preprocessing steps (loading audio, creating a spectrogram, converting to tensor, etc) will happen on CPU nodes. This is important because it can cause a speed bottleneck for the entire process. Typically you'll need 5-10 CPU tasks for each GPU to avoid a preprocessing bottleneck, but of course, this depends on several things: the speed of the GPU, size of the network (larger networks -> more work for the GPU), and the amount of preprocessing done on each sample (heavier preprocessing or larger audio samples -> more work for the CPUs).
  • Even more so than preprocessing, I/O (reading data files like WAV audio files from a storage location) can limit the speed of training and prediction. Store your data on the fastest drive you can, and somewhere "close" to where your model is running. From fastest to slowest: internal NVMe drive > SSD > HDD > external HDD > internet connection. (Note that some advanced hard drive configurations allow you to read data in parallel from several drives). If your data is stored on HDD (a spinning disk drive) or on a networked device (accessed by your compute machine via an internet connection), I/O will severely bottleneck your training and prediction speeds.

Use the RIBBIT method to detect repeated-element sounds

Look at the RIBBIT tutorial notebook. See also:

Detect accelerating call patterns with signal processing

Work through an example of using signal processing to detect the accelerating wing drumming pattern of Ruffed Grouse

Ruffed grouse manuscript notebooks

See also: manuscript

Spatially localize sounds from a synchronized grid of recorders

Acoustic localization methods are available in OpenSoundscape as a "beta" feature, meaning that they are in active development and will continue to improve as we refine our software tools.

Head to the Localization tutorial notebook to see the tools in action.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment