Skip to content

Instantly share code, notes, and snippets.

@schlameel
Last active November 8, 2023 00:10
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save schlameel/3e2c043f8b8a0658fff75581dd88cc6a to your computer and use it in GitHub Desktop.
Save schlameel/3e2c043f8b8a0658fff75581dd88cc6a to your computer and use it in GitHub Desktop.
De-interleave data using numpy
import numpy as np
CHANNEL_COUNT = 2
frames = np.array([0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1])
deinterleaved = [frames[idx::CHANNEL_COUNT] for idx in range(CHANNEL_COUNT)]
print(deinterleaved[0])
# prints "[0 0 0 0 0 0 0 0 0 0]"
print(deinterleaved[1])
# prints "[1 1 1 1 1 1 1 1 1 1]"
@RGD2
Copy link

RGD2 commented Nov 8, 2023

Not needed though: you can open your data with a dtype naming each channel for each frame. Better, this allows you to open memmapped files which can have channels with different data types/sizes, eg, first channel might be U16, second might be S16, etc.
e.g.:

dtype=numpy.dtype([('ch1','>u2'),('ch2','>s2')]) 
mmf = np.memmap('somebigfile.pcm', dtype=dtype, mode='r')

Then you can just access it like so (which does the de-interleaving for you):

deinterleaved1 = mmf['ch1']
deinterleaved2 = mmf['ch2']

This is better, because the data isn't being awkwardly copied into a python list, so numpy will leave it where it is (a memory mapped file is very good for this - on a 64bit system, that file could be TB in size, and this will work fine, and go as fast as the drive allows).
Or you might take a slice of the mmf based on indexing into it, and then access the channels of the slice. Good way to avoid copying all of what could be a huge file!

Works for any number of channels, so long as you can make the right dtype object to describe it.

There are a couple gotchas: They only bite if the data isn't a power of two number of bytes. (1,2,4,8 etc). In this case you can't do the above - at least not as of Numpy V 1.23.5.

Eg, If data is a set of booleans stored as flags in a byte: you must use 'u1' for the channel type, and then later use numpy.unpackbits() to separate the flags (you'd name them then). Similar if it's flags in a word '2', dword '4' or qword '8', although in those cases it also matters whether it's little-endian '<' or big-endian '>' too.

There's a way to deal with packed 24-bit data too ('>s3' doesn't work!), but it's a bit awkward. (open as 32bit, 'break' the stride from 4 to 3 modifying how the view accesses the array under the hood, and mask off the resulting overlapping byte for every access using a bitwise-and).
Still better than taking a massive copy though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment