Skip to content

Instantly share code, notes, and snippets.

@daleroberts
Last active May 27, 2016 00:00
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save daleroberts/dcef111f7025586d3598f2ba07886041 to your computer and use it in GitHub Desktop.
Save daleroberts/dcef111f7025586d3598f2ba07886041 to your computer and use it in GitHub Desktop.
Benchmarking various ways to mask an array based on a bitmask
# 8=no_contiguity, 9=sea, 10=cloud_acca, 11=cloud_fmask,
# 12=cloudshadow_acca, 13=cloudshadow_fmask, 14=toposhadow
import numpy as np
xsize, ysize, nbands = 4000, 4000, 6
# generate some test data
pq = np.zeros((xsize, ysize), dtype=np.uint16)
for bit in [9,8,11]:
pq[np.random.randint(0,2,pq.shape).astype(np.bool)] |= 2<<bit
# do the magic once
assert(pq.dtype == np.uint16)
bits = lambda *b: np.logical_or.reduce([pq&(2<<s)==0 for s in b], dtype=np.bool)
mask = bits(8, 10, 11, 12)
# example use case
img = np.zeros((xsize, ysize, nbands), dtype=np.float32)
img[mask] = np.nan
nancount = np.count_nonzero(np.isnan(img))
try:
# memory profiling: `python3 -m memory_profiler testmask.py`
# line profiling: `kernprof -l -v testmask.py`
@profile
def test1():
bits = lambda *b: np.logical_or.reduce([pq&(2<<s)==0 for s in b], dtype=np.bool)
mask = bits(8, 10, 11, 12)
img[mask] = np.nan # use broadcast
assert(np.count_nonzero(np.isnan(img))==nancount)
@profile
def test2():
bits = lambda *b: np.logical_or.reduce([pq&(2<<s)==0 for s in b], dtype=np.bool)
mask = np.reshape(np.repeat(bits(8, 10, 11, 12), nbands), (*pq.shape, nbands))
img[mask] = np.nan
assert(np.count_nonzero(np.isnan(img))==nancount)
@profile
def test3():
bits = lambda *b: np.logical_or.reduce([pq&(2<<s)==0 for s in b], dtype=np.bool)
mask = np.dstack([bits(8, 10, 11, 12) for _ in range(nbands)])
img[mask] = np.nan
assert(np.count_nonzero(np.isnan(img))==nancount)
@profile
def test4():
def bits(*b):
result = np.full(pq.shape, False, dtype=bool)
for s in b:
result |= (pq&(2<<s)).astype(bool)
return result
mask = np.reshape(np.repeat(bits(8, 10, 11, 12), nbands), (*pq.shape, nbands))
img[mask] = np.nan
assert(np.count_nonzero(np.isnan(img))==nancount)
@profile
def test5():
bits = lambda *b: np.bitwise_or.reduce([pq&(2<<s) for s in b], dtype=np.uint16)==0
mask = np.reshape(np.repeat(bits(8, 10, 11, 12), nbands), (*pq.shape, nbands))
img[mask] = np.nan
assert(np.count_nonzero(np.isnan(img))==nancount)
@profile
def test5():
bits = lambda *b: np.bitwise_or.reduce([pq&(2<<s) for s in b], dtype=np.uint16)==0
mask = bits(8, 10, 11, 12)
img[mask] = np.nan # use broadcast
assert(np.count_nonzero(np.isnan(img))==nancount)
test1()
test2()
test3()
test4()
test5() # fastest
except NameError:
pass
@sixy6e
Copy link

sixy6e commented May 11, 2016

The one under eotools is nowhere near as neat.
https://github.com/GeoscienceAustralia/eo-tools/blob/stable/eotools/pq_utils.py#L62

Just a question, why create a 3D mask? The broadcasting rules should take care of it.

mask1 = np.reshape(np.repeat(bits(8, 10, 11, 12), nbands), (pq.shape[0], pq.shape[1], nbands))
mask2 = np.reshape(np.repeat(bits(8, 10, 11, 12), 1), (pq.shape[0], pq.shape[1]))
mask1.shape
mask2.shape
img2 = img.copy()
img[mask1] = 1
img2[mask2] = 1
(img - img2).sum()

@daleroberts
Copy link
Author

Thanks, just noticed your comment. Good point, I've added broadcast approach and combined with a new trick there is now a new fastest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment