freeman-lab/thunder-api.md

## thunder-api.md

      
    Raw
  

              thunder-api.md
            
          
    data types

images

count()

Explicit count of the number of items.
For lazy or distributed data, will force a computation.
first()

Return the first element.
toblocks(size='150')

Convert to Blocks, each representing a subdivision of the larger Images data.


size str or tuple of block size per dimension,
String interpreted as memory size (in megabytes, e.g. "64"). Tuple of ints interpreted as
"pixels per dimension". Only valid in spark mode.


totimeseries(size='150')

Converts this Images object to a TimeSeries object.
This method is equivalent to images.asBlocks(size).asSeries().asTimeSeries().


size string memory size optional default = "150M"
String interpreted as memory size (e.g. "64M").


units string either "pixels" or "splits" default = "pixels"
What units to use for a tuple size.


toseries(size='150')

Converts this Images object to a Series object.
This method is equivalent to images.toblocks(size).toSeries().


size string memory size optional default = "150M"
String interpreted as memory size (e.g. "64M").


tolocal()

Convert to local representation.
tospark(engine=None)

Convert to spark representation.
foreach(func)

Execute a function on each image
sample(nsamples=100, seed=None)

Extract random sample of series.


nsamples int optional default = 100
The number of data points to sample.


seed int optional default = None
Random seed.


map(func, dims=None, with_keys=False)

Map an array -> array function over each image
filter(func)

Filter images
reduce(func)

Reduce over images
mean()

Compute the mean across images
var()

Compute the variance across images
std()

Compute the standard deviation across images
sum()

Compute the sum across images
max()

Compute the max across images
min()

Compute the min across images
squeeze()

Remove single-dimensional axes from images.
max_projection(axis=2)

Compute maximum projections of images / volumes
along the specified dimension.


axis int optional default = 2
Which axis to compute projection along


max_min_projection(axis=2)

Compute maximum-minimum projections of images / volumes
along the specified dimension. This computes the sum
of the maximum and minimum values along the given dimension.


axis int optional default = 2
Which axis to compute projection along


subsample(factor)

Downsample an image volume by an integer factor


sample_factor positive int or tuple of positive ints
Stride to use in subsampling. If a single int is passed, each dimension of the image
will be downsampled by this same factor. If a tuple is passed, it must have the same
dimensionality of the image. The strides given in a passed tuple will be applied to
each image dimension.


gaussian_filter(sigma=2, order=0)

Spatially smooth images with a gaussian filter.
Filtering will be applied to every image in the collection.
parameters


sigma scalar or sequence of scalars default=2
Size of the filter size as standard deviation in pixels. A sequence is interpreted
as the standard deviation for each axis. A single scalar is applied equally to all
axes.


order choice of 0 / 1 / 2 / 3 or sequence from same set optional default = 0
Order of the gaussian kernel, 0 is a gaussian, higher numbers correspond
to derivatives of a gaussian.


uniform_filter(size=2)

Spatially filter images using a uniform filter.
Filtering will be applied to every image in the collection.
parameters
size: int, optional, default=2
Size of the filter neighbourhood in pixels. A sequence is interpreted
as the neighborhood size for each axis. A single scalar is applied equally to all
axes.
median_filter(size=2)

Spatially filter images using a median filter.
Filtering will be applied to every image in the collection.
parameters
size: int, optional, default=2
Size of the filter neighbourhood in pixels. A sequence is interpreted
as the neighborhood size for each axis. A single scalar is applied equally to all
axes.
localcorr(neighborhood=2)

Correlate every pixel to the average of its local neighborhood.
This algorithm computes, for every spatial record, the correlation coefficient
between that record's series, and the average series of all records within
a local neighborhood with a size defined by the neighborhood parameter.
The neighborhood is currently required to be a single integer,
which represents the neighborhood size in both x and y.


neighborhood int optional default=2
Size of the correlation neighborhood (in both the x and y directions), in pixels.


subtract(val)

Subtract a constant value or an image / volume from
all images / volumes in the data set.


val int float or ndarray
Value to subtract


topng(path, prefix="image", overwrite=False)

Write 2d or 3d images as PNG files.
Files will be written into a newly-created directory.
Three-dimensional data will be treated as RGB channels.


path string
Path to output directory, must be one level below an existing directory.


prefix string
String to prepend to filenames.


overwrite bool
If true, the directory given by path will first be deleted if it exists.


totif(path, prefix="image", overwrite=False)

Write 2d or 3d images as TIF files.
Files will be written into a newly-created directory.
Three-dimensional data will be treated as RGB channels.


path string
Path to output directory, must be one level below an existing directory.


prefix string
String to prepend to filenames.


overwrite bool
If true, the directory given by path will first be deleted if it exists.


tobinary(path, prefix="image", overwrite=False)

Write out images or volumes as flat binary files.
Files will be written into a newly-created directory.


path string
Path to output directory, must be one level below an existing directory.


prefix string
String to prepend to filenames.


overwrite bool
If true, the directory given by path will first be deleted if it exists.


map_as_series(func, value_size=None, block_size='150')

Efficiently apply a function to each time series
Applies a function to each time series without transforming all the way
to a Series object, but using a Blocks object instead for increased
efficiency in the transformation back to Images.


func function
Function to apply to each time series. Should take one-dimensional
ndarray and return the transformed one-dimensional ndarray.


value_size int optional default=None
Size of the one-dimensional ndarray resulting from application of
func. If not supplied, will be automatically inferred for an extra
computational cost.


block_size str or tuple of block size per dimension,
String interpreted as memory size (in megabytes e.g. "64"). Tuple of
ints interpreted as "pixels per dimension".


series

flatten()

Reshape all dimensions but the last into a single dimension
count()

Explicit count of the number of items.
For lazy or distributed data, will force a computation.
first()

Return the first element.
tolocal()

Convert to local representation.
tospark(engine=None)

Convert to spark representation.
sample(nsamples=100, seed=None)

Extract random sample of series.


nsamples int optional default = 100
The number of data points to sample.


seed int optional default = None
Random seed.


map(func, index=None, with_keys=False)

Map an array -> array function over each series
filter(func)

Filter by applying a function to each series.
reduce(func)

Reduce over series.
mean()

Compute the mean across images
var()

Compute the variance across images
std()

Compute the standard deviation across images
sum()

Compute the sum across images
max()

Compute the max across images
min()

Compute the min across images
between(left, right)

Select subset of values within the given index range
Inclusive on the left; exclusive on the right.


left int
Left-most index in the desired range


right: int
Right-most index in the desired range
select(crit)

Select subset of values that match a given index criterion


crit function list str int
Criterion function to map to indices, specific index value,
or list of indices


center(axis=1)

Center series data by subtracting the mean
either within or across records


axis int optional default = 0
Which axis to center along, within (1) or across (0) records


standardize(axis=1)

Standardize series data by dividing by the standard deviation
either within or across records


axis int optional default = 0
Which axis to standardize along, within (1) or across (0) records


zscore(axis=1)

Zscore series data by subtracting the mean and
dividing by the standard deviation either
within or across records


axis int optional default = 0
Which axis to zscore along, within (1) or across (0) records


squelch(threshold)

Set all records that do not exceed the given threhsold to 0


threshold scalar
Level below which to set records to zero


correlate(signal)

Correlate series data against one or many one-dimensional arrays.


signal array or str
Signal(s) to correlate against, can be a numpy array or a
MAT file containing the signal as a variable


series_max()

Compute the value maximum of each record in a Series
series_min()

Compute the value minimum of each record in a Series
series_sum()

Compute the value sum of each record in a Series
series_mean()

Compute the value mean of each record in a Series
series_median()

Compute the value median of each record in a Series
series_percentile(q)

Compute the value percentile of each record in a Series.


q scalar
Floating point number between 0 and 100 inclusive, specifying percentile.


series_std()

return self.series_stat('stdev')
series_stat(self, stat):
series_stat(stat)

Compute a simple statistic for each record in a Series


stat str
Which statistic to compute


series_stats()

Compute many statistics for each record in a Series
mean_by_panel(length)

Compute the mean across fixed sized panels of each record.
Splits each record into panels of size length,
and then computes the mean across panels.
Panel length must subdivide record exactly.


length int
Fixed length with which to subdivide.


select_by_index(val, level=0, squeeze=False, filter=False, return_mask=False)

Select or filter elements of the Series by index values (across levels, if multi-index).
The index is a property of a Series object that assigns a value to each position within
the arrays stored in the records of the Series. This function returns a new Series where,
within each record, only the elements indexed by a given value(s) are retained. An index
where each value is a list of a fixed length is referred to as a 'multi-index',
as it provides multiple labels for each index location. Each of the dimensions in these
sublists is a 'level' of the multi-index. If the index of the Series is a multi-index, then
the selection can proceed by first selecting one or more levels, and then selecting one
or more values at each level.


val list of lists
Specifies the selected index values. List must contain one list for each level of the
multi-index used in the selection. For any singleton lists, the list may be replaced
with just the integer.


level list of ints optional default=0
Specifies which levels in the multi-index to use when performing selection. If a single
level is selected, the list can be replaced with an integer. Must be the same length
as val.


squeeze bool optional default=False
If True, the multi-index of the resulting Series will drop any levels that contain
only a single value because of the selection. Useful if indices are used as unique
identifiers.


filter bool optional default=False
If True, selection process is reversed and all index values EXCEPT those specified
are selected.


return_mask bool optional default=False
If True, return the mask used to implement the selection.


aggregate_by_index(function, level=0)

Aggregrate data in each record, grouping by index values.
For each unique value of the index, applies a function to the group
indexed by that value. Returns a Series indexed by those unique values.
For the result to be a valid Series object, the aggregating function should
return a simple numeric type. Also allows selection of levels within a
multi-index. See select_by_index for more info on indices and multi-indices.


function function
Aggregating function to map to Series values. Should take a list or ndarray
as input and return a simple numeric value.


level list of ints optional default=0
Specifies the levels of the multi-index to use when determining unique index values.
If only a single level is desired, can be an int.


stat_by_index(stat, level=0)

Compute the desired statistic for each uniue index values (across levels, if multi-index)


stat string
Statistic to be computed: sum, mean, median, stdev, max, min, count


level list of ints optional default=0
Specifies the levels of the multi-index to use when determining unique index values.
If only a single level is desired, can be an int.


sum_by_index(level=0)

Compute sums for each unique index value (across levels, if multi-index)
mean_by_index(level=0)

Compute means for each unique index value (across levels, if multi-index)
median_by_index(level=0)

Compute medians for each unique index value (across levels, if multi-index)
std_by_index(level=0)

Compute means for each unique index value (across levels, if multi-index)
max_by_index(level=0)

Compute maximum values for each unique index value (across levels, if multi-index)
min_by_index(level=0)

Compute minimum values for each unique index value (across level, if multi-index)
count_by_index(level=0)

Count the number for each unique index value (across levels, if multi-index)
cov()

Compute covariance of a distributed matrix.


axis int optional default = None
Axis for performing mean subtraction, None (no subtraction), 0 (rows) or 1 (columns)


gramian()

Compute gramian of a distributed matrix.
The gramian is defined as the product of the matrix
with its transpose, i.e. A^T * A.
times(other)

Multiply a matrix by another one.
Other matrix must be a numpy array, a scalar,
or another matrix in local mode.


other Matrix scalar or numpy array
A matrix to multiply with


totimeseries()

Convert Series to TimeSeries, a subclass for time series computation.
toimages(size='150')

Converts Series to Images.
Equivalent to calling series.toBlocks(size).toImages()


size str optional default = "150M"
String interpreted as memory size.


tobinary(path, prefix='series', overwrite=False, credentials=None)

Write data to binary files.


path string path or URI to directory to be created
Output files will be written underneath path.
Directory will be created as a result of this call.


prefix str optional default = 'series'
String prefix for files.


overwrite bool
If true, path and all its contents will be deleted and
recreated as partof this call.


reading

images

fromrdd(rdd, dims=None, nrecords=None, dtype=None)

Load Images object from a Spark RDD.
Must be a collection of key-value pairs
where keys are singleton tuples indexing images,
and values are 2d or 3d ndarrays.


rdd SparkRDD
An RDD containing images


dims tuple or array optional default = None
Image dimensions (if provided will avoid check).


nrecords int optional default = None
Number of images (if provided will avoid check).


dtype string default = None
Data numerical type (if provided will avoid check)


fromarray(values, npartitions=None, engine=None)

Load Series object from a local array-like.
First dimension will be used to index images,
so remaining dimensions after the first should
be the dimensions of the images/volumes,
e.g. (3, 100, 200) for 3 x (100, 200) images


values array-like
The array of images


npartitions int default = None
Number of partitions for parallelization (Spark only)


engine object default = None
Computational engine (e.g. a SparkContext for Spark)


fromlist(items, accessor=None, keys=None, dims=None, dtype=None, npartitions=None, engine=None)

Load images from a list of items using the given accessor.


accessor function
Apply to each item from the list to yield an image


keys list optional default=None
An optional list of keys


dims tuple optional default=None
Specify a known image dimension to avoid computation.


npartitions int
Number of partitions for computational engine


frompath(path, accessor=None, ext=None, start=None, stop=None, recursive=False, npartitions=None, dims=None, dtype=None, recount=False, engine=None, credentials=None)

Load images from a path using the given accessor.
Supports both local and remote filesystems.


accessor function
Apply to each item after loading to yield an image.


ext str optional default=None
File extension.


npartitions int optional default=None
Number of partitions for computational engine,
if None will use default for engine.


dims tuple optional default=None
Dimensions of images.


dtype str optional default=None
Numerical type of images.


start, stop: nonnegative int, optional, default=None
Indices of files to load, interpreted using Python slicing conventions.


recursive boolean optional default=False
If true, will recursively descend directories from path, loading all files
with an extension matching 'ext'.


recount boolean optional default=False
Force subsequent record counting.


frombinary(path, shape=None, dtype=None, ext='bin', start=None, stop=None, recursive=False, nplanes=None, npartitions=None, conf='conf.json', order='C', engine=None, credentials=None)

Load images from flat binary files.
Assumes one image per file, each with the shape and ordering as given
by the input arguments.


path str
Path to data files or directory, specified as either a local filesystem path
or in a URI-like format, including scheme. May include a single '*' wildcard character.


shape tuple of positive int
Dimensions of input image data.


ext string optional default="bin"
Extension required on data files to be loaded.


start, stop nonnegative int optional default=None
Indices of the first and last-plus-one file to load, relative to the sorted
filenames matching path and ext. Interpreted using python slice indexing conventions.


recursive boolean optional default=False
If true, will recursively descend directories from path, loading all files
with an extension matching 'ext'.


nplanes positive integer optional default=None
If passed, will cause single files to be subdivided into nplanes separate images.
Otherwise, each file is taken to represent one image.


npartitions int optional default=None
Number of partitions for computational engine,
if None will use default for engine.


fromtif(path, ext='tif', start=None, stop=None, recursive=False, nplanes=None, npartitions=None, engine=None, credentials=None)

Loads images from single or multi-page TIF files.


path str
Path to data files or directory, specified as either a local filesystem path
or in a URI-like format, including scheme. May include a single '*' wildcard character.


ext string optional default="tif"
Extension required on data files to be loaded.


start, stop nonnegative int optional default=None
Indices of the first and last-plus-one file to load, relative to the sorted
filenames matching 'path' and 'ext'. Interpreted using python slice indexing conventions.


recursive boolean optional default=False
If true, will recursively descend directories from path, loading all files
with an extension matching 'ext'.


nplanes positive integer optional default=None
If passed, will cause single files to be subdivided into nplanes separate images.
Otherwise, each file is taken to represent one image.


npartitions int optional default=None
Number of partitions for computational engine,
if None will use default for engine.


frompng(path, ext='png', start=None, stop=None, recursive=False, npartitions=None, engine=None, credentials=None)

Load images from PNG files.


path str
Path to data files or directory, specified as either a local filesystem path
or in a URI-like format, including scheme. May include a single '*' wildcard character.


ext string optional default="tif"
Extension required on data files to be loaded.


start, stop nonnegative int optional default=None
Indices of the first and last-plus-one file to load, relative to the sorted
filenames matching path and ext. Interpreted using python slice indexing conventions.


recursive boolean optional default=False
If true, will recursively descend directories from path, loading all files
with an extension matching 'ext'.


npartitions int optional default=None
Number of partitions for computational engine,
if None will use default for engine.


fromrandom(shape=(10, 50, 50), npartitions=1, seed=42, engine=None)

Generate random image data.


shape tuple optional default=(10 50 50)
Dimensions of images.


npartitions int optional default=1
Number of partitions.


seed int optional default=42
Random seed.


fromexample(name=None, engine=None)

Load example image data.
Data must be downloaded from S3, so this method requires
an internet connection.


name str
Name of dataset, if not specified will print options.


series

fromrdd(rdd, nrecords=None, shape=None, index=None, dtype=None)

Load Series object from a Spark RDD.
Assumes keys are tuples with increasing and unique indices,
and values are 1d ndarrays. Will try to infer properties
that are not explicitly provided.


rdd SparkRDD
An RDD containing series data.


shape tuple or array optional default = None
Total shape of data (if provided will avoid check).


nrecords int optional default = None
Number of records (if provided will avoid check).


index array optional default = None
Index for records, if not provided will use (0, 1, ...)


dtype string default = None
Data numerical type (if provided will avoid check)


fromarray(values, index=None, npartitions=None, engine=None)

Load Series object from a local numpy array.
Assumes that all but final dimension index the records,
and the size of the final dimension is the length of each record,
e.g. a (2, 3, 4) array will be treated as 2 x 3 records of size (4,)


values array-like
An array containing the data.


index array optional default = None
Index for records, if not provided will use (0,1,...,N)
where N is the length of each record.


npartitions int default = None
Number of partitions for parallelization (Spark only)


engine object default = None
Computational engine (e.g. a SparkContext for Spark)


fromlist(items, accessor=None, index=None, dtype=None, npartitions=None, engine=None)

Create a Series object from a list of items and optional accessor function.
Will call accessor function on each item from the list,
providing a generic interface for data loading.


items list
A list of items to load.


accessor function optional default = None
A function to apply to each item in the list during loading.


index array optional default = None
Index for records, if not provided will use (0,1,...,N)
where N is the length of each record.


dtype string default = None
Data numerical type (if provided will avoid check)


npartitions int default = None
Number of partitions for parallelization (Spark only)


engine object default = None
Computational engine (e.g. a SparkContext for Spark)


fromtext(path, ext='txt', dtype='float64', skip=0, shape=None, index=None, npartitions=None, engine=None, credentials=None)

Loads Series data from text files.
Assumes data are formatted as rows, where each record is a row
of numbers separated by spaces e.g. 'v v v v v'. You can
optionally specify a fixed number of initial items per row to skip / discard.


path string
Directory to load from, can be a URI string with scheme
(e.g. "file://", "s3n://", or "gs://"), or a single file,
or a directory, or a directory with a single wildcard character.


ext str optional default = 'txt'
File extension.


dtype: dtype or dtype specifier, default 'float64'
Numerical type to use for data after converting from text.


skip int optional default = 0
Number of items in each record to skip.


shape tuple or list optional default = None
Shape of data if known, will be inferred otherwise.


index array optional default = None
Index for records, if not provided will use (0, 1, ...)


npartitions int default = None
Number of partitions for parallelization (Spark only)


engine object default = None
Computational engine (e.g. a SparkContext for Spark)


credentials dict default = None
Credentials for remote storage (e.g. S3) in the form {access: ***, secret: ***}


frombinary(path, ext='bin', conf='conf.json', dtype=None, shape=None, skip=0, index=None, engine=None, credentials=None)

Load a Series object from flat binary files.


path string URI or local filesystem path
Directory to load from, can be a URI string with scheme
(e.g. "file://", "s3n://", or "gs://"), or a single file,
or a directory, or a directory with a single wildcard character.


ext str optional default = 'bin'
Optional file extension specifier.


conf str optional default = 'conf.json'
Name of conf file with type and size information.


dtype: dtype or dtype specifier, default 'float64'
Numerical type to use for data after converting from text.


shape tuple or list optional default = None
Shape of data if known, will be inferred otherwise.


skip int optional default = 0
Number of items in each record to skip.


index array optional default = None
Index for records, if not provided will use (0, 1, ...)


engine object default = None
Computational engine (e.g. a SparkContext for Spark)


credentials dict default = None
Credentials for remote storage (e.g. S3) in the form {access: ***, secret: ***}


fromrandom(shape=(100, 10), npartitions=1, seed=42, engine=None)

Generate gaussian random series data.


shape tuple
Dimensions of data.


npartitions int
Number of partitions with which to distribute data.


seed int
Randomization seed.


fromexample(name=None, engine=None)

Load example series data.
Data must be downloaded from S3, so this method requires
an internet connection.


name str
Name of dataset, options include 'iris' | 'mouse' | 'fish'.
If not specified will print options.