Use conda to install the TileDB Python API and other dependencies.
mamba env create -f environment.yaml
conda activate aind-demo
The ingest-metadata.py
script requires two arguments:
metadata_dir
: Directory containing processed metadata for a particular mouse.array_uri
: Location where the new array will be created. This can be a local file path for S3 URI.
Each subdirectory within metadata_dir
represents one coronal section of the mouse brain and is named according to the following convention: 202203030920_60988223_VMSC01601
, where 609882
is the mouse ID and 23
is the section number. The processed_metadata.csv
file within each subdirectory contains the data to be ingested.
Example directory:
> tree data/mouse_1
data/mouse_1
├── 202202170851_60988201_VMSC01001
│ └── metadata_processed.csv.gz
├── 202202170851_609882HK01_VMSC01001
│ └── metadata_processed.csv.gz
├── 202202170855_60988202_VMSC01601
│ └── metadata_processed.csv.gz
└── 202202170915_60988203_VMSC00401
└── metadata_processed.csv.gz
You can run the ingest-metadata.py
script with the following command:
> python ingest-metadata.py data/mouse_1 data/arrays/mouse01_processed_metadata
[2022-07-07 06:41:32,967][INFO]: Loading csv 0: data/mouse_1/202202170851_60988201_VMSC01001/metadata_processed.csv.gz
[2022-07-07 06:41:35,925][INFO]: Creating array data/arrays/mouse01_processed_metadata
[2022-07-07 06:41:36,203][INFO]: Ingesting metadata for section 0
[2022-07-07 06:41:38,549][INFO]: Ingested 198093 records
[2022-07-07 06:41:38,549][INFO]: Loading csv 1: data/mouse_1/202202170855_60988202_VMSC01601/metadata_processed.csv.gz
[2022-07-07 06:41:41,272][INFO]: Ingesting metadata for section 1
[2022-07-07 06:41:43,377][INFO]: Ingested 173331 records
[2022-07-07 06:41:43,377][INFO]: Loading csv 2: data/mouse_1/202202170915_60988203_VMSC00401/metadata_processed.csv.gz
[2022-07-07 06:41:45,511][INFO]: Ingesting metadata for section 2
[2022-07-07 06:41:47,169][INFO]: Ingested 137213 records
[2022-07-07 06:41:47,170][INFO]: Finished ingesting all csv files