Skip to content

Instantly share code, notes, and snippets.

@darencard
Created October 3, 2018 16:40
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save darencard/b2592af2a54ffaf6fa02701ada2bb5e7 to your computer and use it in GitHub Desktop.
Save darencard/b2592af2a54ffaf6fa02701ada2bb5e7 to your computer and use it in GitHub Desktop.
Installing, authenticating, and downloading using BaseSpace CLI

Installing, authenticating, and downloading using BaseSpace CLI

Installation on Mac computer

  1. Install BaseSpace CLI to $HOME/bin directory and make executable.
wget "https://api.bintray.com/content/basespace/BaseSpaceCLI-EarlyAccess-BIN/latest/\$latest/amd64-osx/bs?bt_package=latest" \
-O $HOME/bin/bs
chmod u+x $HOME/bin/bs
  1. Install the BaseSpace copy application to the $HOME/bin directory and make executable.
wget https://api.bintray.com/content/basespace/BaseSpace-Copy-BIN/\$latest/osx/bscp?bt_package=develop \
-O $HOME/bin/bs-cp
chmod u+x $HOME/bin/bs-cp
  1. In your internet browser, log into BaseSpace through either your personal or a private domain (i.e., organization) login.

  2. Once logged in in your browser, authenticate bs.

bs authenticate

Copy the URL supplied by the command and paste into your web browser, and since you are already signed, BaseSpace should prompt to accept terms and then properly authenticate. The command prompt in your shell will recycle when this is complete.

  1. List available datasets in BaseSpace.
bs list datasets

For the sake of this tutorial, let's focus on the project CVOS_WGS_J18.

  1. List all datasets associated with this project.
bs list datasets --terse -f csv --filter-field=Project.Name --filter-term=CVOS_WGS_J18
  1. We can also list attributes associated with each dataset.
bs list datasets --terse -f csv --filter-field=Project.Name --filter-term=CVOS_WGS_J18 | \
while read id; do echo ${id}; bs list attributes dataset -i ${id}; done
  1. Let's list the contents of each dataset as well.
bs list datasets --terse -f csv --filter-field=Project.Name --filter-term=CVOS_WGS_J18 | \
while read id; do echo ${id}; bs contents dataset -i ${id}; done

Here we can see paths for each file, but not sure how to use that.

  1. We can download a dataset by doing the following. We will focus on dataset ds.bcb4c9f5b0d34ccaa44859af9a5fb5e1 since it is relatively small.
# will take a second to start downloading
cd /path/to/store/data
bs download dataset -i ds.bcb4c9f5b0d34ccaa44859af9a5fb5e1 --extension=fastq.gz -o ./
  1. Unfortunately, bs download does not download a MD5 checksum to verify the integrity of the data. (But maybe it checks it automatically as part of the download - something to ask Illumina). Fortunately another tool, bs cp, does download checksums.
cd /path/to/store/data
bs cp --write-md5 <basespace_location> ./

Just need to figure out how to determine the path on BaseSpace.

@Yale73
Copy link

Yale73 commented Nov 21, 2020

Hi,

How can we download a project folder including many datasets, rather than one dataset?

Thanks,
Yale

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment