Skip to content

Instantly share code, notes, and snippets.

@xinaxu
Last active January 23, 2024 14:57
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save xinaxu/aa2fe82b311a2d203c644cdfd13aca63 to your computer and use it in GitHub Desktop.
Save xinaxu/aa2fe82b311a2d203c644cdfd13aca63 to your computer and use it in GitHub Desktop.
Singularity Workshop

Singularity Workshop

This instruction will guide you through all steps necessary to use Singularity to

  1. Prepare an open dataset from S3
  2. Send deal to a local emulated storage provider f02815405
  3. Make retrievals from the emulated storage provider using HTTP and Bitswap

Prerequisites

  1. Download latest pre-built Singularity release
  2. Download latest pre-built sim-sp release
  3. Do not use package format, i.e. '.deb', '.rpm' as it will be installed to your system PATH
  4. Extract above to the same folder. After extraction, the folder should contain two executables singularity, sim-sp and some docs and license file
  5. Open a terminal and change the working directory to that folder
  6. Make sure everything works by running
    ./singularity version
    ./sim-sp -h
    For Windows user, you will need to use below command for the whole workshop
    singularity.exe version
    sim-sp.exe -h
    For Mac user, you may need to open it in finder once to bypass gatekeeper
  7. Download latest IPFS CLI

Setup Singularity

Singularity is based on SQL database, in this workshop, we will be using the default SQLite database backend. To initialize the database, run

./singularity admin init

Note if you suspect you've messed up anywhere in the below instruction and would like to restart from scratch, you can reset the database using

./singularity admin reset --really-do-it

Data Preparation

In this workshop, we are going to use CIViC (Clinical Interpretation of Variants in Cancer) as our source dataset.

Connect to AWS S3 bucket as a storage connection. Let's name the connection as civic

./singularity storage create s3 aws --region us-west-2 --name civic --path civic-aws-opendata

Now we can see the storage connection is saved in the database using below command

./singularity storage list

We can also see what's inside this storage connection using below command. This gives us another assurance that the connection is valid. Note those folders have not been prepared yet.

./singularity storage explore civic

Now create a new preparation named civic with the storage connection civic with default parameters

./singularity prep create --name civic --source civic

Start Scanning the datasource for files

./singularity prep start-scan civic civic

Now the data source is marked as ready to be scanned, but we have not yet started running any worker to scan and prepare the dataset. Usually the dataset worker should be always running, but in this workshop, we are going to run it on-demand. The command will take one minute to complete depending on the Internet speed, it will look like it hangs at created pack job 2 with 43 file range but it will finish soon. The command will exit upon completion of data preparation.

./singularity run dataset-worker --exit-on-error --exit-on-complete

We also want to do one more thing, DAG generation, this contains all folder structure information and can be very useful for retrieval. Run below two command to complete DAG generation. Those should complete almost instantly.

./singularity prep start-daggen civic civic
./singularity run dataset-worker --exit-on-error --exit-on-complete

Great, we have completed the data preparation and can now list all prepared pieces using below command. It will show piece_size and piece_cid of each pieces which are very important parameters in deal proposals to storage providers.

./singularity prep list-pieces civic

Also, now all files and folders for this dataset now has a CID which can be used for later retrieval. The CID on the first line with an empty path is called RootCID, which is the CID of the root folder of this dataset. Make sure you write it down if it is not bafybeibtm5nxak73c7db7z4xmwuergpkqvjkkw7awuwtsehtg3ca55by3q and replace every place in following instruction that uses bafybeibtm5nxak73c7db7z4xmwuergpkqvjkkw7awuwtsehtg3ca55by3q

./singularity prep explore civic civic

Distribute CAR files

You may start to wonder, where is the CAR files? Singularity uses inline preparation which stores the mapping between the original data files and the CAR files so you don't need to provision extra space for CAR files.

You can run content provider to offer CAR file downloads for Storage Providers. Do not terminate the content-provider until the storage provider has "sealed" the deal in the next section.

./singularity run content-provider

[Optional] You may try downloading CAR files using below command in another terminal window (replace <piece_cid> with the actual ones from list-pieces output)

wget http://127.0.0.1:7777/piece/<piece_cid>

Deal Making

Deal making needs two parties. A client which sends the deal proposals and a storage provider that does the sealing. In this demo, we're going to import a leaked private key as our test client

./singularity wallet import 7b2254797065223a22736563703235366b31222c22507269766174654b6579223a226b35507976337148327349586343595a58594f5775453149326e32554539436861556b6c4e36695a5763453d227d

We also want to attach this wallet to our preparation civic so all deal proposals for this data preparation will be sent from this wallet

./singularity prep attach-wallet civic f0808055

Now for storage provider, we will run an emulated storage provider in another terminal window. This emulated storage provider will accept any boost deal, download the CAR file which is part of the deal from Singularity content provider and offer free retrievals. Do not terminate this window until the end of the workshop.

./sim-sp run

Finally, it's time to send out the deals, to simplify this process, we are going to send deals for all available pieces from this open dataset to the storage provider. Do not change the miner id f02815405, do NOT replace {PIECE_CID}.

./singularity deal schedule create --verified=false --preparation civic --provider f02815405 --url-template "http://127.0.0.1:7777/piece/{PIECE_CID}"

Then run the deal pusher to actually send those deals out.

./singularity run deal-pusher

You will now things happening from 3 different terminal window

  1. Singularity Deal Pusher is sending deal proposals to the emulated storage provider
  2. sim-sp is receiving boost online deals and is trying to download and parse the CAR files from Singularit Content Provider
  3. Singularity Content Provider is reading S3 objects and converting into CAR stream for download

Wait until you see two of Deal xxxxxxx-xxxx-xxxx-xxxx-xxxxxxxx completed successfully, then you are ready for the upcoming retrieval session. You may also kill the deal-pusher and content-provider service and leave sim-sp running for retrievals.

Retrievals

This emulated storage provider offers both HTTP and bitswap retrievals similar to boost.

[Optional] Piece Retrieval

The CAR files can be downloaded using below command (replace <piece_cid> with the actual ones from list-pieces output)

wget http://127.0.0.1:7778/piece/<piece_cid>

File Retrieval using HTTP

The emulated Storage Provider will enable IPFS Gateway so you can actually browse the dataset using the RootCID by go to http://127.0.0.1:7778/ipfs/bafybeibtm5nxak73c7db7z4xmwuergpkqvjkkw7awuwtsehtg3ca55by3q, this RootCID comes from the preparation result ./singularity prep explore civic civic

You may also browse the files using IPFS Gateway and download files via any HTTP clients.

Bitswap Retrieval

First initialize and run IPFS with below commands

ipfs init
ipfs daemon

Then you can connect to the emulated storage provider using below command in a different terminal window. If you see warning says repo locked, just try it again

ipfs swarm connect /ip4/127.0.0.1/tcp/24001/p2p/12D3KooWDeNSud283YaRmhqbZDynLNmtATBxjUPAUJxtPyEXXp9u

Finally, you can retrieve the whole dataset with a single RootCID

ipfs get -o out bafybeibtm5nxak73c7db7z4xmwuergpkqvjkkw7awuwtsehtg3ca55by3q

Now you can examine the out folder which should contain the whole dataset

@SgtCoin
Copy link

SgtCoin commented Jan 23, 2024

@xinaxu

I am working on a rewrite of the singularity workshop and wanted to let you know I just wrapped up the first rough draft. I still have some formatting and terminal window management content to update but I wanted to get your thoughts and correct anything I overlooked beforehand. Please let me know if you see anything technically incorrect or otherwise.

https://gist.github.com/SgtCoin/6a9513afedbf8875d01655f039ad9d2e

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment