This instruction will guide you through all steps necessary to use Singularity to
- Prepare an open dataset from S3
- Send deal to a local emulated storage provider
f02815405 - Make retrievals from the emulated storage provider using HTTP and Bitswap
- Download latest pre-built Singularity release
- Download latest pre-built sim-sp release
- Do not use package format, i.e. '.deb', '.rpm' as it will be installed to your system PATH
- Extract above to the same folder. After extraction, the folder should contain two executables
singularity,sim-spand some docs and license file - Open a terminal and change the working directory to that folder
- Make sure everything works by running
For Windows user, you will need to use below command for the whole workshop
./singularity version ./sim-sp -h
For Mac user, you may need to open it in finder once to bypass gatekeepersingularity.exe version sim-sp.exe -h
- Download latest IPFS CLI
Singularity is based on SQL database, in this workshop, we will be using the default SQLite database backend. To initialize the database, run
./singularity admin initNote if you suspect you've messed up anywhere in the below instruction and would like to restart from scratch, you can reset the database using
./singularity admin reset --really-do-itIn this workshop, we are going to use CIViC (Clinical Interpretation of Variants in Cancer) as our source dataset.
Connect to AWS S3 bucket as a storage connection. Let's name the connection as civic
./singularity storage create s3 aws --region us-west-2 --name civic --path civic-aws-opendataNow we can see the storage connection is saved in the database using below command
./singularity storage listWe can also see what's inside this storage connection using below command. This gives us another assurance that the connection is valid. Note those folders have not been prepared yet.
./singularity storage explore civicNow create a new preparation named civic with the storage connection civic with default parameters
./singularity prep create --name civic --source civicStart Scanning the datasource for files
./singularity prep start-scan civic civicNow the data source is marked as ready to be scanned, but we have not yet started running any worker to scan and prepare the dataset. Usually the dataset worker should be always running, but in this workshop, we are going to run it on-demand. The command will take one minute to complete depending on the Internet speed, it will look like it hangs at created pack job 2 with 43 file range but it will finish soon. The command will exit upon completion of data preparation.
./singularity run dataset-worker --exit-on-error --exit-on-completeWe also want to do one more thing, DAG generation, this contains all folder structure information and can be very useful for retrieval. Run below two command to complete DAG generation. Those should complete almost instantly.
./singularity prep start-daggen civic civic
./singularity run dataset-worker --exit-on-error --exit-on-completeGreat, we have completed the data preparation and can now list all prepared pieces using below command. It will show piece_size and piece_cid of each pieces which are very important parameters in deal proposals to storage providers.
./singularity prep list-pieces civicAlso, now all files and folders for this dataset now has a CID which can be used for later retrieval. The CID on the first line with an empty path is called RootCID, which is the CID of the root folder of this dataset. Make sure you write it down if it is not bafybeibtm5nxak73c7db7z4xmwuergpkqvjkkw7awuwtsehtg3ca55by3q and replace every place in following instruction that uses bafybeibtm5nxak73c7db7z4xmwuergpkqvjkkw7awuwtsehtg3ca55by3q
./singularity prep explore civic civicYou may start to wonder, where is the CAR files? Singularity uses inline preparation which stores the mapping between the original data files and the CAR files so you don't need to provision extra space for CAR files.
You can run content provider to offer CAR file downloads for Storage Providers. Do not terminate the content-provider until the storage provider has "sealed" the deal in the next section.
./singularity run content-provider[Optional] You may try downloading CAR files using below command in another terminal window (replace <piece_cid> with the actual ones from list-pieces output)
wget http://127.0.0.1:7777/piece/<piece_cid>Deal making needs two parties. A client which sends the deal proposals and a storage provider that does the sealing. In this demo, we're going to import a leaked private key as our test client
./singularity wallet import 7b2254797065223a22736563703235366b31222c22507269766174654b6579223a226b35507976337148327349586343595a58594f5775453149326e32554539436861556b6c4e36695a5763453d227dWe also want to attach this wallet to our preparation civic so all deal proposals for this data preparation will be sent from this wallet
./singularity prep attach-wallet civic f0808055Now for storage provider, we will run an emulated storage provider in another terminal window. This emulated storage provider will accept any boost deal, download the CAR file which is part of the deal from Singularity content provider and offer free retrievals. Do not terminate this window until the end of the workshop.
./sim-sp runFinally, it's time to send out the deals, to simplify this process, we are going to send deals for all available pieces from this open dataset to the storage provider. Do not change the miner id f02815405, do NOT replace {PIECE_CID}.
./singularity deal schedule create --verified=false --preparation civic --provider f02815405 --url-template "http://127.0.0.1:7777/piece/{PIECE_CID}"Then run the deal pusher to actually send those deals out.
./singularity run deal-pusherYou will now things happening from 3 different terminal window
- Singularity Deal Pusher is sending deal proposals to the emulated storage provider
- sim-sp is receiving boost online deals and is trying to download and parse the CAR files from Singularit Content Provider
- Singularity Content Provider is reading S3 objects and converting into CAR stream for download
Wait until you see two of Deal xxxxxxx-xxxx-xxxx-xxxx-xxxxxxxx completed successfully, then you are ready for the upcoming retrieval session. You may also kill the deal-pusher and content-provider service and leave sim-sp running for retrievals.
This emulated storage provider offers both HTTP and bitswap retrievals similar to boost.
The CAR files can be downloaded using below command (replace <piece_cid> with the actual ones from list-pieces output)
wget http://127.0.0.1:7778/piece/<piece_cid>The emulated Storage Provider will enable IPFS Gateway so you can actually browse the dataset using the RootCID by go to http://127.0.0.1:7778/ipfs/bafybeibtm5nxak73c7db7z4xmwuergpkqvjkkw7awuwtsehtg3ca55by3q, this RootCID comes from the preparation result ./singularity prep explore civic civic
You may also browse the files using IPFS Gateway and download files via any HTTP clients.
First initialize and run IPFS with below commands
ipfs init
ipfs daemonThen you can connect to the emulated storage provider using below command in a different terminal window. If you see warning says repo locked, just try it again
ipfs swarm connect /ip4/127.0.0.1/tcp/24001/p2p/12D3KooWDeNSud283YaRmhqbZDynLNmtATBxjUPAUJxtPyEXXp9uFinally, you can retrieve the whole dataset with a single RootCID
ipfs get -o out bafybeibtm5nxak73c7db7z4xmwuergpkqvjkkw7awuwtsehtg3ca55by3qNow you can examine the out folder which should contain the whole dataset
@xinaxu
I am working on a rewrite of the singularity workshop and wanted to let you know I just wrapped up the first rough draft. I still have some formatting and terminal window management content to update but I wanted to get your thoughts and correct anything I overlooked beforehand. Please let me know if you see anything technically incorrect or otherwise.
https://gist.github.com/SgtCoin/6a9513afedbf8875d01655f039ad9d2e