Last active
November 22, 2022 14:45
-
-
Save wleepang/d17e0b18476d45860893313e2d78b3eb to your computer and use it in GitHub Desktop.
read data from cloud object store with biopython
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
Reads a fastq file directly from the 1000genomes AWS public dataset | |
into a Bio.SeqRecord set | |
Requires an AWS Account | |
""" | |
from smart_open import open | |
from Bio import SeqIO | |
# file handle-like reference to ~60MB object in S3 | |
fh = open('s3://1000genomes/phase3/data/NA12878/sequence_read/SRR622461.filt.fastq.gz') | |
for record in SeqIO.parse(fh, 'fastq'): | |
print(record.id) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
Reads a fastq file directly from the 1000genomes AWS public dataset | |
into a Bio.SeqRecord set | |
Does not require an AWS Account | |
""" | |
import io | |
from gzip import GzipFile | |
import s3fs | |
from Bio import SeqIO | |
fs = s3fs.S3FileSystem(anon=True) | |
with fs.open('1000genomes/phase3/data/NA12878/sequence_read/SRR622461.filt.fastq.gz','rb') as f: | |
for record in SeqIO.parse(io.TextIOWrapper(GzipFile(fileobj=f)), 'fastq'): | |
print(record.id) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Looks like
smart_open
does use boto3, but does not provide anonymous access to S3. That feature seems unique tos3fs
.