Skip to content

Instantly share code, notes, and snippets.

@masaomi
Last active May 20, 2024 10:46
Show Gist options
  • Save masaomi/870f24db57d295eabfd0094b504c2f1b to your computer and use it in GitHub Desktop.
Save masaomi/870f24db57d295eabfd0094b504c2f1b to your computer and use it in GitHub Desktop.

Day2 Part1 9-10

Goal

  • Understand command line argument and file input/output process

Example1

  • Command line argument

Source code

import sys          # import sys library
                           
print(sys.argv)     # print command line arguments

Result

$ python3 day2_1_example1.py 123 abc
['day2_1_example1.py', '123', 'abc']

Note

  • sys.argv returns the file path and the command line arguments with list
  • Importing sys module is needed before calling sys.argv

Example2

  • Load data from a file
import sys                  # Import sys module to use open method
file = open("input.fa")     # Open the file input.fa and return the file handle
for line in file:           # Repeat reading line by line from the file
    print(line.rstrip())    # Show the line with the line break removed
file.close()                # Close the file

Result

$ python3 day2_1_example2.py  
>seq00000
GAAGACTAGT
>seq00001
GAAGACTAGT
>seq00002
GAAGACTAGT
>seq00003
GAAGACTAGT
>seq00004
GAAGACTAGT
....

Note

  • Before running the script, please check whether the input.fa in the current working directory.

  • To read the contents of a file, you need the following:

    1. import sys module
    2. open a file
    3. read line by line using for statement
    4. close the file
  • The open and close processes are necessary to tell the python interpreter about where the data location is and reading/writing from/to a file safely

Example 3

import sys                  # Import sys module to use open method
file = open(sys.argv[1])    # Open the file and return the file handle
for line in file:           # Repeat reading line by line from the file
    print(line.rstrip())    # Show the line with the line break removed
file.close()                # Close the file

Result

$ python3 day2_1_example3.py  input.fa
>seq00000
GAAGACTAGT
>seq00001
GAAGACTAGT
>seq00002
GAAGACTAGT
>seq00003
GAAGACTAGT
>seq00004
GAAGACTAGT
....

Note

  • Please pay attention to the command line argument compared to Example2

Exercise1

  • Load a FASTA file by specifying the file path with the command line argument, and calculate the genome size (the total number of nucleotides)
import sys      # import sys package for open and close methods

genome_size = 0                             # Initialize a variable to store the genome size
file = open(sys.argv[1])        # Open a fasta file, athal_genome.fa
for line in file:               # Read line by line using for statement
  #
  # Your code here
  #
file.close()                                # Close the file

print("genome size =", genome_size, "bp")   # Show the genome_size

Expected result

$ python day2_1_exercise1.py athal_genome.fa
genome size = 119667750 bp

Advanced exercise1

  • Calculate GC content (the ratio of (GC/genome size)) of A. thaliana genome (athal_genome.fa)

Expected result

$ python day2_1_advancecd1.py athal_genome.fa
Genome size = 119667750 bp
GC content = GC count/genome size = 0.35999748470243653

Advanced exercise2

  • Read a FASTQ file (sample.fq) and calculate 1. the total number of reads, 2. the average length of reads, and 3. GC content

Hint

  • Fastq file format is as follows:
@HWI-ST1357:71:D2FNTACXX:1:1101:1235:2123 1:N:0:CGATGT
AGTCTCTCTGCTACCTTCATCGTCTACCCATGGCGGCCTTAGTTCCTGCGTCCTTACTAAACCCTCCGTTTCTCTTACGAAGGTCTTCGCGGTTCCCGAAA
+
@@<DFFFFHHHHHIIJIIJIIJHGHIIJJCGIIJIJGIJJGHIGAFH@FB@GHFHFFEDFFEEDBDDDADDCCDDCCCD<??BACCCCBBBDBBB@A####
  • The first line is an annotation line
  • The second line is the sequence data
  • The third line is nothing
  • The last line is quality data for each nucleotide

Expected result

$ python day2_1_advanced2.py sample.fq
Total reads = 250
Average read length = 101.0
GC content = GC count/total = 0.4575049504950495
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment