Skip to content

Instantly share code, notes, and snippets.

@endrebak
Created January 6, 2017 14:48
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save endrebak/fdf6962bd887ffa38fae03bfa40a1b71 to your computer and use it in GitHub Desktop.
Save endrebak/fdf6962bd887ffa38fae03bfa40a1b71 to your computer and use it in GitHub Desktop.
import pandas as pd
mir_miR_correspondence = "/local/home/annata/mirna.mature.offset0.txt"
mirna_example_file = "/local/home/annata/SHORTREADS/OFFCONTROL/offcontrol-start/Demux.SRhi10002.Adipocyte%20-%20omental%2c%20donor3.SRhi10002_hg19.11475-119C8.GTGAAA.fastq.gz.filter.shortreads"
### READ FILES
mirna_df = pd.read_table(mirna_example_file, sep="\s+", header=None,
names="id1 id2 nb1 mirna_seq score nb2 short_read_seq type end offset".split(), index_col=0)
miR_df = pd.read_table(mir_miR_correspondence, index_col=0)
## Remove nonzero offset
mirna_df = mirna_df.loc[mirna_df.offset == 0]
## Seperate the two types of ends
mirna3 = mirna_df.loc[mirna_df.end == 3]
mirna5 = mirna_df.loc[mirna_df.end == 5]
# Merge canonical sequences and miR table
mirna3_miR = mirna3.join(miR_df, how="left")
mirna5_miR = mirna5.join(miR_df, how="left")
## Remove all but canonical sequences
mirna3_miR = mirna3_miR.loc[mirna3_miR.mirna_seq == mirna3_miR.sequence3p]
mirna5_miR = mirna5_miR.loc[mirna5_miR.mirna_seq == mirna5_miR.sequence5p]
# Example output first line mirna5_miR
# You might want to remove some columns?
mirna5_miR.head(1)
# id2 nb1 mirna_seq score nb2 short_read_seq type end offset p5 p3 off5 off3 sequence5p sequence3p
# hsa-let-7a-1 hsa-let-7a-1 6 TGAGGTAGTAGGTTGTATAGTT 11072.5106498 6 TGAGGTAGTAGGTTG start 5 0 hsa-let-7a-5p hsa-let-7a-3p 6.0 57.0 TGAGGTAGTAGGTTGTATAGTT CTATACAATCTACTGTCTTTC'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment