Skip to content

Instantly share code, notes, and snippets.

@SilasK
Created January 22, 2021 15:06
Show Gist options
  • Save SilasK/40ca8f1ef719f8176556dfbab6447a84 to your computer and use it in GitHub Desktop.
Save SilasK/40ca8f1ef719f8176556dfbab6447a84 to your computer and use it in GitHub Desktop.
Hack DRAM to calculate the coverage of all Kegg modules
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@SilasK
Copy link
Author

SilasK commented Feb 26, 2021

I put everything in a neat script

If you have dram installed you have access to the mag_annotator module


annotation_file="path/to/annotation.tsv"
mouse_output_table= "all_kegg_modules.tsv"

import pandas as pd
from mag_annotator.utils import get_database_locs
from mag_annotator.summarize_genomes import build_module_net,make_module_coverage_frame

annotations = pd.read_csv(annotation_file, sep='\t', index_col=0)
db_locs = get_database_locs()
if 'module_step_form' not in db_locs:
    raise ValueError('Module step form location must be set in order to summarize genomes')

module_steps_form = pd.read_csv(db_locs['module_step_form'], sep='\t')

all_module_nets = {module: build_module_net(module_df)
                   for module, module_df in module_steps_form.groupby('module') }

module_coverage_frame = make_module_coverage_frame(annotations, all_module_nets, groupby_column='fasta')

module_coverage_frame.to_csv(mouse_output_table, sep='\t')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment