Skip to content

Instantly share code, notes, and snippets.

View Jiu9Shen's full-sized avatar

Yichao Shen Jiu9Shen

  • University of Saskatchewan
View GitHub Profile
@Jiu9Shen
Jiu9Shen / flanking_seq_extract_from_vcf.pl
Last active March 28, 2019 17:49
Extract flanking region sequences for markers in VCF from a fasta file.
use warnings;
use strict;
# purpose of this secript
# to extract flanking sequences of a specific region from fasta file using SAMTOOLS
# based on test, executing time for 100 markers is approximately 7 seconds
# if you have an large file containg lots of markers, running time will be 12 minutes for 10,000 markers
# PARAMETERS:
# 1. $file_in_marker, we expect a vcf file
# 2. $file_in_fasta, we expect a fasta file
@Jiu9Shen
Jiu9Shen / Step1_gene_seq_extract.pl
Last active January 30, 2020 18:52
Update gene locations on new genome.
use warnings;
use strict;
# purpose of this secript
# to extract sequences of a specific region from fasta file using SAMTOOLS
# first argument: the list of regions user want to extract
# Format of each line:
# marker_name:start_position-end_position
my $file_in_gff3 = $ARGV[0];
my $file_in_genome_fasta = $ARGV[1];
use warnings;
use strict;
use Data::Dumper;
# four parameters input required
# file1: gene name list
# file2: genes in genome, in this case Lentil v1.2
# file3: output file name in simple format
# file4: output file name in gff3 format
my $file_in_name_list = $ARGV[0];
@Jiu9Shen
Jiu9Shen / A Script to update marker locations for new genome
Last active November 5, 2018 17:56
Script to update marker locations on new genome. (Four files: README, perl scripts: step 1,2,3)
Script to update locations for new genome
Steps and example:
(example: update maker position of dry bean(Phaseolus vulgaris) from genome version 0.9 to 1.0)
1: Extract flanking sequence
command for step 1:
perl Step1_sequence_extract_flanking_from_gff.pl --inGff3 120919_pv768_bam_sorted.gff3 --inGenome Phaseolus_vulgaris.main_genome.scaffolds.fasta --regex "ID=Pv09sc(\d+)p(\d+)" --prefix "scaffold" --position 9 --flanking 60 --out drybean_marker_flanking_region.fa
--inGFF3: GFF3 file which contains markers from old genome
@Jiu9Shen
Jiu9Shen / marker_find_cross_species.pl
Last active June 22, 2018 19:33
Find potential KASP markers for identifying lentil species
use warnings;
use strict;
# steps
# 1. read reference file and generate arrays containing correlation between germplasm and their species
# of: cultivars, nigricans, orientalis, lamottei, tomentosus, odemensis, ervoides
# 2. read vcf file line by line
# check flanking region at first, skip lines too close with each other
# convert each SNP from vcf to hapmap kind format, also skip every line if hetezygous or indel is found
# 3. for SNP passed all check
@Jiu9Shen
Jiu9Shen / extract_genes_in_region.pl
Last active March 2, 2018 20:34
This script is used to extract genes in one or multiple regions and write into an output file.
use warnings;
use strict;
use Getopt::Std;
use Data::Dumper;
# this script is used to help researchers to find genes located in specific (QTL) regions for lentil
# option -g
# essential file
@Jiu9Shen
Jiu9Shen / Marker_name_conversion_0.8s_0.8_1.2.pl
Last active February 28, 2018 21:44
Conversion of interesting markers from lens genome version 0.8s to 0.8 to 1.2.
use warnings;
use strict;
use Data::Dumper qw(Dumper);
#----------------------------------------------------------------------------------
# function of this script is converting marker names from 0.8 version (or 0.8 space version) to 1.2 version
# we have data file with name matches between 0.8 and 0.8s, also 0.8 and 1.2
####
# 3 input files required
# $agp_08s_08: one file with match info between 0.8 and 0.8s
@Jiu9Shen
Jiu9Shen / Germplasm_comp_m_vs_1.pl
Last active February 28, 2018 22:05
A bash script that compare expression of different germplasms. Due to large size of file, it is split by chrome and multiple threading is used to reduce computing time. All results need to be compiled together at last.
use warnings;
use strict;
#-------------------------------------------------------------------------------------------------------------------------
# function: convert a vcf file into a hash, using Marker name as keys
# 1. read second vcf file line by line, skip lines start with ##
# 2. push gergplasm names into first line
# 3. convert and push all genotype (ATCG or -) into the hash
# 4. return the hash
sub pre_convert_vcf_2_matrix{
<?php
/*
**********************************************************************************************
/*Pre-work
* blast result files are generated using blastn cmd:
* blastn -query **.fasta -db ** -outfmt "7 qaccver saccver pident length mismatch gapopen qstart qend sstart send evalue qseq sseq" -out BlastResult_**_bean_v1.txt
* blastn -query **.fasta -db ** -outfmt "7 qaccver saccver pident length mismatch gapopen qstart qend sstart send evalue qseq sseq" -out BlastResult_**_bean_v2.txt
*/
// use input fasta file to build an array of marker names, for further us as keys
@Jiu9Shen
Jiu9Shen / .block
Last active July 4, 2017 22:00 — forked from mbostock/.block
Bar Chart
license: gpl-3.0