This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
  | # Python programming setup | |
| - Desc: This is my current workflow for setting up my Python programming interface | |
| with Jupyter lab on a remote server. I will be setting up Jupyter lab on the | |
| remote server to be constantly running. After this has been completed once, only | |
| step 5-6 are needed to connect my local computer to the running Jupyter lab. | |
| - Benefits: | |
| - Now I get to develope locally using the Jupyter Lab interface but run | |
| calculations on remotely on the servers :) | 
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
  | # Need to download seqkit | |
| # fx2tab converst a fasta to tabular format | |
| seqkit fx2tab allORFs.fasta | sort -k1,1 --parallel 32 -S20% > allORFs.sorted.tsv | |
| # grep list of headers against tabular fasta then convert back to standard fasta | |
| LC_ALL=C grep -w -F -f <(sort -k1,1 toextract.txt) allORFs.sorted.tsv | seqkit tab2fx > toextract.fasta | 
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
  | # Removes everything after the "." | |
| # You could subsitute "." for anything (i.e., "_", "/") | |
| awk 'BEGIN{FS=OFS="."} NF--' FILE | 
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
  | seqkit fx2tab original.fasta | awk '{print "seq_"NR"\t"$2}' | seqkit tab2fx > renamed.fasta | |
| # 1: concert fasta to tabular format | |
| # 2: Here is where you can change the headers. In the example above, each sequence header will | |
| # be changed to "seq_'NR'" (NR is the variable for number of records (i.e.line number) in | |
| # awk programming linking) | 
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
  | pufm_cor <- cor(pufm_agg_v3_wide, method = "pearson") | |
| pufm_cor <- as.dist(1 - pufm_cor) | |
| hc_methods <- c("ward.D", "ward.D2", "single", "complete", "average", "mcquitty", "median", "centroid") | |
| coph <- function(hc_method, d = d, dist_method){ | |
| hc <- hclust(d, method = hc_method) | |
| coph <- cor(cophenetic(hc), d) | |
| df <- data_frame(hc_method = hc_method, dist_method = dist_method, coph = coph) | |
| } | 
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
  | cmake -DCMAKE_BUILD_TYPE=RELEASE -DCMAKE_INSTALL_PREFIX=. -DZLIB_ROOT=/home/mschecht/.linuxbrew/Cellar/zlib/1.2.11 .. | 
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
  | # du: disk usage | |
| # -sh: Show the size of a single folder in human readable units | |
| du -h ./* | sort -h | |
| # sort -h: sort by size of folder in human readable form | 
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
  | ## 7. Permutate subsampling of larger matrices | |
| Now lets subsample the "all" and the "knowns" to have the same number of components as "unknowns" | |
| ### Matrix subsampling function | |
| ```{r} | |
| phyloseq_subsample <- function(phyloseq_obj) { | |
| subsample_size <- taxa_names(unk_physeq) %>% length() # get number of variables in unk matrix | |
| matrix <- phyloseq:::veganifyOTU(phyloseq_obj) # Pull out matrix from phyloseq | |
| sub <- sample(x = seq_len(ncol(matrix)), size = subsample_size, replace = FALSE) # create vector of subsampled variables from larger matrix | 
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
  | --- | |
| title: "Single vs multidomain proteins in refseq" | |
| output: | |
| html_document: | |
| df_print: paged | |
| editor_options: | |
| chunk_output_type: console | |
| --- | |
| What is the number of single versus multidomain proteins in the non-redundant refseq database? | 
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
  | #!/usr/bin/env bash | |
| MULTI=$(awk '{print $3}' $1 | uniq -c | awk '$1 != 1 {print $2}' | wc -l) | |
| SINGLE=$(awk '{print $3}' $1 | uniq -c | awk '$1 == 1 {print $2}' | wc -l) | |
| echo "No. of single domain proteins = $SINGLE" | |
| echo "No. of multidomain proteins = $MULTI" | 
NewerOlder