Philipp Bayer philippbayer

## sessioninfo.txt
R version 4.3.2 (2023-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.6 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_AU.UTF-8       LC_NUMERIC=C               LC_TIME=en_AU.UTF-8        LC_COLLATE=en_AU.UTF-8     LC_MONETARY=en_AU.UTF-8

## changes.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              3 stars
            
          
                philippbayer
                / changes.md
            
            
              Last active
              November 9, 2023 15:20
            
              
                createRepeatLandscape.pl changes for EDTA/TESorter classes
              
          
    I changed the following in RepeatMasker/util/createRepeatLandscape.pl to make classes reported by EDTA and TESorter appear in the plot.
Around line 220 I added CACTA repeats as their own class:
              [ 'DNA/Transib',    '#FF9972' ],
              [ 'DNA/CACTA',      '#D45B2C' ],

I got the color by googling #FF9972 and then clicking around in that feature to get a similar looking color.
Then, around line 700, I added all these translations:

  
## nextflow.config
// have this as nextflow.config in the folder of your run for Pawseys Setonix

// i settled on this command for nf-core/mag:
// nextflow run nf-core/mag --input '*{R{1,2}.fastq.gz' --outdir results
// --skip_spades --cat_db https://tbb.bio.uu.nl/bastiaan/CAT_prepare/CAT_prepare_20210107.tar.gz
// --gtdb 'https://data.gtdb.ecogenomic.org/releases/release202/202.0/auxillary_files/gtdbtk_r202_data.tar.gz'
// -resume -profile singularity
// --refine_bins_dastool --postbinning_input both
// --busco_download_path /SOMEWHERE/busco-data.ezlab.org/v5/data
// --disable-jobs-cancellation

## torch.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                philippbayer
                / torch.md
            
            
              Last active
              October 16, 2023 08:01
            
              
                installing torch/transformers under ROCm on Pawsey
              
          
    Here's my alias in .bashrc for getting a gpu-dev instance based on https://support.pawsey.org.au/documentation/display/US/Setonix+GPU+Partition+Quick+Start
alias getgpunode='salloc -p gpu-dev --nodes=1 --gpus-per-node=1 --account=${PAWSEY_PROJECT}-gpu'

First, to make a fresh environment:
mamba create -p `pwd`/transformers transformers python=3.10

Install Torch with the closest ROCm version (nothing for 5.4.3, the current 'new' version on Pawsey, and nothing for 5.2.3, the default version). Also setting the pip-cache-dir to somewhere on /scratch.

  
## computeLCA.py
import os
import sys
import argparse
from statistics import mean
'''
INPUT: tab-delimited blastn output. Assuming that taxonomy ID is in this format:
-outfmt "6 qseqid sseqid staxids sscinames scomnames sskingdoms pident length qlen slen mismatch gapopen gaps qstart qend sstart send stitle evalue bitscore qcovs qcovhsp"

This script also assumes that input has been filtered by 90% identity.
$ awk '{if ($7 > 90) print}' all_results.tsv > all_results.90perc.tsv

## align.sh
#!/bin/bash -l

# SLURM directives
#
# This is an array job with four subtasks

#SBATCH --job-name=align
#SBATCH --time=12:00:00
#SBATCH --cpus-per-task=1
#SBATCH --partition=work

## circlize.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                philippbayer
                / circlize.md
            
            
              Created
              March 26, 2021 01:34
            
              
                circlize 
              
          
    First, a tab-delimited file with genome sizes
Gm01	58711475	26	60	61
Gm03	52519505	59690052	60	61

etc.
Then, to plot the thing:

  
## EDTA.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                philippbayer
                / EDTA.md
            
            
              Last active
              March 3, 2021 02:34
            
              
                Running EDTA on Pawsey with Singularity
              
          
    First, to download EDTA:
module load singularity
singularity pull EDTA.sif docker://quay.io/biocontainers/edta:1.9.4--0

That'll make a new file called EDTA.sif containing everything in the EDTA v1.9.4 container.
Then we have a problem:
Pawsey allows only 1 million files per user and running several EDTA runs for several genomes at once will hit that limit.

  
## similarity.py
def get_similarity(a, suffix):
    from itertools import izip
    score = 0
    for a, b in izip(a, suffix):
        if a != b:
            break
        score += 1
    return score

def stringSimilarity(a):

## covid_vs_cash.Rmd

```{r setup}
library(tidyverse)
library(ggrepel)
```

```{r}
df <- readxl::read_xlsx('./Covid_vs_State.xlsx')
head(df)
```
	R version 4.3.2 (2023-10-31)
	Platform: x86_64-pc-linux-gnu (64-bit)
	Running under: Ubuntu 20.04.6 LTS

	Matrix products: default
	BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
	LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

	locale:
	[1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8 LC_MONETARY=en_AU.UTF-8
	// have this as nextflow.config in the folder of your run for Pawseys Setonix

	// i settled on this command for nf-core/mag:
	// nextflow run nf-core/mag --input '*{R{1,2}.fastq.gz' --outdir results
	// --skip_spades --cat_db https://tbb.bio.uu.nl/bastiaan/CAT_prepare/CAT_prepare_20210107.tar.gz
	// --gtdb 'https://data.gtdb.ecogenomic.org/releases/release202/202.0/auxillary_files/gtdbtk_r202_data.tar.gz'
	// -resume -profile singularity
	// --refine_bins_dastool --postbinning_input both
	// --busco_download_path /SOMEWHERE/busco-data.ezlab.org/v5/data
	// --disable-jobs-cancellation
	import os
	import sys
	import argparse
	from statistics import mean
	'''
	INPUT: tab-delimited blastn output. Assuming that taxonomy ID is in this format:
	-outfmt "6 qseqid sseqid staxids sscinames scomnames sskingdoms pident length qlen slen mismatch gapopen gaps qstart qend sstart send stitle evalue bitscore qcovs qcovhsp"

	This script also assumes that input has been filtered by 90% identity.
	$ awk '{if ($7 > 90) print}' all_results.tsv > all_results.90perc.tsv
	#!/bin/bash -l

	# SLURM directives
	#
	# This is an array job with four subtasks

	#SBATCH --job-name=align
	#SBATCH --time=12:00:00
	#SBATCH --cpus-per-task=1
	#SBATCH --partition=work
	def get_similarity(a, suffix):
	from itertools import izip
	score = 0
	for a, b in izip(a, suffix):
	if a != b:
	break
	score += 1
	return score

	def stringSimilarity(a):

	```{r setup}
	library(tidyverse)
	library(ggrepel)
	```

	```{r}
	df <- readxl::read_xlsx('./Covid_vs_State.xlsx')
	head(df)
	```