Artem Tarasov lomereiter

## tmux-and-escape.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                lomereiter
                / tmux-and-escape.md
            
            
              Created
              August 7, 2017 15:19
            
          
Add set -g escape-time 10 to ~/.tmux.conf
Also add set -g default-terminal "screen-256color"
Run tmux source-file ~/.tmux.conf to reload the config


## non-molecules.ipynb

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                lomereiter
                / non-molecules.ipynb
            
            
              Created
              May 12, 2017 14:30
            
          
        Loading

      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## flagstat.py
# environment: python3; conda install -c conda-forge fastparquet=0.0.4post1 joblib
# usage: python flagstat.py <dataset.adam>

from collections import Counter
import sys

import fastparquet
from fastparquet.core import read_row_group_file
from fastparquet.schema import SchemaHelper

## mass_spec_formats.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                lomereiter
                / mass_spec_formats.md
            
            
              Last active
              June 15, 2016 08:51
            
          
    Summary of the problem from mz5 paper (concerning .mzML but just as true for .imzML):

Although based on excellent ontologies, relying on the extended markup language (XML)
for the straightforward implementation of mzData, mzXML, and mzML makes for a major
efficiency bottleneck. XML was designed to be a human readable, textual data format
with considerable inherent verbosity and redundancy. XML was not designed for efficient
bulk data storage, and the general modus operandi requires reading complete files to
construct the XML parse tree. The mzXML and mzML formats partly circumvent these limitations
by using base-64 encoding and (optional) compression of the raw MS scan data in combination
with an application-specific indexing system. Despite the improvements gained from these efforts,


## serialization.md

      
              1 file
            
          
              1 fork
            
          
              0 comments
            
          
              3 stars
            
          
                lomereiter
                / serialization.md
            
            
              Last active
              October 6, 2022 09:23
            
          
    Serialization: best practices

(In this document I pay attention mostly to data storage in scientific applications, not to web protocols.)
Traditional approaches


XML:

slow to parse
schemas (.xsd) are human-readable but hard to edit without special software


tooling for generating code for reading/writing is limited (mostly to Java)


## testdatasets2.ipynb

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                lomereiter
                / testdatasets2.ipynb
            
            
              Last active
              December 10, 2015 17:21
            
          
        Loading

      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
      
    
## pipeline.py
import sys
# path to pyIMS parent dir
sys.path.append("/home/lomereiter/github")

from pyIMS.image_measures.level_sets_measure import measure_of_chaos
from pyIMS.image_measures.isotope_image_correlation import isotope_image_correlation
from pyIMS.image_measures.isotope_pattern_match import isotope_pattern_match

import numpy as np
import cPickle

## sambamba_161.d
// compilation: rdmd --build-only -O -release -inline -IBioD sambamba_161.d
// to use LDC: rdmd --compiler=ldmd2 [--force] ...
import bio.bam.reader, bio.bam.writer, std.parallelism;

void main(string[] args) {
  // boilerplate
  defaultPoolThreads = 8;
  auto input = new BamReader(args[1]); // use std.getopt for better args handling
  auto output = new BamWriter(args[2]);
  output.writeSamHeader(input.header);

## range_fixes.patch
diff --git a/wqflask/base/data_set.py b/wqflask/base/data_set.py
index a572a60..b152357 100755
--- a/wqflask/base/data_set.py
+++ b/wqflask/base/data_set.py
@@ -555,12 +555,22 @@ class DataSet(object):
         #  """ % (query_args))

         try:
-            self.id, self.name, self.fullname, self.shortname = g.db.execute("""
+            if self.type != "ProbeSet":

## git-lfs gn2
> git-lfs smudge genotype_files/gemma/HLC.map
LocalWorkingDir=/home/lomereiter/github/genenetwork2
LocalGitDir=/home/lomereiter/github/genenetwork2/.git
LocalMediaDir=/home/lomereiter/github/genenetwork2/.git/lfs/objects
TempDir=/home/lomereiter/github/genenetwork2/.git/lfs/tmp
GIT_DIR=.git

Error accessing media: genotype_files/gemma/HLC.map (84241b81feb7eec3c0b914e223ff23810c69610a6e759baf4797bab4a4850de8)

Error downloading /home/lomereiter/github/genenetwork2/.git/lfs/objects/84/24/84241b81feb7eec3c0b914e223ff23810c69610a6e759baf4797bab4a4850de8.
	# environment: python3; conda install -c conda-forge fastparquet=0.0.4post1 joblib
	# usage: python flagstat.py <dataset.adam>

	from collections import Counter
	import sys

	import fastparquet
	from fastparquet.core import read_row_group_file
	from fastparquet.schema import SchemaHelper
	import sys
	# path to pyIMS parent dir
	sys.path.append("/home/lomereiter/github")

	from pyIMS.image_measures.level_sets_measure import measure_of_chaos
	from pyIMS.image_measures.isotope_image_correlation import isotope_image_correlation
	from pyIMS.image_measures.isotope_pattern_match import isotope_pattern_match

	import numpy as np
	import cPickle
	// compilation: rdmd --build-only -O -release -inline -IBioD sambamba_161.d
	// to use LDC: rdmd --compiler=ldmd2 [--force] ...
	import bio.bam.reader, bio.bam.writer, std.parallelism;

	void main(string[] args) {
	// boilerplate
	defaultPoolThreads = 8;
	auto input = new BamReader(args[1]); // use std.getopt for better args handling
	auto output = new BamWriter(args[2]);
	output.writeSamHeader(input.header);
	diff --git a/wqflask/base/data_set.py b/wqflask/base/data_set.py
	index a572a60..b152357 100755
	--- a/wqflask/base/data_set.py
	+++ b/wqflask/base/data_set.py
	@@ -555,12 +555,22 @@ class DataSet(object):
	# """ % (query_args))

	try:
	- self.id, self.name, self.fullname, self.shortname = g.db.execute("""
	+ if self.type != "ProbeSet":
	> git-lfs smudge genotype_files/gemma/HLC.map
	LocalWorkingDir=/home/lomereiter/github/genenetwork2
	LocalGitDir=/home/lomereiter/github/genenetwork2/.git
	LocalMediaDir=/home/lomereiter/github/genenetwork2/.git/lfs/objects
	TempDir=/home/lomereiter/github/genenetwork2/.git/lfs/tmp
	GIT_DIR=.git

	Error accessing media: genotype_files/gemma/HLC.map (84241b81feb7eec3c0b914e223ff23810c69610a6e759baf4797bab4a4850de8)

	Error downloading /home/lomereiter/github/genenetwork2/.git/lfs/objects/84/24/84241b81feb7eec3c0b914e223ff23810c69610a6e759baf4797bab4a4850de8.