Skip to content

Instantly share code, notes, and snippets.

View mwalzer's full-sized avatar

Mathias Walzer mwalzer

  • EMBL-EBI
  • Cambridge
View GitHub Profile
@mwalzer
mwalzer / lxml2df4mzq.py
Last active April 19, 2024 21:19
lxml xpath for mzquantml
from lxml import etree
mzq = "file:///path/vis_fix.mzq"
doc = etree.parse(mzq)
header = doc.xpath('/x:MzQuantML/x:PeptideConsensusList/x:AssayQuantLayer/x:ColumnIndex',
namespaces={'x': "http://psidev.info/psi/pi/mzQuantML/1.0.0"})
col_names = ['object_ref'] + header[0].text.split(' ')
dm = doc.xpath('/x:MzQuantML/x:PeptideConsensusList/x:AssayQuantLayer/x:DataMatrix',
namespaces={'x': "http://psidev.info/psi/pi/mzQuantML/1.0.0"})
@mwalzer
mwalzer / PRIDE_contaminants.msp
Created October 2, 2022 11:44
from PRIDE nfs (2016 folder) because FTP doesnt
This file has been truncated, but you can view the full file.
Name: IQVR/2
Comment: Spec=Consensus Mods=0 Parent=258.049 Nreps=20 Naa=4 MaxRatio=0.750 PrecursorMzRange=0.0570 Protein=sp|TRYP_PIG|
Num peaks: 32
130.886 897.48
157.784 26.99
174.812 660.3
192.273 365.64
196.799 37.08
213.811 258.83
224.825 3465.14
@mwalzer
mwalzer / biognosis_irts.csv
Last active September 29, 2022 18:57
Peptide sequences, precursor m/z, and iRT score from the Biognonsis iRT kit as csv
iRT peptide Precursor m/z iRT
LGGNEQVTR 487.257 -24.92
GAGSSEPVTGLDAK 644.823 0.00
VEATFGVDESNAK 683.828 12.39
YILAGVENSK 547.298 19.79
TPVISGGPYEYR 669.838 28.71
TPVITGAPYEYR 683.854 33.38
DGLDAASYYAPVR 699.339 42.26
ADVTPADFSEWSK 726.836 54.62
GTFIIDPGGVIR 622.854 70.52
@mwalzer
mwalzer / write_in_5_minutes.ipynb
Last active June 23, 2022 09:41
read_in_5_minutes.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@mwalzer
mwalzer / qc_edges.html
Created June 1, 2022 13:43
CV visualisation with pyvis
<html>
<head>
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/vis-network@latest/styles/vis-network.css" type="text/css" />
<script type="text/javascript" src="https://cdn.jsdelivr.net/npm/vis-network@latest/dist/vis-network.min.js"> </script>
<center>
<h1></h1>
</center>
<!-- <link rel="stylesheet" href="../node_modules/vis/dist/vis.min.css" type="text/css" />
<script type="text/javascript" src="../node_modules/vis/dist/vis.js"> </script>-->
@mwalzer
mwalzer / proteomicsdb_api_request.py
Last active April 21, 2022 10:06
a python one-off script to get the normalised intensities of target proteins in proteomicsdb
import json
import pprint
import time
import requests
import pandas as pd
api_target = "https://www.proteomicsdb.org/proteomicsdb/logic/api/proteinexpression.xsodata/InputParams(PROTEINFILTER='{prot_acc}',MS_LEVEL=1,TISSUE_ID_SELECTION='',TISSUE_CATEGORY_SELECTION='tissue;fluid',SCOPE_SELECTION=1,GROUP_BY_TISSUE=1,CALCULATION_METHOD=0,EXP_ID=-1)/Results?$select=UNIQUE_IDENTIFIER,TISSUE_ID,TISSUE_NAME,TISSUE_SAP_SYNONYM,SAMPLE_ID,SAMPLE_NAME,AFFINITY_PURIFICATION,EXPERIMENT_ID,EXPERIMENT_NAME,EXPERIMENT_SCOPE,EXPERIMENT_SCOPE_NAME,PROJECT_ID,PROJECT_NAME,PROJECT_STATUS,UNNORMALIZED_INTENSITY,NORMALIZED_INTENSITY,MIN_NORMALIZED_INTENSITY,MAX_NORMALIZED_INTENSITY,SAMPLES&$format=json"
results = list()
no_joy = list()

Metabolomics batch runs example

Here, we describe details of a metabolomics mzQC JSON document used to describe a Studies' quality before and after batch correction methods are applied. For description of the general structure of mzQC, see the Single-Run Example of mzQC. Find the complete file at the bottom of this document or in the example folder. The mzQC file is made from the acquisions of GC-ToF-MS polar metabolite data of an Arabidopsis nucleotype-plasmotype diallel study as described in Improved batch correction in untargeted MS-based metabolomics.

    "description": "This dataset is based on the analysis of polar extracts from a nucleotype-plasmotype combination study of Arabidopsis for 58 different genotypes. For details of the used plant material we refer to Flood (2015). Analysis of the polar, derivatized metabolites by GC-ToF-MS (Agilent 6890 GC coupled to a Leco Pegasus III MS) and processing of the data were done as described in Villaf

QC Sample-Run Example of mzQC

Here, we describe details of a mzQC JSON document used for a QC sample mass spectrometry run. For description of the general structure of mzQC, see the Single-Run Example of mzQC. Find the complete file at the bottom of this document or in the example folder. The mzQC file is made from the acquision of a QC2 sample as described in QCloud: A cloud-based quality control system for mass spectrometry-based proteomics laboratories. Optional (detailed) descriptions about the file can be placed into mzQC next to the general information about the file.

    "description": "This is an example of an mzQC file produced from a proteomics QC2 sample. 20 ug dried Pierce HeLa protein digest standard from Thermo Fisher Scientific (Part number: 88329) are dissolved in 200 uL of 0.1% formic acid in water to a final concentration of 100 ng/uL. A total amount of 1 uL (100ng) is injected per analysis.",

The metrics describe simple values lik

Single-Run Example of mzQC

Here, we describe a mzQC JSON document used for QC of a single mass spectrometry run. Find the complete file at the bottom of this document or in the example folder. The documents main anchor is between the outer curly brackets:

{ "mzQC": {
...
}
{
"mzQC": {
"creationDate": "2020-12-09T11:04:16",
"contactName": "Mathias Walzer",
"contactAddress": "walzer@ebi.ac.uk",
"version": "1.0.0",
"description": "This dataset is based on the analysis of polar extracts from a nucleotype-plasmotype combination study of Arabidopsis for 58 different genotypes. For details of the used plant material we refer to Flood (2015). Analysis of the polar, derivatized metabolites by GC-ToF-MS (Agilent 6890 GC coupled to a Leco Pegasus III MS) and processing of the data were done as described in Villafort Carvalho et al. (2015). Here, the number of metabolites (75) is much lower than in the other two data sets, partly because the focus was on the primary rather than the secondary metabolites. The number of samples was 240, with a percentage of non-detects of 16 %; the maximum fraction of non-detects in individual metabolites is 92 %. All metabolites were retained in the analysis. Four batches of 31-89 samples were employed, containing 2-6 QCs per batch, 1