Skip to content

Instantly share code, notes, and snippets.

@marinegor
Last active September 14, 2020 13:02
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save marinegor/96102c9b7ce87509a0832649d11ba927 to your computer and use it in GitHub Desktop.
Save marinegor/96102c9b7ce87509a0832649d11ba927 to your computer and use it in GitHub Desktop.
scidata

Small-wedge synchrotron and serial XFEL datasets for Cysteinyl leukotriene GPCRs

This is a github gist that contains scripts, made available for everyone to reproduce data processing in the publication.

SSX

General folder structure

In deposited datasets, organization is the following:

C2_L_C2221
+-- 001_01_01
+-- 002_01_02
|   +-- images
|   |   +-- 002_01_02_0[0-100].cbf
|   |   +-- x_geo_corr.cbf
|   |   +-- y_geo_corr.cbf
|   +-- XDS.INP
|   +-- XDS.INP.modified
|   +-- crystallization.txt
+-- 003_02_01
...
+-- express.py
+-- fin.csv
+-- reject.sh
+-- create_express_inp.py
+-- xdscc.py
+-- xdscc12
+-- rating.py
+-- XSCALE.express.py.INP

Each folder is numbered as XXX_YY_ZZ_NN, where:

  • XXX -- consequtive number in the datasest
  • YY -- crystallization conditions ID
  • ZZ -- loop number withing same crystallization conditinos
  • NN -- number of miniset within loop

Preparation of input files

create_xscale_inp.py -- creates fixed-name fin.csv table, implying folder structure as described above. The table is then used by express.py as input file. Each line in fin.csv represents one miniset. It contains following columns (in that order): name of the folder to be created by express.py for all XDS-related files in this datasets, location of XDS.INP file for this dataset, location of raw data for this dataset, number of images in this dataset.

Usage: python create_xscale_inp.py

Integration-1

For the first integration, one has to have input file fin.csv ready (see above) and filtered, if necessary. Only the scripts express.py and reject.sh are necessary.

  • express.py -- Given a list of folders with XDS.INP and path to respective data sets, the script runs XDS for all the data sets in list, optionally adding UNIT_CELL_CONSTANTS, SPACE_GROUP_NUMBER, INCLUDE_RESOLUTION_RANGE, setting SPOT_RANGE same as DATA_RANGE, and setting REFERENCE_DATA_SET. Adds MAXIMUM_NUMBER_OF_PROCESSORS and MAXIMUM_NUMBER_OF_JOBS for processing on large clusters. Runs xscale_par afterwards.
data_summary_table = 'fin.csv'
space_group         = '!SPACE_GROUP_NUMBER=  1 \n'
unit_cell_constants = '!UNIT_CELL_CONSTANTS=  30 40 50 90 90 90\n'
max_num_proc        = 'MAXIMUM_NUMBER_OF_PROCESSORS= 80  \n'
max_num_jobs        = 'MAXIMUM_NUMBER_OF_JOBS=       48  \n'
resolution_range    = 'INCLUDE_RESOLUTION_RANGE= 30 3.0 \n'

#reference_data_set = 'REFERENCE_DATA_SET= %s \n'%'../reference.HKL'
reference_data_set = '!REFERENCE_DATA_SET= %s \n'%'../reference.HKL'
use_reference = True

Input parameters are to set by manually editing the script. If one want to use REFERENCE_DATA_SET keyword during integration, uncomment line with REFERENCE_DATA_SET in code and comment prefious one. By default, file reference.HKL in folder with express.py is used as reference.

Usage:

# for short data processing runs
python express.py
# for long run, where you may want to log off from the processing server and keep log file
python express.py |& tee log.express.py_$(date "+%Y_%m_%d_%H_%M") & disown
# to kill mistakinly started processing
pkill -f express.py; pkill -f xds_par; pkill -f xscale_par
  • reject.sh Merges all XDS_ASCII.HKL files in subfolders of current folder, optionally choosing those that have particular space group. Then iteratively runs deltaCC12 rejection with given resolution range and number of cycles. Saves all intermediate XSCALE.INP-s and XSCALE.LP-s.

Usage: bash reject.sh using default configuration (only part of the script is shown):

# will run 4 cycles of rejection
for i in `seq 1 1 4`; do
    xscale_par
    cp XSCALE.LP{,_$i}
    
    # will make deltaCC rejection in 32.0-10.0 resolutoin range with 5 bins
    ./xdscc12 scaled_nonmerged.HKL -dmin 32.0 -dmax 10 -nbin 5 > XDSCC.LP
    
    # will analyse output file XDSCC.LP
    # and write to good.xdscc names of only those minisets,
    # which have deltaCC12 higher then 2.0
    python xdscc.py XDSCC.LP 2.0 |& tee log.xdscc_"$i"

Optionally, you may want to inspect XSCALE.LP tables for all datasets and take the best one as reference or for further processing:

grep 'Nano' -A 25 XSCALE.LP_* | less
# choose best one, e.g. XSCALE.LP_2
cp xscale.inp_2 XSCALE.INP
xscale_par; cp scaled_nonmerged.HKL reference.HKL

You can also take only minisets with certain space group as initial input for further deltaCC rejection:

# comment this line
# ls */XDS_ASCII.HKL > xscale.inp

# and uncomment this -- here the space group is in first `grep`, number 22
grep SPACE_GROUP_NUMBER */XDS_ASCII.HKL | grep "22$" | tr ":" " " | awk '{print $1}' > xscale.inp
  • xdscc.py Analyses output of xdscc utility together with last XSCALE.INP used, providing the list of datasets with their deltaCC12 values. Saves list good.xdscc of those which have deltaCC12 higher than input value.

Usage:

# run xdscc12, which is executable and located in current folder
# using scaled_nonmerged file (produced with XSCALE using MERGE=FALSE)
# in resolution range 40.0-2.8 and 13 bins
./xdscc12 scaled_nonmerged.HKL -dmin 40.0 -dmax 2.8 -nbin 13 > XDSCC.LP

# analyse output file XDSCC.LP, produced by xdscc12
# and write go good.xdscc only filenames with
# deltaCC > 3.0
python xdscc.py XDSCC.LP 3.0

This will give output of following kind:

1	056_03_02_01/XDS_ASCII.HKL	    -5.75
3	101_05_02_05/XDS_ASCII.HKL	    -0.86
--	--------------------------	     0.00
4	106_05_03_04/XDS_ASCII.HKL	     0.11
7	109_05_03_07/XDS_ASCII.HKL	     0.35
5	107_05_03_05/XDS_ASCII.HKL	     1.23
10	112_05_03_10/XDS_ASCII.HKL	     1.60
8	110_05_03_08/XDS_ASCII.HKL	     3.31
12	191_13_01_01/XDS_ASCII.HKL	     4.35
9	111_05_03_09/XDS_ASCII.HKL	     4.40
11	113_05_03_11/XDS_ASCII.HKL	     4.69
6	108_05_03_06/XDS_ASCII.HKL	     7.66
13	196_13_02_03/XDS_ASCII.HKL	     8.45
15	204_18_01_01/XDS_ASCII.HKL	    10.39
2	090_04_02_02/XDS_ASCII.HKL	    11.33
14	203_17_01_01/XDS_ASCII.HKL	    24.51

and write following good.xdscc:

110_05_03_08/XDS_ASCII.HKL
191_13_01_01/XDS_ASCII.HKL
111_05_03_09/XDS_ASCII.HKL
113_05_03_11/XDS_ASCII.HKL
108_05_03_06/XDS_ASCII.HKL
196_13_02_03/XDS_ASCII.HKL
204_18_01_01/XDS_ASCII.HKL
090_04_02_02/XDS_ASCII.HKL
203_17_01_01/XDS_ASCII.HKL

Integration-2

For second integration, you usually update express.py to have initial unit cell constants as in your reference dataset, and also increase resolution range (if you see that your reference data set has potential for it):

  • express.py
grep '^!UNIT_CELL_CONSTANTS' reference.HKL
>UNIT_CELL_CONSTANTS= 59.22     45.66     86.77  90.000  91.275  90.000
grep SPACE_GROUP_NUMBER reference.HKL
>!SPACE_GROUP_NUMBER=   4

Modify express.py input parameters:

data_summary_table = 'fin.csv'
space_group         = 'SPACE_GROUP_NUMBER=  4 \n'
unit_cell_constants = 'UNIT_CELL_CONSTANTS=  59.22     45.66     86.77  90.000  91.275  90.000\n'
max_num_proc        = 'MAXIMUM_NUMBER_OF_PROCESSORS= 80  \n'
max_num_jobs        = 'MAXIMUM_NUMBER_OF_JOBS=       48  \n'
resolution_range    = 'INCLUDE_RESOLUTION_RANGE= 30 2.5 \n'

And run it:

python express.py |& tee log.express.py_$(date "+%Y_%m_%d_%H_%M") & disown

Merging

After the second run, you might assume that you have your minisets in the best quality possible, and you can start merging them in the best possible way.

  • reject.sh You may add several cycles of deltaCC rejection in various resolution ranges -- e.g. perform low-resolution rejection first (to get rid of non-isomorphous data), and then high-resolution second (to improve your resolution):
# first cycle -- resolution range 30.0-10.0, 10 bins, 5.0 deltaCC cutoff
for i in `seq 1 1 5`; do
    # part of the code omitted
    ./xdscc12 scaled_nonmerged.HKL -dmin 30.0 -dmax 10.0 -nbin 10 > XDSCC.LP
    python xdscc.py XDSCC.LP 5.0 |& tee log.xdscc_"$i"
    # part of the code omitted
    ...
done

# second cycle -- resolution range 5.0-2.5, 23 bins, 1.0 deltaCC cutoff
for i in `seq 6 1 10`; do
    # part of the code omitted
    ./xdscc12 scaled_nonmerged.HKL -dmin 5.0 -dmax 2.5 -nbin 23 > XDSCC.LP
    python xdscc.py XDSCC.LP 1.0 |& tee log.xdscc_"$i"
    # part of the code omitted
    ...
done
  • REIDX For some data sets, you may want to run reindexing (this is the case for C2_S_I4 dataset). To enable further deltaCC rejection, make sure you write your re-indexed datasets (run XSCALE with MERGE=FALSE) as XDS_ASCII.HKL in corresponding folder.

SFX

For primer in SFX data processing, please relate to original CrystFEL tutorial. Here, we discuss high-level wrappers used during the processing.

In deposited datasets, organization is the following:

C1_Zaf_P1
+-- raw_data
|   +-- r0126-cyslt1-zaf
|   +-- r0127-cyslt1-zaf
|   |   +-- cxilq5415-r0133-c00.cxi
+-- streams
|   +-- c1_zaf_p1_2019_04_29_10_27_19.stream
|   +-- c1_zaf_p1_2019_04_29_17_49_58.stream
+-- logs
+-- scratch
+-- initial.geom
+-- run_crystfel.sh
+-- analyse.sh
+-- c1_zaf.cell
+-- laststream -> streams/c1_zaf_p1_2019_04_29_17_49_58.stream

Folders scratch, logs and streams are necessary for running analyse.sh, please make sure you have them (do mkdir scratch; mkdir streams; mkdir logs before running it). Note that laststream points to the most recent stream for further analysis convenience.

Preparation of input files

  • find Locate all input files (either *.h5 or *.cxi, in publication only *.cxi is the case) in your subfolders:

Usage:

find -name *.cxi > cxi.lst
  • list_events Convert several-events-per-line input cxi.lst file to one-event-per-line input cxi_event.lst file (needs proper geometry file, which is provided in each deposition):

Usage:

list_events -i cxi.lst -o cxi-events.lst -g refined.geom

Integration

  • run_crystfel.sh Wrapper for indexamajig routine, that i) arranges all crystfel-related files into subfolders ii) automatically assigns date and time for each generated stream and recpective log file iii) links last created stream to laststream link, and shuffles input file list, so that one could quickly and reliably check indexing rate before the indexing finishes.
# prefix for all *.stream files in streams folder
PROJECT_NAME="c1_zaf_p1"
# number of cores used for processing
NPROC="95"

# PEAK FINDING PARAMETERS (see `man indexamajig`)
SNR='4.5'
THRESHOLD='210'
HIGHRES='2.5'

LST='cxi-events.lst'
CELL='c1-zaf.cell'

shuf "$LST" > input.lst # your list must have events to enable this
GEOM="initial.geom"

ln -f -s "streams/"$PROJECT_NAME"_${time}.stream" laststream
indexamajig -i input.lst \
--temp-dir=scratch \
-o "streams/"$PROJECT_NAME"_${time}.stream" \
\
-g "$GEOM" \
--peaks=peakfinder8 \
-j "$NPROC" \
--min-snr="$SNR" \
--threshold="$THRESHOLD" \
--highres="$HIGHRES" \
 \
-p "$CELL" \
--check-peaks \
 \
--indexing=felix,dirax,asdf,mosflm,xds,taketwo |& tee logs/log.indexamajig_${time}

Usage:

# short run without logging off
bash run_crystfel.sh
# long background run
bash run_crystfel.sh & disown

Integration analysis & merging

  • analysis.sh Wrapper for process_hkl, partialator, check_hkl and compare_hkl routines, which produces XSCALE.LP-like statistics table, counts images indexed with different indexers, produces command-line visible histogram of image resolution (for simple estimation of push-res parameter) and writes logs.
# Indexing analysis only:"
./analysis.sh -i laststream"
# Merging with process_hkl and analysis:"
./analysis.sh -i laststream --dorate 0 -j 96 --cell c1-zaf.cell --pushres 1.8 -s '-1' --highres 2.53
# Merging with partialator and analysis:
./analysis.sh -i laststream --dorate 1 -j 96 --cell c1-zaf.cell --pushres 1.8 -s '-1' --highres 2.53 --iterations 1 --model unity

Analysis of input stream will provide text histogram of resolution, estimated by crystfel, and info about all indexers success:

=================
Indexing details:
=================
.046 	 1986 	 asdf-nolatt-cell
.168 	 7224 	 dirax-nolatt-nocell
.389 	 16717 	 felix-latt-cell
.002 	 121 	 mosflm-latt-cell
.057 	 2470 	 taketwo-latt-cell
0 	 10 	 xds-latt-cell
=================
Indexing summary:
=================
Total number of images for processing:	 43417
Number of processed images:		 42907
Number of indexed:	 28528
Number of crystals:	 28900
Number of spots found:	 2244193
Image indexing rate:		 .66
Crystals percentage:	 .67
Average crystals per image:	 1.01

If merging was performed, following XSCALE.LP-like table will be written:

Center 1/nm  # refs Possible  Compl       Meas   Red   SNR    Std dev       Mean     d(A)    Min 1/nm   Max 1/nm	Rsplit/%	CC	CC*
     1.086     3036     3036 100.00     501807 165.3 10.75    6043.70    4365.55     9.21       0.333      1.838	8.02	0.9914373	0.9978478
     2.076     3028     3028 100.00     309303 102.1  7.94    3191.70    3226.50     4.82       1.838      2.313	12.09	0.9720119	0.9928783
     2.480     2992     2992 100.00     236952  79.2  4.85    2389.33    1890.02     4.03       2.313      2.647	19.67	0.9567785	0.9888943
     2.780     3037     3037 100.00     175450  57.8  2.74    1122.50     866.10     3.60       2.647      2.913	38.39	0.8656982	0.9633355
     3.025     3041     3041 100.00     176418  58.0  1.92     644.22     483.40     3.31       2.913      3.138	56.76	0.7393615	0.9220373
     3.236     3020     3020 100.00     156472  51.8  1.18     468.81     272.44     3.09       3.138      3.334	98.64	0.5750552	0.8545193
     3.422     3037     3037 100.00     132063  43.5  0.72     365.36     170.59     2.92       3.334      3.510	171.59	0.3274640	0.7024014
     3.590     3043     3043 100.00     128703  42.3  0.61     360.10     143.10     2.79       3.510      3.669	211.22	0.2679825	0.6501470
     3.743     3004     3004 100.00     116843  38.9  0.43     364.12     109.43     2.67       3.669      3.816	299.75	0.1926646	0.5684036
     3.884     3063     3063 100.00      94652  30.9  0.31     385.49      79.01     2.57       3.816      3.953	487.25	0.0681203	0.3571438
   -------------------------------------------------------------------------------------------------------------------------------------------------------
     2.143    30301    30301 100.00    2028663  67.0  3.14    2749.07    1159.24     4.67       0.333      3.953	28.23	0.9739452	0.9933784
#!/bin/bash
dorate="-1"
symmetry='-1'
highres='3.0'
lowres='30.0'
iterations='0'
model='unity'
mincc="0.0"
scale="0";
pushres="inf"
j="6"
# looping over input parameters
while [[ $# -gt 1 ]]
do
key="$1"
cell_set=0
case $key in
-i|--input)
input="$2"
shift # past argument
;;
--dorate)
dorate="$2"
shift # past argument
;;
-s|--symmetry)
symmetry="$2"
shift # past argument
;;
--highres)
highres="$2"
shift # past argument
;;
--lowres)
lowres="$2"
shift # past argument
;;
--iterations)
iterations="$2"
shift # past argument
;;
--mincc)
mincc="$2"
shift # past argument
;;
-m|--model)
model="$2"
shift # past argument
;;
--scale)
scale="$2";
shift # past argument
;;
-p|--pushres)
pushres="$2"
shift # past argument
;;
-h|--help)
echo "Indexing analysis only:"
echo " ./indexing_analysis.sh output.stream"
echo "Merging with process_hkl and analysis:"
echo " ./indexing_analysis.sh output.stream --dorate 0 --pushres 1.0 --highres 2.5 --lowres 30.0 --symmetry 222"
echo "Merging with partialator and analysis:"
echo " ./indexing_analysis.sh output.stream --dorate 1 --pushres 1.0 --highres 2.5 --lowres 30.0 --symmetry 222"
exit 0;
shift # past argument
;;
-j|--nproc)
j="$2"
shift # past argument
;;
-c|--cell)
cell="$2"
cell_set=1
shift # past argument
;;
--default)
DEFAULT=YES
;;
*)
# unknown option
;;
esac
shift # past argument or value
done
# if [[ "$scale" == "1" ]]; then
# echo "YES";
# else
# echo "NO";
# fi
# exit 0;
output=merging_stats_$(md5sum $input | cut -c1-5).csv
#----------------------------
# outputs to overall_stats.log statistics, obtained with check_hkl (SNR, multiplicity, N of refl, etc), and also Rsplit, CC and CC*.
function rate {
rm stats[0-9].dat &>/dev/null
compare_hkl tmp.hkl1 tmp.hkl2 -y "$symmetry" -p "$cell" --fom rsplit --nshells=10 --lowres "$lowres" --highres "$highres" &> compare_hkl.log ; cat shells.dat > stats1.dat
compare_hkl tmp.hkl1 tmp.hkl2 -y "$symmetry" -p "$cell" --fom cc --nshells=10 --lowres "$lowres" --highres "$highres" &> compare_hkl.log ; grep -a -v "shitcentre" shells.dat > stats2.dat
compare_hkl tmp.hkl1 tmp.hkl2 -y "$symmetry" -p "$cell" --fom ccstar --nshells=10 --lowres "$lowres" --highres "$highres" &> compare_hkl.log ; grep -a -v "shitcentre" shells.dat > stats3.dat
check_hkl tmp.hkl -y "$symmetry" -p "$cell" --lowres="$lowres" --highres "$highres" &> compare_hkl.log ; cat shells.dat > stats4.dat
compare_hkl tmp.hkl1 tmp.hkl2 -y "$symmetry" -p "$cell" --fom rsplit --nshells=1 --lowres "$lowres" --highres "$highres" &> compare_hkl.log ; cat shells.dat > stats5.dat
compare_hkl tmp.hkl1 tmp.hkl2 -y "$symmetry" -p "$cell" --fom cc --nshells=1 --lowres "$lowres" --highres "$highres" &> compare_hkl.log ; grep -a -v "shitcentre" shells.dat > stats6.dat
compare_hkl tmp.hkl1 tmp.hkl2 -y "$symmetry" -p "$cell" --fom ccstar --nshells=1 --lowres "$lowres" --highres "$highres" &> compare_hkl.log ; grep -a -v "shitcentre" shells.dat >> stats7.dat
check_hkl tmp.hkl --nshells 1 -y "$symmetry" -p "$cell" --lowres "$lowres" --highres "$highres" &> compare_hkl.log ; cat shells.dat >> stats8.dat
paste stats4.dat <(awk '{print $3'} stats1.dat) <(awk '{print $3'} stats2.dat) <(awk '{print $3'} stats3.dat) | head -1 > overall_stats.csv
paste stats4.dat <(awk '{print $2}' stats1.dat) <(awk '{print $2}' stats2.dat) <(awk '{print $2}' stats3.dat) | tail -n +2 >> overall_stats.csv
echo " -------------------------------------------------------------------------------------------------------------------------------------------------------" >> overall_stats.csv
paste stats8.dat <(awk '{print $2}' stats5.dat) <(awk '{print $2}' stats6.dat) <(awk '{print $2}' stats7.dat) | tail -n +2 >> overall_stats.csv
}
echo "Filename for current run: $input"
echo "Stream generated by: $(grep -a 'Generated by' "$input" | uniq)"
pythonstring='from __future__ import print_function; print(*[i.split("-i")[1].split()[0] for i in open("'$input'").readlines() if "indexamajig" in i],sep="\n")'
NIMAGES_INPUT=$(python2 -c "$pythonstring" | xargs wc -l 2> /dev/null | tail -1 | awk '{print $1}')
if [[ "$NIMAGES_INPUT" -eq 0 ]]; then
NIMAGES_INPUT="n/a (file lists not available)"
fi
#-----------------------
number_of_streams=$(grep -a 'indexamajig' $input | wc -l) # grep -as number of streams used for dorate processing
if [[ "$number_of_streams" -gt 1 ]]
then
echo "Multi-stream mode; number of streams: $number_of_streams"
echo "indexamajig string: $(grep -a 'indexamajig' $input | tail -1)"
else
echo "Single-stream mode; number of streams: 1"
echo "indexamajig string: $(grep -a indexamajig $input)"
fi
echo "md5 checksum: $(md5sum $input)"
echo "Date: $(date -R)"
echo "================="
echo "Indexing details:"
echo "================="
NIMAGES=$(grep -a "Begin chunk" $input | wc -l )
NCRYST=$(grep -a "Begin crystal" $input | wc -l )
# lists all indexing methods used
METHODS=($(egrep -a "indexed_by" "$input" | grep -a -v 'none' | sort | uniq | awk 'NF>1{print $NF}' | tr '\n' ' '))
NINDEXED=0
for i in "${METHODS[@]}"
do
if [ $i = "none" ]
then
continue
fi
tmp="$(egrep -a -w "$i" "$input" | wc -l)"
let "NINDEXED=$NINDEXED+$tmp"
ratio=$(echo " scale=3; $tmp/$NIMAGES" | bc)
echo -e $ratio "\t" $tmp "\t" "$i"
done
NSPOTS=$(grep -a "num_reflections" "$input" | awk '{print $3;}' | paste -sd+ | bc)
echo "================="
echo "Indexing summary:"
echo "================="
echo "Total number of images for processing: " $NIMAGES_INPUT
echo "Number of processed images: " $NIMAGES
echo "Number of indexed: " $NINDEXED
echo "Number of crystals: " $NCRYST
echo "Number of spots found: " $NSPOTS
#echo "Spots per image: " $(echo "scale=2; $NSPOTS/$NIMAGES" | bc )
#echo "Spots per crystal: " $(echo "scale=2; $NSPOTS/$NCRYST" | bc )
echo "Image indexing rate: " $(echo "scale=2; $NINDEXED/$NIMAGES" | bc )
echo "Crystals percentage: " $(echo "scale=2; $NCRYST/$NIMAGES" | bc)
echo "Average crystals per image: " $(echo "scale=2; $NCRYST/$NINDEXED" | bc)
echo "==================="
echo "Resolution summary:"
echo "==================="
grep 'diffraction_resolution_limit' $input | awk '{print $6}' | sort -n > reslim.txt
python2 -c 'from text_histogram import histogram; histogram([float(elem) for elem in open("reslim.txt").read().split("\n") if elem and float(elem) < 10], buckets=15)'
echo "======================="
echo "Profile radius summary:"
echo "======================="
grep 'profile_radius' $input | awk '{print $3}' | sort -n > profile_radius.txt
python2 -c 'from text_histogram import histogram; histogram([float(elem) for elem in open("profile_radius.txt").read().split("\n") if elem], buckets=15)'
if [[ "$dorate" == "1" ]]; then
# runs partialator to estimate rmeas and other foms
partialator -i "$input" -o tmp.hkl --iterations "$iterations" -j "$j" --model "$model" --push-res "$pushres" -y "$symmetry" &> partialator.log
rate
elif [[ "$dorate" == "0" ]]; then
if [[ "$scale" == "1" ]]; then
process_hkl -i "$input" --min-cc "$mincc" --scale -o tmp.hkl -y "$symmetry" --min-res "$lowres" --push-res "$pushres"
process_hkl -i "$input" --min-cc "$mincc" --scale -o tmp.hkl1 -y "$symmetry" --min-res "$lowres" --push-res "$pushres" --odd-only
process_hkl -i "$input" --min-cc "$mincc" --scale -o tmp.hkl2 -y "$symmetry" --min-res "$lowres" --push-res "$pushres" --even-only
else
process_hkl -i "$input" --min-cc "$mincc" -o tmp.hkl -y "$symmetry" --min-res "$lowres" --push-res "$pushres"
process_hkl -i "$input" --min-cc "$mincc" -o tmp.hkl1 -y "$symmetry" --min-res "$lowres" --push-res "$pushres" --odd-only
process_hkl -i "$input" --min-cc "$mincc" -o tmp.hkl2 -y "$symmetry" --min-res "$lowres" --push-res "$pushres" --even-only
fi
rate
else
:
fi
if [[ "$dorate" == "-1" ]]; then
# rate
exit 0; fi
echo "================"
echo "Merging summary:"
echo "================"
echo "Merging stats backup file: $output"
tail tmp.hkl | tail -n 5 | head -n 1
echo "================" >> "$output"
echo "Merging summary:" >> "$output"
echo "================" >> "$output"
tail tmp.hkl | tail -n 5 | head -n 1 >> "$output"
cat overall_stats.csv >> "$output"
rm stats[0-9].dat
cat overall_stats.csv
#!/usr/env/bin python
from __future__ import print_function
import os
for elem in os.listdir("."):
if "merging" in elem:
continue
if not os.path.isfile(elem + "/XDS.INP"):
continue
if os.path.isdir(elem):
print(
elem,
"/".join([os.getcwd(), elem]),
"/".join([os.getcwd(), elem, "/images"]),
1,
len(os.listdir("/".join([os.getcwd(), elem, "/images"]))) - 2,
sep=",",
)
http://portal.nersc.gov/archive/home/projects/cxidb/www/106/C1_Zaf_P21/analysis.sh
http://portal.nersc.gov/archive/home/projects/cxidb/www/106/C1_Zaf_P21/c1.lst
http://portal.nersc.gov/archive/home/projects/cxidb/www/106/C1_Zaf_P21/c1.pdb
http://portal.nersc.gov/archive/home/projects/cxidb/www/106/C1_Zaf_P21/c1_events.lst
http://portal.nersc.gov/archive/home/projects/cxidb/www/106/C1_Zaf_P21/c1_p21_2019_04_09_11_40_00.stream
http://portal.nersc.gov/archive/home/projects/cxidb/www/106/C1_Zaf_P21/c1_v1.pdb
http://portal.nersc.gov/archive/home/projects/cxidb/www/106/C1_Zaf_P21/check-near-bragg
http://portal.nersc.gov/archive/home/projects/cxidb/www/106/C1_Zaf_P21/initial-predrefine.geom
http://portal.nersc.gov/archive/home/projects/cxidb/www/106/C1_Zaf_P21/initial.geom
http://portal.nersc.gov/archive/home/projects/cxidb/www/106/C1_Zaf_P21/initial_v1.geom
http://portal.nersc.gov/archive/home/projects/cxidb/www/106/C1_Zaf_P21/initial_v2.geom
http://portal.nersc.gov/archive/home/projects/cxidb/www/106/C1_Zaf_P21/input.lst
http://portal.nersc.gov/archive/home/projects/cxidb/www/106/C1_Zaf_P21/overall_stats.csv.backup
http://portal.nersc.gov/archive/home/projects/cxidb/www/106/C1_Zaf_P21/runcrystfel.sh
http://portal.nersc.gov/archive/home/projects/cxidb/www/106/C1_Zaf_P21/streams.tar
http://portal.nersc.gov/archive/home/projects/cxidb/www/106/C1_Zaf_P21/tmp.hkl
http://portal.nersc.gov/archive/home/projects/cxidb/www/106/C1_Zaf_P21/tmp.hkl1
http://portal.nersc.gov/archive/home/projects/cxidb/www/106/C1_Zaf_P21/tmp.hkl2
http://portal.nersc.gov/archive/home/projects/cxidb/www/106/C1_Zaf_P21/tmp.lst
http://portal.nersc.gov/archive/home/projects/cxidb/www/106/C1_Zaf_P21/hdf5/r0012-cyslt1-nh4.tar
http://portal.nersc.gov/archive/home/projects/cxidb/www/106/C1_Zaf_P21/hdf5/r0013-cyslt1-nh4.tar
http://portal.nersc.gov/archive/home/projects/cxidb/www/106/C1_Zaf_P21/hdf5/r0014-cyslt1-nh4.tar
http://portal.nersc.gov/archive/home/projects/cxidb/www/106/C1_Zaf_P21/hdf5/r0015-cyslt1-nh4.tar
http://portal.nersc.gov/archive/home/projects/cxidb/www/106/C1_Zaf_P21/hdf5/r0016-cyslt1-nh4.tar
http://portal.nersc.gov/archive/home/projects/cxidb/www/106/C1_Zaf_P21/hdf5/r0017-cyslt1-nh4.tar
http://portal.nersc.gov/archive/home/projects/cxidb/www/106/C1_Zaf_P21/hdf5/r0018-cyslt1-nh4.tar
http://portal.nersc.gov/archive/home/projects/cxidb/www/106/C1_Zaf_P21/hdf5/r0019-cyslt1-nh4.tar
http://portal.nersc.gov/archive/home/projects/cxidb/www/106/C1_Zaf_P21/hdf5/r0020-cyslt1-nh4.tar
http://portal.nersc.gov/archive/home/projects/cxidb/www/106/C1_Zaf_P21/hdf5/r0058-cyslt1-nh4.tar
http://portal.nersc.gov/archive/home/projects/cxidb/www/106/C1_Zaf_P21/hdf5/r0059-cyslt1-nh4.tar
http://portal.nersc.gov/archive/home/projects/cxidb/www/106/C1_Zaf_P21/hdf5/r0060-cyslt1-nh4.tar
http://portal.nersc.gov/archive/home/projects/cxidb/www/106/C1_Zaf_P21/hdf5/r0061-cyslt1-nh4.tar
http://portal.nersc.gov/archive/home/projects/cxidb/www/106/C1_Zaf_P21/hdf5/r0062-cyslt1-nh4.tar
http://portal.nersc.gov/archive/home/projects/cxidb/www/106/C1_Zaf_P21/hdf5/r0063-cyslt1-nh4.tar
http://portal.nersc.gov/archive/home/projects/cxidb/www/106/C1_Zaf_P21/hdf5/r0064-cyslt1-nh4.tar
http://portal.nersc.gov/archive/home/projects/cxidb/www/106/C1_Zaf_P21/hdf5/r0065-cyslt1-nh4.tar
http://portal.nersc.gov/archive/home/projects/cxidb/www/106/C1_Zaf_P21/hdf5/r0066-cyslt1-nh4.tar
http://portal.nersc.gov/archive/home/projects/cxidb/www/106/C1_Zaf_P21/hdf5/r0067-cyslt1-nh4.tar
http://portal.nersc.gov/archive/home/projects/cxidb/www/106/C1_Zaf_P21/hdf5/r0119-lys3.tar
https://cxidb.org/data/106/C1_Zaf_P21_stream.tar.gz
https://www.cxidb.org/data/107/6RZ5_CysLT1R_stream.tar.gz
http://portal.nersc.gov/archive/home/projects/cxidb/www/107/6RZ5_C1_Zaf_P1/raw_data/r0127-cyslt1-zaf.tar
http://portal.nersc.gov/archive/home/projects/cxidb/www/107/6RZ5_C1_Zaf_P1/raw_data/r0128-cyslt1-zaf.tar
http://portal.nersc.gov/archive/home/projects/cxidb/www/107/6RZ5_C1_Zaf_P1/raw_data/r0129-cyslt1-zaf.tar
http://portal.nersc.gov/archive/home/projects/cxidb/www/107/6RZ5_C1_Zaf_P1/raw_data/r0130-cyslt1-zaf.tar
http://portal.nersc.gov/archive/home/projects/cxidb/www/107/6RZ5_C1_Zaf_P1/raw_data/r0131-cyslt1-zaf.tar
http://portal.nersc.gov/archive/home/projects/cxidb/www/107/6RZ5_C1_Zaf_P1/raw_data/r0133-cyslt1-zaf.tar
http://portal.nersc.gov/archive/home/projects/cxidb/www/107/6RZ5_C1_Zaf_P1/raw_data/r0180-cyslt1-zaf.tar
http://portal.nersc.gov/archive/home/projects/cxidb/www/107/6RZ5_C1_Zaf_P1/raw_data/r0181-cyslt1-zaf.tar
http://portal.nersc.gov/archive/home/projects/cxidb/www/107/6RZ5_C1_Zaf_P1/analysis.sh
http://portal.nersc.gov/archive/home/projects/cxidb/www/107/6RZ5_C1_Zaf_P1/c1-zaf.cell
http://portal.nersc.gov/archive/home/projects/cxidb/www/107/6RZ5_C1_Zaf_P1/cxi-events.lst
http://portal.nersc.gov/archive/home/projects/cxidb/www/107/6RZ5_C1_Zaf_P1/cxi.lst
http://portal.nersc.gov/archive/home/projects/cxidb/www/107/6RZ5_C1_Zaf_P1/index.html
http://portal.nersc.gov/archive/home/projects/cxidb/www/107/6RZ5_C1_Zaf_P1/input.lst
http://portal.nersc.gov/archive/home/projects/cxidb/www/107/6RZ5_C1_Zaf_P1/overall_stats.csv
http://portal.nersc.gov/archive/home/projects/cxidb/www/107/6RZ5_C1_Zaf_P1/raw_files.md5
http://portal.nersc.gov/archive/home/projects/cxidb/www/107/6RZ5_C1_Zaf_P1/raw_files_list.txt
http://portal.nersc.gov/archive/home/projects/cxidb/www/107/6RZ5_C1_Zaf_P1/refined.geom
http://portal.nersc.gov/archive/home/projects/cxidb/www/107/6RZ5_C1_Zaf_P1/runcrystfel.sh
http://portal.nersc.gov/archive/home/projects/cxidb/www/107/6RZ5_C1_Zaf_P1/streams.tar.gz
http://portal.nersc.gov/archive/home/projects/cxidb/www/107/6RZ5_C1_Zaf_P1/tmp.hkl
http://portal.nersc.gov/archive/home/projects/cxidb/www/107/6RZ5_C1_Zaf_P1/tmp.hkl1
http://portal.nersc.gov/archive/home/projects/cxidb/www/107/6RZ5_C1_Zaf_P1/tmp.hkl2
#!/bin/bash
download(){
local DATASET="$1"
local file_list="$2"
mkdir "$DATASET"
while read url; do
(cd "$DATASET" && curl -O "$url")
done < "$file_list"
}
DATASETS=("cxidb_ID106_C1_Zaf_P21" "cxidb_ID107_C1_Zaf_P1" "zenodo_6RZ4_CysLT1R" "zenodo_6RZ6_CysLT2R" "zenodo_6RZ7_CysLT2R" "zenodo_6RZ8_CysLT2R" "zenodo_6RZ9_CysLT2R")
LISTS=("cxidb_id106.txt" "cxidb_id107.txt" "zenodo_cyslt1r_6RZ4.txt" "zenodo_cyslt2r_6RZ6.txt" "zenodo_cyslt2r_6RZ7.txt" "zenodo_cyslt2r_6RZ8.txt" "zenodo_cyslt2r_6RZ9.txt")
for i in `seq 0 1 7`; do
echo "Downloading: ${DATASETS[$i]} ${LISTS[$i]}"
download ${DATASETS[$i]} ${LISTS[$i]}
done |& tee download.log
#!/usr/bin/python
from __future__ import print_function
import os
import re
from shutil import copyfile
# looks for pattern in a string; returns True if found, False if not
def Find(pat, text):
match = re.search(pat, text)
return match
data_summary_table = "fin.csv"
working_directory = os.getcwd() # assumes that you are already in processing folder
space_group = "!SPACE_GROUP_NUMBER= 4 \n"
unit_cell_constants = (
"!UNIT_CELL_CONSTANTS= 59.22 45.66 86.77 90.000 91.275 90.000\n"
)
max_num_proc = "MAXIMUM_NUMBER_OF_PROCESSORS= 80 \n"
max_num_jobs = "MAXIMUM_NUMBER_OF_JOBS= 48 \n"
resolution_range = "INCLUDE_RESOLUTION_RANGE= 40 2.0 \n"
# reference_data_set = ''
reference_data_set = "REFERENCE_DATA_SET= %s \n" % "../reference.HKL"
if len(reference_data_set) > 0:
use_reference = True
else:
use_reference = False
fin = open(data_summary_table).read().split("\n")
log = open("log.express", "w")
# reading input files into XDSs
XDSs = dict()
for index, string in enumerate(fin):
try:
name, data, inp, data_range_start, data_range_stop = string.split(",")
except ValueError:
print("Error while loading string %d:\t%s" % (index, string), file=log)
continue
XDSs[name] = [data, inp, data_range_start, data_range_stop]
print("Following folders detected:\n", file=log)
for name in XDSs.keys():
print("%s\n \t%s\n \t%s\n\n" % (name, XDSs[name][0], XDSs[name][1]), file=log)
# for each dataset folder does some stuff
for name in XDSs.keys():
os.chdir(working_directory)
xycorr = XDSs[name][1]
xds = XDSs[name][1] + "/XDS.INP"
data = XDSs[name][0]
data_range_start = XDSs[name][2]
data_range_stop = XDSs[name][3]
os.chdir(name)
project_dir = os.getcwd()
# remember current directory
copyfile("../XSCALE.express.py.INP", "XSCALE.INP")
xds = open("XDS.INP", "r")
modif = xds.readlines()
xds.close()
job = False
spotrange_first = False
for i, string in enumerate(modif):
noreference = True
if False:
pass
elif Find("SPACE_GROUP_NUMBER=", string):
modif[i] = space_group
print("### Space group added")
elif Find("UNIT_CELL_CONSTANTS=", string):
# modif[i] = 'UNIT_CELL_CONSTANTS= 36.337 35.631 41.277 90.000 93.606 90.000\n'
modif[i] = unit_cell_constants
print("### Unit cell constants added")
elif Find("JOB=", string):
if job:
modif[i] = "\n"
else:
modif[i] = "JOB=XYCORR INIT COLSPOT IDXREF\n"
job = True
print("### Job added")
elif Find("JOB=", string) and job == True:
modif[i] = "\n"
elif Find("SECONDS=", string):
modif[i] = "!" + string
print("### Seconds added")
elif Find("MAXIMUM_NUMBER_OF_PROCESSORS=", string):
modif[i] = max_num_proc
print("### Maximum number of processors added")
elif Find("MAXIMUM_NUMBER_OF_JOBS=", string):
modif[i] = max_num_jobs
print("### Maximum number of jobs added")
elif Find("X-GEO_CORR=", string):
os.system("bzip2 -d %s" % (xycorr + "/x_geo_corr.cbf.bz2\n"))
# modif[i] = 'X-GEO_CORR=%s'%(xycorr + '/x_geo_corr.cbf\n')
modif[i] = "images/x_geo_corr.cbf\n"
#
elif Find("Y-GEO_CORR=", string):
os.system("bzip2 -d %s" % (xycorr + "/y_geo_corr.cbf.bz2\n"))
# modif[i] = 'Y-GEO_CORR=%s'%(xycorr + '/y_geo_corr.cbf\n')
modif[i] = "images/y_geo_corr.cbf\n"
elif Find("RESOLUTION_RANGE", string):
modif[i] = resolution_range
print("### Resolution range added")
elif Find("LIB", string):
modif[
i
] = "LIB=/home/marin/Apps/neggia/build/src/dectris/neggia/plugin/dectris-neggia.so \n"
elif Find("REFERENCE_DATA_SET", string):
modif[i] = ""
if use_reference:
modif[i] = reference_data_set
noreference = False
else:
pass
elif Find("SPOT_RANGE", string):
modif[i] = "SPOT_RANGE= %s %s\n" % (data_range_start, data_range_stop)
noreference = True
if noreference:
modif.append("REFERENCE_DATA_SET= ../reference.HKL\n")
with open("XDS.INP", "w") as xds:
xds.writelines(modif)
xds.close()
os.system("xds_par")
os.system("xscale_par")
os.system("cp XDS_ASCII.HKL XDS_ASCII.HKL_old")
os.system("cp GXPARM.XDS XPARM.XDS")
os.system("mv CORRECT.LP CORRECT.LP.old")
os.system("mv XSCALE.LP XSCALE.LP.old")
os.system("egrep -v 'JOB|REIDX' XDS.INP > XDS.INP.new")
os.system(
'echo "! JOB=XYCORR INIT COLSPOT IDXREF DEFPIX INTEGRATE CORRECT" > XDS.INP'
)
os.system('echo "JOB=DEFPIX INTEGRATE CORRECT" >> XDS.INP')
os.system("cat XDS.INP.new >> XDS.INP")
os.system("xds_par")
os.system("xscale_par")
os.system("ls -alt")
log.close()
#!/bin/bash
cp XSCALE.INP XSCALE.INP.reject_backup
echo "MAXIMUM_NUMBER_OF_PROCESSORS= 80" >> xscale.inp
ls */XDS_ASCII.HKL > xscale.inp
sed -e 's/^/INPUT_FILE= /g' xscale.inp | sed -e 's/$/\nINCLUDE_RESOLUTION_RANGE= 30 2.5/' > XSCALE.INP.reject_0
echo "OUTPUT_FILE= scaled_nonmerged.HKL" > XSCALE.INP
echo "MERGE= FALSE" >> XSCALE.INP
echo "!REFERENCE_DATA_SET= reference.HKL" >> XSCALE.INP
echo "" >> XSCALE.INP
cat XSCALE.INP.reject_0 >> XSCALE.INP
cp XSCALE.INP XSCALE.INP.reject_0
for i in `seq 1 1 5`; do
xscale_par
cp XSCALE.LP{,_$i}
./xdscc12 scaled_nonmerged.HKL -dmin 30.0 -dmax 10.0 -nbin 7 > XDSCC.LP
python xdscc.py XDSCC.LP 1.0 |& tee log.xdscc_"$i"
cp good.xdscc xscale.inp_"$i"
sed -e 's/^/INPUT_FILE= /g' xscale.inp_"$i" | sed -e 's/$/\nINCLUDE_RESOLUTION_RANGE= 30 2.5/' > XSCALE.INP.reject_"$i"
echo "MAXIMUM_NUMBER_OF_PROCESSORS= 80" >> XSCALE.INP
echo "OUTPUT_FILE= scaled_nonmerged.HKL" > XSCALE.INP
echo "MERGE= FALSE" >> XSCALE.INP
echo "!REFERENCE_DATA_SET= reference.HKL" >> XSCALE.INP
echo "" >> XSCALE.INP
cat XSCALE.INP.reject_"$i" >> XSCALE.INP
cp scaled_nonmerged.HKL reference.HKL
done
for i in `seq 6 1 9`; do
xscale_par
cp XSCALE.LP{,_$i}
./xdscc12 scaled_nonmerged.HKL -dmin 10.0 -dmax 2.5 -nbin 15 > XDSCC.LP
python xdscc.py XDSCC.LP 1.0 |& tee log.xdscc_"$i"
cp good.xdscc xscale.inp_"$i"
sed -e 's/^/INPUT_FILE= /g' xscale.inp_"$i" | sed -e 's/$/\nINCLUDE_RESOLUTION_RANGE= 30 2.5/' > XSCALE.INP.reject_"$i"
echo "MAXIMUM_NUMBER_OF_PROCESSORS= 80" >> XSCALE.INP
echo "OUTPUT_FILE= scaled_nonmerged.HKL" > XSCALE.INP
echo "MERGE= FALSE" >> XSCALE.INP
echo "!REFERENCE_DATA_SET= reference.HKL" >> XSCALE.INP
echo "" >> XSCALE.INP
cat XSCALE.INP.reject_"$i" >> XSCALE.INP
cp scaled_nonmerged.HKL reference.HKL
done
#!/bin/bash
time=$(date "+%Y_%m_%d_%H_%M_%S")
PROJECT_NAME="protein"
NPROC=`nproc`
# PEAK FINDING PARAMETERS
SNR='4.4'
THRESHOLD='20'
HIGHRES='3.0'
LST='c1_events.lst'
CELL='c1_v1.pdb'
#shuf "$LST" | head -n 1000 > input.lst # your list must have events to enable this
shuf "$LST" > input.lst # your list must have events to enable this
GEOM="initial_v1.geom"
ln -f -s "streams/"$PROJECT_NAME"_${time}.stream" laststream
indexamajig -i input.lst \
--temp-dir=scratch \
-o "streams/"$PROJECT_NAME"_${time}.stream" \
\
-g "$GEOM" \
--peaks=peakfinder8 \
-j "$NPROC" \
--min-snr="$SNR" \
--threshold="$THRESHOLD" \
--highres="$HIGHRES" \
--max-res=300 \
--min-res=80 \
\
-p "$CELL" \
--check-peaks \
\
--multi \
--indexing=dirax,xds,asdf,taketwo,xgandalf |& tee logs/log.indexamajig_${time}
#!/usr/bin/env python
from __future__ import print_function
import os
import sys
import re
# USAGE EXAMPLE:
# xscale_par; grep 'Nano' -A 25 XSCALE.LP; xdscc12 scaled_nonmerged.HKL -dmin 5.0 -dmax 2.5 -nbin 10 > XDSCC.LP; python xdscc.py XDSCC.LP
fin = open("XDSCC.LP")
fin = open(sys.argv[1])
fin_re = open(sys.argv[1])
try:
cutoff = float(sys.argv[2])
except IndexError:
cutoff = 1.0
def fill(string, N=15):
"Pads string with spaces up to length N"
if len(string) > N:
return string[:N]
else:
return string + " " * (N - len(string))
def get_rejected_crystals(xdscclp, rejection_func=None, mode="noano"):
"""
Parsing of XDSCC12.LP file using rejection critecia 'rejection_func'.
Returns set of numbers -- the bad crystals with respect to numbering in
the initial xdscclp file.
"""
fin = open(xdscclp)
bad_crystals = set()
if mode == "noano":
a = re.compile("^a\s+")
b = re.compile("^b\s+")
c = re.compile("^c\s+")
elif mode == "ano":
a = re.compile("^d\s+")
b = re.compile("^e\s+")
c = re.compile("^f\s+")
else:
print("Wrong mode given to get_rejected_crystals: %s" % mode)
sys.exit(1)
while True:
fline = fin.readline()
if not fline:
break
if a.match(fline):
crystal_number = int(fline.split()[1])
if crystal_number % 100 == 0:
print("Working with crystal number %d" % crystal_number, end="\r")
elif b.match(fline):
try:
CC = [float(i) for i in fline.split()[1:]]
except ValueError:
# print("Unusual pattern while parsing CC in %s"%fline)
CC = resolving(fline.replace("-100", " -100"))
CC = [i for i in CC if i != 0.0]
elif c.match(fline):
Nref = [int(i) for i in fline.split()[1:]]
CCaverage = sum([i for i in CC]) / len(CC)
if CCaverage < 0 and sum(Nref) / len(Nref) > 10:
bad_crystals.add(crystal_number)
# returns set()
return bad_crystals
# expressins to parse HKL file
dataset = lambda string: " ISET=" in string and "INPUT_FILE" in string
reflection_file = lambda string: "reflection file is" in string
getnamesfrom = [i for i in fin_re.readlines() if reflection_file(i)][0].split()[-1]
getnamesfrom = open(getnamesfrom)
fin_re.close()
datasets_from_xscale = dict()
i = 1
for fline in getnamesfrom.readlines():
if dataset(fline):
# print(fline,end='')
datasets_from_xscale[i] = {"name": fline.split("INPUT_FILE=")[-1][:-1]}
i += 1
getnamesfrom.close()
# expressions to parse XDSCC.LP
resolution_shells = lambda string: "resolution shells (for lines starting" in string
abcdef = re.compile("^[abcdef]\s+", re.M)
next_shells = False
for fline in fin.readlines():
if resolution_shells(fline):
next_shells = True
continue
elif next_shells:
shells = [float(i) for i in fline.split() if i]
next_shells = False
if "overall" in fline:
j = 0
elif abcdef.match(fline):
current_type, numbers = fline.split()[0], [float(i) for i in fline.split()[1:]]
if current_type == "a" or current_type == "d":
j += 1
datasets_from_xscale[j][current_type] = numbers
iterxds = True
try:
fin = open("iterxds.log")
except IOError:
iterxds = False
if iterxds:
i = 1
for fline in fin.readlines():
if "overall" in fline and len(fline.split()) > 3:
rmeas_overall_low = fline.replace("%", "").split()[0]
elif "XDS_ASCII" in fline:
name = fline.split()[-1]
rmeas_low, rmeas_overall = fline.replace("%", "").split()[:2]
rmeas_low = float(rmeas_low)
rmeas_overall = float(rmeas_overall)
for key in datasets_from_xscale.keys():
if datasets_from_xscale[key]["name"] == name:
datasets_from_xscale[key]["rmeas_low"] = rmeas_low
datasets_from_xscale[key]["rmeas_overall"] = rmeas_overall
fout = open("good.xdscc", "w")
padding_length = max([len(elem["name"]) for elem in datasets_from_xscale.values()])
toprint = ["--\t%s\t %8.2f" % ("-" * padding_length, 0)]
for key in datasets_from_xscale.keys():
name = datasets_from_xscale[key]["name"]
CCnoano = datasets_from_xscale[key]["b"]
Nrefsnoano = datasets_from_xscale[key]["c"]
# CCano = datasets_from_xscale[key]['e']
# Nrefsano = datasets_from_xscale[key]['f']
if iterxds:
rmeas_low = datasets_from_xscale[key]["rmeas_low"]
rmeas_overall = datasets_from_xscale[key]["rmeas_overall"]
toprint.append(
"%d\t%s\t %8.2f\t%2.2f\t%2.2f"
% (
key,
fill(name, N=padding_length),
sum(CCnoano) / len(CCnoano),
rmeas_low,
rmeas_overall,
)
)
else:
toprint.append(
"%d\t%s\t %8.2f"
% (key, fill(name, N=padding_length), sum(CCnoano) / len(CCnoano))
)
if sum(CCnoano) / len(CCnoano) >= cutoff:
print("%s" % name, file=fout)
print(*sorted(toprint, key=lambda f: float(f.split()[2])), sep="\n")
https://zenodo.org/record/3921911/files/6RZ4_C1_Pran.tar.gz?download=1
https://zenodo.org/record/3921911/files/6RZ4_C1_Pran_hkls.tar.gz?download=1
https://zenodo.org/record/3842753/files/6RZ6_C2_L_C2221_hkls.tar.gz?download=1
https://zenodo.org/record/3842753/files/6RZ6_C2_L_C2221.tar.gz?download=1
https://zenodo.org/record/3921930/files/6RZ7_C2_L_F222_hkls.tar.gz?download=1
https://zenodo.org/record/3921930/files/6RZ7_C2_L_F222.tar.gz?download=1
https://zenodo.org/record/3921931/files/6RZ8_C2_S_I4_hkls.tar.gz?download=1
https://zenodo.org/record/3921931/files/6RZ8_C2_S_I4.tar.gz?download=1
https://zenodo.org/record/3921934/files/6RZ9_C2_O_C2221_hkls.tar.gz?download=1
https://zenodo.org/record/3921934/files/6RZ9_C2_O_C2221.tar.gz?download=1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment