pkeller/aa_final_stage_aniso_stats.md

## aa_final_stage_aniso_stats.md

      
    Raw
  

              aa_final_stage_aniso_stats.md
            
          
    Archiving the final anisotropic data processing statistics in PDBx/mmCIF

In what follows, proposed new category and item names are shown with a prefix gphl.
Some of the changes suggested below are also applicable to isotropic data processing, however they have arisen from the consideration of anisotropic cases where they are particularly important.
Completeness

The mmCIF dictionary currently has one measure of completeness. For an entire data set it is _reflns.percent_possible_obs and for a resolution-defined shell it is _reflns_shell.percent_possible_obs. These items refer to data within two cut-off surfaces that are defined by low and high resolution limits, and are therefore spherical. An anisotropic treatment of the data requires the completeness to be calculated taking into account a non-spherical cut-off surface. Unobserved data outside the cut-off surface are not taken into account when calculating anisotropic completeness, so the resulting values differ from the conventional isotropic completeness.
We also note that the current PDBx dictionary does not seem to have items for anomalous completeness.
In order to allow for all four possible types of completeness to be archived, we propose extending the reflns category with the following three items:

_reflns.gphl_percent_possible_obs_anomalous
_refnls.gphl_percent_possible_obs_aniso
_reflns.gphl_percent_possible_obs_aniso_anomalous

and the reflns_shell category with the following three items:

_reflns_shell.gphl_percent_possible_obs_anomalous
_refnls_shell.gphl_percent_possible_obs_aniso
_reflns_shell.gphl_percent_possible_obs_aniso_anomalous

Data on the anisotropy

We suggest creating a new category to hold data that relates specifically to the degree of anisotropy, and linking it to the reflns category. The STARANISO output includes the following:
Diffraction limits & principal axes of ellipsoid fitted to diffraction cut-off surface:

                              1.887         0.9873   0.1573  -0.0204       0.950 _a_* + 0.305 _b_* + 0.065 _c_*
                              1.489        -0.1065   0.7526   0.6498      -0.075 _a_* + 0.679 _b_* + 0.730 _c_*
                              1.569         0.1176  -0.6394   0.7599       0.097 _a_* - 0.676 _b_* + 0.730 _c_*

Columns 2-4 are the axis directions of the ellipsoid expressed as an orthonormal set
    Fraction of data inside cut-off surface:        82.2%  (    45658 /    55540)
    
    Fraction of cut-off surface above threshold:    43.4%  (    1270 /    2927)

Beq:                               19.65    [ = equivalent overall isotropic B factor on Fs.]

                                               B11      B22      B33      B23      B31      B12
Delta-B tensor:                              12.10    -6.43    -5.67     2.44     0.86     6.20

This could be represented in PDBx/mmCIF as:
gphl_reflns_aniso.reflns_pdbx_ordinal    1       # child of _reflns.pdbx_ordinal
gphl_reflns_aniso.diffrn_limit_1          1.887
gphl_reflns_aniso.diffrn_limit_2          1.489
gphl_reflns_aniso.diffrn_limit_3          1.569

gphl_reflns_aniso.axis_1_ortho[1]  0.9873
gphl_reflns_aniso.axis_1_ortho[2]  0.1573
gphl_reflns_aniso.axis_1_ortho[3] -0.0204
gphl_reflns_aniso.axis_2_ortho[1] -0.1065
gphl_reflns_aniso.axis_2_ortho[2]  0.7526
gphl_reflns_aniso.axis_2_ortho[3]  0.6498
gphl_reflns_aniso.axis_3_ortho[1]  0.1176
gphl_reflns_aniso.axis_3_ortho[2] -0.6394
gphl_reflns_aniso.axis_3_ortho[3]  0.7599


gphl_reflns_aniso.axis_1_rcell[1]  0.868
gphl_reflns_aniso.axis_1_rcell[2]  0.451 
gphl_reflns_aniso.axis_1_rcell[3]  0.210
gphl_reflns_aniso.axis_2_rcell[1] -0.186
gphl_reflns_aniso.axis_2_rcell[2]  0.897
gphl_reflns_aniso.axis_2_rcell[3] -0.400
gphl_reflns_aniso.axis_3_rcell[1] -0.151
gphl_reflns_aniso.axis_3_rcell[2]  0.401
gphl_reflns_aniso.axis_3_rcell[3]  0.904

gphl_reflns_aniso.b[1][1]   31.75
gphl_reflns_aniso.b[2][2]   13.22
gphl_reflns_aniso.b[3][3]   13.98
gphl_reflns_aniso.b[2][3]   22.09
gphl_reflns_aniso.b[3][1]   20.51
gphl_reflns_aniso.b[1][2]   25.85

gphl_reflns_aniso.percent_data_inside_cutoff       82.2
gphl_reflns_aniso.percent_cutoff_above_threshold   43.4

where we archive the absolute B tensor, rather than Beq and the delta-B tensor.
All these items could be incorporated directly into the reflns category, if that is thought to be a better solution.
Reflection redundancy

There is currently no way of representing the individual redundancy of merged reflections. We propose extending the refln category with three new items to cater for this:

_refln.gphl_number_obs (the redundancy of _refln.intensity_meas)
_refln.gphl_number_obs_plus (the redundancy of _refln.pdbx_I_plus)
_refln.gphl_number_obs_minus (the redundancy of _refln.pdbx_I_minus)

For centric reflections, only the first of these three items would be populated.
Including these redundancies would allow better interpretation of individual σ(I) values, and improved visualisation of the effects of detector module gaps, shadowing and cusps (even for data from multi-orientation data collections).
Reflection binning by statistical significance

STARANISO uses the local mean I/σ(I) as a measure of statistical significance, and calculates this value for each reflection. A cut-off surface of arbitrary shape is then defined based on a threshold value of this local mean I/σ(I). (For more details, see the STARANISO documentation). Views of the reciprocal lattice, binned and coloured by statistical significance are then produced by:

the WebGL viewer on the STARANISO server (in 3D)
autoPROC (as 2D plots of key projections)

An example of a 2D plot is:


p0r plot from reprocessing JCSG images for 4IB2


We propose to archive the measure of statistical significance for each reflection, with an associated status to aid interpretation.
To allow the 2D and 3D plots to be reproduced (or other equivalent views to be generated) the binning determined by STARANISO is also required. Our suggestion on how to do this is as follows:
_gphl_refln_signal.criterion 'local(mean(I/sigI))'

loop_
_gphl_refln_signal_bin.upper_threshold
# The first threshold defines the lower limit of statistical
# significance, i.e. the cut-off surface
  1.20
  7.22
 19.00
 36.81
 48.87
 53.84
 57.69
 
 loop_
_refln.index_h
_refln.index_k
_refln.index_l
_refln.gphl_signal_status
_refln.gphl_signal
# ... other items omitted 
    1  -37    0    -    .         # Unobservable (grey)
    2  -33    0    <    .         # Observable but unmeasured (blue)
    1  -32    0    o    3.38      # Observed, with associated signal (red, orange, etc.)
  19    18    0    x    0.23      # Observed, but with high individual I/sig(I) (pink) 

In this example, the proposed item _refln.gphl_signal_status borrows some of its controlled vocabulary from _refln.status
Goodness of fit

The analysis implemented by STARANISO entails fitting an analytical function to the statistical significance: currently this fit is of an ellipsoid to the boundary between statistically significant and non-significant regions of reciprocal space. In the future, we would also like to archive a measure of the goodness of fit. We are currently considering how best to do this.

  
## staraniso_alldata.ismean-p0r.png

      
    Raw
  

              staraniso_alldata.ismean-p0r.png