code: Quanto
Totally based on FastQC
This document is based on FastQC version 0.11.3
Basic quality check modules from FastQC
input file name, e.g. "SRR000001.fastq.gz". Not suitable for identifier since this can be like "stdin" due to the calculation workflow.
- type: String
'Conventional base calls' or 'colorspace', most data are base calls while data from SOLiD are colorspace which required to be converted to base calls.
- type: Categorical
e.g. 'Sanger / Illumina 1.9'
- type: Categorical
Total number of sequences in the input file.
- type: Integer
Number of filtered sequences specified by option, must be 0 since we do not filter any sequences.
- type: Integer
e.g. '36' or '50-150'.
- type: Integer or Range
Overall percent GC.
- type: Integer
Mean, median, lower quartile, upper quartile, 10th quartile, 90th quartile of phred score for each base positions.
- type: dataframe (integer, float)
(new module, not yet implemented)
Count of sequences for each phred scores.
- type: dataframe (integer, float)
Percentages of G, A, T, C for each base positions.
- type: dataframe (integer, float)
Count of sequences for each %GC.
- type: dataframe (integer, float)
Percentages of bases called as N for each base positions.
- type: dataframe (integer, float)
Count of sequences for each sequence length.
- type: dataframe (integer, float)
Percentage of the duplicated sequences of the total.
- type: float
Relative count of duplicated sequences for each duplication levels.
- type: dataframe (integer, float)
List of sequences appeared more than 0.1% of total with its score (count-percentage) and possible source e.g. adaptor sequence.
- type: dataframe (string, float)
Count, ratio of overall observed/expected, ratio of max observed/exepected and position of max pbserved/expected for each 5-mers.
- type: dataframe (string, float)
Custom quality indicate modules
Minimum length of sequences indicated by sequence length module.
- type: float
Maximum length of sequences indicated by sequence length module.
- type: float
Mean length of sequences indicated by sequence length module.
- type: float
Median length of sequences indicated by sequence length module.
- type: float
Quality indicator for the input file using mean value of per base sequence quality.
- type: float
Quality indicator for the input file using median value of per base sequence quality.
- type: float
Percentage of overall n appearance calculated from per base n content.
- type: float
Example of plots are available at official page of FastQC. See good illumina data and bad illumina data
##FastQC 0.10.1
>>Basic Statistics pass
#Measure Value
Filename ERR055260.fastq
File type Conventional base calls
Encoding Sanger / Illumina 1.9
Total Sequences 33692804
Filtered Sequences 0
Sequence length 36
%GC 40
>>END_MODULE
>>Per base sequence quality pass
#Base Mean Median Lower Quartile Upper Quartile 10th Percentile 90th Percentile
1 38.09500847718106 39.0 38.0 40.0 35.0 40.0
2 37.703108058326045 39.0 38.0 40.0 33.0 40.0
3 37.38177641730264 39.0 38.0 40.0 33.0 40.0
4 37.75079236504032 39.0 38.0 40.0 33.0 40.0
5 37.715360496561814 39.0 38.0 40.0 33.0 40.0
6 37.88910848737908 39.0 38.0 40.0 35.0 40.0
7 37.7323402647046 39.0 38.0 40.0 33.0 40.0
8 37.696287788929645 39.0 38.0 40.0 33.0 40.0
9 37.65292689798095 39.0 38.0 40.0 33.0 40.0
10 37.574305005899774 39.0 38.0 40.0 33.0 40.0
11 37.67899068299569 39.0 38.0 40.0 33.0 40.0
12 37.39158314042369 39.0 37.0 40.0 33.0 40.0
13 37.38735989441544 39.0 37.0 40.0 33.0 40.0
14 37.2906411410579 39.0 37.0 40.0 33.0 40.0
15 37.17269708392332 39.0 36.0 40.0 32.0 40.0
16 37.22397919152113 39.0 37.0 40.0 33.0 40.0
17 37.10818915516797 39.0 36.0 40.0 32.0 40.0
18 37.02408362925211 39.0 36.0 40.0 32.0 40.0
19 37.07573323371958 39.0 36.0 40.0 32.0 40.0
20 36.95778353739867 39.0 36.0 40.0 31.0 40.0
21 37.08180610316672 39.0 36.0 40.0 33.0 40.0
22 36.990251004339086 39.0 36.0 40.0 32.0 40.0
23 37.02335727237187 39.0 36.0 40.0 32.0 40.0
24 36.93700628181614 39.0 36.0 40.0 32.0 40.0
25 37.028989513606525 39.0 36.0 40.0 32.0 40.0
26 36.98949529400996 39.0 36.0 40.0 33.0 40.0
27 36.79439235748975 39.0 36.0 40.0 32.0 40.0
28 36.543570342201264 38.0 36.0 40.0 31.0 40.0
29 36.43908939724933 38.0 36.0 40.0 31.0 40.0
30 36.523010106252954 38.0 36.0 40.0 31.0 40.0
31 36.429458498022306 38.0 36.0 40.0 31.0 40.0
32 36.27531036004009 38.0 36.0 40.0 31.0 40.0
33 36.12885104487 38.0 35.0 40.0 30.0 40.0
34 35.739400080800635 38.0 35.0 40.0 29.0 40.0
35 35.66179745681006 38.0 35.0 40.0 29.0 40.0
36 35.6744608136503 38.0 35.0 40.0 29.0 40.0
>>END_MODULE
>>Per sequence quality scores pass
#Quality Count
2 50286.0
3 966.0
4 1304.0
5 1936.0
6 2218.0
7 3957.0
8 4201.0
9 5279.0
10 6306.0
11 6334.0
12 8558.0
13 10742.0
14 12620.0
15 14313.0
16 17574.0
17 22568.0
18 27724.0
19 35020.0
20 43862.0
21 54141.0
22 65589.0
23 80119.0
24 101310.0
25 131505.0
26 173663.0
27 230580.0
28 304747.0
29 395319.0
30 505505.0
31 636941.0
32 798371.0
33 1016143.0
34 1363547.0
35 1939626.0
36 2835615.0
37 4343740.0
38 5375975.0
39 1.2696099E7
40 368501.0
>>END_MODULE
>>Per base sequence content warn
#Base G A T C
1 21.568317674005407 27.723905080740685 28.783710017130065 21.924067228123846
2 20.859004009526448 30.14009743146298 29.378557538203598 19.622341020806967
3 19.871759020533563 30.333583765680793 30.451993479885303 19.342663733900338
4 20.112516407814205 29.907275072849743 30.627476606398616 19.352731912937433
5 20.77839328134104 30.234710295214462 30.45225092295607 18.53464550048843
6 19.751613744030394 30.1310583998951 30.18988111572674 19.927446740347765
7 19.529743330257617 30.02529454002178 30.939361132257005 19.505600997463603
8 19.2779686510499 30.019016227293406 30.809971494505817 19.893043627150877
9 19.4511712664278 29.394952282485644 31.11639079655591 20.037485654530645
10 19.733634265818598 29.61723742162058 30.385820896474836 20.263307416085986
11 19.86803428323437 29.365699141470742 30.45093539973882 20.315331175556068
12 19.64830531765774 29.320177685419118 30.60173323656885 20.42978376035429
13 19.906591923901615 29.467431680663918 30.411805440710722 20.214170954723745
14 19.900890409714787 29.49094708769267 30.55503186971319 20.053130632879355
15 19.683514616355467 29.496283538763947 30.76681299662682 20.05338884825377
16 19.587630640655497 29.375961110271497 30.722661135594414 20.31374711347859
17 19.566697387370905 29.37546248747952 30.63057619069045 20.42726393445912
18 19.52858835969841 29.697801346542725 30.523075491134545 20.250534802624323
19 19.756595503300943 29.432590413074557 30.59961705769576 20.211197025928744
20 19.852909837958276 29.28972904718764 30.29984681595512 20.55751429889896
21 19.736086672988094 29.505445732566514 30.431483233036943 20.32698436140845
22 20.00252041949373 29.274455756190548 30.51929130030258 20.20373252401314
23 19.962701827963027 29.42996077144544 30.442031479481496 20.16530592111004
24 19.800753300318966 29.59609416895074 30.577903222302304 20.025249308427995
25 19.891263421783677 29.602993541469186 30.486007070590333 20.01973596615681
26 19.744303667145363 29.335705260385424 30.677747191399167 20.242243881070042
27 19.94891082123505 29.38188304603278 30.572581372777712 20.09662475995446
28 19.752763824584026 29.42595398115277 30.428625055961504 20.392657138301697
29 19.730910493528526 29.43132308014495 30.505187398472387 20.332579027854138
30 20.202162847086 29.29213026470447 30.290274002697696 20.21543288551183
31 19.92989728726722 29.482277901839876 30.353785993200173 20.234038817692728
32 20.130666577989917 29.32103200735859 30.49401461622464 20.05428679842686
33 19.89157010642012 29.65369525514942 30.408405906898427 20.046328731532036
34 19.98428803966568 29.35088750701782 30.687609140515583 19.977215312800915
35 20.029303586605614 29.357663434601644 30.47588440546533 20.13714857332741
36 20.054531525485384 29.322394776047727 30.429153358681578 20.193920339785315
>>END_MODULE
>>Per base GC content pass
#Base %GC
1 43.49238490212925
2 40.481345030333415
3 39.214422754433905
4 39.465248320751634
5 39.31303878182947
6 39.67906048437816
7 39.03534432772122
8 39.17101227820078
9 39.48865692095844
10 39.99694168190459
11 40.18336545879044
12 40.07808907801203
13 40.12076287862536
14 39.95402104259414
15 39.736903464609235
16 39.90137775413409
17 39.99396132183002
18 39.77912316232273
19 39.96779252922968
20 40.41042413685724
21 40.06307103439654
22 40.20625294350687
23 40.12800774907306
24 39.82600260874696
25 39.910999387940485
26 39.986547548215405
27 40.045535581189505
28 40.14542096288572
29 40.063489521382664
30 40.41759573259783
31 40.16393610495995
32 40.18495337641678
33 39.937898837952154
34 39.9615033524666
35 40.16645215993302
36 40.248451865270695
>>END_MODULE
>>Per sequence GC content pass
#GC Content Count
0 2030.0
1 2569.0
2 3108.0
3 3108.0
4 6722.5
5 10337.0
6 10337.0
7 25111.0
8 39885.0
9 39885.0
10 68722.5
11 97560.0
12 97560.0
13 163166.5
14 228773.0
15 382853.0
16 536933.0
17 536933.0
18 1078223.0
19 1619513.0
20 1619513.0
21 1577108.5
22 1534704.0
23 1534704.0
24 1513886.5
25 1493069.0
26 1572658.5
27 1652248.0
28 1652248.0
29 1806665.5
30 1961083.0
31 1961083.0
32 2127069.0
33 2293055.0
34 2293055.0
35 2425408.0
36 2557761.0
37 2557761.0
38 2745179.0
39 2932597.0
40 2927845.0
41 2923093.0
42 2923093.0
43 2821301.0
44 2719509.0
45 2719509.0
46 2603949.0
47 2488389.0
48 2488389.0
49 2340317.0
50 2192245.0
51 2004598.0
52 1816951.0
53 1816951.0
54 1624925.5
55 1432900.0
56 1432900.0
57 1286259.0
58 1139618.0
59 1139618.0
60 952493.0
61 765368.0
62 765368.0
63 635691.5
64 506015.0
65 411790.5
66 317566.0
67 317566.0
68 254479.5
69 191393.0
70 191393.0
71 151128.0
72 110863.0
73 110863.0
74 86738.5
75 62614.0
76 48186.5
77 33759.0
78 33759.0
79 25211.0
80 16663.0
81 16663.0
82 12244.0
83 7825.0
84 7825.0
85 5538.0
86 3251.0
87 3251.0
88 2228.5
89 1206.0
90 817.5
91 429.0
92 429.0
93 292.5
94 156.0
95 156.0
96 144.0
97 132.0
98 132.0
99 167.5
100 203.0
>>END_MODULE
>>Per base N content pass
#Base N-Count
1 0.0
2 0.01642487220713361
3 0.5503192907304479
4 0.004570708926452069
5 7.330942239179619E-4
6 1.1278372675660952E-4
7 0.002813657183296469
8 2.967992809384461E-6
9 0.0029145689388155407
10 0.0032143362125633713
11 0.0012376530015133203
12 0.0
13 0.0
14 0.0
15 0.0
16 0.0
17 0.0
18 0.0
19 0.0
20 0.0
21 0.0
22 0.0
23 0.0
24 0.0
25 5.935985618768922E-6
26 0.02445032476370919
27 0.03007467113749274
28 0.0
29 0.0
30 0.0010744133969971747
31 0.006645335900211808
32 0.03687434266379254
33 0.022093738473057924
34 0.0
35 0.0
36 0.0
>>END_MODULE
>>Sequence Length Distribution pass
#Length Count
36 3.3692804E7
>>END_MODULE
>>Sequence Duplication Levels fail
#Total Duplicate Percentage 79.75348905003098
#Duplication Level Relative count
1 100.0
2 94.58492688413948
3 70.533183352081
4 43.69853768278965
5 25.700787401574804
6 14.77615298087739
7 9.275590551181102
8 6.717660292463442
9 4.893138357705287
10++ 79.76377952755905
>>END_MODULE
>>Overrepresented sequences warn
#Sequence Count Percentage Possible Source
GATCGGAAGAGCGGTTCAGCAGGAATGCCGAGATCG 66145 0.19631788437673514 Illumina Paired End PCR Primer 2 (97% over 36bp)
TTATTCTATGTTATTCTATGTTATTCTATGTTATTC 55521 0.16478592876983464 No Hit
AATAACATAGAATAACATAGAATAACATAGAATAAC 52868 0.1569118438465377 No Hit
ATAGAATAACATAGAATAACATAGAATAACATAGAA 50994 0.1513498253217512 No Hit
CTATGTTATTCTATGTTATTCTATGTTATTCTATGT 50545 0.15001719655033757 No Hit
CATAGAATAACATAGAATAACATAGAATAACATAGA 49336 0.14642889324379177 No Hit
GAATAACATAGAATAACATAGAATAACATAGAATAA 48688 0.14450563390331064 No Hit
TATGTTATTCTATGTTATTCTATGTTATTCTATGTT 48627 0.14432458634193818 No Hit
GTTATTCTATGTTATTCTATGTTATTCTATGTTATT 48349 0.1434994843409293 No Hit
TGTTATTCTATGTTATTCTATGTTATTCTATGTTAT 47439 0.14079861088438944 No Hit
AGAATAACATAGAATAACATAGAATAACATAGAATA 46916 0.13924635064508137 No Hit
TAGAATAACATAGAATAACATAGAATAACATAGAAT 45861 0.13611511823118078 No Hit
ATGTTATTCTATGTTATTCTATGTTATTCTATGTTA 44430 0.1318679205209516 No Hit
ACATAGAATAACATAGAATAACATAGAATAACATAG 41366 0.12277399055299762 No Hit
TCTATGTTATTCTATGTTATTCTATGTTATTCTATG 41338 0.12269088675433484 No Hit
TTCTATGTTATTCTATGTTATTCTATGTTATTCTAT 40405 0.11992174946317916 No Hit
AACATAGAATAACATAGAATAACATAGAATAACATA 38890 0.1154252403569617 No Hit
ATAACATAGAATAACATAGAATAACATAGAATAACA 38263 0.11356430886547762 No Hit
TAACATAGAATAACATAGAATAACATAGAATAACAT 37993 0.11276295080694383 No Hit
>>END_MODULE
>>Kmer Content warn
#Sequence Count Obs/Exp Overall Obs/Exp Max Max Obs/Exp Position
CTATG 3682525 3.1166635 3.6598775 6
CAGCA 1692370 2.2376845 5.340712 17
AGCAG 1409890 1.8827376 5.0392146 18
TGCCG 888770 1.6877342 6.104747 26
GAGCG 846420 1.6750124 6.33894 9
GCCGA 791525 1.5509424 6.0667715 27
GCAGG 780815 1.5451841 6.1764627 19
AGCGG 772975 1.5296693 6.2083983 10
GCGGT 619000 1.187152 5.674342 11
CCGAG 565840 1.1087272 5.6430373 28
>>END_MODULE