9.10. errorToolGeneralReport

9.10.1. Overview

In support of instrument development and run diagnostics, the errorTool is capable of decomposing per base error rates in many different ways. The most generic partitioning of the errors is by read length and is described here.

9.10.2. Report Details

The general report results from running errorTool with the following command line arguments:

$ errorTool --reference_file ref.fasta --read_file reads.sms \
             --analysis_type general

The report contains three sub-reports in which the per base error rates have been partitioned by their aligned position in either a homopolymer, non-homopolymer, or irrespective of these reference contexts. The sub-report headers: Errors in the whole read, Errors in non-homopolymer regions, and Errors in homopolymer regions precede each sub-report.

9.10.3. Columns

  • Read Length: Indicates the read length on which the per base error rate was computed. These rates (and the intermediate values used to compute them) are reported in the remaining column entries for each row.
  • Number of Reads: Indicates the number of reads used to in the assessment.
  • Cumul. Length: Indicates the total number of reference positions spanned by the reads of a given row’s length. Since this represents the aligned total bases read, it is not Number of Reads x Read Length. To be precise, this is Number of Reads x Read Length + Cumul. Missed - Cumul. Inserts.
  • Cumul. Inserts: Indicates the total number of bases read and not matched with a base in the reference sequence
  • Cumul. Missed: Indicates the total number of bases not read, but expected in the reference sequence
  • Cumul. Substs: Indicates the total number of bases read and mismatched with a base in the reference sequence
  • Cumul. Error: Indicates the total number of erroronuos bases read. Specifically, Cumul. Error = Cumul. Inserts + Cumul. Missed + Cumul. Substs.
  • Percent Inserts: Per base insertion rate for reads of a specific length. The value reported is a percentage and is 100 * Cumul. Inserts / Cumul. Length.
  • Percent Missed: Per base deletion rate for reads of a specific length. The value reported is a percentage and is 100 * Cumul. Missed / Cumul. Length.
  • Percent Substs: Per base substitution rate for reads of a specific length. The value reported is a percentage and is 100 * Cumul. Substs / Cumul. Length.
  • Percent Error: Per base total rate for reads of a specific length. The value reported is a percentage and is 100 * *Cumul. Error / Cumul. Length.

Generic Report Output Information

Reports are stored in a set of files with names that encode the flowcell, channel, camera, and read pass number. Specifically the file <TT>_fc<VV>_ch<XX>_pass<YY>_camera<ZZ> is associated with an error analysis of

  • Analysis type TT, where TT one of: general, by_nuc, by_cycle or by_position,
  • Flow cell number VV, and
  • Channel is XX.

If the --separate_pass_accounting flag is chosen YY is the pass number, otherwise YY is set to ALL.

If the --by_camera flag is chosen ZZ is the camera number, otherwise ZZ is set to ALL.

If the --by_reference_accounting flag is chosen then each file contains a set of tables for each reference. The corresponding set begins with the name of the reference. Example Report

The example below shows a mock report generated for reads of length 14 to 18. The sub-reports have been formatted in HTML to aid in viewing them. Those output from errorTool will be in text format (ASCII).

Errors in the whole read

Read    Number  Cumul.  Cumul.  Cumul.  Cumul.  Cumul.  Percent Percent Percent Percent
Length  Reads   Length  Inserts Missed  Substs  Error   Inserts Missed  Substs  Error
18      1102    20144   614     922     219     1755    3.05    4.58    1.09    8.71
17      3650    62550   655     1155    321     2131    1.05    1.85    0.51    3.41
16      5768    94516   630     2858    423     3911    0.67    3.02    0.45    4.14
15      6031    95110   670     5315    425     6410    0.70    5.59    0.45    6.74
14      7121    107192  813     8311    552     9676    0.76    7.75    0.51    9.03

Errors in non-homopolymer regions

Read    Number  Cumul.  Cumul.  Cumul.  Cumul.  Cumul.  Percent Percent Percent Percent
Length  Reads   Length  Inserts Missed  Substs  Error   Inserts Missed  Substs  Error
18      1102    16639   614     333     84      1031    3.69    2.00    0.50    6.20
17      3650    54594   655     737     105     1497    1.20    1.35    0.19    2.74
16      5768    89030   628     2395    247     3270    0.71    2.69    0.28    0.67
15      6031    91858   670     4882    307     5859    0.73    5.31    0.33    6.38
14      7121    104848  813     8086    473     9372    0.78    7.71    0.45    8.94

Errors in homopolymer regions

Read    Number  Cumul.  Cumul.  Cumul.  Cumul.  Cumul.  Percent Percent Percent Percent
Length  Reads   Length  Inserts Missed  Substs  Error   Inserts Missed  Substs  Error
18      986     3505    0       589     135     724     0.00    16.80   3.85    20.66
17      3363    7956    0       418     216     634     0.00    5.25    2.71    7.97
16      2276    5486    2       463     176     641     0.04    8.44    3.21    11.68
15      1287    3252    0       433     118     551     0.00    13.31   3.63    16.94
14      957     2344    0       225     79      304     0.00    9.60    3.37    12.97