10.5. Metadata in Output Text Files

10.5.1. Overview

The output of many of the tools in the Helicos Bioinformatics Suite includes a common header that details the execution environment of the tool. These fields are preceded by a pound (#) sign and can be easily processed by parsers. Header fields

  • #PROGRAM: This is the name of the executable.
  • #VERSION: This is the version of the executable that created this report.
  • #DATETIME: The time at which the tool was started.
  • #COMMAND: The entire command line executed to generate the report.
  • #PARAMETER: The individual parameter values that make up the command line string in COMMAND. This list may also include defaults.

10.5.2. Example output

Below is an example from the errorTool executable by_detailed_substitution report.

#PROGRAM=errorTool
#VERSION=2.0
#DATETIME=2008-06-18-T17:07:57
#COMMAND=errorTool --read_file  /helicos/analysis/reads.aligned.sms --reference_file
/helicos/analysis/references.qc_set.fasta --uniqueness_option 1 --percent_error 100 --config_file
/helicos/applications/helisphere-0.11.0/config/tools/HPDP/hpdp_GL_noHP_config --first_read_number 1
--last_read_number 1 --min_normalized_score 4 --analysis_type by_detailed_substitutions --sample_size 100000
--by_camera --flow_cells 1 --channels 1,2
#PARAMETER:analysis_type=by_detailed_substitutions
#PARAMETER:read_file=/helicos/analysis/reads.aligned.sms
#PARAMETER:reference_file=/helicos/analysis/reference_data/references.qc_set.fasta
#PARAMETER:flow_cells=1
#PARAMETER:channels=1,2
#PARAMETER:first_read_number=1
#PARAMETER:last_read_number=1
#PARAMETER:sample_size=100000
#PARAMETER:percent_error=1
#PARAMETER:config_file=/helicos/applications/helisphere-0.11.0/config/tools/HPDP/hpdp_GL_noHP_config
#PARAMETER:separate_pass_accounting=0
#PARAMETER:by_camera=1
#PARAMETER:by_reference_accounting=0
#PARAMETER:clip_edges=0
#PARAMETER:no_impossimers=0
#PARAMETER:uniqueness_option=1
#PARAMETER:min_normalized_score=4
FlowCell Channel Position Camera ReferenceContext ReadLength ReferenceValue ReadValue NumberOfBases NumberOfErrors
PercentErrorRate PlusMinus
1 1 ALL 0 ALL 20 C C 23 0 0 0
1 1 ALL 0 ALL 20 A C 29 0 0 0
1 1 ALL 0 ALL 20 G C 19 0 0 0
...

In addition to common header values, most reports include results broken out by the four main experimental fields:

FlowCell
The flow cell is the proprietary reaction vessel of the HeliScope single molecule sequencer. There are two per machine and are designated "1" and "2" in tool parameters and outputs
Channel
Each flow cell has 25 channels into which different samples may be placed. In tool parameters and outputs these are numbered 1-25.
Camera
Images can be taken from each channel by one of four cameras, designated "0", "1", "2", or "3".
Position
X, Y location of a strand within an image.