1.2. Release Notes

1.2.1. Current Release

The 2010-R1.0.9 beta release of the HeliSphere software, released 02/09/10, includes the following features and changes:

  • SqlLite

    • The pipelines now generate reports via sqlite database files. YOU WILL NEED TO ENSURE THAT YOUR SYSTEM HAS pysqlite INSTALLED. See the instructions in the Prerequistes section of the installation documentation. Please download and run the small examples in order to comfirm that pysqlite is working properly.
  • Affine Gaps:

    • The indexDPgenomic aligner now supports affine gaps with distinct penalty terms for gap-open and gap-extend. This means that a set of adjacent indels can be assigned a different score than an equal number of dispersed indels, which increases the sensitivity for indel detection and allows alignment to longer indels (on the order of 3-4bp). This in turn allows those indels to be detected by snpSniffer and the Resequencing pipeline that calls it. Note that there is a computational cost to this sensitivity; the dynamic programming phase of indexingDPgenomic takes approximately twice as long when affine gaps are enabled, so the Resequencing pipeline will be slower. Other pipelines do not use affine gaps and so should be unaffected.
  • snpSniffer:

    • Longer Indels: now detects short indels up to 4bp long
    • Strand Confirmation: SNP Sniffer can now call SNPs on forward and reverse strands separately and produce an integrated report of agreement .
  • Resequencing pipeline now supports barcoded samples:

    • Takes sms files containing barcodes reads as input
    • Analyzes each set of barcoded reads separately
    • Consolidates the yield, strand length, error and snp count data for the barcoded reads in a database.
  • Pypeline architecture changes:

    • Extensibility: Developer documentation is now provided to allow HeliSphere users to extend existing pipelines or develop their own using the HeliSphere’s scons-based process management architecture.
    • Uniform Reporting: pipelines now generates reports via sqlite databases.

1.2.2. 2009-R1

The 2009-R1 release of the HeliSphere software, released Oct 2009, included the following features and changes:

  • Installation Changes:

    • Supported LINUX platforms now include RedHat 5 and Ubuntu 9.04 (64-bit variants only).
    • The installation process now uses Python’s scons instead of LINUX make when compiling from source. scons 1.2.0 or higher and Python 2.4-2.6 must be installed before building from source. See http://www.scons.org.
    • A simpler installation process is now supported for RedHat and Ubuntu systems using the native automated package management capabilities on those systems. This process downloads and installs binary images directly, avoiding the need to compile source code. Source code install is available for unsupported LINUX platforms.
    • samtoools must be installed for certain components of HeliSphere to work. See http://samtools.sourceforge.net .
    • Hardcoded pathnames have been eliminated, making it possible to install the system in any directory.
  • New Pypeline Architecture:

    • The HeliSphere analysis pipelines are based on a software framework written in the Python programming language, called pypeline for Python pipeline
    • All analysis pipelines are now invoked through a common command named pypeline.
    • pypeline uses simple configuration files in place of the XML used by the old pipeline script, analysis_controller.pl, which has been retired.
    • pypeline is based on scons. It constructs a dependency tree of the files needed for an analysis. If some of these files are already present, they are not rebuilt. If an analysis fails partway through and needs to be restarted, it should pick up where it left off.
    • pypeline supports efficient use of parallel computing resources on both SGE-enabled and non-SGE systems. (See http://gridengine.sunsource.net.)
  • New/revised analysis pypelines: The system includes the following analysis pipelines, invoked via the pypeline command:

    • The oligo pipeline is used to analyze control oligo channels to assess HeliScope performance.
    • The midrun pipeline is used to analyze control oligo channels midway through a run
    • The basic pipeline performs the common core steps of most analysis pipelines, including read filtering, alignment, and alignment filtering. (Formerly named mini_pipeline.pl.)
    • The resequencing pipeline identifies differences between observed sequence data from a single channel and a reference sequence.
    • The DGE pipeline performs digital gene expression analysis.
  • New tools: The following tools have been added:

    • download_srf provides a command-line alternative to the web interface for downloading SRF files from the HeliScope.
    • align2viz allows sequence-level visualization of aligned reads in a region of interest using a Web browser. Differences from the reference are highlighted.
    • catAlign allows multiple alignment files to be combined into a single file.
    • align2sam converts Helicos binary alignment files to SAM or BAM format.
    • fasta2fastq converts fasta files produced by sms2txt into fastq files with a constant error rate. Useful for submitting Helicos data to NCBI’s SRA.
  • Deprecated Tools: The following tools have been removed from the public distribution: analysis_controller.pl palmer bcgen simmer
  • Improvements to the SNP Sniffer Polymorphism Detector:

    • The SNP Sniffer is embedded in a comprehensive resequencing pipeline.
    • It detects homozygous or heterozygous indels up to length 2.
    • It detects SNPs in homopolymeric contexts.
    • It detects homozygous or heterozygous homopolymer length polymorphisms.
    • It no longer misclassifies homozygous substitutions in homopolymers as heterzygous changes.
  • Documentation Changes:

    • The documentation is now versioned in tandem with the software as web (.html) pages.
    • It is available on the Web at open.helicosbio.com and installed locally in $HELICOS_ANALYSIS_HOME/docs.
  • Executable examples: Working code examples are now available as part of the software release distribution files. These examples can be used for tutorials, to test the installation, and as performance benchmarks. See Section 2.8, “Installing the Small Examples” for more information.

    • Fast tests can verify HeliSphere functionality.
    • Full scale tests can be obtained from Helicos to assess system performance.
  • Reference Repository: Commonly used references are now available for download from our ftp site at ftp://ftp.helicosbio.com/pub/distribution.