3.3. Analysis of Biological Sample Channels

The process for analyzing biological sample data varies with the protocol and the analysis pipeline. You should first group your channels into compatible sets. Channels are compatible if they will be analyzed with the same pipeline and parameter settings. The most common parameter likely to vary between analysis that use the same pipeline is the reference. For example, if you have DGE channels for human and rat, you will need to do separate DGE pipeline runs for them, because the reference file is a parameter. Similarly, if you wanted to explore different normscore or minlength filtering parameters on the same datasets, you would need to do separate runs, because each run uses a single set of parameters.

Your site should establish a policy for where analysis results will be stored. This might be under the run directory you created earlier, or somewhere else; perhaps the data files will be distributed to sample owners to process themselves. In that case samples will also be grouped by owner.

The typical analysis involves the following steps: . Download and convert: To analyze channels with real samples, first follow the instructions in Section 3.2, “Full Run Oligo Analysis” for the download and sms conversion steps. If you have already done an oligo analysis these steps will have been done already; there is no need to repeat them.

  1. Create an analysis directory: It is a good policy to create a subdirectory to group the results of each analysis run. You can move (mv), copy (cp) or symbolically link (ln -s) the sms files for those channels to an sms directory under your analysis directory; we recommend linking. For example, if you are analyzing a group of human DGE channels together, you might do:

              $ mkdir dge_human
              $ cd dge_human
              $ cp $HELICOS_ANALYSIS_HOME/sample/run.dge.conf .
              $ mkdir sms
              $ cd sms
              $ ln -s ../../sms/*fc1.ch22 .
              $ ln -s ../../sms/*fc1.ch23 .
              ...
              $ cd ..

    under the run directory. This will create a directory called dge_human, copy a configuration file template to it, create an sms subdirectory, and create symbolic links from that to the DGE channels in the run. (You should substitute the appropriate flowcell and channel numbers.) This will make it appear as if the SMS files are in run+/human_dge/sms+ without using up extra disk space, and preserving their original location in run+/sms+.

  2. Prepare References: ensure that you have all the reference and supporting files you need for your analysis. See Section 2.10, “Installing References”.
  3. Edit the config file template as appropriate for the pipeline you wish to run. Typically this involves specifying the channels to analyze and the reference to use, at a minimum. See the documentation for the pipeline of interest in Chapter 4, Pipelines for details.
  4. Run the pipeline using the pypeline command as described in the documentation for the pipeline of interest in Chapter 4, Pipelines.