Chapter 10. Miscellanous

Table of Contents

10.1. palmer
10.2. bcgen
10.3. fasta2fastq
10.4. File Formats and Extensions
10.5. Metadata in Output Text Files

10.1. palmer

10.1.1. Purpose

Use precomputed edit distance read variants to assign reads to references. This process is commonly referred to as barcode matching. Palmer works by precomputing the user specified k-error variants of a reference and storing them in a hash map of <variant ⇒ reference>. Then, for each input read, looks up the read sequence in this table and maps them to the appropriate reference if they fall within the k-error budget.

Palmer supports one-pass and two-pass fixed reference start oligo reads.

10.1.2. Usage

$ palmer --reference_file_pass1 ref.fasta --max_errors_pass1 1 --read_file reads.sms

10.1.3. Parameters

Generic options:
  --help                     produce help message
  --version                  produce version message
  --reference_file_pass1 arg first pass reference FASTA file name

Required options:
  --read_file arg         read file name
  --flow_cells arg (=1-2) flowcells to process
  --channels arg (=1-25)  channels to process
  --hits arg (=hits.csv)  filename for count output

Optional options:
  --reference_file_pass2 arg  second pass reference FASTA file name
  --max_errors_pass1 arg (=0) first pass error budget
  --max_errors_pass2 arg (=0) second pass error budget
  --spikes arg                filename for spike reads output
  --discards arg              filename for discarded reads output
  --discard_length arg (=6)   minimum discard read length
  --split_hits arg (=0)       split hits into distinct sms files after trimming
                              barcode prefix
  --spacer_char arg           nucleotide spacer found between a sample barcode 
                              and the sample read
  --allow_del arg (=1)        allow deletions in matches
  --allow_sub arg (=1)        allow substitutions in matches
  --allow_ins arg (=1)        allow insertions in matches
  --hits_pass1 arg            filename for count output from pass 1 of a 2 pass
                              read
  --hits_pass2 arg            filename for count output from pass 2 of a 2 pass
                              read
  --prefix arg                Prefix for output files

10.1.4. Comments

  • To process one-pass reference oligos, simply omit the ref2 and error2 options.
  • To process just a single pass of two-pass reference oligos, simply omit the ref# and error# options where # is the pass you want to ignore. When palmer is run with just one reference it will treat two-pass reads as one-pass reads. This allows you to perform counting on just one of the passes.
  • Spike reference oligos are specified in the FASTA file record description metadata using the <key=value> pair of spike=1.
  • The error# option can be overwritten in the FASTA file record description metadata using the <key=value> pair of edits=# for barcodes or spike_edits=# for spike reference oligos.
  • The exact flowcells and channels to process can be scecified using the flowcells and channels options respectively. Use commas and dashes to specify the range. For example, to process channels 1, 5, through 12, 17, and 20 through 25, the value of the channels could be 1,5-12,17,20-25
  • Samples with prefix barcodes placed in the same channel can be split into distinct SMS read files using the split_hits option. In this usage mode, the reference file will contain the expected barcodes (specified with reference_file_pass1), the max_errors_pass1 determines the maximum edit distance, and the expected nucleotide used to ligate the barcode oligo to the sample template is defined by the spacer_char argument. Palmer outputs reads into SMS files with names <Reference_Id>.sms where Reference_Id is the name of the reference barcode to which the reads map. The barcode prefix together with the specified spacer_char nucleotides will be removed from the reads placed in each SMS file.