Chapter 7. Alignment Tools

Table of Contents

7.1. preprocessDB
7.2. indexDPgenomic
7.3. filterAlign
7.4. sortAlign
7.5. align2sam
7.6. align2txt
7.7. align2viz
7.8. splitAlign
7.9. catAlign
7.10. templates
7.11. indexDP

7.1. preprocessDB

7.1.1. Purpose

preprocessDB creates a database of index files from a set of references in a FASTA fole. set. This database is used by indexDPgenomic.

7.1.2. Usage

$ preprocessDB --reference_file references.fasta ...

7.1.3. Parameters

Generic options:
  --help                Produce help message

Required options:
  --reference_file arg  Reference file name (fasta)
  --out_prefix arg      Prefix for output file name

Optional options:
  --seed_size arg (=18) Seed size
  --memory_usage arg    Memory usage in GB (default 4 GB)
  --max_db_dup arg      Mers with more than that duplication are ignored 
                        (default 65000)

7.1.4. Output

Database files

preprocessDB creates 4 output files:

OUTPUT_PREFIX_all_left_mers_DB
OUTPUT_PREFIX_all_right_mers_DB
OUTPUT_PREFIX_all_left_mers_DB_index
OUTPUT_PREFIX_all_right_mers_DB_index

7.1.5. Comments

  • preprocessDB can be I/O intensive on large references. It may take many hours to process a human-sized genome.
  • The more RAM available the faster preprocessDB will run. The --memory_usage should be used to control memory usage.
  • It is typically executed in the reference directory so that the index files will be available to all users.