indexDP version INDEX_DP_v_1.3.082208
Generic options:
--help Produce help message
Required options:
--reads_file arg Read file name (fasta)
--reference_file arg Reference file name (fasta)
--seed_size arg Seed size
--num_errors arg Number of errors in seed
--weight arg Weight of template
--out_prefix arg Prefix for output file name
--config_file arg Prefix for output file name
--template_repository arg Repository for template files
Optional options:
--read_file_type arg Type of read file (fasta or sms, default is
fasta)
--flow_cell arg Flow cell number (required with sms read file)
--channel arg Channel number (reqired with sms read file)
--pass arg Pass to be aligned 1, 2 (required with sms read
file)
--num_blocks arg Number of blocks of reads a channel is partitio
ned into (required with sms reads file)
--block_index arg Index of block of reads to be aligned (required
with sms read file)
--terse_only Produce terse only output
--binary_output Produce binary output only (default is the
standard text output
--best_only Print best only match
--multithread Run with multiple threads
--max_hit_duplication arg Maximum number of times a seed can align before
it is filtered (default 25)
--percent_error arg Percent error in read threshold (default 30%)
--read_step arg Step between kmers is read (default 1)
--min_norm_score arg Min normalized score of alignments to be output
(default 0)
--aligned_files_threshold arg Normalized score threshold for specifying read
as aligned (default 4.0)
--strands arg Strand option for reference: forward/both
(forward)
Unlike search and alignment tools with persistent indexes (e.g. BLAST), indexDP uses RAM to store the read set, reference set and the corresponding indexes which depend on the read length and reference length distribution. As a result, indexDP can be memory intensive for real world tasks.
In tests within Helicos, alignment of 10 million reads (~400Mb file) against RefSeq transcripts using the 20:16:2 template family consumes approximately 6Gb of RAM. Systems using 8Gb per core are recommended. If read sets and reference sets are much larger, jobs can broken up into smaller read sets and run in parallel either manually or using DRM software like Sun Grid Engine.
Due to caching and other issues, the memory scaling is not necessarily linear. For even small data sets, approximately 4Gb of RAM should be available.