HOMD Banner
16S rRNA RefSeq: V15.23    Genomic RefSeq: V10.1
Page Help::Blast Advanced Parameters
Back To: Main-Help-Menu
Blast Help Menu:
  BLAST Index
  Blast Formats
  Databases
  Parameters
  Blast Programs
  Advanced Parameters
  Linkout to NCBI Blast+
Full list of the BLAST Advanced options

See https://www.ncbi.nlm.nih.gov/books/NBK279684/ for a complete list from NCBI
optiontypedefault valuedescription and notes

db

string

none

BLAST database name.

query

string

stdin

Query file name.

query_loc

string

none

Location on the query sequence (Format: start-stop)

out

string

stdout

Output file name

evalue

real

10.0

Expect value (E) for saving hits

subject

string

none

File with subject sequence(s) to search.

subject_loc

string

none

Location on the subject sequence (Format: start-stop).

show_gis

flag

N/A

Show NCBI GIs in report.

num_descriptions

integer

500

Show one-line descriptions for this number of database sequences.

num_alignments

integer

250

Show alignments for this number of database sequences.

max_target_seqs

integer

500

Number of aligned sequences to keep. Use with report formats that do not have separate definition line and alignment sections such as tabular (all outfmt > 4). Not compatible with num_descriptions or num_alignments. Ties are broken by order of sequences in the database.

max_hsps

integer

none

Maximum number of HSPs (alignments) to keep for any single query-subject pair. The HSPs shown will be the best as judged by expect value. This number should be an integer that is one or greater. If this option is not set, BLAST shows all HSPs meeting the expect value criteria. Setting it to one will show only the best HSP for every query-subject pair

html

flag

N/A

Produce HTML output

gilist

string

none

Restrict search of database to GI’s listed in this file. Local searches only.

negative_gilist

string

none

Restrict search of database to everything except the GI’s listed in this file. Local searches only.

entrez_query

string

none

Restrict search with the given Entrez query. Remote searches only.

culling_limit

integer

none

Delete a hit that is enveloped by at least this many higher-scoring hits.

best_hit_overhang

real

none

Best Hit algorithm overhang value (recommended value: 0.1)

best_hit_score_edge

real

none

Best Hit algorithm score edge value (recommended value: 0.1)

dbsize

integer

none

Effective size of the database

searchsp

integer

none

Effective length of the search space

import_search_strategy

string

none

Search strategy file to read.

export_search_strategy

string

none

Record search strategy to this file.

parse_deflines

flag

N/A

Parse query and subject bar delimited sequence identifiers (e.g., gi|129295).

num_threads

integer

1

Number of threads (CPUs) to use in blast search.

remote

flag

N/A

Execute search on NCBI servers?

outfmt

string

0

alignment view options:

0 = pairwise,

1 = query-anchored showing identities,

2 = query-anchored no identities,

3 = flat query-anchored, show identities,

4 = flat query-anchored, no identities,

5 = XML Blast output,

6 = tabular,

7 = tabular with comment lines,

8 = Text ASN.1,

9 = Binary ASN.1

10 = Comma-separated values

11 = BLAST archive format (ASN.1)


12 = Seqalign (JSON),
13 = Multiple-file BLAST JSON,
14 = Multiple-file BLAST XML2,
15 = Single-file BLAST JSON,
16 = Single-file BLAST XML2,
17 = Sequence Alignment/Map (SAM),
18 = Organism Report

Options 6, 7, and 10 can be additionally configured to produce a custom format specified by space delimited format specifiers.

The supported format specifiers are:

qseqid means Query Seq-id

qgi means Query GI

qacc means Query accesion

sseqid means Subject Seq-id

sallseqid means All subject Seq-id(s), separated by a ';'

sgi means Subject GI

sallgi means All subject GIs

sacc means Subject accession

sallacc means All subject accessions

qstart means Start of alignment in query

qend means End of alignment in query

sstart means Start of alignment in subject

send means End of alignment in subject

qseq means Aligned part of query sequence

sseq means Aligned part of subject sequence

evalue means Expect value

bitscore means Bit score

score means Raw score

length means Alignment length

pident means Percentage of identical matches

nident means Number of identical matches

mismatch means Number of mismatches

positive means Number of positive-scoring matches

gapopen means Number of gap openings

gaps means Total number of gap

ppos means Percentage of positive-scoring matches

frames means Query and subject frames separated by a '/'

qframe means Query frame

sframe means Subject frame

btop means Blast traceback operations (BTOP)

staxids means unique Subject Taxonomy ID(s), separated by a ';'(in numerical order)

sscinames means unique Subject Scientific Name(s), separated by a ';'

scomnames means unique Subject Common Name(s), separated by a ';'

sblastnames means unique Subject Blast Name(s), separated by a ';' (in alphabetical order)

sskingdoms means unique Subject Super Kingdom(s), separated by a ';' (in alphabetical order)

stitle means Subject Title

salltitles means All Subject Title(s), separated by a '<>'

sstrand means Subject Strand

qcovs means Query Coverage Per Subject (for all HSPs)

qcovhsp means Query Coverage Per HSP

qcovus is a measure of Query Coverage that counts a position in a subject sequence for this measure only once. The second time the position is aligned to the query is not counted towards this measure.

When not provided, the default value is:

'qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore', which is equivalent to the keyword 'std'

mt_mode

integer

0

Values can be 0 or 1.

Value of 0 indicates that BLAST should multi-thread by having each thread work on a part of the database. This is appropriate for a small number of queries and a larger database (e.g., larger than Swissprot). Value 1 indicates that each thread will take a batch of queries and process them independently. This works well for large query sets (10k residues per thread for BLASTP and 2 million bases per thread for BLASTN) and smaller databases (e.g., Swissprot).

Table C2:

blastn application options. The blastn application searches a nucleotide query against nucleotide subject sequences or a nucleotide database. An option of type “flag” takes no arguments, but if present the argument is true. Four different tasks are supported: 1.) “megablast”, for very similar sequences (e.g, sequencing errors), 2.) “dc-megablast”, typically used for inter-species comparisons, 3.) “blastn”, the traditional program used for inter-species comparisons, 4.) “blastn-short”, optimized for sequences less than 30 nucleotides.

optiontask(s)typedefault valuedescription and notes

word_size

megablast

integer

28

Length of initial exact match.

word_size

dc-megablast

integer

11

Number of matching nucleotides in initial match. dc-megablast allows non-consecutive letters to match.

word_size

blastn

integer

11

Length of initial exact match.

word_size

blastn-short

integer

7

Length of initial exact match.

gapopen

megablast

integer

0

Cost to open a gap. See appendix “BLASTN reward/penalty values”.

gapextend

megablast

integer

none

Cost to extend a gap. This default is a function of reward/penalty value. See appendix “BLASTN reward/penalty values”.

gapopen

blastn, blastn-short, dc-megablast

integer

5

Cost to open a gap. See appendix “BLASTN reward/penalty values”.

gapextend

blastn, blastn-short, dc-megablast

integer

2

Cost to extend a gap. See appendix “BLASTN reward/penalty values”.

reward

megablast

integer

1

Reward for a nucleotide match.

penalty

megablast

integer

-2

Penalty for a nucleotide mismatch.

reward

blastn, dc-megablast

integer

2

Reward for a nucleotide match.

penalty

blastn, dc-megablast

integer

-3

Penalty for a nucleotide mismatch.

reward

blastn-short

integer

1

Reward for a nucleotide match.

penalty

blastn-short

integer

-3

Penalty for a nucleotide mismatch.

strand

all

string

both

Query strand(s) to search against database/subject. Choice of both, minus, or plus.

dust

all

string

20 64 1

Filter query sequence with dust.

filtering_db

all

string

none

Mask query using the sequences in this database.

window_masker_taxid

all

integer

none

Enable WindowMasker filtering using a Taxonomic ID.

window_masker_db

all

string

none

Enable WindowMasker filtering using this file.

soft_masking

all

boolean

true

Apply filtering locations as soft masks (i.e., only for finding initial matches).

lcase_masking

all

flag

N/A

Use lower case filtering in query and subject sequence(s).

db_soft_mask

all

integer

none

Filtering algorithm ID to apply to the BLAST database as soft mask (i.e., only for finding initial matches).

db_hard_mask

all

integer

none

Filtering algorithm ID to apply to the BLAST database as hard mask (i.e., sequence is masked for all phases of search).

perc_identity

all

integer

0

Percent identity cutoff.

template_type

dc-megablast

string

coding

Discontiguous MegaBLAST template type. Allowed values are coding, optimal and coding_and_optimal.

template_length

dc-megablast

integer

18

Discontiguous MegaBLAST template length.

use_index

megablast

boolean

false

Use MegaBLAST database index. Indices may be created with the makembindex application.

index_name

megablast

string

none

MegaBLAST database index name.

xdrop_ungap

all

real

20

Heuristic value (in bits) for ungapped extensions.

xdrop_gap

all

real

30

Heuristic value (in bits) for preliminary gapped extensions.

xdrop_gap_final

all

real

100

Heuristic value (in bits) for final gapped alignment.

no_greedy

megablast

flag

N/A

Use non-greedy dynamic programming extension.

min_raw_gapped_score

all

integer

none

Minimum raw gapped score to keep an alignment in the preliminary gapped and trace-back stages. Normally set based upon expect value.

ungapped

all

flag

N/A

Perform ungapped alignment.

window_size

dc-megablast

integer

40

Multiple hits window size, use 0 to specify 1-hit algorithm

Table C3:

blastp application options. The blastp application searches a protein sequence against protein subject sequences or a protein database. An option of type “flag” takes no arguments, but if present the argument is true. Three different tasks are supported: 1.) “blastp”, for standard protein-protein comparisons, 2.) “blastp-short”, optimized for query sequences shorter than 30 residues, and 3.) “blastp-fast”, a faster version that uses a larger word-size per https://www.ncbi.nlm.nih.gov/pubmed/17921491. This table reflects the 2.2.27 BLAST+ release.

optiontasktypedefault valuedescription and notes

word_size

blastp

integer

3

Word size of initial match. Valid word sizes are 2-7.

word_size

blastp-short

integer

2

Word size of initial match.

word size

blastp-fast

Integer

6

Word size of initial match

gapopen

blastp

integer

11

Cost to open a gap.

gapextend

blastp

integer

1

Cost to extend a gap.

gapopen

blastp-short

integer

9

Cost to open a gap.

gapextend

blastp-short

integer

1

Cost to extend a gap.

matrix

blastp

string

BLOSUM62

Scoring matrix name.

matrix

blastp-short

string

PAM30

Scoring matrix name.

threshold

blastp

integer

11

Minimum score to add a word to the BLAST lookup table.

threshold

blastp-short

integer

16

Minimum score to add a word to the BLAST lookup table.

Threshold

Blastp-fast

Integer

21

Minimum score to add a word to the BLAST lookup table.

comp_based_stats

Blastp and blastp-fast

string

2

Use composition-based statistics:

D or d: default (equivalent to 2)

0 or F or f: no composition-based statistics

1: Composition-based statistics as in NAR 29:2994-3005, 2001

2 or T or t : Composition-based score adjustment as in Bioinformatics

21:902-911, 2005, conditioned on sequence properties

3: Composition-based score adjustment as in Bioinformatics 21:902-911, 2005, unconditionally

comp_based_stats

blastp-short

string

0

Use composition-based statistics :

D or d: default (equivalent to 2)

0 or F or f: no composition-based statistics

1: Composition-based statistics as in NAR 29:2994-3005, 2001

2 or T or t : Composition-based score adjustment as in Bioinformatics

21:902-911, 2005, conditioned on sequence properties

3: Composition-based score adjustment as in Bioinformatics 21:902-911, 2005, unconditionally

seg

all

string

no

Filter query sequence with SEG (Format: 'yes', 'window locut hicut', or 'no' to disable).

soft_masking

blastp

boolean

false

Apply filtering locations as soft masks (i.e., only for finding initial matches).

lcase_masking

all

flag

N/A

Use lower case filtering in query and subject sequence(s).

db_soft_mask

all

integer

none

Filtering algorithm ID to apply to the BLAST database as soft mask (i.e., only for finding initial matches).

db_hard_mask

all

integer

none

Filtering algorithm ID to apply to the BLAST database as hard mask (i.e., sequence is masked for all phases of search).

xdrop_gap_final

all

real

25

Heuristic value (in bits) for final gapped alignment/

window_size

Blastp and blastp-fast

integer

40

Multiple hits window size, use 0 to specify 1-hit algorithm.

window_size

blastp-short

integer

15

Multiple hits window size, use 0 to specify 1-hit algorithm.

use_sw_tback

all

flag

N/A

Compute locally optimal Smith-Waterman alignments?

Table C4:

blastx application options. The blastx application translates a nucleotide query and searches it against protein subject sequences or a protein database. Two different tasks are supported: 1.) “blastx” for standard translated nucleotide-protein comparison and 2.) “blastx-fast”, a faster version that uses a larger word-size based on https://www.ncbi.nlm.nih.gov/pubmed/17921491.

optiontasktypedefault valuedescription and notes

word_size

Blastx

integer

3

Word size for initial match. Valid word sizes are 2-7.

word_size

Blastx-fast

Integer

6

Word size for initial match.

gapopen

All

integer

11

Cost to open a gap.

gapextend

All

integer

1

Cost to extend a gap.

matrix

All

string

BLOSUM62

Scoring matrix name.

threshold

Blastx

integer

12

Minimum score to add a word to the BLAST lookup table.

threshold

Blastx-fast

Integer

21

Minimum score to add a word to the BLAST lookup table.

seg

All

string

12 2.2 2.5

Filter query sequence with SEG (Format: 'yes', 'window locut hicut', or 'no' to disable).

soft_masking

all

boolean

false

Apply filtering locations as soft masks (i.e., only for finding initial matches).

lcase_masking

all

flag

N/A

Use lower case filtering in query and subject sequence(s).

db_soft_mask

all

integer

none

Filtering algorithm ID to apply to the BLAST database as soft mask (i.e., only for finding initial matches).

db_hard_mask

all

integer

none

Filtering algorithm ID to apply to the BLAST database as hard mask (i.e., sequence is masked for all phases of search).

xdrop_gap_final

all

real

25

Heuristic value (in bits) for final gapped alignment.

window_size

all

integer

40

Multiple hits window size, use 0 to specify 1-hit algorithm.

strand

all

string

both

Query strand(s) to search against database/subject. Choice of both, minus, or plus.

query_genetic_code

all

integer

1

Genetic code to translate query, see ftp://ftp.ncbi.nih.gov/entrez/misc/data/gc.prt

max_intron_length

all

integer

0

Length of the largest intron allowed in a translated nucleotide sequence when linking multiple distinct alignments (a negative value disables linking).

comp_based_stats

all

integer

2

Use composition-based statistics for blastx:

D or d: default (equivalent to 2)

0 or F or f: no composition-based statistics

1: Composition-based statistics as in NAR 29:2994-3005, 2001

2 or T or t : Composition-based score adjustment as in Bioinformatics

21:902-911, 2005, conditioned on sequence properties

3: Composition-based score adjustment as in Bioinformatics 21:902-911, 2005, unconditionally

Default = `2'

Table C5:

tblastn application options. The tblastn application searches a protein query against nucleotide subject sequences or a nucleotide database translated at search time. Two different tasks are supported: 1.) “tblastn” for a standard protein-translated nucleotide comparison and 2.) “tblastn-fast” for a faster version with a larger word-size based on https://www.ncbi.nlm.nih.gov/pubmed/17921491.

optiontasktypedefault valuedescription and notes

word_size

tblastn

integer

3

Word size for initial match. Valid word sizes are 2-7.

Word size

tblastn-fast

Integer

6

Word size for initial match.

gapopen

All

integer

11

Cost to open a gap.

gapextend

All

integer

1

Cost to extend a gap.

matrix

All

string

BLOSUM62

Scoring matrix name.

threshold

tblastn

integer

13

Minimum score to add a word to the BLAST lookup table.

threshold

tblastn-fast

Integer

21

Minimum score to add a word to the BLAST lookup table.

seg

All

string

12 2.2 2.5

Filter query sequence with SEG (Format: 'yes', 'window locut hicut', or 'no' to disable).

soft_masking

All

boolean

false

Apply filtering locations as soft masks (i.e., only for finding initial matches).

lcase_masking

All

flag

N/A

Use lower case filtering in query and subject sequence(s).

db_soft_mask

All

integer

none

Filtering algorithm ID to apply to the BLAST database as soft mask (i.e., only for finding initial matches).

db_hard_mask

All

integer

none

Filtering algorithm ID to apply to the BLAST database as hard mask (i.e., sequence is masked for all phases of search).

xdrop_gap_final

All

real

25

Heuristic value (in bits) for final gapped alignment.

window_size

All

integer

40

Multiple hits window size, use 0 to specify 1-hit algorithm.

db_gen_code

All

integer

1

Genetic code to translate subject sequences, see ftp://ftp.ncbi.nih.gov/entrez/misc/data/gc.prt

max_intron_length

All

integer

0

Length of the largest intron allowed in a translated nucleotide sequence when linking multiple distinct alignments (a negative value disables linking).

comp_based_stats

all

string

2

Use composition-based statistics for tblastn:

D or d: default (equivalent to 2)

0 or F or f: no composition-based statistics

1: Composition-based statistics as in NAR 29:2994-3005, 2001

2 or T or t : Composition-based score adjustment as in Bioinformatics

21:902-911, 2005, conditioned on sequence properties

3: Composition-based score adjustment as in Bioinformatics 21:902-911, 2005, unconditionally

Default = `2'

Table C6:

tblastx application options. The tblastx application searches a translated nucleotide query against translated nucleotide subject sequences or a translated nucleotide database. An option of type “flag” takes no arguments, but if present the argument is true. This table reflects the 2.2.27 BLAST+ release. Only ungapped searches are supported for tblastx.

optiontypedefault valuedescription and notes

word_size

integer

3

Word size for initial match.

matrix

string

BLOSUM62

Scoring matrix name.

threshold

integer

13

Minimum word score to add the word to the BLAST lookup table.

seg

string

12 2.2 2.5

Filter query sequence with SEG (Format: 'yes', 'window locut hicut', or 'no' to disable).

soft_masking

boolean

false

Apply filtering locations as soft masks (i.e., only for finding initial matches).

lcase_masking

flag

N/A

Use lower case filtering in query and subject sequence(s).

db_soft_mask

integer

none

Filtering algorithm ID to apply to the BLAST database as soft mask (i.e., only for finding initial matches).

db_hard_mask

integer

none

Filtering algorithm ID to apply to the BLAST database as hard mask (i.e., sequence is masked for all phases of search).

strand

string

both

Query strand(s) to search against database subject sequences. Choice of both, minus, or plus.

query_genetic_code

integer

1

Genetic code to translate query, see ftp://ftp.ncbi.nih.gov/entrez/misc/data/gc.prt

db_gen_code

integer

1

Genetic code to translate subject sequences, see ftp://ftp.ncbi.nih.gov/entrez/misc/data/gc.prt

max_intron_length

integer

0

Length of the largest intron allowed in a translated nucleotide sequence when linking multiple distinct alignments (a negative value disables linking)

Table C7:

rpsblast application options. The rpsblast application searches a protein query against the conserved domain database (CDD), which is a set of protein profiles. Many of the common options such as matrix or word threshold are set when the CDD is built and cannot be changed by the rpsblast application. A search ready CDD can be downloaded from ftp://ftp.ncbi.nih.gov/pub/mmdb/cdd/

OptionTypeDefault valueDescription and notes
window_sizeinteger40Multiple hits window size, use 0 to specify 1-hit algorithm.
xdrop_ungapreal15Heuristic value (in bits) for ungapped extensions
xdrop_gapreal25Heuristic value (in bits) for preliminary gapped extensions.
xdrop_gap_finalreal40Heuristic value (in bits) for final gapped alignment.
segstring12 2.2 2.5Filter query sequence with SEG (Format: 'yes', 'window locut hicut', or 'no' to disable).
soft_maskingbooleanfalseApply filtering locations as soft masks (i.e., only for finding initial matches).
mt_modeinteger0Set to 1 if a large number of queries are to be searched and you wish to use multiple threads, as specified by the num_threads argument.
comp_based_statsinteger2Use composition-based statistics for rpsblast:
D or d: default (equivalent to 2)
0 or F or f: no composition-based statistics
1: Composition-based statistics as in NAR 29:2994-3005, 2001
2 or T or t : Composition-based score adjustment as in Bioinformatics
21:902-911, 2005, conditioned on sequence properties
3: Composition-based score adjustment as in Bioinformatics 21:902-911, 2005, unconditionally
Default = `2'

Table C8:

Makeblastdb application options. This application builds a BLAST database. An option of type “flag” takes no arguments, but if present the argument is true. Starting with the 2.10.0 release, makeblastdb produces version 5 databases by default, which uses LMDB. LMDB requires virtual memory (at least 600 GB, but 800 GB is recommended) to build an index. If makeblastdb cannot access enough virtual memory, it will produce a message containing the string “mdb_env_open”. Virtual memory is just that (virtual) and doesn’t depend on the hardware in your system. In general, we recommend that BLAST users simply set the virtual memory to unlimited. The other alternative is to use an environment variable (BLASTDB_LMDB_MAP_SIZE) to set the required virtual memory lower, but this runs the risk of LMDB not being able to complete indexing the database. For a smaller database (tens of millions of letters) it may be possible to use a value of 100 million.

optiontypedefault valueDescription and notes

in

string

stdin

Input file/database name

input_type

string

fasta

Input file type, it may be any of the following:

fasta: for FASTA file(s)

blastdb: for BLAST database(s)

asn1_txt: for Seq-entries in text ASN.1 format

asn1_bin: for Seq-entries in binary ASN.1 format

dbtype

string

prot

Molecule type of input, values can be nucl or prot.

title

string

none

Title for BLAST database. If not set, the input file name will be used.

parse_seqids

flag

N/A

Parse bar delimited sequence identifiers (e.g., gi|129295) in FASTA input.

hash_index

flag

N/A

Create index of sequence hash values.

mask_data

string

none

Comma-separated list of input files containing masking data as produced by NCBI masking applications (e.g. dustmasker, segmasker, windowmasker).

out

string

input file name

Name of BLAST database to be created. Input file name is used if none provided. This field is required if input consists of multiple files.

max_file_size

string

1GB

Maximum file size to use for BLAST database. 4GB is the maximum supported by the database structure.

blastdb_version

integer

5

Version 5 (taxonomy aware) is the default starting with the 2.10.0 release. Value must be 4 or 5.

taxid

integer

none

Taxonomy ID to assign to all sequences.

taxid_map

string

none

File with two columns mapping sequence ID to the taxonomy ID. The first column is the sequence ID represented as one of:

1.

fasta with accessions (e.g., emb|X17276.1|)

2.

fasta with GI (e.g., gi|4)

3.

GI as a bare number (e.g., 4)

4.

A local ID. The local ID must be prefixed with "lcl" (e.g., lcl|4).

The second column should be the NCBI taxonomy ID (e.g., 9606 for human).

logfile

string

none

Program log file (default is stderr).

Table C9:

Makeprofiledb application options. This application builds an RPS-BLAST database. An option of type “flag” takes no arguments, but if present the argument is true. COBALT (a multiple sequence alignment program) and DELTA-BLAST both use RPS-BLAST searches as part of their processing but use specialized versions of the database. This application can build databases for COBALT, DELTA-BLAST, and a standard RPS-BLAST search. The “dbtype” option (see entry in table) determines which flavor of the database is built.

optiontypedefault valueDescription and notes

in

string

stdin

Input file that contains a list of scoremat files (delimited by space, tab, or newline)

binary

flag

N/A

The scoremat files are binary ASN.1

title

string

none

Title for RPS-BLAST database. If not set, the input file name will be used.

threshold

real

9.82

Threshold for RPSBLAST lookup table.

out

string

input file name

Name of BLAST database to be created. Input file name is used if none provided.

max_file_size

string

1GB

Maximum file size to use for BLAST database.

dbtype

string

rps

Specifies use for RPSBLAST db. One of rps, cobalt, or delta.

index

flag

N/A

Creates index files.

gapopen

integer

none

Cost to open a gap. Used only if scoremat files do not contain PSSM scores, otherwise ignored.

gapextend

integer

none

Cost to extend a gap by one residue. Used only if scoremat files do not contain PSSM scores, otherwise ignored.

scale

real

100

PSSM scale factor.

matrix

string

BLOSUM62

Matrix to use in constructing PSSM. One of BLOSUM45, BLOSUM50, BLOSUM62, BLOSUM80, BLOSUM90, PAM250, PAM30 or PAM70. Used only if scoremat files do not contain PSSM scores, otherwise ignored.

obsr_threshold

real

6

Exclude domains with maximum number of independent observations below this value (for use in DELTA-BLAST searches).

exclude_invalid

real

true

Exclude domains that do not pass validation test (for use in DELTA-BLAST searches).

max_smp_vol

integer

2500

Maximum number of SMP files per DB volume. Increasing this number will decrease the number of BLAST database volumes produced.

taxid

integer

none

Taxonomy ID to assign to all sequences.

taxid_map

string

none

File with two columns mapping sequence ID to the taxonomy ID. The first column is the sequence ID represented as one of:

1.

fasta with accessions (e.g., emb|X17276.1|)

2.

fasta with GI (e.g., gi|4)

3.

GI as a bare number (e.g., 4)

4.

A local ID. The local ID must be prefixed with "lcl" (e.g., lcl|4).

The second column should be the NCBI taxonomy ID (e.g., 9606 for human).

logfile

string

none

Program log file (default is stderr).

Table C10:

Blastdbcmd application options. This application reads a BLAST database and produces reports.

optiontypedefault valuedescription and notes

db

string

nr

BLAST database name.

dbtype

string

guess

Molecule type stored in BLAST database, one of nucl, prot, or guess.

entry

string

none

Comma-delimited search string(s) of sequence identifiers: e.g.: 555, AC147927, 'gnl|dbname|tag', or 'all' to select all sequences in the database

entry_batch

string

none

Input file for batch processing. The format requires one entry per line; each line should begin with the sequence ID followed by any of the following optional specifiers (in any order): range (format: ‘from-to’, inclusive in 1-offsets), strand (‘plus’ or ‘minus’), or masking algorithm ID (integer value representing the available masking algorithm). Omitting the ending range (e.g.: ‘10-‘) is supported, but there should not be any spaces around the ‘-‘.

pig

integer

none

PIG (protein identity group) to retrieve.

info

flag

N/A

Print BLAST database information.

range

string

none

Range of sequence to extract (Format: start-stop).

strand

string

plus

Strand of nucleotide sequence to extract. Choice of plus or minus.

mask_sequence_with

string

none

Produce lower-case masked FASTA using the algorithm IDs specified.

out

string

stdout

Output file name.

outfmt

string

%f

Output format, where the available format specifiers are:

%f means sequence in FASTA format

%s means sequence data (without defline)

%a means accession

%g means gi

%o means ordinal id (OID)

%t means sequence title

%l means sequence length

%T means taxid

%L means common taxonomic name

%S means scientific name

%P means PIG

%mX means sequence masking data, where X is an optional comma-separated list of integers to specify the algorithm ID(s) to display (or all masks if absent or invalid specification). Masking data will be displayed as a series of 'N-M' values separated by ';' or the word 'none' if none are available. For every format except '%f', each line of output will correspond to a sequence.

target_only

flag

N/A

Definition line should contain target GI only.

get_dups

flag

N/A

Retrieve duplicate accessions.

line_length

integer

80

Line length for output.

ctrl_a

flag

N/A

Use Ctrl-A as the non-redundant definition line separator.

Table C11:

Makembindex application options. The indexed databases created by makembindex are used by production MegaBLAST software and by a new srsearch utility designed to quickly search for nearly exact matches (up to one mismatch) of short queries against a genomic database. When a FASTA formatted file is used as the input, then masking by lower case letters is incorporated in the index. Makembindex can currently build two types of indices, called “old style” and “new style” indexing. The NCBI offers full support for the new style and has deprecated the old style. A MegaBLAST search with a new style index requires that both the index and the corresponding BLAST database be present. The index structure is described in PMID:18567917. Please cite this paper in any publication that uses makembindex.

optiontypedefault valueDescription and notes

input

string

stdin

Input file name or BLAST database name, depending on the value of the iformat parameter. For FASTA formatted input, this parameter is optional and defaults to the program's standard input stream.

output

string

none

The resulting index name. The index itself can consist of multiple files, called volumes, called <index_name>.00.idx, <index_name>.01.idx,...

This option should not be used with new style indices.

iformat

string

fasta

The input format selector. Possible values are 'fasta' and 'blastdb'.

old_style_index

boolean

false

The old_style_index is no longer supported. If set to 'false' the new style index is created. New style indices require a BLAST database as input (use -iformat blastdb), which can be downloaded from the NCBI FTP site or created with makeblastdb. The option -output is ignored for a new style index. New style indices are always created at the same location as the corresponding BLAST database.

db_mask

integer

None

Exclude masked regions of BLAST db from the index. Use makeblastdb to discover the algorithm ID to be used as input for this argument.

legacy

boolean

true

This is a compatibility feature to support current production MegaBLAST. If true, then -stride, -nmer, and -ws_hint are ignored. The legacy format must be used for BLAST.

nmer

integer

12

N-mer size to use. Ignored if –legacy is specified

ws_hint

integer

28

This is an optimization hint for makembindex that indicates an expected minimum match size in searches that use the index. If n is the value of -nmer parameter and s is the value of –stride parameter, then the value of -ws_hint must be at least n + s - 1.

stride

integer

5

makembindex will index every stride-th N-mer of the database.

volsize

integer

1536

Target index volume size in megabytes.