decoyFasta

This module takes a FASTA file and creates a decoy database by shuffling or reversing each sequence. The generated decoy database FASTA file can then be used for library searching with proteomics data.

Usage

usage: moPepGen decoyFasta [-h] -i <file> -o <file> [--decoy-string <value>]
                           [--decoy-string-position <value>]
                           [--method {reverse,shuffle}] [--enzyme <value>]
                           [--non-shuffle-pattern <value>]
                           [--shuffle-max-attempts <number>]
                           [--keep-peptide-nterm <choice>]
                           [--keep-peptide-cterm <choice>] [--seed <number>]
                           [--order <choice>] [--debug-level <value|number>]
                           [-q]

Generate decoy database FASTA file.

optional arguments:
  -h, --help            show this help message and exit
  -i <file>, --input-path <file>
                        Input FASTA file. Valid formats: ['.fa', '.fasta']
                        (default: None)
  -o <file>, --output-path <file>
                        File path to the output file. Valid formats: ['.fa',
                        '.fasta'] (default: None)
  --method {reverse,shuffle}
                        Method to be used to generate the decoy sequences from
                        target sequences. (default: reverse)
  --enzyme <value>      Enzymatic cleavage rule. Amino acids at cleavage sites
                        will be kept unmodified. Set it to None to turn off
                        this behavior. (default: None)
  --non-shuffle-pattern <value>
                        Residues to not shuffle and keep at the original
                        position. Separate by common (e.g. "K,R") (default: )
  --shuffle-max-attempts <number>
                        Maximal attempts to shuffle a sequence to avoid any
                        identical decoy sequence. (default: 30)
  --keep-peptide-nterm <choice>
                        Whether to keep the peptide N terminus constant.
                        (default: true)
  --keep-peptide-cterm <choice>
                        Whether to keep the peptide C terminus constant.
                        (default: true)
  --seed <number>       Random seed number. (default: None)
  --order <choice>      Order of target and decoy sequences to write in the
                        output FASTA. (default: juxtaposed)
  --debug-level <value|number>
                        Debug level. (default: INFO)
  -q, --quiet           Quiet (default: False)

Decoy Database Parameters:
  --decoy-string <value>
                        The decoy string that is combined with the FASTA
                        header for decoy sequences. (default: DECOY_)
  --decoy-string-position <value>
                        Should the decoy string be placed at the start or end
                        of FASTA headers? (default: prefix)

Arguments

-h, --help

show this help message and exit

-i, --input-path <file> Path

Input FASTA file. Valid formats: ['.fa', '.fasta']

-o, --output-path <file> Path

File path to the output file. Valid formats: ['.fa', '.fasta']

--decoy-string <value> str

The decoy string that is combined with the FASTA header for decoy sequences. str
Default: DECOY_

--decoy-string-position <value> str

Should the decoy string be placed at the start or end of FASTA headers? str
Default: prefix
Choices: ['prefix', 'suffix']

--method str

Method to be used to generate the decoy sequences from target sequences. str
Default: reverse
Choices: ['reverse', 'shuffle']

--enzyme <value> str

Enzymatic cleavage rule. Amino acids at cleavage sites will be kept unmodified. Set it to None to turn off this behavior.
Choices: [None, 'arg-c', 'asp-n', 'bnps-skatole', 'caspase 1', 'caspase 2', 'caspase 3', 'caspase 4', 'caspase 5', 'caspase 6', 'caspase 7', 'caspase 8', 'caspase 9', 'caspase 10', 'chymotrypsin high specificity', 'chymotrypsin low specificity', 'clostripain', 'cnbr', 'enterokinase', 'factor xa', 'formic acid', 'glutamyl endopeptidase', 'granzyme b', 'hydroxylamine', 'iodosobenzoic acid', 'lysc', 'lysn', 'ntcb', 'pepsin ph1.3', 'pepsin ph2.0', 'proline endopeptidase', 'proteinase k', 'staphylococcal peptidase i', 'thermolysin', 'thrombin', 'trypsin', 'trypsin_exception']

--non-shuffle-pattern <value> str

Residues to not shuffle and keep at the original position. Separate by common (e.g. "K,R") str
Default:

--shuffle-max-attempts <number> int

Maximal attempts to shuffle a sequence to avoid any identical decoy sequence. int
Default: 30

--keep-peptide-nterm <choice> str

Whether to keep the peptide N terminus constant. str
Default: true
Choices: ['true', 'false']

--keep-peptide-cterm <choice> str

Whether to keep the peptide C terminus constant. str
Default: true
Choices: ['true', 'false']

--seed <number> int

Random seed number.

--order <choice> str

Order of target and decoy sequences to write in the output FASTA. str
Default: juxtaposed
Choices: ['juxtaposed', 'target_first', 'decoy_first']

--debug-level <value|number> str

Debug level. str
Default: INFO

-q, --quiet

Quiet
Default: False