callAltTranslation
callAltTranslation
calls peptide sequences from coding transcripts that
harbor any alternative translation event.
Reference Version
The version of reference genome and proteome FASTA and annotation GTF MUST be consistent across all analysis.
Usage
usage: moPepGen callAltTranslation [-h] -o <file> [--w2f-reassignment]
[--selenocysteine-termination] [-g <file>]
[-a <file>]
[--reference-source {GENCODE,ENSEMBL}]
[-p <file>]
[--invalid-protein-as-noncoding]
[--index-dir [<file>]] [-c <value>]
[--cleavage-exception <value>]
[-m <number>] [-w <number>] [-l <number>]
[-x <number>]
[--debug-level <value|number>] [-q]
optional arguments:
-h, --help show this help message and exit
-o <file>, --output-path <file>
Output path to the alternative translation peptide
FASTA. Valid formats: ['.fa', '.fasta'] (default:
None)
--w2f-reassignment Include peptides with W > F (Tryptophan to
Phenylalanine) reassignment. (default: False)
--selenocysteine-termination
Include peptides of selenoprotiens that the UGA is
treated as termination instead of Sec. (default:
False)
--debug-level <value|number>
Debug level. (default: INFO)
-q, --quiet Quiet (default: False)
Reference Files:
-g <file>, --genome-fasta <file>
Path to the genome assembly FASTA file. Only ENSEMBL
and GENCODE are supported. Its version must be the
same as the annotation GTF and proteome FASTA
(default: None)
-a <file>, --annotation-gtf <file>
Path to the annotation GTF file. Only ENSEMBL and
GENCODE are supported. Its version must be the same as
the genome and proteome FASTA. (default: None)
--reference-source {GENCODE,ENSEMBL}
Source of reference genome and annotation. (default:
None)
-p <file>, --proteome-fasta <file>
Path to the translated protein sequence FASTA file.
Only ENSEMBL and GENCODE are supported. Its version
must be the same as genome FASTA and annotation GTF.
(default: None)
--invalid-protein-as-noncoding
Treat any transcript that the protein sequence is
invalid ( contains the * symbol) as noncoding.
(default: False)
--index-dir [<file>] Path to the directory of index files generated by
moPepGen generateIndex. If given, --genome-fasta,
--proteome-fasta and --anntotation-gtf will be
ignored. (default: None)
Cleavage Parameters:
-c <value>, --cleavage-rule <value>
Enzymatic cleavage rule. (default: trypsin)
--cleavage-exception <value>
Enzymatic cleavage exception. (default: auto)
-m <number>, --miscleavage <number>
Number of cleavages to allow per non-canonical
peptide. (default: 2)
-w <number>, --min-mw <number>
The minimal molecular weight of the non-canonical
peptides. (default: 500.0)
-l <number>, --min-length <number>
The minimal length of non-canonical peptides,
inclusive. (default: 7)
-x <number>, --max-length <number>
The maximum length of non-canonical peptides,
inclusive. (default: 25)
Arguments
-h, --help
show this help message and exit
-o, --output-path <file> Path
Output path to the alternative translation peptide FASTA. Valid formats: ['.fa', '.fasta']
--w2f-reassignment
Include peptides with W > F (Tryptophan to Phenylalanine) reassignment.
Default: False
--selenocysteine-termination
Include peptides of selenoprotiens that the UGA is treated as termination instead of Sec.
Default: False
-g, --genome-fasta <file> Path
Path to the genome assembly FASTA file. Only ENSEMBL and GENCODE are supported. Its version must be the same as the annotation GTF and proteome FASTA
-a, --annotation-gtf <file> Path
Path to the annotation GTF file. Only ENSEMBL and GENCODE are supported. Its version must be the same as the genome and proteome FASTA.
--reference-source str
Source of reference genome and annotation.
Choices: ['GENCODE', 'ENSEMBL']
-p, --proteome-fasta <file> Path
Path to the translated protein sequence FASTA file. Only ENSEMBL and GENCODE are supported. Its version must be the same as genome FASTA and annotation GTF.
--invalid-protein-as-noncoding
Treat any transcript that the protein sequence is invalid ( contains the * symbol) as noncoding.
Default: False
--index-dir <file> Path
Path to the directory of index files generated by moPepGen generateIndex. If given, --genome-fasta, --proteome-fasta and --anntotation-gtf will be ignored.
-c, --cleavage-rule <value> str
Enzymatic cleavage rule.
str
Default: trypsin
Choices: ['arg-c', 'asp-n', 'bnps-skatole', 'caspase 1', 'caspase 2', 'caspase 3', 'caspase 4', 'caspase 5', 'caspase 6', 'caspase 7', 'caspase 8', 'caspase 9', 'caspase 10', 'chymotrypsin high specificity', 'chymotrypsin low specificity', 'clostripain', 'cnbr', 'enterokinase', 'factor xa', 'formic acid', 'glutamyl endopeptidase', 'granzyme b', 'hydroxylamine', 'iodosobenzoic acid', 'lysc', 'lysn', 'ntcb', 'pepsin ph1.3', 'pepsin ph2.0', 'proline endopeptidase', 'proteinase k', 'staphylococcal peptidase i', 'thermolysin', 'thrombin', 'trypsin', 'trypsin_exception']
--cleavage-exception <value> str
Enzymatic cleavage exception.
str
Default: auto
-m, --miscleavage <number> int
Number of cleavages to allow per non-canonical peptide.
int
Default: 2
-w, --min-mw <number> float
The minimal molecular weight of the non-canonical peptides.
float
Default: 500.0
-l, --min-length <number> int
The minimal length of non-canonical peptides, inclusive.
int
Default: 7
-x, --max-length <number> int
The maximum length of non-canonical peptides, inclusive.
int
Default: 25
--debug-level <value|number> str
Debug level.
str
Default: INFO
-q, --quiet
Quiet
Default: False
Alternative translation
Alternative translation is when a different peptide is generated from the same transcript without any changes in the nucleotide sequence of the transcript.
Selenocysteine Termination
In eukaryotes, the UGA on some mRNAs can be decoded into selenocysteine instead of being recognized as a stop codon, and these proteins are called selenoproteins. However, the decoding of UGA is regulated by complex signals including mRNA and sec-tRNA abundance, which could result in two proteoforms: one with UGA read through and one with termination at the stop codon. Selenocysteine termination is used to represent the later situation. Selenocysteine terminations are not written into any GVF files but are represented in the format of SECT-<pos>
where pos
is the position of the selenocysteine UGA being recognized as a stop codon in the gene.
Tryptophan > Phenylalanine Codon Reassignment
Tryptophan > Phenylalanine substitutants, described in [Patasker, et al., happen when cellular tryptophan is depleted and phenylalanine is reassigned to tryptophan codons to continue protein synthesis. The process largely exists in tumor cells. Similar to selenocysteine termination, W > F substitutants are not written in GVFs, but are represented in the format of W2F-<pos>
. Noted that the pos
is a peptide coordinate (i.e., zeroed at the beginning of the peptide).