parseREDItools
parseREDItools
takes RNA editing results called by
REDItools and saves them as a GVF
file. The GVF file can then be used to call variant peptides using
callVariant
Reference Version
The version of reference genome and proteome FASTA and annotation GTF MUST be consistent across all analysis.
Usage
usage: moPepGen parseREDItools [-h] -i <file> -o <file>
[--transcript-id-column <number>]
[--min-coverage-alt <number>]
[--min-frequency-alt <value>]
[--min-coverage-rna <value>]
[--min-coverage-dna <number>] --source SOURCE
[-a <file>]
[--reference-source {GENCODE,ENSEMBL}]
[--codon-table {SGC5,Hexamita Nuclear,Euplotid Nuclear,Ascidian Mitochondrial,Thraustochytrium Mitochondrial,Standard,Gracilibacteria,SGC8,Condylostoma Nuclear,Flatworm Mitochondrial,SGC1,Protozoan Mitochondrial,SGC3,Pachysolen tannophilus Nuclear,Archaeal,Chlorophycean Mitochondrial,Pterobranchia Mitochondrial,Echinoderm Mitochondrial,Blastocrithidia Nuclear,Invertebrate Mitochondrial,SGC0,Mycoplasma,Yeast Mitochondrial,Mesodinium Nuclear,Vertebrate Mitochondrial,Bacterial,Trematode Mitochondrial,SGC4,Balanophoraceae Plastid,Blepharisma Macronuclear,Mold Mitochondrial,Karyorelict Nuclear,Spiroplasma,SGC9,Scenedesmus obliquus Mitochondrial,Plant Plastid,Coelenterate Mitochondrial,Alternative Yeast Nuclear,Dasycladacean Nuclear,Candidate Division SR1,SGC2,Cephalodiscidae Mitochondrial,Peritrich Nuclear,Ciliate Nuclear,Alternative Flatworm Mitochondrial}]
[--chr-codon-table [CHR_CODON_TABLE [CHR_CODON_TABLE ...]]]
[--start-codons [START_CODONS [START_CODONS ...]]]
[--chr-start-codons [CHR_START_CODONS [CHR_START_CODONS ...]]]
[--index-dir [<file>]]
[--debug-level <value|number>] [-q]
Parse the REDItools result to a GVF format of variant records for moPepGen to
call variant peptides. The genome
optional arguments:
-h, --help show this help message and exit
-i <file>, --input-path <file>
File path to REDItools' TSV output. Valid formats:
['.tsv', '.txt'] (default: None)
-o <file>, --output-path <file>
File path to the output file. Valid formats: ['.gvf']
(default: None)
--transcript-id-column <number>
The column index for transcript ID. If your REDItools
table does not contains it, use the AnnotateTable.py
from the REDItools package. (default: 17)
--min-coverage-alt <number>
Minimal read coverage of alterations to be parsed.
(default: 3)
--min-frequency-alt <value>
Minimal frequency of alteration to be parsed.
(default: 0.1)
--min-coverage-rna <value>
Minimal read coverage at the alteration site of RNAseq
data of reference and all alterations. (default: 10)
--min-coverage-dna <number>
Minimal read coverage at the alteration site of WGS.
Set it to -1 to skip checking this. (default: 10)
--source SOURCE Variant source (e.g. gSNP, sSNV, Fusion) (default:
None)
--debug-level <value|number>
Debug level. (default: INFO)
-q, --quiet Quiet (default: False)
Reference Files:
-a <file>, --annotation-gtf <file>
Path to the annotation GTF file. Only ENSEMBL and
GENCODE are supported. Its version must be the same as
the genome and proteome FASTA. (default: None)
--reference-source {GENCODE,ENSEMBL}
Source of reference genome and annotation. (default:
None)
--codon-table {SGC5,Hexamita Nuclear,Euplotid Nuclear,Ascidian Mitochondrial,Thraustochytrium Mitochondrial,Standard,Gracilibacteria,SGC8,Condylostoma Nuclear,Flatworm Mitochondrial,SGC1,Protozoan Mitochondrial,SGC3,Pachysolen tannophilus Nuclear,Archaeal,Chlorophycean Mitochondrial,Pterobranchia Mitochondrial,Echinoderm Mitochondrial,Blastocrithidia Nuclear,Invertebrate Mitochondrial,SGC0,Mycoplasma,Yeast Mitochondrial,Mesodinium Nuclear,Vertebrate Mitochondrial,Bacterial,Trematode Mitochondrial,SGC4,Balanophoraceae Plastid,Blepharisma Macronuclear,Mold Mitochondrial,Karyorelict Nuclear,Spiroplasma,SGC9,Scenedesmus obliquus Mitochondrial,Plant Plastid,Coelenterate Mitochondrial,Alternative Yeast Nuclear,Dasycladacean Nuclear,Candidate Division SR1,SGC2,Cephalodiscidae Mitochondrial,Peritrich Nuclear,Ciliate Nuclear,Alternative Flatworm Mitochondrial}
Codon table. Defaults to "Standard". Supported codon
tables: {'SGC5', 'Hexamita Nuclear', 'Euplotid
Nuclear', 'Ascidian Mitochondrial', 'Thraustochytrium
Mitochondrial', 'Standard', 'Gracilibacteria', 'SGC8',
'Condylostoma Nuclear', 'Flatworm Mitochondrial',
'SGC1', 'Protozoan Mitochondrial', 'SGC3', 'Pachysolen
tannophilus Nuclear', 'Archaeal', 'Chlorophycean
Mitochondrial', 'Pterobranchia Mitochondrial',
'Echinoderm Mitochondrial', 'Blastocrithidia Nuclear',
'Invertebrate Mitochondrial', 'SGC0', 'Mycoplasma',
'Yeast Mitochondrial', 'Mesodinium Nuclear',
'Vertebrate Mitochondrial', 'Bacterial', 'Trematode
Mitochondrial', 'SGC4', 'Balanophoraceae Plastid',
'Blepharisma Macronuclear', 'Mold Mitochondrial',
'Karyorelict Nuclear', 'Spiroplasma', 'SGC9',
'Scenedesmus obliquus Mitochondrial', 'Plant Plastid',
'Coelenterate Mitochondrial', 'Alternative Yeast
Nuclear', 'Dasycladacean Nuclear', 'Candidate Division
SR1', 'SGC2', 'Cephalodiscidae Mitochondrial',
'Peritrich Nuclear', 'Ciliate Nuclear', 'Alternative
Flatworm Mitochondrial'} (default: Standard)
--chr-codon-table [CHR_CODON_TABLE [CHR_CODON_TABLE ...]]
Chromosome specific codon table. Must be specified in
the format of "chrM:SGC1", where "chrM" is the
chromosome name and "SGC1" is the codon table to use
to translate genes on chrM. Supported codon tables:
{'SGC5', 'Hexamita Nuclear', 'Euplotid Nuclear',
'Ascidian Mitochondrial', 'Thraustochytrium
Mitochondrial', 'Standard', 'Gracilibacteria', 'SGC8',
'Condylostoma Nuclear', 'Flatworm Mitochondrial',
'SGC1', 'Protozoan Mitochondrial', 'SGC3', 'Pachysolen
tannophilus Nuclear', 'Archaeal', 'Chlorophycean
Mitochondrial', 'Pterobranchia Mitochondrial',
'Echinoderm Mitochondrial', 'Blastocrithidia Nuclear',
'Invertebrate Mitochondrial', 'SGC0', 'Mycoplasma',
'Yeast Mitochondrial', 'Mesodinium Nuclear',
'Vertebrate Mitochondrial', 'Bacterial', 'Trematode
Mitochondrial', 'SGC4', 'Balanophoraceae Plastid',
'Blepharisma Macronuclear', 'Mold Mitochondrial',
'Karyorelict Nuclear', 'Spiroplasma', 'SGC9',
'Scenedesmus obliquus Mitochondrial', 'Plant Plastid',
'Coelenterate Mitochondrial', 'Alternative Yeast
Nuclear', 'Dasycladacean Nuclear', 'Candidate Division
SR1', 'SGC2', 'Cephalodiscidae Mitochondrial',
'Peritrich Nuclear', 'Ciliate Nuclear', 'Alternative
Flatworm Mitochondrial'}. By default, "SGC1" is
assigned to mitochondrial chromosomes. (default: [])
--start-codons [START_CODONS [START_CODONS ...]]
Default start codon(s) to use for novel ORF
translation. Defaults to ["ATG"]. (default: ['ATG'])
--chr-start-codons [CHR_START_CODONS [CHR_START_CODONS ...]]
Chromosome specific start codon(s). For example,
"chrM:ATG,ATA,ATT".By defualt, mitochondrial
chromosome name is automatically inferred andstart
codon "ATG", "ATA", "ATT", "ATC" and "GTG" are
assigned to it. (default: [])
--index-dir [<file>] Path to the directory of index files generated by
moPepGen generateIndex. If given, --genome-fasta,
--proteome-fasta and --anntotation-gtf will be
ignored. (default: None)
Arguments
-h, --help
show this help message and exit
-i, --input-path <file> Path
File path to REDItools' TSV output. Valid formats: ['.tsv', '.txt']
-o, --output-path <file> Path
File path to the output file. Valid formats: ['.gvf']
--transcript-id-column <number> int
The column index for transcript ID. If your REDItools table does not contains it, use the AnnotateTable.py from the REDItools package.
int
Default: 17
--min-coverage-alt <number> int
Minimal read coverage of alterations to be parsed.
int
Default: 3
--min-frequency-alt <value> float
Minimal frequency of alteration to be parsed.
float
Default: 0.1
--min-coverage-rna <value> int
Minimal read coverage at the alteration site of RNAseq data of reference and all alterations.
int
Default: 10
--min-coverage-dna <number> int
Minimal read coverage at the alteration site of WGS. Set it to -1 to skip checking this.
int
Default: 10
--source str
Variant source (e.g. gSNP, sSNV, Fusion)
-a, --annotation-gtf <file> Path
Path to the annotation GTF file. Only ENSEMBL and GENCODE are supported. Its version must be the same as the genome and proteome FASTA.
--reference-source str
Source of reference genome and annotation.
Choices: ['GENCODE', 'ENSEMBL']
--codon-table str
Codon table. Defaults to "Standard". Supported codon tables: {'SGC5', 'Hexamita Nuclear', 'Euplotid Nuclear', 'Ascidian Mitochondrial', 'Thraustochytrium Mitochondrial', 'Standard', 'Gracilibacteria', 'SGC8', 'Condylostoma Nuclear', 'Flatworm Mitochondrial', 'SGC1', 'Protozoan Mitochondrial', 'SGC3', 'Pachysolen tannophilus Nuclear', 'Archaeal', 'Chlorophycean Mitochondrial', 'Pterobranchia Mitochondrial', 'Echinoderm Mitochondrial', 'Blastocrithidia Nuclear', 'Invertebrate Mitochondrial', 'SGC0', 'Mycoplasma', 'Yeast Mitochondrial', 'Mesodinium Nuclear', 'Vertebrate Mitochondrial', 'Bacterial', 'Trematode Mitochondrial', 'SGC4', 'Balanophoraceae Plastid', 'Blepharisma Macronuclear', 'Mold Mitochondrial', 'Karyorelict Nuclear', 'Spiroplasma', 'SGC9', 'Scenedesmus obliquus Mitochondrial', 'Plant Plastid', 'Coelenterate Mitochondrial', 'Alternative Yeast Nuclear', 'Dasycladacean Nuclear', 'Candidate Division SR1', 'SGC2', 'Cephalodiscidae Mitochondrial', 'Peritrich Nuclear', 'Ciliate Nuclear', 'Alternative Flatworm Mitochondrial'}
str
Default: Standard
Choices: {'SGC5', 'Hexamita Nuclear', 'Euplotid Nuclear', 'Ascidian Mitochondrial', 'Thraustochytrium Mitochondrial', 'Standard', 'Gracilibacteria', 'SGC8', 'Condylostoma Nuclear', 'Flatworm Mitochondrial', 'SGC1', 'Protozoan Mitochondrial', 'SGC3', 'Pachysolen tannophilus Nuclear', 'Archaeal', 'Chlorophycean Mitochondrial', 'Pterobranchia Mitochondrial', 'Echinoderm Mitochondrial', 'Blastocrithidia Nuclear', 'Invertebrate Mitochondrial', 'SGC0', 'Mycoplasma', 'Yeast Mitochondrial', 'Mesodinium Nuclear', 'Vertebrate Mitochondrial', 'Bacterial', 'Trematode Mitochondrial', 'SGC4', 'Balanophoraceae Plastid', 'Blepharisma Macronuclear', 'Mold Mitochondrial', 'Karyorelict Nuclear', 'Spiroplasma', 'SGC9', 'Scenedesmus obliquus Mitochondrial', 'Plant Plastid', 'Coelenterate Mitochondrial', 'Alternative Yeast Nuclear', 'Dasycladacean Nuclear', 'Candidate Division SR1', 'SGC2', 'Cephalodiscidae Mitochondrial', 'Peritrich Nuclear', 'Ciliate Nuclear', 'Alternative Flatworm Mitochondrial'}
--chr-codon-table str
Chromosome specific codon table. Must be specified in the format of "chrM:SGC1", where "chrM" is the chromosome name and "SGC1" is the codon table to use to translate genes on chrM. Supported codon tables: {'SGC5', 'Hexamita Nuclear', 'Euplotid Nuclear', 'Ascidian Mitochondrial', 'Thraustochytrium Mitochondrial', 'Standard', 'Gracilibacteria', 'SGC8', 'Condylostoma Nuclear', 'Flatworm Mitochondrial', 'SGC1', 'Protozoan Mitochondrial', 'SGC3', 'Pachysolen tannophilus Nuclear', 'Archaeal', 'Chlorophycean Mitochondrial', 'Pterobranchia Mitochondrial', 'Echinoderm Mitochondrial', 'Blastocrithidia Nuclear', 'Invertebrate Mitochondrial', 'SGC0', 'Mycoplasma', 'Yeast Mitochondrial', 'Mesodinium Nuclear', 'Vertebrate Mitochondrial', 'Bacterial', 'Trematode Mitochondrial', 'SGC4', 'Balanophoraceae Plastid', 'Blepharisma Macronuclear', 'Mold Mitochondrial', 'Karyorelict Nuclear', 'Spiroplasma', 'SGC9', 'Scenedesmus obliquus Mitochondrial', 'Plant Plastid', 'Coelenterate Mitochondrial', 'Alternative Yeast Nuclear', 'Dasycladacean Nuclear', 'Candidate Division SR1', 'SGC2', 'Cephalodiscidae Mitochondrial', 'Peritrich Nuclear', 'Ciliate Nuclear', 'Alternative Flatworm Mitochondrial'}. By default, "SGC1" is assigned to mitochondrial chromosomes.
str
Default: []
--start-codons str
Default start codon(s) to use for novel ORF translation. Defaults to ["ATG"].
str
Default: ['ATG']
--chr-start-codons str
Chromosome specific start codon(s). For example, "chrM:ATG,ATA,ATT".By defualt, mitochondrial chromosome name is automatically inferred andstart codon "ATG", "ATA", "ATT", "ATC" and "GTG" are assigned to it.
str
Default: []
--index-dir <file> Path
Path to the directory of index files generated by moPepGen generateIndex. If given, --genome-fasta, --proteome-fasta and --anntotation-gtf will be ignored.
--debug-level <value|number> str
Debug level.
str
Default: INFO
-q, --quiet
Quiet
Default: False