parseREDItools

parseREDItools takes RNA editing results called by REDItools and saves them as a GVF file. The GVF file can then be used to call variant peptides using callVariant

Reference Version

The version of reference genome and proteome FASTA and annotation GTF MUST be consistent across all analysis.

Usage

usage: moPepGen parseREDItools [-h] -i <file> -o <file>
                               [--transcript-id-column <number>]
                               [--min-coverage-alt <number>]
                               [--min-frequency-alt <value>]
                               [--min-coverage-rna <value>]
                               [--min-coverage-dna <number>] --source SOURCE
                               [--skip-failed] [-a <file>]
                               [--reference-source {GENCODE,ENSEMBL}]
                               [--codon-table {Thraustochytrium Mitochondrial,SGC0,Gracilibacteria,Plant Plastid,SGC8,Chlorophycean Mitochondrial,Ascidian Mitochondrial,Invertebrate Mitochondrial,SGC5,SGC9,Blepharisma Macronuclear,Bacterial,SGC2,Yeast Mitochondrial,Pterobranchia Mitochondrial,Hexamita Nuclear,Echinoderm Mitochondrial,Euplotid Nuclear,Scenedesmus obliquus Mitochondrial,Pachysolen tannophilus Nuclear,Coelenterate Mitochondrial,Condylostoma Nuclear,SGC3,Protozoan Mitochondrial,SGC4,Peritrich Nuclear,Trematode Mitochondrial,Archaeal,Spiroplasma,Alternative Flatworm Mitochondrial,Mesodinium Nuclear,SGC1,Blastocrithidia Nuclear,Mold Mitochondrial,Alternative Yeast Nuclear,Standard,Flatworm Mitochondrial,Dasycladacean Nuclear,Vertebrate Mitochondrial,Karyorelict Nuclear,Balanophoraceae Plastid,Cephalodiscidae Mitochondrial,Candidate Division SR1,Ciliate Nuclear,Mycoplasma}]
                               [--chr-codon-table [CHR_CODON_TABLE ...]]
                               [--start-codons [START_CODONS ...]]
                               [--chr-start-codons [CHR_START_CODONS ...]]
                               [--index-dir [<file>]]
                               [--debug-level <value|number>] [-q]

Parse the REDItools result to a GVF format of variant records for moPepGen to
call variant peptides. The genome

options:
  -h, --help            show this help message and exit
  -i <file>, --input-path <file>
                        File path to REDItools' TSV output. Valid formats:
                        ['.tsv', '.txt'] (default: None)
  -o <file>, --output-path <file>
                        File path to the output file. Valid formats: ['.gvf']
                        (default: None)
  --transcript-id-column <number>
                        The column index for transcript ID. If your REDItools
                        table does not contains it, use the AnnotateTable.py
                        from the REDItools package. (default: 17)
  --min-coverage-alt <number>
                        Minimal read coverage of alterations to be parsed.
                        (default: 3)
  --min-frequency-alt <value>
                        Minimal frequency of alteration to be parsed.
                        (default: 0.1)
  --min-coverage-rna <value>
                        Minimal read coverage at the alteration site of RNAseq
                        data of reference and all alterations. (default: 10)
  --min-coverage-dna <number>
                        Minimal read coverage at the alteration site of WGS.
                        Set it to -1 to skip checking this. (default: 10)
  --source SOURCE       Variant source (e.g. gSNP, sSNV, Fusion) (default:
                        None)
  --skip-failed         When set, the failed records will be skipped.
                        (default: False)
  --debug-level <value|number>
                        Debug level. (default: INFO)
  -q, --quiet           Quiet (default: False)

Reference Files:
  -a <file>, --annotation-gtf <file>
                        Path to the annotation GTF file. Only ENSEMBL and
                        GENCODE are supported. Its version must be the same as
                        the genome and proteome FASTA. (default: None)
  --reference-source {GENCODE,ENSEMBL}
                        Source of reference genome and annotation. (default:
                        None)
  --codon-table {Thraustochytrium Mitochondrial,SGC0,Gracilibacteria,Plant Plastid,SGC8,Chlorophycean Mitochondrial,Ascidian Mitochondrial,Invertebrate Mitochondrial,SGC5,SGC9,Blepharisma Macronuclear,Bacterial,SGC2,Yeast Mitochondrial,Pterobranchia Mitochondrial,Hexamita Nuclear,Echinoderm Mitochondrial,Euplotid Nuclear,Scenedesmus obliquus Mitochondrial,Pachysolen tannophilus Nuclear,Coelenterate Mitochondrial,Condylostoma Nuclear,SGC3,Protozoan Mitochondrial,SGC4,Peritrich Nuclear,Trematode Mitochondrial,Archaeal,Spiroplasma,Alternative Flatworm Mitochondrial,Mesodinium Nuclear,SGC1,Blastocrithidia Nuclear,Mold Mitochondrial,Alternative Yeast Nuclear,Standard,Flatworm Mitochondrial,Dasycladacean Nuclear,Vertebrate Mitochondrial,Karyorelict Nuclear,Balanophoraceae Plastid,Cephalodiscidae Mitochondrial,Candidate Division SR1,Ciliate Nuclear,Mycoplasma}
                        Codon table. Defaults to "Standard". Supported codon
                        tables: {'Thraustochytrium Mitochondrial', 'SGC0',
                        'Gracilibacteria', 'Plant Plastid', 'SGC8',
                        'Chlorophycean Mitochondrial', 'Ascidian
                        Mitochondrial', 'Invertebrate Mitochondrial', 'SGC5',
                        'SGC9', 'Blepharisma Macronuclear', 'Bacterial',
                        'SGC2', 'Yeast Mitochondrial', 'Pterobranchia
                        Mitochondrial', 'Hexamita Nuclear', 'Echinoderm
                        Mitochondrial', 'Euplotid Nuclear', 'Scenedesmus
                        obliquus Mitochondrial', 'Pachysolen tannophilus
                        Nuclear', 'Coelenterate Mitochondrial', 'Condylostoma
                        Nuclear', 'SGC3', 'Protozoan Mitochondrial', 'SGC4',
                        'Peritrich Nuclear', 'Trematode Mitochondrial',
                        'Archaeal', 'Spiroplasma', 'Alternative Flatworm
                        Mitochondrial', 'Mesodinium Nuclear', 'SGC1',
                        'Blastocrithidia Nuclear', 'Mold Mitochondrial',
                        'Alternative Yeast Nuclear', 'Standard', 'Flatworm
                        Mitochondrial', 'Dasycladacean Nuclear', 'Vertebrate
                        Mitochondrial', 'Karyorelict Nuclear',
                        'Balanophoraceae Plastid', 'Cephalodiscidae
                        Mitochondrial', 'Candidate Division SR1', 'Ciliate
                        Nuclear', 'Mycoplasma'} (default: Standard)
  --chr-codon-table [CHR_CODON_TABLE ...]
                        Chromosome specific codon table. Must be specified in
                        the format of "chrM:SGC1", where "chrM" is the
                        chromosome name and "SGC1" is the codon table to use
                        to translate genes on chrM. Supported codon tables:
                        {'Thraustochytrium Mitochondrial', 'SGC0',
                        'Gracilibacteria', 'Plant Plastid', 'SGC8',
                        'Chlorophycean Mitochondrial', 'Ascidian
                        Mitochondrial', 'Invertebrate Mitochondrial', 'SGC5',
                        'SGC9', 'Blepharisma Macronuclear', 'Bacterial',
                        'SGC2', 'Yeast Mitochondrial', 'Pterobranchia
                        Mitochondrial', 'Hexamita Nuclear', 'Echinoderm
                        Mitochondrial', 'Euplotid Nuclear', 'Scenedesmus
                        obliquus Mitochondrial', 'Pachysolen tannophilus
                        Nuclear', 'Coelenterate Mitochondrial', 'Condylostoma
                        Nuclear', 'SGC3', 'Protozoan Mitochondrial', 'SGC4',
                        'Peritrich Nuclear', 'Trematode Mitochondrial',
                        'Archaeal', 'Spiroplasma', 'Alternative Flatworm
                        Mitochondrial', 'Mesodinium Nuclear', 'SGC1',
                        'Blastocrithidia Nuclear', 'Mold Mitochondrial',
                        'Alternative Yeast Nuclear', 'Standard', 'Flatworm
                        Mitochondrial', 'Dasycladacean Nuclear', 'Vertebrate
                        Mitochondrial', 'Karyorelict Nuclear',
                        'Balanophoraceae Plastid', 'Cephalodiscidae
                        Mitochondrial', 'Candidate Division SR1', 'Ciliate
                        Nuclear', 'Mycoplasma'}. By default, "SGC1" is
                        assigned to mitochondrial chromosomes. (default: [])
  --start-codons [START_CODONS ...]
                        Default start codon(s) to use for novel ORF
                        translation. Defaults to ["ATG"]. (default: ['ATG'])
  --chr-start-codons [CHR_START_CODONS ...]
                        Chromosome specific start codon(s). For example,
                        "chrM:ATG,ATA,ATT".By defualt, mitochondrial
                        chromosome name is automatically inferred andstart
                        codon "ATG", "ATA", "ATT", "ATC" and "GTG" are
                        assigned to it. (default: [])
  --index-dir [<file>]  Path to the directory of index files generated by
                        moPepGen generateIndex. If given, --genome-fasta,
                        --proteome-fasta and --anntotation-gtf will be
                        ignored. (default: None)

Arguments

-h, --help

show this help message and exit

-i, --input-path <file> Path

File path to REDItools' TSV output. Valid formats: ['.tsv', '.txt']

-o, --output-path <file> Path

File path to the output file. Valid formats: ['.gvf']

--transcript-id-column <number> int

The column index for transcript ID. If your REDItools table does not contains it, use the AnnotateTable.py from the REDItools package. int
Default: 17

--min-coverage-alt <number> int

Minimal read coverage of alterations to be parsed. int
Default: 3

--min-frequency-alt <value> float

Minimal frequency of alteration to be parsed. float
Default: 0.1

--min-coverage-rna <value> int

Minimal read coverage at the alteration site of RNAseq data of reference and all alterations. int
Default: 10

--min-coverage-dna <number> int

Minimal read coverage at the alteration site of WGS. Set it to -1 to skip checking this. int
Default: 10

--source str

Variant source (e.g. gSNP, sSNV, Fusion)

--skip-failed

When set, the failed records will be skipped.
Default: False

-a, --annotation-gtf <file> Path

Path to the annotation GTF file. Only ENSEMBL and GENCODE are supported. Its version must be the same as the genome and proteome FASTA.

--reference-source str

Source of reference genome and annotation.
Choices: ['GENCODE', 'ENSEMBL']

--codon-table str

Codon table. Defaults to "Standard". Supported codon tables: {'Thraustochytrium Mitochondrial', 'SGC0', 'Gracilibacteria', 'Plant Plastid', 'SGC8', 'Chlorophycean Mitochondrial', 'Ascidian Mitochondrial', 'Invertebrate Mitochondrial', 'SGC5', 'SGC9', 'Blepharisma Macronuclear', 'Bacterial', 'SGC2', 'Yeast Mitochondrial', 'Pterobranchia Mitochondrial', 'Hexamita Nuclear', 'Echinoderm Mitochondrial', 'Euplotid Nuclear', 'Scenedesmus obliquus Mitochondrial', 'Pachysolen tannophilus Nuclear', 'Coelenterate Mitochondrial', 'Condylostoma Nuclear', 'SGC3', 'Protozoan Mitochondrial', 'SGC4', 'Peritrich Nuclear', 'Trematode Mitochondrial', 'Archaeal', 'Spiroplasma', 'Alternative Flatworm Mitochondrial', 'Mesodinium Nuclear', 'SGC1', 'Blastocrithidia Nuclear', 'Mold Mitochondrial', 'Alternative Yeast Nuclear', 'Standard', 'Flatworm Mitochondrial', 'Dasycladacean Nuclear', 'Vertebrate Mitochondrial', 'Karyorelict Nuclear', 'Balanophoraceae Plastid', 'Cephalodiscidae Mitochondrial', 'Candidate Division SR1', 'Ciliate Nuclear', 'Mycoplasma'} str
Default: Standard
Choices: {'Thraustochytrium Mitochondrial', 'SGC0', 'Gracilibacteria', 'Plant Plastid', 'SGC8', 'Chlorophycean Mitochondrial', 'Ascidian Mitochondrial', 'Invertebrate Mitochondrial', 'SGC5', 'SGC9', 'Blepharisma Macronuclear', 'Bacterial', 'SGC2', 'Yeast Mitochondrial', 'Pterobranchia Mitochondrial', 'Hexamita Nuclear', 'Echinoderm Mitochondrial', 'Euplotid Nuclear', 'Scenedesmus obliquus Mitochondrial', 'Pachysolen tannophilus Nuclear', 'Coelenterate Mitochondrial', 'Condylostoma Nuclear', 'SGC3', 'Protozoan Mitochondrial', 'SGC4', 'Peritrich Nuclear', 'Trematode Mitochondrial', 'Archaeal', 'Spiroplasma', 'Alternative Flatworm Mitochondrial', 'Mesodinium Nuclear', 'SGC1', 'Blastocrithidia Nuclear', 'Mold Mitochondrial', 'Alternative Yeast Nuclear', 'Standard', 'Flatworm Mitochondrial', 'Dasycladacean Nuclear', 'Vertebrate Mitochondrial', 'Karyorelict Nuclear', 'Balanophoraceae Plastid', 'Cephalodiscidae Mitochondrial', 'Candidate Division SR1', 'Ciliate Nuclear', 'Mycoplasma'}

--chr-codon-table str

Chromosome specific codon table. Must be specified in the format of "chrM:SGC1", where "chrM" is the chromosome name and "SGC1" is the codon table to use to translate genes on chrM. Supported codon tables: {'Thraustochytrium Mitochondrial', 'SGC0', 'Gracilibacteria', 'Plant Plastid', 'SGC8', 'Chlorophycean Mitochondrial', 'Ascidian Mitochondrial', 'Invertebrate Mitochondrial', 'SGC5', 'SGC9', 'Blepharisma Macronuclear', 'Bacterial', 'SGC2', 'Yeast Mitochondrial', 'Pterobranchia Mitochondrial', 'Hexamita Nuclear', 'Echinoderm Mitochondrial', 'Euplotid Nuclear', 'Scenedesmus obliquus Mitochondrial', 'Pachysolen tannophilus Nuclear', 'Coelenterate Mitochondrial', 'Condylostoma Nuclear', 'SGC3', 'Protozoan Mitochondrial', 'SGC4', 'Peritrich Nuclear', 'Trematode Mitochondrial', 'Archaeal', 'Spiroplasma', 'Alternative Flatworm Mitochondrial', 'Mesodinium Nuclear', 'SGC1', 'Blastocrithidia Nuclear', 'Mold Mitochondrial', 'Alternative Yeast Nuclear', 'Standard', 'Flatworm Mitochondrial', 'Dasycladacean Nuclear', 'Vertebrate Mitochondrial', 'Karyorelict Nuclear', 'Balanophoraceae Plastid', 'Cephalodiscidae Mitochondrial', 'Candidate Division SR1', 'Ciliate Nuclear', 'Mycoplasma'}. By default, "SGC1" is assigned to mitochondrial chromosomes. str
Default: []

--start-codons str

Default start codon(s) to use for novel ORF translation. Defaults to ["ATG"]. str
Default: ['ATG']

--chr-start-codons str

Chromosome specific start codon(s). For example, "chrM:ATG,ATA,ATT".By defualt, mitochondrial chromosome name is automatically inferred andstart codon "ATG", "ATA", "ATT", "ATC" and "GTG" are assigned to it. str
Default: []

--index-dir <file> Path

Path to the directory of index files generated by moPepGen generateIndex. If given, --genome-fasta, --proteome-fasta and --anntotation-gtf will be ignored.

--debug-level <value|number> str

Debug level. str
Default: INFO

-q, --quiet

Quiet
Default: False