parseRMATS
parseRMATS takes the alternative splicing event data called by
rMATS and converts them to a GVF file.
All five alternative splicing events are supported, including skipped exons,
alternative 5 splicing, alternative 3 splicing, mutually exclusive exons, and
retained introns. Both the tsv files with JC or JCEC suffix are supported.
The created GVF file can be then used to call for variant peptides using
callVariant
Reference Version
The version of reference genome and proteome FASTA and annotation GTF MUST be consistent across all analysis.
Usage
usage: moPepGen parseRMATS [-h] [--se <file>] [--a5ss <file>] [--a3ss <file>]
[--mxe <file>] [--ri <file>] [--min-ijc MIN_IJC]
[--min-sjc MIN_SJC] -o <file> --source SOURCE
[-g <file>] [-a <file>]
[--reference-source {GENCODE,ENSEMBL}]
[--codon-table {Standard,Alternative Flatworm Mitochondrial,Trematode Mitochondrial,SGC8,SGC3,Protozoan Mitochondrial,Ciliate Nuclear,Gracilibacteria,Spiroplasma,Dasycladacean Nuclear,Invertebrate Mitochondrial,Balanophoraceae Plastid,Peritrich Nuclear,Mesodinium Nuclear,SGC5,Candidate Division SR1,Blastocrithidia Nuclear,SGC1,Bacterial,Alternative Yeast Nuclear,Yeast Mitochondrial,Scenedesmus obliquus Mitochondrial,Plant Plastid,Flatworm Mitochondrial,SGC2,Archaeal,Mycoplasma,Euplotid Nuclear,SGC9,Mold Mitochondrial,Thraustochytrium Mitochondrial,Hexamita Nuclear,Coelenterate Mitochondrial,Chlorophycean Mitochondrial,Pachysolen tannophilus Nuclear,Ascidian Mitochondrial,SGC0,Blepharisma Macronuclear,Karyorelict Nuclear,SGC4,Echinoderm Mitochondrial,Condylostoma Nuclear,Vertebrate Mitochondrial,Pterobranchia Mitochondrial,Cephalodiscidae Mitochondrial}]
[--chr-codon-table [CHR_CODON_TABLE ...]]
[--start-codons [START_CODONS ...]]
[--chr-start-codons [CHR_START_CODONS ...]]
[--index-dir [<file>]]
[--debug-level <value|number>] [-q]
Parse the rMATS result to GVF format of variant records for moPepGen to call
variant peptides.
options:
-h, --help show this help message and exit
--se <file> File path to the SE (skipped exons) junction count
file output by rMATS. The file name should look like
'*_SE.MATS.JC.txt' or '*_SE.MATS.JCEC.txt'. Valid
formats: ['.tsv', '.txt'] (default: None)
--a5ss <file> File path to the A5SS (alternative 5' splicint site)
junction count file output by rMATS. The file name
should look like '_S5SS.MATS.JC.txt' or
'*_A5SS.MATS.JCEC.txt'. Valid formats: ['.tsv',
'.txt'] (default: None)
--a3ss <file> File path to the A3SS (alternative 3' splicint site)
junction count file output by rMATS. The file name
should look like '_S3SS.MATS.JC.txt' or
'*_A3SS.MATS.JCEC.txt'. Valid formats: ['.tsv',
'.txt'] (default: None)
--mxe <file> File path to the MXE (mutually exclusive exons)
junction count file output by rMATS. The file name
should look like '_MXE.MATS.JC.txt' or
'*_MXE.MATS.JCEC.txt'. Valid formats: ['.tsv', '.txt']
(default: None)
--ri <file> File path to the RI (retained intron) junction count
file output by rMATS. The file name should look like
'_RI.MATS.JC.txt' or '*_RI.MATS.JCEC.txt'. Valid
formats: ['.tsv', '.txt'] (default: None)
--min-ijc MIN_IJC Minimal junction read count for the inclusion version
to be analyzed. (default: 1)
--min-sjc MIN_SJC Minimal junction read count for the skipped version to
be analyzed. (default: 1)
-o <file>, --output-path <file>
File path to the output file. Valid formats: ['.gvf']
(default: None)
--source SOURCE Variant source (e.g. gSNP, sSNV, Fusion) (default:
None)
--debug-level <value|number>
Debug level. (default: INFO)
-q, --quiet Quiet (default: False)
Reference Files:
-g <file>, --genome-fasta <file>
Path to the genome assembly FASTA file. Only ENSEMBL
and GENCODE are supported. Its version must be the
same as the annotation GTF and proteome FASTA
(default: None)
-a <file>, --annotation-gtf <file>
Path to the annotation GTF file. Only ENSEMBL and
GENCODE are supported. Its version must be the same as
the genome and proteome FASTA. (default: None)
--reference-source {GENCODE,ENSEMBL}
Source of reference genome and annotation. (default:
None)
--codon-table {Standard,Alternative Flatworm Mitochondrial,Trematode Mitochondrial,SGC8,SGC3,Protozoan Mitochondrial,Ciliate Nuclear,Gracilibacteria,Spiroplasma,Dasycladacean Nuclear,Invertebrate Mitochondrial,Balanophoraceae Plastid,Peritrich Nuclear,Mesodinium Nuclear,SGC5,Candidate Division SR1,Blastocrithidia Nuclear,SGC1,Bacterial,Alternative Yeast Nuclear,Yeast Mitochondrial,Scenedesmus obliquus Mitochondrial,Plant Plastid,Flatworm Mitochondrial,SGC2,Archaeal,Mycoplasma,Euplotid Nuclear,SGC9,Mold Mitochondrial,Thraustochytrium Mitochondrial,Hexamita Nuclear,Coelenterate Mitochondrial,Chlorophycean Mitochondrial,Pachysolen tannophilus Nuclear,Ascidian Mitochondrial,SGC0,Blepharisma Macronuclear,Karyorelict Nuclear,SGC4,Echinoderm Mitochondrial,Condylostoma Nuclear,Vertebrate Mitochondrial,Pterobranchia Mitochondrial,Cephalodiscidae Mitochondrial}
Codon table. Defaults to "Standard". Supported codon
tables: {'Standard', 'Alternative Flatworm
Mitochondrial', 'Trematode Mitochondrial', 'SGC8',
'SGC3', 'Protozoan Mitochondrial', 'Ciliate Nuclear',
'Gracilibacteria', 'Spiroplasma', 'Dasycladacean
Nuclear', 'Invertebrate Mitochondrial',
'Balanophoraceae Plastid', 'Peritrich Nuclear',
'Mesodinium Nuclear', 'SGC5', 'Candidate Division
SR1', 'Blastocrithidia Nuclear', 'SGC1', 'Bacterial',
'Alternative Yeast Nuclear', 'Yeast Mitochondrial',
'Scenedesmus obliquus Mitochondrial', 'Plant Plastid',
'Flatworm Mitochondrial', 'SGC2', 'Archaeal',
'Mycoplasma', 'Euplotid Nuclear', 'SGC9', 'Mold
Mitochondrial', 'Thraustochytrium Mitochondrial',
'Hexamita Nuclear', 'Coelenterate Mitochondrial',
'Chlorophycean Mitochondrial', 'Pachysolen tannophilus
Nuclear', 'Ascidian Mitochondrial', 'SGC0',
'Blepharisma Macronuclear', 'Karyorelict Nuclear',
'SGC4', 'Echinoderm Mitochondrial', 'Condylostoma
Nuclear', 'Vertebrate Mitochondrial', 'Pterobranchia
Mitochondrial', 'Cephalodiscidae Mitochondrial'}
(default: Standard)
--chr-codon-table [CHR_CODON_TABLE ...]
Chromosome specific codon table. Must be specified in
the format of "chrM:SGC1", where "chrM" is the
chromosome name and "SGC1" is the codon table to use
to translate genes on chrM. Supported codon tables:
{'Standard', 'Alternative Flatworm Mitochondrial',
'Trematode Mitochondrial', 'SGC8', 'SGC3', 'Protozoan
Mitochondrial', 'Ciliate Nuclear', 'Gracilibacteria',
'Spiroplasma', 'Dasycladacean Nuclear', 'Invertebrate
Mitochondrial', 'Balanophoraceae Plastid', 'Peritrich
Nuclear', 'Mesodinium Nuclear', 'SGC5', 'Candidate
Division SR1', 'Blastocrithidia Nuclear', 'SGC1',
'Bacterial', 'Alternative Yeast Nuclear', 'Yeast
Mitochondrial', 'Scenedesmus obliquus Mitochondrial',
'Plant Plastid', 'Flatworm Mitochondrial', 'SGC2',
'Archaeal', 'Mycoplasma', 'Euplotid Nuclear', 'SGC9',
'Mold Mitochondrial', 'Thraustochytrium
Mitochondrial', 'Hexamita Nuclear', 'Coelenterate
Mitochondrial', 'Chlorophycean Mitochondrial',
'Pachysolen tannophilus Nuclear', 'Ascidian
Mitochondrial', 'SGC0', 'Blepharisma Macronuclear',
'Karyorelict Nuclear', 'SGC4', 'Echinoderm
Mitochondrial', 'Condylostoma Nuclear', 'Vertebrate
Mitochondrial', 'Pterobranchia Mitochondrial',
'Cephalodiscidae Mitochondrial'}. By default, "SGC1"
is assigned to mitochondrial chromosomes. (default:
[])
--start-codons [START_CODONS ...]
Default start codon(s) to use for novel ORF
translation. Defaults to ["ATG"]. (default: ['ATG'])
--chr-start-codons [CHR_START_CODONS ...]
Chromosome specific start codon(s). For example,
"chrM:ATG,ATA,ATT".By defualt, mitochondrial
chromosome name is automatically inferred andstart
codon "ATG", "ATA", "ATT", "ATC" and "GTG" are
assigned to it. (default: [])
--index-dir [<file>] Path to the directory of index files generated by
moPepGen generateIndex. If given, --genome-fasta,
--proteome-fasta and --anntotation-gtf will be
ignored. (default: None)
Arguments
-h, --help
show this help message and exit
--se <file> Path
File path to the SE (skipped exons) junction count file output by rMATS. The file name should look like '*_SE.MATS.JC.txt' or '*_SE.MATS.JCEC.txt'. Valid formats: ['.tsv', '.txt']
--a5ss <file> Path
File path to the A5SS (alternative 5' splicint site) junction count file output by rMATS. The file name should look like '_S5SS.MATS.JC.txt' or '*_A5SS.MATS.JCEC.txt'. Valid formats: ['.tsv', '.txt']
--a3ss <file> Path
File path to the A3SS (alternative 3' splicint site) junction count file output by rMATS. The file name should look like '_S3SS.MATS.JC.txt' or '*_A3SS.MATS.JCEC.txt'. Valid formats: ['.tsv', '.txt']
--mxe <file> Path
File path to the MXE (mutually exclusive exons) junction count file output by rMATS. The file name should look like '_MXE.MATS.JC.txt' or '*_MXE.MATS.JCEC.txt'. Valid formats: ['.tsv', '.txt']
--ri <file> Path
File path to the RI (retained intron) junction count file output by rMATS. The file name should look like '_RI.MATS.JC.txt' or '*_RI.MATS.JCEC.txt'. Valid formats: ['.tsv', '.txt']
--min-ijc int
Minimal junction read count for the inclusion version to be analyzed.
int
Default: 1
--min-sjc int
Minimal junction read count for the skipped version to be analyzed.
int
Default: 1
-o, --output-path <file> Path
File path to the output file. Valid formats: ['.gvf']
--source str
Variant source (e.g. gSNP, sSNV, Fusion)
-g, --genome-fasta <file> Path
Path to the genome assembly FASTA file. Only ENSEMBL and GENCODE are supported. Its version must be the same as the annotation GTF and proteome FASTA
-a, --annotation-gtf <file> Path
Path to the annotation GTF file. Only ENSEMBL and GENCODE are supported. Its version must be the same as the genome and proteome FASTA.
--reference-source str
Source of reference genome and annotation.
Choices: ['GENCODE', 'ENSEMBL']
--codon-table str
Codon table. Defaults to "Standard". Supported codon tables: {'Standard', 'Alternative Flatworm Mitochondrial', 'Trematode Mitochondrial', 'SGC8', 'SGC3', 'Protozoan Mitochondrial', 'Ciliate Nuclear', 'Gracilibacteria', 'Spiroplasma', 'Dasycladacean Nuclear', 'Invertebrate Mitochondrial', 'Balanophoraceae Plastid', 'Peritrich Nuclear', 'Mesodinium Nuclear', 'SGC5', 'Candidate Division SR1', 'Blastocrithidia Nuclear', 'SGC1', 'Bacterial', 'Alternative Yeast Nuclear', 'Yeast Mitochondrial', 'Scenedesmus obliquus Mitochondrial', 'Plant Plastid', 'Flatworm Mitochondrial', 'SGC2', 'Archaeal', 'Mycoplasma', 'Euplotid Nuclear', 'SGC9', 'Mold Mitochondrial', 'Thraustochytrium Mitochondrial', 'Hexamita Nuclear', 'Coelenterate Mitochondrial', 'Chlorophycean Mitochondrial', 'Pachysolen tannophilus Nuclear', 'Ascidian Mitochondrial', 'SGC0', 'Blepharisma Macronuclear', 'Karyorelict Nuclear', 'SGC4', 'Echinoderm Mitochondrial', 'Condylostoma Nuclear', 'Vertebrate Mitochondrial', 'Pterobranchia Mitochondrial', 'Cephalodiscidae Mitochondrial'}
str
Default: Standard
Choices: {'Standard', 'Alternative Flatworm Mitochondrial', 'Trematode Mitochondrial', 'SGC8', 'SGC3', 'Protozoan Mitochondrial', 'Ciliate Nuclear', 'Gracilibacteria', 'Spiroplasma', 'Dasycladacean Nuclear', 'Invertebrate Mitochondrial', 'Balanophoraceae Plastid', 'Peritrich Nuclear', 'Mesodinium Nuclear', 'SGC5', 'Candidate Division SR1', 'Blastocrithidia Nuclear', 'SGC1', 'Bacterial', 'Alternative Yeast Nuclear', 'Yeast Mitochondrial', 'Scenedesmus obliquus Mitochondrial', 'Plant Plastid', 'Flatworm Mitochondrial', 'SGC2', 'Archaeal', 'Mycoplasma', 'Euplotid Nuclear', 'SGC9', 'Mold Mitochondrial', 'Thraustochytrium Mitochondrial', 'Hexamita Nuclear', 'Coelenterate Mitochondrial', 'Chlorophycean Mitochondrial', 'Pachysolen tannophilus Nuclear', 'Ascidian Mitochondrial', 'SGC0', 'Blepharisma Macronuclear', 'Karyorelict Nuclear', 'SGC4', 'Echinoderm Mitochondrial', 'Condylostoma Nuclear', 'Vertebrate Mitochondrial', 'Pterobranchia Mitochondrial', 'Cephalodiscidae Mitochondrial'}
--chr-codon-table str
Chromosome specific codon table. Must be specified in the format of "chrM:SGC1", where "chrM" is the chromosome name and "SGC1" is the codon table to use to translate genes on chrM. Supported codon tables: {'Standard', 'Alternative Flatworm Mitochondrial', 'Trematode Mitochondrial', 'SGC8', 'SGC3', 'Protozoan Mitochondrial', 'Ciliate Nuclear', 'Gracilibacteria', 'Spiroplasma', 'Dasycladacean Nuclear', 'Invertebrate Mitochondrial', 'Balanophoraceae Plastid', 'Peritrich Nuclear', 'Mesodinium Nuclear', 'SGC5', 'Candidate Division SR1', 'Blastocrithidia Nuclear', 'SGC1', 'Bacterial', 'Alternative Yeast Nuclear', 'Yeast Mitochondrial', 'Scenedesmus obliquus Mitochondrial', 'Plant Plastid', 'Flatworm Mitochondrial', 'SGC2', 'Archaeal', 'Mycoplasma', 'Euplotid Nuclear', 'SGC9', 'Mold Mitochondrial', 'Thraustochytrium Mitochondrial', 'Hexamita Nuclear', 'Coelenterate Mitochondrial', 'Chlorophycean Mitochondrial', 'Pachysolen tannophilus Nuclear', 'Ascidian Mitochondrial', 'SGC0', 'Blepharisma Macronuclear', 'Karyorelict Nuclear', 'SGC4', 'Echinoderm Mitochondrial', 'Condylostoma Nuclear', 'Vertebrate Mitochondrial', 'Pterobranchia Mitochondrial', 'Cephalodiscidae Mitochondrial'}. By default, "SGC1" is assigned to mitochondrial chromosomes.
str
Default: []
--start-codons str
Default start codon(s) to use for novel ORF translation. Defaults to ["ATG"].
str
Default: ['ATG']
--chr-start-codons str
Chromosome specific start codon(s). For example, "chrM:ATG,ATA,ATT".By defualt, mitochondrial chromosome name is automatically inferred andstart codon "ATG", "ATA", "ATT", "ATC" and "GTG" are assigned to it.
str
Default: []
--index-dir <file> Path
Path to the directory of index files generated by moPepGen generateIndex. If given, --genome-fasta, --proteome-fasta and --anntotation-gtf will be ignored.
--debug-level <value|number> str
Debug level.
str
Default: INFO
-q, --quiet
Quiet
Default: False