parseCIRCexplorer
parseCIRCExplorer
takes the identified circRNA results from
CIRCexplorer and save as a
GVF file. The GVF file can be later used to call variant peptides using
callVariant. Noted that only known circRNA is supported (
*_circular_known.txt)
Reference Version
The version of reference genome and proteome FASTA and annotation GTF MUST be consistent across all analysis.
Usage
usage: moPepGen parseCIRCexplorer [-h] -i <file> -o <file> [--circexplorer3]
[--min-read-number <number>]
[--min-fpb-circ <number>]
[--min-circ-score <number>]
[--intron-start-range <number>]
[--intron-end-range <number>]
[--skip-failed] --source SOURCE [-a <file>]
[--reference-source {GENCODE,ENSEMBL}]
[--codon-table {Alternative Yeast Nuclear,Protozoan Mitochondrial,Vertebrate Mitochondrial,Blepharisma Macronuclear,Chlorophycean Mitochondrial,Ascidian Mitochondrial,Ciliate Nuclear,Mesodinium Nuclear,Balanophoraceae Plastid,SGC9,Cephalodiscidae Mitochondrial,Trematode Mitochondrial,Pachysolen tannophilus Nuclear,SGC2,Yeast Mitochondrial,SGC5,Euplotid Nuclear,Scenedesmus obliquus Mitochondrial,Peritrich Nuclear,Archaeal,Coelenterate Mitochondrial,Bacterial,Mold Mitochondrial,SGC3,Hexamita Nuclear,Pterobranchia Mitochondrial,Plant Plastid,Condylostoma Nuclear,Blastocrithidia Nuclear,Gracilibacteria,Alternative Flatworm Mitochondrial,Echinoderm Mitochondrial,Invertebrate Mitochondrial,SGC0,Candidate Division SR1,Dasycladacean Nuclear,SGC4,Flatworm Mitochondrial,SGC8,Thraustochytrium Mitochondrial,SGC1,Spiroplasma,Mycoplasma,Standard,Karyorelict Nuclear}]
[--chr-codon-table [CHR_CODON_TABLE [CHR_CODON_TABLE ...]]]
[--start-codons [START_CODONS [START_CODONS ...]]]
[--chr-start-codons [CHR_START_CODONS [CHR_START_CODONS ...]]]
[--index-dir [<file>]]
[--debug-level <value|number>] [-q]
Parse CIRCexplorer result to a TSV format for moPepGen to call variant
peptides
optional arguments:
-h, --help show this help message and exit
-i <file>, --input-path <file>
File path to CIRCexplorer's TSV output. Valid formats:
['.tsv', '.txt'] (default: None)
-o <file>, --output-path <file>
File path to the output file. Valid formats: ['.gvf']
(default: None)
--circexplorer3 Using circRNA resutls from CIRCexplorer3 (default:
False)
--min-read-number <number>
Minimal number of junction read counts. (default: 1)
--min-fpb-circ <number>
Minimal CRICscore value for CIRCexplorer3. Recommends
to 1, defaults to None (default: None)
--min-circ-score <number>
Minimal CIRCscore value for CIRCexplorer3. Recommends
to 1, defaults to None (default: None)
--intron-start-range <number>
The range of difference allowed between the intron
start and the reference position. (default: -2,0)
--intron-end-range <number>
The range of difference allowed between the intron end
and the reference position. (default: -100,5)
--skip-failed When set, the failed records will be skipped.
(default: False)
--source SOURCE Variant source (e.g. gSNP, sSNV, Fusion) (default:
None)
--debug-level <value|number>
Debug level. (default: INFO)
-q, --quiet Quiet (default: False)
Reference Files:
-a <file>, --annotation-gtf <file>
Path to the annotation GTF file. Only ENSEMBL and
GENCODE are supported. Its version must be the same as
the genome and proteome FASTA. (default: None)
--reference-source {GENCODE,ENSEMBL}
Source of reference genome and annotation. (default:
None)
--codon-table {Alternative Yeast Nuclear,Protozoan Mitochondrial,Vertebrate Mitochondrial,Blepharisma Macronuclear,Chlorophycean Mitochondrial,Ascidian Mitochondrial,Ciliate Nuclear,Mesodinium Nuclear,Balanophoraceae Plastid,SGC9,Cephalodiscidae Mitochondrial,Trematode Mitochondrial,Pachysolen tannophilus Nuclear,SGC2,Yeast Mitochondrial,SGC5,Euplotid Nuclear,Scenedesmus obliquus Mitochondrial,Peritrich Nuclear,Archaeal,Coelenterate Mitochondrial,Bacterial,Mold Mitochondrial,SGC3,Hexamita Nuclear,Pterobranchia Mitochondrial,Plant Plastid,Condylostoma Nuclear,Blastocrithidia Nuclear,Gracilibacteria,Alternative Flatworm Mitochondrial,Echinoderm Mitochondrial,Invertebrate Mitochondrial,SGC0,Candidate Division SR1,Dasycladacean Nuclear,SGC4,Flatworm Mitochondrial,SGC8,Thraustochytrium Mitochondrial,SGC1,Spiroplasma,Mycoplasma,Standard,Karyorelict Nuclear}
Codon table. Defaults to "Standard". Supported codon
tables: {'Alternative Yeast Nuclear', 'Protozoan
Mitochondrial', 'Vertebrate Mitochondrial',
'Blepharisma Macronuclear', 'Chlorophycean
Mitochondrial', 'Ascidian Mitochondrial', 'Ciliate
Nuclear', 'Mesodinium Nuclear', 'Balanophoraceae
Plastid', 'SGC9', 'Cephalodiscidae Mitochondrial',
'Trematode Mitochondrial', 'Pachysolen tannophilus
Nuclear', 'SGC2', 'Yeast Mitochondrial', 'SGC5',
'Euplotid Nuclear', 'Scenedesmus obliquus
Mitochondrial', 'Peritrich Nuclear', 'Archaeal',
'Coelenterate Mitochondrial', 'Bacterial', 'Mold
Mitochondrial', 'SGC3', 'Hexamita Nuclear',
'Pterobranchia Mitochondrial', 'Plant Plastid',
'Condylostoma Nuclear', 'Blastocrithidia Nuclear',
'Gracilibacteria', 'Alternative Flatworm
Mitochondrial', 'Echinoderm Mitochondrial',
'Invertebrate Mitochondrial', 'SGC0', 'Candidate
Division SR1', 'Dasycladacean Nuclear', 'SGC4',
'Flatworm Mitochondrial', 'SGC8', 'Thraustochytrium
Mitochondrial', 'SGC1', 'Spiroplasma', 'Mycoplasma',
'Standard', 'Karyorelict Nuclear'} (default: Standard)
--chr-codon-table [CHR_CODON_TABLE [CHR_CODON_TABLE ...]]
Chromosome specific codon table. Must be specified in
the format of "chrM:SGC1", where "chrM" is the
chromosome name and "SGC1" is the codon table to use
to translate genes on chrM. Supported codon tables:
{'Alternative Yeast Nuclear', 'Protozoan
Mitochondrial', 'Vertebrate Mitochondrial',
'Blepharisma Macronuclear', 'Chlorophycean
Mitochondrial', 'Ascidian Mitochondrial', 'Ciliate
Nuclear', 'Mesodinium Nuclear', 'Balanophoraceae
Plastid', 'SGC9', 'Cephalodiscidae Mitochondrial',
'Trematode Mitochondrial', 'Pachysolen tannophilus
Nuclear', 'SGC2', 'Yeast Mitochondrial', 'SGC5',
'Euplotid Nuclear', 'Scenedesmus obliquus
Mitochondrial', 'Peritrich Nuclear', 'Archaeal',
'Coelenterate Mitochondrial', 'Bacterial', 'Mold
Mitochondrial', 'SGC3', 'Hexamita Nuclear',
'Pterobranchia Mitochondrial', 'Plant Plastid',
'Condylostoma Nuclear', 'Blastocrithidia Nuclear',
'Gracilibacteria', 'Alternative Flatworm
Mitochondrial', 'Echinoderm Mitochondrial',
'Invertebrate Mitochondrial', 'SGC0', 'Candidate
Division SR1', 'Dasycladacean Nuclear', 'SGC4',
'Flatworm Mitochondrial', 'SGC8', 'Thraustochytrium
Mitochondrial', 'SGC1', 'Spiroplasma', 'Mycoplasma',
'Standard', 'Karyorelict Nuclear'}. By default, "SGC1"
is assigned to mitochondrial chromosomes. (default:
[])
--start-codons [START_CODONS [START_CODONS ...]]
Default start codon(s) to use for novel ORF
translation. Defaults to ["ATG"]. (default: ['ATG'])
--chr-start-codons [CHR_START_CODONS [CHR_START_CODONS ...]]
Chromosome specific start codon(s). For example,
"chrM:ATG,ATA,ATT".By defualt, mitochondrial
chromosome name is automatically inferred andstart
codon "ATG", "ATA", "ATT", "ATC" and "GTG" are
assigned to it. (default: [])
--index-dir [<file>] Path to the directory of index files generated by
moPepGen generateIndex. If given, --genome-fasta,
--proteome-fasta and --anntotation-gtf will be
ignored. (default: None)
Arguments
-h, --help
show this help message and exit
-i, --input-path <file> Path
File path to CIRCexplorer's TSV output. Valid formats: ['.tsv', '.txt']
-o, --output-path <file> Path
File path to the output file. Valid formats: ['.gvf']
--circexplorer3
Using circRNA resutls from CIRCexplorer3
Default: False
--min-read-number <number> int
Minimal number of junction read counts.
int
Default: 1
--min-fpb-circ <number> float
Minimal CRICscore value for CIRCexplorer3. Recommends to 1, defaults to None
--min-circ-score <number> float
Minimal CIRCscore value for CIRCexplorer3. Recommends to 1, defaults to None
--intron-start-range <number> str
The range of difference allowed between the intron start and the reference position.
str
Default: -2,0
--intron-end-range <number> str
The range of difference allowed between the intron end and the reference position.
str
Default: -100,5
--skip-failed
When set, the failed records will be skipped.
Default: False
--source str
Variant source (e.g. gSNP, sSNV, Fusion)
-a, --annotation-gtf <file> Path
Path to the annotation GTF file. Only ENSEMBL and GENCODE are supported. Its version must be the same as the genome and proteome FASTA.
--reference-source str
Source of reference genome and annotation.
Choices: ['GENCODE', 'ENSEMBL']
--codon-table str
Codon table. Defaults to "Standard". Supported codon tables: {'Alternative Yeast Nuclear', 'Protozoan Mitochondrial', 'Vertebrate Mitochondrial', 'Blepharisma Macronuclear', 'Chlorophycean Mitochondrial', 'Ascidian Mitochondrial', 'Ciliate Nuclear', 'Mesodinium Nuclear', 'Balanophoraceae Plastid', 'SGC9', 'Cephalodiscidae Mitochondrial', 'Trematode Mitochondrial', 'Pachysolen tannophilus Nuclear', 'SGC2', 'Yeast Mitochondrial', 'SGC5', 'Euplotid Nuclear', 'Scenedesmus obliquus Mitochondrial', 'Peritrich Nuclear', 'Archaeal', 'Coelenterate Mitochondrial', 'Bacterial', 'Mold Mitochondrial', 'SGC3', 'Hexamita Nuclear', 'Pterobranchia Mitochondrial', 'Plant Plastid', 'Condylostoma Nuclear', 'Blastocrithidia Nuclear', 'Gracilibacteria', 'Alternative Flatworm Mitochondrial', 'Echinoderm Mitochondrial', 'Invertebrate Mitochondrial', 'SGC0', 'Candidate Division SR1', 'Dasycladacean Nuclear', 'SGC4', 'Flatworm Mitochondrial', 'SGC8', 'Thraustochytrium Mitochondrial', 'SGC1', 'Spiroplasma', 'Mycoplasma', 'Standard', 'Karyorelict Nuclear'}
str
Default: Standard
Choices: {'Alternative Yeast Nuclear', 'Protozoan Mitochondrial', 'Vertebrate Mitochondrial', 'Blepharisma Macronuclear', 'Chlorophycean Mitochondrial', 'Ascidian Mitochondrial', 'Ciliate Nuclear', 'Mesodinium Nuclear', 'Balanophoraceae Plastid', 'SGC9', 'Cephalodiscidae Mitochondrial', 'Trematode Mitochondrial', 'Pachysolen tannophilus Nuclear', 'SGC2', 'Yeast Mitochondrial', 'SGC5', 'Euplotid Nuclear', 'Scenedesmus obliquus Mitochondrial', 'Peritrich Nuclear', 'Archaeal', 'Coelenterate Mitochondrial', 'Bacterial', 'Mold Mitochondrial', 'SGC3', 'Hexamita Nuclear', 'Pterobranchia Mitochondrial', 'Plant Plastid', 'Condylostoma Nuclear', 'Blastocrithidia Nuclear', 'Gracilibacteria', 'Alternative Flatworm Mitochondrial', 'Echinoderm Mitochondrial', 'Invertebrate Mitochondrial', 'SGC0', 'Candidate Division SR1', 'Dasycladacean Nuclear', 'SGC4', 'Flatworm Mitochondrial', 'SGC8', 'Thraustochytrium Mitochondrial', 'SGC1', 'Spiroplasma', 'Mycoplasma', 'Standard', 'Karyorelict Nuclear'}
--chr-codon-table str
Chromosome specific codon table. Must be specified in the format of "chrM:SGC1", where "chrM" is the chromosome name and "SGC1" is the codon table to use to translate genes on chrM. Supported codon tables: {'Alternative Yeast Nuclear', 'Protozoan Mitochondrial', 'Vertebrate Mitochondrial', 'Blepharisma Macronuclear', 'Chlorophycean Mitochondrial', 'Ascidian Mitochondrial', 'Ciliate Nuclear', 'Mesodinium Nuclear', 'Balanophoraceae Plastid', 'SGC9', 'Cephalodiscidae Mitochondrial', 'Trematode Mitochondrial', 'Pachysolen tannophilus Nuclear', 'SGC2', 'Yeast Mitochondrial', 'SGC5', 'Euplotid Nuclear', 'Scenedesmus obliquus Mitochondrial', 'Peritrich Nuclear', 'Archaeal', 'Coelenterate Mitochondrial', 'Bacterial', 'Mold Mitochondrial', 'SGC3', 'Hexamita Nuclear', 'Pterobranchia Mitochondrial', 'Plant Plastid', 'Condylostoma Nuclear', 'Blastocrithidia Nuclear', 'Gracilibacteria', 'Alternative Flatworm Mitochondrial', 'Echinoderm Mitochondrial', 'Invertebrate Mitochondrial', 'SGC0', 'Candidate Division SR1', 'Dasycladacean Nuclear', 'SGC4', 'Flatworm Mitochondrial', 'SGC8', 'Thraustochytrium Mitochondrial', 'SGC1', 'Spiroplasma', 'Mycoplasma', 'Standard', 'Karyorelict Nuclear'}. By default, "SGC1" is assigned to mitochondrial chromosomes.
str
Default: []
--start-codons str
Default start codon(s) to use for novel ORF translation. Defaults to ["ATG"].
str
Default: ['ATG']
--chr-start-codons str
Chromosome specific start codon(s). For example, "chrM:ATG,ATA,ATT".By defualt, mitochondrial chromosome name is automatically inferred andstart codon "ATG", "ATA", "ATT", "ATC" and "GTG" are assigned to it.
str
Default: []
--index-dir <file> Path
Path to the directory of index files generated by moPepGen generateIndex. If given, --genome-fasta, --proteome-fasta and --anntotation-gtf will be ignored.
--debug-level <value|number> str
Debug level.
str
Default: INFO
-q, --quiet
Quiet
Default: False