parseCIRCexplorer

parseCIRCExplorer takes the identified circRNA results from CIRCexplorer and save as a GVF file. The GVF file can be later used to call variant peptides using callVariant. Noted that only known circRNA is supported ( *_circular_known.txt)

Reference Version

The version of reference genome and proteome FASTA and annotation GTF MUST be consistent across all analysis.

Usage

usage: moPepGen parseCIRCexplorer [-h] -i <file> -o <file> [--circexplorer3]
                                  [--min-read-number <number>]
                                  [--min-fpb-circ <number>]
                                  [--min-circ-score <number>]
                                  [--intron-start-range <number>]
                                  [--intron-end-range <number>]
                                  [--skip-failed] --source SOURCE [-a <file>]
                                  [--reference-source {GENCODE,ENSEMBL}]
                                  [--codon-table {SGC5,Hexamita Nuclear,Euplotid Nuclear,Ascidian Mitochondrial,Thraustochytrium Mitochondrial,Standard,Gracilibacteria,SGC8,Condylostoma Nuclear,Flatworm Mitochondrial,SGC1,Protozoan Mitochondrial,SGC3,Pachysolen tannophilus Nuclear,Archaeal,Chlorophycean Mitochondrial,Pterobranchia Mitochondrial,Echinoderm Mitochondrial,Blastocrithidia Nuclear,Invertebrate Mitochondrial,SGC0,Mycoplasma,Yeast Mitochondrial,Mesodinium Nuclear,Vertebrate Mitochondrial,Bacterial,Trematode Mitochondrial,SGC4,Balanophoraceae Plastid,Blepharisma Macronuclear,Mold Mitochondrial,Karyorelict Nuclear,Spiroplasma,SGC9,Scenedesmus obliquus Mitochondrial,Plant Plastid,Coelenterate Mitochondrial,Alternative Yeast Nuclear,Dasycladacean Nuclear,Candidate Division SR1,SGC2,Cephalodiscidae Mitochondrial,Peritrich Nuclear,Ciliate Nuclear,Alternative Flatworm Mitochondrial}]
                                  [--chr-codon-table [CHR_CODON_TABLE [CHR_CODON_TABLE ...]]]
                                  [--start-codons [START_CODONS [START_CODONS ...]]]
                                  [--chr-start-codons [CHR_START_CODONS [CHR_START_CODONS ...]]]
                                  [--index-dir [<file>]]
                                  [--debug-level <value|number>] [-q]

Parse CIRCexplorer result to a TSV format for moPepGen to call variant
peptides

optional arguments:
  -h, --help            show this help message and exit
  -i <file>, --input-path <file>
                        File path to CIRCexplorer's TSV output. Valid formats:
                        ['.tsv', '.txt'] (default: None)
  -o <file>, --output-path <file>
                        File path to the output file. Valid formats: ['.gvf']
                        (default: None)
  --circexplorer3       Using circRNA resutls from CIRCexplorer3 (default:
                        False)
  --min-read-number <number>
                        Minimal number of junction read counts. (default: 1)
  --min-fpb-circ <number>
                        Minimal CRICscore value for CIRCexplorer3. Recommends
                        to 1, defaults to None (default: None)
  --min-circ-score <number>
                        Minimal CIRCscore value for CIRCexplorer3. Recommends
                        to 1, defaults to None (default: None)
  --intron-start-range <number>
                        The range of difference allowed between the intron
                        start and the reference position. (default: -2,0)
  --intron-end-range <number>
                        The range of difference allowed between the intron end
                        and the reference position. (default: -100,5)
  --skip-failed         When set, the failed records will be skipped.
                        (default: False)
  --source SOURCE       Variant source (e.g. gSNP, sSNV, Fusion) (default:
                        None)
  --debug-level <value|number>
                        Debug level. (default: INFO)
  -q, --quiet           Quiet (default: False)

Reference Files:
  -a <file>, --annotation-gtf <file>
                        Path to the annotation GTF file. Only ENSEMBL and
                        GENCODE are supported. Its version must be the same as
                        the genome and proteome FASTA. (default: None)
  --reference-source {GENCODE,ENSEMBL}
                        Source of reference genome and annotation. (default:
                        None)
  --codon-table {SGC5,Hexamita Nuclear,Euplotid Nuclear,Ascidian Mitochondrial,Thraustochytrium Mitochondrial,Standard,Gracilibacteria,SGC8,Condylostoma Nuclear,Flatworm Mitochondrial,SGC1,Protozoan Mitochondrial,SGC3,Pachysolen tannophilus Nuclear,Archaeal,Chlorophycean Mitochondrial,Pterobranchia Mitochondrial,Echinoderm Mitochondrial,Blastocrithidia Nuclear,Invertebrate Mitochondrial,SGC0,Mycoplasma,Yeast Mitochondrial,Mesodinium Nuclear,Vertebrate Mitochondrial,Bacterial,Trematode Mitochondrial,SGC4,Balanophoraceae Plastid,Blepharisma Macronuclear,Mold Mitochondrial,Karyorelict Nuclear,Spiroplasma,SGC9,Scenedesmus obliquus Mitochondrial,Plant Plastid,Coelenterate Mitochondrial,Alternative Yeast Nuclear,Dasycladacean Nuclear,Candidate Division SR1,SGC2,Cephalodiscidae Mitochondrial,Peritrich Nuclear,Ciliate Nuclear,Alternative Flatworm Mitochondrial}
                        Codon table. Defaults to "Standard". Supported codon
                        tables: {'SGC5', 'Hexamita Nuclear', 'Euplotid
                        Nuclear', 'Ascidian Mitochondrial', 'Thraustochytrium
                        Mitochondrial', 'Standard', 'Gracilibacteria', 'SGC8',
                        'Condylostoma Nuclear', 'Flatworm Mitochondrial',
                        'SGC1', 'Protozoan Mitochondrial', 'SGC3', 'Pachysolen
                        tannophilus Nuclear', 'Archaeal', 'Chlorophycean
                        Mitochondrial', 'Pterobranchia Mitochondrial',
                        'Echinoderm Mitochondrial', 'Blastocrithidia Nuclear',
                        'Invertebrate Mitochondrial', 'SGC0', 'Mycoplasma',
                        'Yeast Mitochondrial', 'Mesodinium Nuclear',
                        'Vertebrate Mitochondrial', 'Bacterial', 'Trematode
                        Mitochondrial', 'SGC4', 'Balanophoraceae Plastid',
                        'Blepharisma Macronuclear', 'Mold Mitochondrial',
                        'Karyorelict Nuclear', 'Spiroplasma', 'SGC9',
                        'Scenedesmus obliquus Mitochondrial', 'Plant Plastid',
                        'Coelenterate Mitochondrial', 'Alternative Yeast
                        Nuclear', 'Dasycladacean Nuclear', 'Candidate Division
                        SR1', 'SGC2', 'Cephalodiscidae Mitochondrial',
                        'Peritrich Nuclear', 'Ciliate Nuclear', 'Alternative
                        Flatworm Mitochondrial'} (default: Standard)
  --chr-codon-table [CHR_CODON_TABLE [CHR_CODON_TABLE ...]]
                        Chromosome specific codon table. Must be specified in
                        the format of "chrM:SGC1", where "chrM" is the
                        chromosome name and "SGC1" is the codon table to use
                        to translate genes on chrM. Supported codon tables:
                        {'SGC5', 'Hexamita Nuclear', 'Euplotid Nuclear',
                        'Ascidian Mitochondrial', 'Thraustochytrium
                        Mitochondrial', 'Standard', 'Gracilibacteria', 'SGC8',
                        'Condylostoma Nuclear', 'Flatworm Mitochondrial',
                        'SGC1', 'Protozoan Mitochondrial', 'SGC3', 'Pachysolen
                        tannophilus Nuclear', 'Archaeal', 'Chlorophycean
                        Mitochondrial', 'Pterobranchia Mitochondrial',
                        'Echinoderm Mitochondrial', 'Blastocrithidia Nuclear',
                        'Invertebrate Mitochondrial', 'SGC0', 'Mycoplasma',
                        'Yeast Mitochondrial', 'Mesodinium Nuclear',
                        'Vertebrate Mitochondrial', 'Bacterial', 'Trematode
                        Mitochondrial', 'SGC4', 'Balanophoraceae Plastid',
                        'Blepharisma Macronuclear', 'Mold Mitochondrial',
                        'Karyorelict Nuclear', 'Spiroplasma', 'SGC9',
                        'Scenedesmus obliquus Mitochondrial', 'Plant Plastid',
                        'Coelenterate Mitochondrial', 'Alternative Yeast
                        Nuclear', 'Dasycladacean Nuclear', 'Candidate Division
                        SR1', 'SGC2', 'Cephalodiscidae Mitochondrial',
                        'Peritrich Nuclear', 'Ciliate Nuclear', 'Alternative
                        Flatworm Mitochondrial'}. By default, "SGC1" is
                        assigned to mitochondrial chromosomes. (default: [])
  --start-codons [START_CODONS [START_CODONS ...]]
                        Default start codon(s) to use for novel ORF
                        translation. Defaults to ["ATG"]. (default: ['ATG'])
  --chr-start-codons [CHR_START_CODONS [CHR_START_CODONS ...]]
                        Chromosome specific start codon(s). For example,
                        "chrM:ATG,ATA,ATT".By defualt, mitochondrial
                        chromosome name is automatically inferred andstart
                        codon "ATG", "ATA", "ATT", "ATC" and "GTG" are
                        assigned to it. (default: [])
  --index-dir [<file>]  Path to the directory of index files generated by
                        moPepGen generateIndex. If given, --genome-fasta,
                        --proteome-fasta and --anntotation-gtf will be
                        ignored. (default: None)

Arguments

-h, --help

show this help message and exit

-i, --input-path <file> Path

File path to CIRCexplorer's TSV output. Valid formats: ['.tsv', '.txt']

-o, --output-path <file> Path

File path to the output file. Valid formats: ['.gvf']

--circexplorer3

Using circRNA resutls from CIRCexplorer3
Default: False

--min-read-number <number> int

Minimal number of junction read counts. int
Default: 1

--min-fpb-circ <number> float

Minimal CRICscore value for CIRCexplorer3. Recommends to 1, defaults to None

--min-circ-score <number> float

Minimal CIRCscore value for CIRCexplorer3. Recommends to 1, defaults to None

--intron-start-range <number> str

The range of difference allowed between the intron start and the reference position. str
Default: -2,0

--intron-end-range <number> str

The range of difference allowed between the intron end and the reference position. str
Default: -100,5

--skip-failed

When set, the failed records will be skipped.
Default: False

--source str

Variant source (e.g. gSNP, sSNV, Fusion)

-a, --annotation-gtf <file> Path

Path to the annotation GTF file. Only ENSEMBL and GENCODE are supported. Its version must be the same as the genome and proteome FASTA.

--reference-source str

Source of reference genome and annotation.
Choices: ['GENCODE', 'ENSEMBL']

--codon-table str

Codon table. Defaults to "Standard". Supported codon tables: {'SGC5', 'Hexamita Nuclear', 'Euplotid Nuclear', 'Ascidian Mitochondrial', 'Thraustochytrium Mitochondrial', 'Standard', 'Gracilibacteria', 'SGC8', 'Condylostoma Nuclear', 'Flatworm Mitochondrial', 'SGC1', 'Protozoan Mitochondrial', 'SGC3', 'Pachysolen tannophilus Nuclear', 'Archaeal', 'Chlorophycean Mitochondrial', 'Pterobranchia Mitochondrial', 'Echinoderm Mitochondrial', 'Blastocrithidia Nuclear', 'Invertebrate Mitochondrial', 'SGC0', 'Mycoplasma', 'Yeast Mitochondrial', 'Mesodinium Nuclear', 'Vertebrate Mitochondrial', 'Bacterial', 'Trematode Mitochondrial', 'SGC4', 'Balanophoraceae Plastid', 'Blepharisma Macronuclear', 'Mold Mitochondrial', 'Karyorelict Nuclear', 'Spiroplasma', 'SGC9', 'Scenedesmus obliquus Mitochondrial', 'Plant Plastid', 'Coelenterate Mitochondrial', 'Alternative Yeast Nuclear', 'Dasycladacean Nuclear', 'Candidate Division SR1', 'SGC2', 'Cephalodiscidae Mitochondrial', 'Peritrich Nuclear', 'Ciliate Nuclear', 'Alternative Flatworm Mitochondrial'} str
Default: Standard
Choices: {'SGC5', 'Hexamita Nuclear', 'Euplotid Nuclear', 'Ascidian Mitochondrial', 'Thraustochytrium Mitochondrial', 'Standard', 'Gracilibacteria', 'SGC8', 'Condylostoma Nuclear', 'Flatworm Mitochondrial', 'SGC1', 'Protozoan Mitochondrial', 'SGC3', 'Pachysolen tannophilus Nuclear', 'Archaeal', 'Chlorophycean Mitochondrial', 'Pterobranchia Mitochondrial', 'Echinoderm Mitochondrial', 'Blastocrithidia Nuclear', 'Invertebrate Mitochondrial', 'SGC0', 'Mycoplasma', 'Yeast Mitochondrial', 'Mesodinium Nuclear', 'Vertebrate Mitochondrial', 'Bacterial', 'Trematode Mitochondrial', 'SGC4', 'Balanophoraceae Plastid', 'Blepharisma Macronuclear', 'Mold Mitochondrial', 'Karyorelict Nuclear', 'Spiroplasma', 'SGC9', 'Scenedesmus obliquus Mitochondrial', 'Plant Plastid', 'Coelenterate Mitochondrial', 'Alternative Yeast Nuclear', 'Dasycladacean Nuclear', 'Candidate Division SR1', 'SGC2', 'Cephalodiscidae Mitochondrial', 'Peritrich Nuclear', 'Ciliate Nuclear', 'Alternative Flatworm Mitochondrial'}

--chr-codon-table str

Chromosome specific codon table. Must be specified in the format of "chrM:SGC1", where "chrM" is the chromosome name and "SGC1" is the codon table to use to translate genes on chrM. Supported codon tables: {'SGC5', 'Hexamita Nuclear', 'Euplotid Nuclear', 'Ascidian Mitochondrial', 'Thraustochytrium Mitochondrial', 'Standard', 'Gracilibacteria', 'SGC8', 'Condylostoma Nuclear', 'Flatworm Mitochondrial', 'SGC1', 'Protozoan Mitochondrial', 'SGC3', 'Pachysolen tannophilus Nuclear', 'Archaeal', 'Chlorophycean Mitochondrial', 'Pterobranchia Mitochondrial', 'Echinoderm Mitochondrial', 'Blastocrithidia Nuclear', 'Invertebrate Mitochondrial', 'SGC0', 'Mycoplasma', 'Yeast Mitochondrial', 'Mesodinium Nuclear', 'Vertebrate Mitochondrial', 'Bacterial', 'Trematode Mitochondrial', 'SGC4', 'Balanophoraceae Plastid', 'Blepharisma Macronuclear', 'Mold Mitochondrial', 'Karyorelict Nuclear', 'Spiroplasma', 'SGC9', 'Scenedesmus obliquus Mitochondrial', 'Plant Plastid', 'Coelenterate Mitochondrial', 'Alternative Yeast Nuclear', 'Dasycladacean Nuclear', 'Candidate Division SR1', 'SGC2', 'Cephalodiscidae Mitochondrial', 'Peritrich Nuclear', 'Ciliate Nuclear', 'Alternative Flatworm Mitochondrial'}. By default, "SGC1" is assigned to mitochondrial chromosomes. str
Default: []

--start-codons str

Default start codon(s) to use for novel ORF translation. Defaults to ["ATG"]. str
Default: ['ATG']

--chr-start-codons str

Chromosome specific start codon(s). For example, "chrM:ATG,ATA,ATT".By defualt, mitochondrial chromosome name is automatically inferred andstart codon "ATG", "ATA", "ATT", "ATC" and "GTG" are assigned to it. str
Default: []

--index-dir <file> Path

Path to the directory of index files generated by moPepGen generateIndex. If given, --genome-fasta, --proteome-fasta and --anntotation-gtf will be ignored.

--debug-level <value|number> str

Debug level. str
Default: INFO

-q, --quiet

Quiet
Default: False