parseFusionCatcher

parseFusionCatcher takes the identified fusion transcript results from FusionCatcher and save as a GVF file. The GVF file can be later used to call variant peptides using callVariant.

Reference Version

The version of reference genome and proteome FASTA and annotation GTF MUST be consistent across all analysis.

Usage

usage: moPepGen parseFusionCatcher [-h] -i <file> -o <file>
                                   [--max-common-mapping <number>]
                                   [--min-spanning-unique <number>]
                                   [--skip-failed] --source SOURCE [-g <file>]
                                   [-a <file>]
                                   [--reference-source {GENCODE,ENSEMBL}]
                                   [--codon-table {Standard,Alternative Flatworm Mitochondrial,Trematode Mitochondrial,SGC8,SGC3,Protozoan Mitochondrial,Ciliate Nuclear,Gracilibacteria,Spiroplasma,Dasycladacean Nuclear,Invertebrate Mitochondrial,Balanophoraceae Plastid,Peritrich Nuclear,Mesodinium Nuclear,SGC5,Candidate Division SR1,Blastocrithidia Nuclear,SGC1,Bacterial,Alternative Yeast Nuclear,Yeast Mitochondrial,Scenedesmus obliquus Mitochondrial,Plant Plastid,Flatworm Mitochondrial,SGC2,Archaeal,Mycoplasma,Euplotid Nuclear,SGC9,Mold Mitochondrial,Thraustochytrium Mitochondrial,Hexamita Nuclear,Coelenterate Mitochondrial,Chlorophycean Mitochondrial,Pachysolen tannophilus Nuclear,Ascidian Mitochondrial,SGC0,Blepharisma Macronuclear,Karyorelict Nuclear,SGC4,Echinoderm Mitochondrial,Condylostoma Nuclear,Vertebrate Mitochondrial,Pterobranchia Mitochondrial,Cephalodiscidae Mitochondrial}]
                                   [--chr-codon-table [CHR_CODON_TABLE ...]]
                                   [--start-codons [START_CODONS ...]]
                                   [--chr-start-codons [CHR_START_CODONS ...]]
                                   [--index-dir [<file>]]
                                   [--debug-level <value|number>] [-q]

Parse the FusionCatcher result to GVF format of variant records for moPepGen
to call variant peptides. The genome

options:
  -h, --help            show this help message and exit
  -i <file>, --input-path <file>
                        File path to FusionCatcher's output TSV file. Valid
                        formats: ['.tsv', '.txt'] (default: None)
  -o <file>, --output-path <file>
                        File path to the output file. Valid formats: ['.gvf']
                        (default: None)
  --max-common-mapping <number>
                        Maximal number of common mapping reads. (default: 0)
  --min-spanning-unique <number>
                        Minimal spanning unique reads. (default: 5)
  --skip-failed         When set, the failed records will be skipped.
                        (default: False)
  --source SOURCE       Variant source (e.g. gSNP, sSNV, Fusion) (default:
                        None)
  --debug-level <value|number>
                        Debug level. (default: INFO)
  -q, --quiet           Quiet (default: False)

Reference Files:
  -g <file>, --genome-fasta <file>
                        Path to the genome assembly FASTA file. Only ENSEMBL
                        and GENCODE are supported. Its version must be the
                        same as the annotation GTF and proteome FASTA
                        (default: None)
  -a <file>, --annotation-gtf <file>
                        Path to the annotation GTF file. Only ENSEMBL and
                        GENCODE are supported. Its version must be the same as
                        the genome and proteome FASTA. (default: None)
  --reference-source {GENCODE,ENSEMBL}
                        Source of reference genome and annotation. (default:
                        None)
  --codon-table {Standard,Alternative Flatworm Mitochondrial,Trematode Mitochondrial,SGC8,SGC3,Protozoan Mitochondrial,Ciliate Nuclear,Gracilibacteria,Spiroplasma,Dasycladacean Nuclear,Invertebrate Mitochondrial,Balanophoraceae Plastid,Peritrich Nuclear,Mesodinium Nuclear,SGC5,Candidate Division SR1,Blastocrithidia Nuclear,SGC1,Bacterial,Alternative Yeast Nuclear,Yeast Mitochondrial,Scenedesmus obliquus Mitochondrial,Plant Plastid,Flatworm Mitochondrial,SGC2,Archaeal,Mycoplasma,Euplotid Nuclear,SGC9,Mold Mitochondrial,Thraustochytrium Mitochondrial,Hexamita Nuclear,Coelenterate Mitochondrial,Chlorophycean Mitochondrial,Pachysolen tannophilus Nuclear,Ascidian Mitochondrial,SGC0,Blepharisma Macronuclear,Karyorelict Nuclear,SGC4,Echinoderm Mitochondrial,Condylostoma Nuclear,Vertebrate Mitochondrial,Pterobranchia Mitochondrial,Cephalodiscidae Mitochondrial}
                        Codon table. Defaults to "Standard". Supported codon
                        tables: {'Standard', 'Alternative Flatworm
                        Mitochondrial', 'Trematode Mitochondrial', 'SGC8',
                        'SGC3', 'Protozoan Mitochondrial', 'Ciliate Nuclear',
                        'Gracilibacteria', 'Spiroplasma', 'Dasycladacean
                        Nuclear', 'Invertebrate Mitochondrial',
                        'Balanophoraceae Plastid', 'Peritrich Nuclear',
                        'Mesodinium Nuclear', 'SGC5', 'Candidate Division
                        SR1', 'Blastocrithidia Nuclear', 'SGC1', 'Bacterial',
                        'Alternative Yeast Nuclear', 'Yeast Mitochondrial',
                        'Scenedesmus obliquus Mitochondrial', 'Plant Plastid',
                        'Flatworm Mitochondrial', 'SGC2', 'Archaeal',
                        'Mycoplasma', 'Euplotid Nuclear', 'SGC9', 'Mold
                        Mitochondrial', 'Thraustochytrium Mitochondrial',
                        'Hexamita Nuclear', 'Coelenterate Mitochondrial',
                        'Chlorophycean Mitochondrial', 'Pachysolen tannophilus
                        Nuclear', 'Ascidian Mitochondrial', 'SGC0',
                        'Blepharisma Macronuclear', 'Karyorelict Nuclear',
                        'SGC4', 'Echinoderm Mitochondrial', 'Condylostoma
                        Nuclear', 'Vertebrate Mitochondrial', 'Pterobranchia
                        Mitochondrial', 'Cephalodiscidae Mitochondrial'}
                        (default: Standard)
  --chr-codon-table [CHR_CODON_TABLE ...]
                        Chromosome specific codon table. Must be specified in
                        the format of "chrM:SGC1", where "chrM" is the
                        chromosome name and "SGC1" is the codon table to use
                        to translate genes on chrM. Supported codon tables:
                        {'Standard', 'Alternative Flatworm Mitochondrial',
                        'Trematode Mitochondrial', 'SGC8', 'SGC3', 'Protozoan
                        Mitochondrial', 'Ciliate Nuclear', 'Gracilibacteria',
                        'Spiroplasma', 'Dasycladacean Nuclear', 'Invertebrate
                        Mitochondrial', 'Balanophoraceae Plastid', 'Peritrich
                        Nuclear', 'Mesodinium Nuclear', 'SGC5', 'Candidate
                        Division SR1', 'Blastocrithidia Nuclear', 'SGC1',
                        'Bacterial', 'Alternative Yeast Nuclear', 'Yeast
                        Mitochondrial', 'Scenedesmus obliquus Mitochondrial',
                        'Plant Plastid', 'Flatworm Mitochondrial', 'SGC2',
                        'Archaeal', 'Mycoplasma', 'Euplotid Nuclear', 'SGC9',
                        'Mold Mitochondrial', 'Thraustochytrium
                        Mitochondrial', 'Hexamita Nuclear', 'Coelenterate
                        Mitochondrial', 'Chlorophycean Mitochondrial',
                        'Pachysolen tannophilus Nuclear', 'Ascidian
                        Mitochondrial', 'SGC0', 'Blepharisma Macronuclear',
                        'Karyorelict Nuclear', 'SGC4', 'Echinoderm
                        Mitochondrial', 'Condylostoma Nuclear', 'Vertebrate
                        Mitochondrial', 'Pterobranchia Mitochondrial',
                        'Cephalodiscidae Mitochondrial'}. By default, "SGC1"
                        is assigned to mitochondrial chromosomes. (default:
                        [])
  --start-codons [START_CODONS ...]
                        Default start codon(s) to use for novel ORF
                        translation. Defaults to ["ATG"]. (default: ['ATG'])
  --chr-start-codons [CHR_START_CODONS ...]
                        Chromosome specific start codon(s). For example,
                        "chrM:ATG,ATA,ATT".By defualt, mitochondrial
                        chromosome name is automatically inferred andstart
                        codon "ATG", "ATA", "ATT", "ATC" and "GTG" are
                        assigned to it. (default: [])
  --index-dir [<file>]  Path to the directory of index files generated by
                        moPepGen generateIndex. If given, --genome-fasta,
                        --proteome-fasta and --anntotation-gtf will be
                        ignored. (default: None)

Arguments

-h, --help

show this help message and exit

-i, --input-path <file> Path

File path to FusionCatcher's output TSV file. Valid formats: ['.tsv', '.txt']

-o, --output-path <file> Path

File path to the output file. Valid formats: ['.gvf']

--max-common-mapping <number> int

Maximal number of common mapping reads. int
Default: 0

--min-spanning-unique <number> int

Minimal spanning unique reads. int
Default: 5

--skip-failed

When set, the failed records will be skipped.
Default: False

--source str

Variant source (e.g. gSNP, sSNV, Fusion)

-g, --genome-fasta <file> Path

Path to the genome assembly FASTA file. Only ENSEMBL and GENCODE are supported. Its version must be the same as the annotation GTF and proteome FASTA

-a, --annotation-gtf <file> Path

Path to the annotation GTF file. Only ENSEMBL and GENCODE are supported. Its version must be the same as the genome and proteome FASTA.

--reference-source str

Source of reference genome and annotation.
Choices: ['GENCODE', 'ENSEMBL']

--codon-table str

Codon table. Defaults to "Standard". Supported codon tables: {'Standard', 'Alternative Flatworm Mitochondrial', 'Trematode Mitochondrial', 'SGC8', 'SGC3', 'Protozoan Mitochondrial', 'Ciliate Nuclear', 'Gracilibacteria', 'Spiroplasma', 'Dasycladacean Nuclear', 'Invertebrate Mitochondrial', 'Balanophoraceae Plastid', 'Peritrich Nuclear', 'Mesodinium Nuclear', 'SGC5', 'Candidate Division SR1', 'Blastocrithidia Nuclear', 'SGC1', 'Bacterial', 'Alternative Yeast Nuclear', 'Yeast Mitochondrial', 'Scenedesmus obliquus Mitochondrial', 'Plant Plastid', 'Flatworm Mitochondrial', 'SGC2', 'Archaeal', 'Mycoplasma', 'Euplotid Nuclear', 'SGC9', 'Mold Mitochondrial', 'Thraustochytrium Mitochondrial', 'Hexamita Nuclear', 'Coelenterate Mitochondrial', 'Chlorophycean Mitochondrial', 'Pachysolen tannophilus Nuclear', 'Ascidian Mitochondrial', 'SGC0', 'Blepharisma Macronuclear', 'Karyorelict Nuclear', 'SGC4', 'Echinoderm Mitochondrial', 'Condylostoma Nuclear', 'Vertebrate Mitochondrial', 'Pterobranchia Mitochondrial', 'Cephalodiscidae Mitochondrial'} str
Default: Standard
Choices: {'Standard', 'Alternative Flatworm Mitochondrial', 'Trematode Mitochondrial', 'SGC8', 'SGC3', 'Protozoan Mitochondrial', 'Ciliate Nuclear', 'Gracilibacteria', 'Spiroplasma', 'Dasycladacean Nuclear', 'Invertebrate Mitochondrial', 'Balanophoraceae Plastid', 'Peritrich Nuclear', 'Mesodinium Nuclear', 'SGC5', 'Candidate Division SR1', 'Blastocrithidia Nuclear', 'SGC1', 'Bacterial', 'Alternative Yeast Nuclear', 'Yeast Mitochondrial', 'Scenedesmus obliquus Mitochondrial', 'Plant Plastid', 'Flatworm Mitochondrial', 'SGC2', 'Archaeal', 'Mycoplasma', 'Euplotid Nuclear', 'SGC9', 'Mold Mitochondrial', 'Thraustochytrium Mitochondrial', 'Hexamita Nuclear', 'Coelenterate Mitochondrial', 'Chlorophycean Mitochondrial', 'Pachysolen tannophilus Nuclear', 'Ascidian Mitochondrial', 'SGC0', 'Blepharisma Macronuclear', 'Karyorelict Nuclear', 'SGC4', 'Echinoderm Mitochondrial', 'Condylostoma Nuclear', 'Vertebrate Mitochondrial', 'Pterobranchia Mitochondrial', 'Cephalodiscidae Mitochondrial'}

--chr-codon-table str

Chromosome specific codon table. Must be specified in the format of "chrM:SGC1", where "chrM" is the chromosome name and "SGC1" is the codon table to use to translate genes on chrM. Supported codon tables: {'Standard', 'Alternative Flatworm Mitochondrial', 'Trematode Mitochondrial', 'SGC8', 'SGC3', 'Protozoan Mitochondrial', 'Ciliate Nuclear', 'Gracilibacteria', 'Spiroplasma', 'Dasycladacean Nuclear', 'Invertebrate Mitochondrial', 'Balanophoraceae Plastid', 'Peritrich Nuclear', 'Mesodinium Nuclear', 'SGC5', 'Candidate Division SR1', 'Blastocrithidia Nuclear', 'SGC1', 'Bacterial', 'Alternative Yeast Nuclear', 'Yeast Mitochondrial', 'Scenedesmus obliquus Mitochondrial', 'Plant Plastid', 'Flatworm Mitochondrial', 'SGC2', 'Archaeal', 'Mycoplasma', 'Euplotid Nuclear', 'SGC9', 'Mold Mitochondrial', 'Thraustochytrium Mitochondrial', 'Hexamita Nuclear', 'Coelenterate Mitochondrial', 'Chlorophycean Mitochondrial', 'Pachysolen tannophilus Nuclear', 'Ascidian Mitochondrial', 'SGC0', 'Blepharisma Macronuclear', 'Karyorelict Nuclear', 'SGC4', 'Echinoderm Mitochondrial', 'Condylostoma Nuclear', 'Vertebrate Mitochondrial', 'Pterobranchia Mitochondrial', 'Cephalodiscidae Mitochondrial'}. By default, "SGC1" is assigned to mitochondrial chromosomes. str
Default: []

--start-codons str

Default start codon(s) to use for novel ORF translation. Defaults to ["ATG"]. str
Default: ['ATG']

--chr-start-codons str

Chromosome specific start codon(s). For example, "chrM:ATG,ATA,ATT".By defualt, mitochondrial chromosome name is automatically inferred andstart codon "ATG", "ATA", "ATT", "ATC" and "GTG" are assigned to it. str
Default: []

--index-dir <file> Path

Path to the directory of index files generated by moPepGen generateIndex. If given, --genome-fasta, --proteome-fasta and --anntotation-gtf will be ignored.

--debug-level <value|number> str

Debug level. str
Default: INFO

-q, --quiet

Quiet
Default: False