parseSTARFusion
parseSTARFusion
takes the identified fusion transcript results from
STAR-Fusion and save as a
GVF file. The GVF file can be later used to call variant peptides using
callVariant.
Reference Version
The version of reference genome and proteome FASTA and annotation GTF MUST be consistent across all analysis.
Usage
usage: moPepGen parseSTARFusion [-h] -i <file> -o <file>
[--min-est-j <number>] [--skip-failed]
--source SOURCE [-g <file>] [-a <file>]
[--reference-source {GENCODE,ENSEMBL}]
[--codon-table {Thraustochytrium Mitochondrial,SGC0,Gracilibacteria,Plant Plastid,SGC8,Chlorophycean Mitochondrial,Ascidian Mitochondrial,Invertebrate Mitochondrial,SGC5,SGC9,Blepharisma Macronuclear,Bacterial,SGC2,Yeast Mitochondrial,Pterobranchia Mitochondrial,Hexamita Nuclear,Echinoderm Mitochondrial,Euplotid Nuclear,Scenedesmus obliquus Mitochondrial,Pachysolen tannophilus Nuclear,Coelenterate Mitochondrial,Condylostoma Nuclear,SGC3,Protozoan Mitochondrial,SGC4,Peritrich Nuclear,Trematode Mitochondrial,Archaeal,Spiroplasma,Alternative Flatworm Mitochondrial,Mesodinium Nuclear,SGC1,Blastocrithidia Nuclear,Mold Mitochondrial,Alternative Yeast Nuclear,Standard,Flatworm Mitochondrial,Dasycladacean Nuclear,Vertebrate Mitochondrial,Karyorelict Nuclear,Balanophoraceae Plastid,Cephalodiscidae Mitochondrial,Candidate Division SR1,Ciliate Nuclear,Mycoplasma}]
[--chr-codon-table [CHR_CODON_TABLE ...]]
[--start-codons [START_CODONS ...]]
[--chr-start-codons [CHR_START_CODONS ...]]
[--index-dir [<file>]]
[--debug-level <value|number>] [-q]
Parse STAR-Fusion output to GVF format of variant records for moPepGen to call
variant peptides.
options:
-h, --help show this help message and exit
-i <file>, --input-path <file>
File path to STAR-Fusion's output file. Valid formats:
['.tsv', '.txt'] (default: None)
-o <file>, --output-path <file>
File path to the output file. Valid formats: ['.gvf']
(default: None)
--min-est-j <number> Minimal estimated junction reads to be included.
Default to not filtering by `est_J` (default: -1)
--skip-failed When set, the failed records will be skipped.
(default: False)
--source SOURCE Variant source (e.g. gSNP, sSNV, Fusion) (default:
None)
--debug-level <value|number>
Debug level. (default: INFO)
-q, --quiet Quiet (default: False)
Reference Files:
-g <file>, --genome-fasta <file>
Path to the genome assembly FASTA file. Only ENSEMBL
and GENCODE are supported. Its version must be the
same as the annotation GTF and proteome FASTA
(default: None)
-a <file>, --annotation-gtf <file>
Path to the annotation GTF file. Only ENSEMBL and
GENCODE are supported. Its version must be the same as
the genome and proteome FASTA. (default: None)
--reference-source {GENCODE,ENSEMBL}
Source of reference genome and annotation. (default:
None)
--codon-table {Thraustochytrium Mitochondrial,SGC0,Gracilibacteria,Plant Plastid,SGC8,Chlorophycean Mitochondrial,Ascidian Mitochondrial,Invertebrate Mitochondrial,SGC5,SGC9,Blepharisma Macronuclear,Bacterial,SGC2,Yeast Mitochondrial,Pterobranchia Mitochondrial,Hexamita Nuclear,Echinoderm Mitochondrial,Euplotid Nuclear,Scenedesmus obliquus Mitochondrial,Pachysolen tannophilus Nuclear,Coelenterate Mitochondrial,Condylostoma Nuclear,SGC3,Protozoan Mitochondrial,SGC4,Peritrich Nuclear,Trematode Mitochondrial,Archaeal,Spiroplasma,Alternative Flatworm Mitochondrial,Mesodinium Nuclear,SGC1,Blastocrithidia Nuclear,Mold Mitochondrial,Alternative Yeast Nuclear,Standard,Flatworm Mitochondrial,Dasycladacean Nuclear,Vertebrate Mitochondrial,Karyorelict Nuclear,Balanophoraceae Plastid,Cephalodiscidae Mitochondrial,Candidate Division SR1,Ciliate Nuclear,Mycoplasma}
Codon table. Defaults to "Standard". Supported codon
tables: {'Thraustochytrium Mitochondrial', 'SGC0',
'Gracilibacteria', 'Plant Plastid', 'SGC8',
'Chlorophycean Mitochondrial', 'Ascidian
Mitochondrial', 'Invertebrate Mitochondrial', 'SGC5',
'SGC9', 'Blepharisma Macronuclear', 'Bacterial',
'SGC2', 'Yeast Mitochondrial', 'Pterobranchia
Mitochondrial', 'Hexamita Nuclear', 'Echinoderm
Mitochondrial', 'Euplotid Nuclear', 'Scenedesmus
obliquus Mitochondrial', 'Pachysolen tannophilus
Nuclear', 'Coelenterate Mitochondrial', 'Condylostoma
Nuclear', 'SGC3', 'Protozoan Mitochondrial', 'SGC4',
'Peritrich Nuclear', 'Trematode Mitochondrial',
'Archaeal', 'Spiroplasma', 'Alternative Flatworm
Mitochondrial', 'Mesodinium Nuclear', 'SGC1',
'Blastocrithidia Nuclear', 'Mold Mitochondrial',
'Alternative Yeast Nuclear', 'Standard', 'Flatworm
Mitochondrial', 'Dasycladacean Nuclear', 'Vertebrate
Mitochondrial', 'Karyorelict Nuclear',
'Balanophoraceae Plastid', 'Cephalodiscidae
Mitochondrial', 'Candidate Division SR1', 'Ciliate
Nuclear', 'Mycoplasma'} (default: Standard)
--chr-codon-table [CHR_CODON_TABLE ...]
Chromosome specific codon table. Must be specified in
the format of "chrM:SGC1", where "chrM" is the
chromosome name and "SGC1" is the codon table to use
to translate genes on chrM. Supported codon tables:
{'Thraustochytrium Mitochondrial', 'SGC0',
'Gracilibacteria', 'Plant Plastid', 'SGC8',
'Chlorophycean Mitochondrial', 'Ascidian
Mitochondrial', 'Invertebrate Mitochondrial', 'SGC5',
'SGC9', 'Blepharisma Macronuclear', 'Bacterial',
'SGC2', 'Yeast Mitochondrial', 'Pterobranchia
Mitochondrial', 'Hexamita Nuclear', 'Echinoderm
Mitochondrial', 'Euplotid Nuclear', 'Scenedesmus
obliquus Mitochondrial', 'Pachysolen tannophilus
Nuclear', 'Coelenterate Mitochondrial', 'Condylostoma
Nuclear', 'SGC3', 'Protozoan Mitochondrial', 'SGC4',
'Peritrich Nuclear', 'Trematode Mitochondrial',
'Archaeal', 'Spiroplasma', 'Alternative Flatworm
Mitochondrial', 'Mesodinium Nuclear', 'SGC1',
'Blastocrithidia Nuclear', 'Mold Mitochondrial',
'Alternative Yeast Nuclear', 'Standard', 'Flatworm
Mitochondrial', 'Dasycladacean Nuclear', 'Vertebrate
Mitochondrial', 'Karyorelict Nuclear',
'Balanophoraceae Plastid', 'Cephalodiscidae
Mitochondrial', 'Candidate Division SR1', 'Ciliate
Nuclear', 'Mycoplasma'}. By default, "SGC1" is
assigned to mitochondrial chromosomes. (default: [])
--start-codons [START_CODONS ...]
Default start codon(s) to use for novel ORF
translation. Defaults to ["ATG"]. (default: ['ATG'])
--chr-start-codons [CHR_START_CODONS ...]
Chromosome specific start codon(s). For example,
"chrM:ATG,ATA,ATT".By defualt, mitochondrial
chromosome name is automatically inferred andstart
codon "ATG", "ATA", "ATT", "ATC" and "GTG" are
assigned to it. (default: [])
--index-dir [<file>] Path to the directory of index files generated by
moPepGen generateIndex. If given, --genome-fasta,
--proteome-fasta and --anntotation-gtf will be
ignored. (default: None)
Arguments
-h, --help
show this help message and exit
-i, --input-path <file> Path
File path to STAR-Fusion's output file. Valid formats: ['.tsv', '.txt']
-o, --output-path <file> Path
File path to the output file. Valid formats: ['.gvf']
--min-est-j <number> float
Minimal estimated junction reads to be included. Default to not filtering by `est_J`
float
Default: -1
--skip-failed
When set, the failed records will be skipped.
Default: False
--source str
Variant source (e.g. gSNP, sSNV, Fusion)
-g, --genome-fasta <file> Path
Path to the genome assembly FASTA file. Only ENSEMBL and GENCODE are supported. Its version must be the same as the annotation GTF and proteome FASTA
-a, --annotation-gtf <file> Path
Path to the annotation GTF file. Only ENSEMBL and GENCODE are supported. Its version must be the same as the genome and proteome FASTA.
--reference-source str
Source of reference genome and annotation.
Choices: ['GENCODE', 'ENSEMBL']
--codon-table str
Codon table. Defaults to "Standard". Supported codon tables: {'Thraustochytrium Mitochondrial', 'SGC0', 'Gracilibacteria', 'Plant Plastid', 'SGC8', 'Chlorophycean Mitochondrial', 'Ascidian Mitochondrial', 'Invertebrate Mitochondrial', 'SGC5', 'SGC9', 'Blepharisma Macronuclear', 'Bacterial', 'SGC2', 'Yeast Mitochondrial', 'Pterobranchia Mitochondrial', 'Hexamita Nuclear', 'Echinoderm Mitochondrial', 'Euplotid Nuclear', 'Scenedesmus obliquus Mitochondrial', 'Pachysolen tannophilus Nuclear', 'Coelenterate Mitochondrial', 'Condylostoma Nuclear', 'SGC3', 'Protozoan Mitochondrial', 'SGC4', 'Peritrich Nuclear', 'Trematode Mitochondrial', 'Archaeal', 'Spiroplasma', 'Alternative Flatworm Mitochondrial', 'Mesodinium Nuclear', 'SGC1', 'Blastocrithidia Nuclear', 'Mold Mitochondrial', 'Alternative Yeast Nuclear', 'Standard', 'Flatworm Mitochondrial', 'Dasycladacean Nuclear', 'Vertebrate Mitochondrial', 'Karyorelict Nuclear', 'Balanophoraceae Plastid', 'Cephalodiscidae Mitochondrial', 'Candidate Division SR1', 'Ciliate Nuclear', 'Mycoplasma'}
str
Default: Standard
Choices: {'Thraustochytrium Mitochondrial', 'SGC0', 'Gracilibacteria', 'Plant Plastid', 'SGC8', 'Chlorophycean Mitochondrial', 'Ascidian Mitochondrial', 'Invertebrate Mitochondrial', 'SGC5', 'SGC9', 'Blepharisma Macronuclear', 'Bacterial', 'SGC2', 'Yeast Mitochondrial', 'Pterobranchia Mitochondrial', 'Hexamita Nuclear', 'Echinoderm Mitochondrial', 'Euplotid Nuclear', 'Scenedesmus obliquus Mitochondrial', 'Pachysolen tannophilus Nuclear', 'Coelenterate Mitochondrial', 'Condylostoma Nuclear', 'SGC3', 'Protozoan Mitochondrial', 'SGC4', 'Peritrich Nuclear', 'Trematode Mitochondrial', 'Archaeal', 'Spiroplasma', 'Alternative Flatworm Mitochondrial', 'Mesodinium Nuclear', 'SGC1', 'Blastocrithidia Nuclear', 'Mold Mitochondrial', 'Alternative Yeast Nuclear', 'Standard', 'Flatworm Mitochondrial', 'Dasycladacean Nuclear', 'Vertebrate Mitochondrial', 'Karyorelict Nuclear', 'Balanophoraceae Plastid', 'Cephalodiscidae Mitochondrial', 'Candidate Division SR1', 'Ciliate Nuclear', 'Mycoplasma'}
--chr-codon-table str
Chromosome specific codon table. Must be specified in the format of "chrM:SGC1", where "chrM" is the chromosome name and "SGC1" is the codon table to use to translate genes on chrM. Supported codon tables: {'Thraustochytrium Mitochondrial', 'SGC0', 'Gracilibacteria', 'Plant Plastid', 'SGC8', 'Chlorophycean Mitochondrial', 'Ascidian Mitochondrial', 'Invertebrate Mitochondrial', 'SGC5', 'SGC9', 'Blepharisma Macronuclear', 'Bacterial', 'SGC2', 'Yeast Mitochondrial', 'Pterobranchia Mitochondrial', 'Hexamita Nuclear', 'Echinoderm Mitochondrial', 'Euplotid Nuclear', 'Scenedesmus obliquus Mitochondrial', 'Pachysolen tannophilus Nuclear', 'Coelenterate Mitochondrial', 'Condylostoma Nuclear', 'SGC3', 'Protozoan Mitochondrial', 'SGC4', 'Peritrich Nuclear', 'Trematode Mitochondrial', 'Archaeal', 'Spiroplasma', 'Alternative Flatworm Mitochondrial', 'Mesodinium Nuclear', 'SGC1', 'Blastocrithidia Nuclear', 'Mold Mitochondrial', 'Alternative Yeast Nuclear', 'Standard', 'Flatworm Mitochondrial', 'Dasycladacean Nuclear', 'Vertebrate Mitochondrial', 'Karyorelict Nuclear', 'Balanophoraceae Plastid', 'Cephalodiscidae Mitochondrial', 'Candidate Division SR1', 'Ciliate Nuclear', 'Mycoplasma'}. By default, "SGC1" is assigned to mitochondrial chromosomes.
str
Default: []
--start-codons str
Default start codon(s) to use for novel ORF translation. Defaults to ["ATG"].
str
Default: ['ATG']
--chr-start-codons str
Chromosome specific start codon(s). For example, "chrM:ATG,ATA,ATT".By defualt, mitochondrial chromosome name is automatically inferred andstart codon "ATG", "ATA", "ATT", "ATC" and "GTG" are assigned to it.
str
Default: []
--index-dir <file> Path
Path to the directory of index files generated by moPepGen generateIndex. If given, --genome-fasta, --proteome-fasta and --anntotation-gtf will be ignored.
--debug-level <value|number> str
Debug level.
str
Default: INFO
-q, --quiet
Quiet
Default: False