Pipeline Steps - Variant Calling
SomaticSniper
1. SomaticSniper
v1.0.5.0
Compare a pair of tumor and normal BAM files and output an unfiltered list of single nucleotide positions that are different between tumor and normal, in VCF format.
2. Filter out ambiguous positions.
This takes several steps, listed below, and starts with the same input files given to SomaticSniper
. These are used to generate a list of high confidence indels to assist SNV filtering.
a. Get indel pileup summaries
Summarize counts of reads that support reference, alternate and other alleles for given sites. This is done for both of the input BAM files and the results are used in the next step.
b. Filter indel pileup outputs
Use samtools.pl varFilter
to filter each pileup output (tumor and normal), then further filter each to keep only indels with QUAL > 20. samtools.pl
is packaged with SomaticSniper
.
c. Filter SomaticSniper VCF
Use snpfilter.pl
(packaged with SomaticSniper
):
i. filter VCF using normal indel pileup (from step b
).
ii. filter VCF output from step i
using tumor indel pileup (from step b
).
d. Summarize alignment information for retained SNV positions
Extract positions from filtered VCF file and use with bam-readcount
to generate a summary of read alignment metrics for each position.
e. Final filtering of SNVs using metrics summarized above
Use fpfilter.pl
and highconfidence.pl
(packaged with SomaticSniper), resulting in a final high confidence VCF file.
Strelka2
1. Manta
v1.6.0
The input pair of tumor/normal BAM files are used by Manta to produce candidate small indels via the Manta
somatic configuration protocol. Note, larger (structural) variants are also produced and can be retrieved from the intermediate files directory if save intermediate files is enabled.
2. Strelka2
v2.9.10
The input pair of tumor/normal BAM files, along with the candidate small indel file produced by Manta
are used by Strelka2
to create lists of somatic single nucleotide and small indel variants, both in VCF format. Lower quality variants that did not pass filtering are subsequently removed, yielding .SNV-pass.vcf.gz
and .Indel-pass.vcf.gz
files.
GATK Mutect2
1. Define intervals for scattering
The params.intersect_regions
of the reference genome are split into x intervals for parallelization, where x is defined by params.scatter_count
.
2. Call small somatic variants
Call somatic variants with Mutect2
.
3. Merge
Merge scattered outputs (VCFs, statistics).
4. Learn read orientations
Create artifact prior table based on read orientations with GATK's LearnReadOrientationModel
.
5. Filter
Filter variants with GATK's FilterMutectCalls
, using read orientation prior table and contamination table as well as standard filters.
6. Split VCF
Split filtered VCF into separate files for each variant type: SNVs, MNVs and INDELs.
MuSE
1. MuSE call
Pre-filtering and calculating position-specific summary statistics using the Markov substitution model.
2. MuSE sump
Computes tier-based cutoffs from a sample-specific error model.
3.Filter VCF
MuSE
output has SNVs labeled as PASS
or one of Tier 1-5
for the lower confidence calls (Tier 5
is lowest). This step keeps only SNVs labeled PASS
.