Pipeline Steps - Variant Calling

SomaticSniper

1. SomaticSniper v1.0.5.0

Compare a pair of tumor and normal BAM files and output an unfiltered list of single nucleotide positions that are different between tumor and normal, in VCF format.

2. Filter out ambiguous positions.

This takes several steps, listed below, and starts with the same input files given to SomaticSniper. These are used to generate a list of high confidence indels to assist SNV filtering.

a. Get indel pileup summaries

Summarize counts of reads that support reference, alternate and other alleles for given sites. This is done for both of the input BAM files and the results are used in the next step.

b. Filter indel pileup outputs

Use samtools.pl varFilter to filter each pileup output (tumor and normal), then further filter each to keep only indels with QUAL > 20. samtools.pl is packaged with SomaticSniper.

c. Filter SomaticSniper VCF

Use snpfilter.pl (packaged with SomaticSniper): i. filter VCF using normal indel pileup (from step b). ii. filter VCF output from step i using tumor indel pileup (from step b).

d. Summarize alignment information for retained SNV positions

Extract positions from filtered VCF file and use with bam-readcount to generate a summary of read alignment metrics for each position.

e. Final filtering of SNVs using metrics summarized above

Use fpfilter.pl and highconfidence.pl (packaged with SomaticSniper), resulting in a final high confidence VCF file.

Strelka2

1. Manta v1.6.0

The input pair of tumor/normal BAM files are used by Manta to produce candidate small indels via the Manta somatic configuration protocol. Note, larger (structural) variants are also produced and can be retrieved from the intermediate files directory if save intermediate files is enabled.

2. Strelka2 v2.9.10

The input pair of tumor/normal BAM files, along with the candidate small indel file produced by Manta are used by Strelka2 to create lists of somatic single nucleotide and small indel variants, both in VCF format. Lower quality variants that did not pass filtering are subsequently removed, yielding .SNV-pass.vcf.gz and .Indel-pass.vcf.gz files.

GATK Mutect2

1. Define intervals for scattering

The params.intersect_regions of the reference genome are split into x intervals for parallelization, where x is defined by params.scatter_count.

2. Call small somatic variants

Call somatic variants with Mutect2.

3. Merge

Merge scattered outputs (VCFs, statistics).

4. Learn read orientations

Create artifact prior table based on read orientations with GATK's LearnReadOrientationModel.

5. Filter

Filter variants with GATK's FilterMutectCalls, using read orientation prior table and contamination table as well as standard filters.

6. Split VCF

Split filtered VCF into separate files for each variant type: SNVs, MNVs and INDELs.

MuSE

1. MuSE call

Pre-filtering and calculating position-specific summary statistics using the Markov substitution model.

2. MuSE sump

Computes tier-based cutoffs from a sample-specific error model.

3.Filter VCF

MuSE output has SNVs labeled as PASS or one of Tier 1-5 for the lower confidence calls (Tier 5 is lowest). This step keeps only SNVs labeled PASS.