Outputs

Tool Outputs Type Description
SomaticSniper-{version}_{sample_id}_SNV.vcf.gz .vcf.gz Filtered SNV VCF (somaticsniper)
Strelka2-{version}_{sample_id}_SNV.vcf.gz .vcf.gz Filtered SNV VCF(strelka2)
Strelka2-{version}_{sample_id}_Indel.vcf.gz .vcf.gz Filtered Indel VCF (strelka2)
Mutect2-{version}_{sample_id}_SNV.vcf.gz .vcf.gz Filtered SNV VCF (mutect2)
Mutect2-{version}_{sample_id}_Indel.vcf.gz .vcf.gz Filtered Indel VCF (mutect2)
Mutect2-{version}_{sample_id}_MNV.vcf.gz .vcf.gz Filtered MNV VCF (mutect2)
Mutect2-{version}_{sample_id}_filteringStats.tsv .tsv FilterMutectCalls output (mutect2 QC)
MuSE-{version}_{sample_id}_SNV.vcf.gz .vcf.gz Filtered SNV VCF (MuSE)
report.html, timeline.html, trace.txt .html, .txt Nextflow logs
Intersect Outputs Type Description
isec-1-or-more directory BCFtools isec README.txt and sites.txt, all variants
isec-2-or-more directory BCFtools isec README.txt and sites.txt, variants shared by 2 or more tools
SomaticSniper-{version}_{sample_id}_consensus-variants.vcf.gz .vcf.gz 2-or-more SNV VCF
Strelka2-{version}_{sample_id}_consensus-variants.vcf.gz .vcf.gz 2-or-more SNV VCF
Mutect2-{version}_{sample_id}_consensus-variants.vcf.gz .vcf.gz 2-or-more SNV VCF
MuSE-{version}_{sample_id}_consensus-variants.vcf.gz .vcf.gz 2-or-more SNV VCF
BCFtools-{version}_{sample_id}_Venn-diagram.tiff .tiff Venn Diagram with intersection counts for all variants (1-or-more)
BCFtools-{version}_{sample_id}_SNV-concat.vcf.gz .vcf.gz Single SNV VCF with all 2-or-more variants and mixed annotation
BCFtools-{version}_{sample_id}_SNV-concat.maf.gz .maf.gz Single SNV MAF with all 2-or-more variants and mixed annotation

Performance Validation

Testing was performed in the Boutros Lab SLURM Development cluster. Metrics below will be updated where relevant with additional testing and tuning outputs. Pipeline version used here is v4.0.0-rc.1

Whole Exomes

General estimates, with wide variation, are that whole exome sequences require 16 cpus and 32 GB of memory to run all of the pipeline algorithms. If MuSE is excluded 8 cpus and 16 GB of memory are sufficient. Run time for a test pair of exome tumor/normal input BAMs of 4 GB/5 GB was in both cases 1 to 2 hours.

Whole Genomes

General estimates, with wide variation, are that whole genome sequences require 72 cpus and 144 GB of memory to run all of the pipeline algorithms. If MuSE is excluded 8 cpus and 16 GB of memory are sufficient, but run time could be very long. Run time for a test pair of WGS tumor/normal input BAMs of 400 GB/200 GB was 15 hours for 72 cpus/144 GB, and 52 hours for 8 cpus/16 GB excluding MuSE.

Mutect2

Duration: 3h 25m 24s

  • Process call_sSNVInAssembledChromosomes_Mutect2 has been split into 50 intervals, so the following table shows one of those processes:
process_name max_duration max_cpu max_peak_vmem
call_sSNVInNonAssembledChromosomes_Mutect2 32m 44s 142.0% 33.1 GB
call_sSNVInAssembledChromosomes_Mutect2 1h 20m 12s 123.8% 7.8 GB
run_LearnReadOrientationModel_GATK 31m 5s 106.8% 10.2 GB

SomaticSniper

Duration: 9h 21m 23s

process_name max_duration max_cpu max_peak_vmem
convert_BAM2Pileup_SAMtools 4h 18m 29s 98.2% 1.9 GB
call_sSNV_SomaticSniper 8h 48m 45s 98.7% 511.6 MB
generate_ReadCount_bam_readcount 29m 33s 75.9% 261.5 MB

Strelka2

Strelka2's runtime will be significantly improved when using --callRegions option to exclude the non-canoincal regions of the genome, here is the results of CPCG0196: Sample: CPCG0196 Normal BAM: /hot/software/pipeline/pipeline-align-DNA/Nextflow/development/outputs/bwa-mem2_and_hisat2-2.2.1/bwa-mem2/bams/a-full-CPCG0196-B1/align-DNA-20210424-024139/pipeline-alignDNA.inputs.CPCG0196-B1.bam Tumor BAM: /hot/resource/pipeline_testing_set/WGS/GRCh38/A/full/CPCG0000000196-T001-P01-F.bam

without --callRegions:
process_name max_duration max_cpu max_peak_vmem
call_sIndel_Manta 1h 24m 26s 2724.2% 23.2 GB
call_sSNV_Strelka2 22h 32m 24s 511.3% 17.4 GB
with --callRegions:
process_name max_duration max_cpu max_peak_vmem
call_sIndel_Manta 1h 35m 25s 1848.6% 11.7 GB
call_sSNV_Strelka2 59m 19s 3234.0% 8.2 GB

Therefore, we strongly suggest to use the --callRegions if the non-canonical region is unnecessary. -callRegions's input bed.gz file can be found here: /hot/ref/tool-specific-input/Strelka2/GRCh38/strelka2_call_region.bed.gz. For other genome version, you can use UCSC Liftover to convert.

MuSE v2.0

MuSE v2.0 was tested with a normal/tumor paired CPCG0196 WGS sample on a F32 slurm-dev node. Duration: 1d 11h 6m 54s

process_name max_duration max_cpu max_peak_vmem
call_sSNV_MuSE 3h 44m 15s 3181.7% 60.4 GB
run_sump_MuSE 1d 7h 22m 2s 100.0% 41.6 GB