Outputs
Tool Outputs | Type | Description |
---|---|---|
SomaticSniper-{version}_{sample_id}_SNV.vcf.gz | .vcf.gz | Filtered SNV VCF (somaticsniper) |
Strelka2-{version}_{sample_id}_SNV.vcf.gz | .vcf.gz | Filtered SNV VCF(strelka2) |
Strelka2-{version}_{sample_id}_Indel.vcf.gz | .vcf.gz | Filtered Indel VCF (strelka2) |
Mutect2-{version}_{sample_id}_SNV.vcf.gz | .vcf.gz | Filtered SNV VCF (mutect2) |
Mutect2-{version}_{sample_id}_Indel.vcf.gz | .vcf.gz | Filtered Indel VCF (mutect2) |
Mutect2-{version}_{sample_id}_MNV.vcf.gz | .vcf.gz | Filtered MNV VCF (mutect2) |
Mutect2-{version}_{sample_id}_filteringStats.tsv | .tsv | FilterMutectCalls output (mutect2 QC) |
MuSE-{version}_{sample_id}_SNV.vcf.gz | .vcf.gz | Filtered SNV VCF (MuSE) |
report.html, timeline.html, trace.txt | .html, .txt | Nextflow logs |
Intersect Outputs | Type | Description |
---|---|---|
isec-1-or-more | directory | BCFtools isec README.txt and sites.txt, all variants |
isec-2-or-more | directory | BCFtools isec README.txt and sites.txt, variants shared by 2 or more tools |
SomaticSniper-{version}_{sample_id}_consensus-variants.vcf.gz | .vcf.gz | 2-or-more SNV VCF |
Strelka2-{version}_{sample_id}_consensus-variants.vcf.gz | .vcf.gz | 2-or-more SNV VCF |
Mutect2-{version}_{sample_id}_consensus-variants.vcf.gz | .vcf.gz | 2-or-more SNV VCF |
MuSE-{version}_{sample_id}_consensus-variants.vcf.gz | .vcf.gz | 2-or-more SNV VCF |
BCFtools-{version}_{sample_id}_Venn-diagram.tiff | .tiff | Venn Diagram with intersection counts for all variants (1-or-more ) |
BCFtools-{version}_{sample_id}_SNV-concat.vcf.gz | .vcf.gz | Single SNV VCF with all 2-or-more variants and mixed annotation |
BCFtools-{version}_{sample_id}_SNV-concat.maf.gz | .maf.gz | Single SNV MAF with all 2-or-more variants and mixed annotation |
Performance Validation
Testing was performed in the Boutros Lab SLURM Development cluster. Metrics below will be updated where relevant with additional testing and tuning outputs. Pipeline version used here is v4.0.0-rc.1
Whole Exomes
General estimates, with wide variation, are that whole exome sequences require 16 cpus and 32 GB of memory to run all of the pipeline algorithms. If MuSE is excluded 8 cpus and 16 GB of memory are sufficient. Run time for a test pair of exome tumor/normal input BAMs of 4 GB/5 GB was in both cases 1 to 2 hours.
Whole Genomes
General estimates, with wide variation, are that whole genome sequences require 72 cpus and 144 GB of memory to run all of the pipeline algorithms. If MuSE is excluded 8 cpus and 16 GB of memory are sufficient, but run time could be very long. Run time for a test pair of WGS tumor/normal input BAMs of 400 GB/200 GB was 15 hours for 72 cpus/144 GB, and 52 hours for 8 cpus/16 GB excluding MuSE.
Mutect2
Duration: 3h 25m 24s
- Process
call_sSNVInAssembledChromosomes_Mutect2
has been split into 50 intervals, so the following table shows one of those processes:
process_name | max_duration | max_cpu | max_peak_vmem |
---|---|---|---|
call_sSNVInNonAssembledChromosomes_Mutect2 | 32m 44s | 142.0% | 33.1 GB |
call_sSNVInAssembledChromosomes_Mutect2 | 1h 20m 12s | 123.8% | 7.8 GB |
run_LearnReadOrientationModel_GATK | 31m 5s | 106.8% | 10.2 GB |
SomaticSniper
Duration: 9h 21m 23s
process_name | max_duration | max_cpu | max_peak_vmem |
---|---|---|---|
convert_BAM2Pileup_SAMtools | 4h 18m 29s | 98.2% | 1.9 GB |
call_sSNV_SomaticSniper | 8h 48m 45s | 98.7% | 511.6 MB |
generate_ReadCount_bam_readcount | 29m 33s | 75.9% | 261.5 MB |
Strelka2
Strelka2's runtime will be significantly improved when using --callRegions
option to exclude the non-canoincal regions of the genome, here is the results of CPCG0196:
Sample: CPCG0196
Normal BAM: /hot/software/pipeline/pipeline-align-DNA/Nextflow/development/outputs/bwa-mem2_and_hisat2-2.2.1/bwa-mem2/bams/a-full-CPCG0196-B1/align-DNA-20210424-024139/pipeline-alignDNA.inputs.CPCG0196-B1.bam
Tumor BAM: /hot/resource/pipeline_testing_set/WGS/GRCh38/A/full/CPCG0000000196-T001-P01-F.bam
without --callRegions
:
process_name | max_duration | max_cpu | max_peak_vmem |
---|---|---|---|
call_sIndel_Manta | 1h 24m 26s | 2724.2% | 23.2 GB |
call_sSNV_Strelka2 | 22h 32m 24s | 511.3% | 17.4 GB |
with --callRegions
:
process_name | max_duration | max_cpu | max_peak_vmem |
---|---|---|---|
call_sIndel_Manta | 1h 35m 25s | 1848.6% | 11.7 GB |
call_sSNV_Strelka2 | 59m 19s | 3234.0% | 8.2 GB |
Therefore, we strongly suggest to use the --callRegions
if the non-canonical region is unnecessary. -callRegions
's input bed.gz
file can be found here: /hot/ref/tool-specific-input/Strelka2/GRCh38/strelka2_call_region.bed.gz
. For other genome version, you can use UCSC Liftover to convert.
MuSE v2.0
MuSE v2.0 was tested with a normal/tumor paired CPCG0196 WGS sample on a F32 slurm-dev node. Duration: 1d 11h 6m 54s
process_name | max_duration | max_cpu | max_peak_vmem |
---|---|---|---|
call_sSNV_MuSE | 3h 44m 15s | 3181.7% | 60.4 GB |
run_sump_MuSE | 1d 7h 22m 2s | 100.0% | 41.6 GB |