Overview

The align-DNA Nextflow pipeline, aligns paired-end data utilizing BWA-MEM2 and/or HISAT2, Picard Tools and SAMtools. The pipeline has been engineered to run in a 4 layer stack in a cloud-based scalable environment of CycleCloud, Slurm, Nextflow and Docker. Additionally, it has been validated with the SMC-HET dataset and reference GRCh38, where paired-end fastq’s were created with BAM Surgeon.

The pipeline should be run WITH A SINGLE SAMPLE AT A TIME. Otherwise resource allocation and Nextflow errors could cause the pipeline to fail.

Developer's Notes:

For some reads with low mapping qualities, BWA-MEM2 assigns them to different genomic positions when using different CPU-numbers. If you want to 100% reproduce a run, the same CPU-number (bwa_mem_number_of_cpus) needs to be set.

BWA-MEM2 now only supports five CPU instruction set, AVX, AVX2, AVX512, SSE4.1 and SSE4.2. However we only tested the pipeline on AVX2 and AVX512 CPUs.

We performed a benchmarking on our SLURM cluster. Using 56 CPUs for alignment (bwa_mem_number_of_cpus) gives it the best performance. See Testing and Validation.