Overview
The align-DNA Nextflow pipeline, aligns paired-end data utilizing BWA-MEM2 and/or HISAT2, Picard Tools and SAMtools. The pipeline has been engineered to run in a 4 layer stack in a cloud-based scalable environment of CycleCloud, Slurm, Nextflow and Docker. Additionally, it has been validated with the SMC-HET dataset and reference GRCh38, where paired-end fastq’s were created with BAM Surgeon.
The pipeline should be run WITH A SINGLE SAMPLE AT A TIME. Otherwise resource allocation and Nextflow errors could cause the pipeline to fail.
Developer's Notes:
For some reads with low mapping qualities, BWA-MEM2 assigns them to different genomic positions when using different CPU-numbers. If you want to 100% reproduce a run, the same CPU-number (
bwa_mem_number_of_cpus
) needs to be set.BWA-MEM2 now only supports five CPU instruction set, AVX, AVX2, AVX512, SSE4.1 and SSE4.2. However we only tested the pipeline on AVX2 and AVX512 CPUs.
We performed a benchmarking on our SLURM cluster. Using 56 CPUs for alignment (
bwa_mem_number_of_cpus
) gives it the best performance. See Testing and Validation.