Testing and Validation

Test Data Set

This pipeline was tested using the synthesized SMC-HET dataset as well as a multi-lane real sample CPCG0196-B1, using reference genome version GRCh38. Some benchmarking has been done comparing BWA-MEM2 v2.1, v2.0, and the original BWA. BWA-MEM2 is able to reduce approximately half of the runtime comparing to the original BWA, with the output BAM almost identical. See here for the benchmarking.

Validation \<10.0.0>

metric	Result
raw total sequences	1.0000000
filtered sequences	NaN
sequences	1.0000000
is sorted	1.0000000
1st fragments	1.0000000
last fragments	1.0000000
reads mapped	1.0000000
reads mapped and paired	1.0000001
reads unmapped	0.9999950
reads properly paired	0.9999999
reads paired	1.0000000
reads duplicated	0.9999949
reads MQ0	1.0000009
reads QC failed	NaN
non-primary alignments	0.9999757
total length	1.0000000
bases mapped	1.0000000
bases mapped (cigar)	1.0000000
bases trimmed	NaN
bases duplicated	0.9999958
mismatches	0.9999987
error rate	0.9999987
average length	1.0000000
maximum length	1.0000000
average quality	1.0000000
insert size average	1.0000000
insert size standard deviation	1.0000000
inward oriented pairs	0.9999991
outward oriented pairs	1.0000477
pairs with other orientation	0.9999726
pairs on different chromosomes	1.0000416

Validation Tool

Included is a template for validating your input files. For more information on the tool check out the following link: https://github.com/uclahs-cds/package-PipeVal