Testing and Validation
Test Data Set
This pipeline was tested using the synthesized SMC-HET dataset as well as a multi-lane real sample CPCG0196-B1, using reference genome version GRCh38. Some benchmarking has been done comparing BWA-MEM2 v2.1, v2.0, and the original BWA. BWA-MEM2 is able to reduce approximately half of the runtime comparing to the original BWA, with the output BAM almost identical. See here for the benchmarking.
Validation \<10.0.0>
metric | Result |
---|---|
raw total sequences | 1.0000000 |
filtered sequences | NaN |
sequences | 1.0000000 |
is sorted | 1.0000000 |
1st fragments | 1.0000000 |
last fragments | 1.0000000 |
reads mapped | 1.0000000 |
reads mapped and paired | 1.0000001 |
reads unmapped | 0.9999950 |
reads properly paired | 0.9999999 |
reads paired | 1.0000000 |
reads duplicated | 0.9999949 |
reads MQ0 | 1.0000009 |
reads QC failed | NaN |
non-primary alignments | 0.9999757 |
total length | 1.0000000 |
bases mapped | 1.0000000 |
bases mapped (cigar) | 1.0000000 |
bases trimmed | NaN |
bases duplicated | 0.9999958 |
mismatches | 0.9999987 |
error rate | 0.9999987 |
average length | 1.0000000 |
maximum length | 1.0000000 |
average quality | 1.0000000 |
insert size average | 1.0000000 |
insert size standard deviation | 1.0000000 |
inward oriented pairs | 0.9999991 |
outward oriented pairs | 1.0000477 |
pairs with other orientation | 0.9999726 |
pairs on different chromosomes | 1.0000416 |
Validation Tool
Included is a template for validating your input files. For more information on the tool check out the following link: https://github.com/uclahs-cds/package-PipeVal