Testing and Validation
Test Data Set
This pipeline was tested using the synthesized SMC-HET dataset as well as a multi-lane real sample CPCG0196-B1, using reference genome version GRCh38. Some benchmarking has been done comparing BWA-MEM2 v2.1, v2.0, and the original BWA. BWA-MEM2 is able to reduce approximately half of the runtime comparing to the original BWA, with the output BAM almost identical. See here for the benchmarking.
Validation \<10.0.0>
| metric | Result |
|---|---|
| raw total sequences | 1.0000000 |
| filtered sequences | NaN |
| sequences | 1.0000000 |
| is sorted | 1.0000000 |
| 1st fragments | 1.0000000 |
| last fragments | 1.0000000 |
| reads mapped | 1.0000000 |
| reads mapped and paired | 1.0000001 |
| reads unmapped | 0.9999950 |
| reads properly paired | 0.9999999 |
| reads paired | 1.0000000 |
| reads duplicated | 0.9999949 |
| reads MQ0 | 1.0000009 |
| reads QC failed | NaN |
| non-primary alignments | 0.9999757 |
| total length | 1.0000000 |
| bases mapped | 1.0000000 |
| bases mapped (cigar) | 1.0000000 |
| bases trimmed | NaN |
| bases duplicated | 0.9999958 |
| mismatches | 0.9999987 |
| error rate | 0.9999987 |
| average length | 1.0000000 |
| maximum length | 1.0000000 |
| average quality | 1.0000000 |
| insert size average | 1.0000000 |
| insert size standard deviation | 1.0000000 |
| inward oriented pairs | 0.9999991 |
| outward oriented pairs | 1.0000477 |
| pairs with other orientation | 0.9999726 |
| pairs on different chromosomes | 1.0000416 |
Validation Tool
Included is a template for validating your input files. For more information on the tool check out the following link: https://github.com/uclahs-cds/package-PipeVal