Example:
---
patient_id: 'patient_id'
dataset_id: 'dataset_id'
input:
normal:
- path: /absolute/path/to/normal.bam
read_length: length
tumor:
- path: /absolute/path/to/tumor.bam
read_length: length
Config
| Field |
Type |
Required |
Description |
algorithm |
list |
no |
List of tools to be run: ['fastqc', 'samtools_stats', 'collectwgsmetrics', 'collecthsmetrics', 'mosdepth_coverage', 'mosdepth_quantize', 'qualimap_bamqc'], default = ['stats', 'collectwgsmetrics'] |
reference |
path |
yes/no |
Reference fasta is required only for CollectWgsMetrics and CollectHsMetrics |
intervals_bed |
path |
no |
Absolute path to BED file with intervals to process |
output_dir |
path |
yes |
Not required if blcds_registered_dataset = true |
blcds_registered_dataset |
boolean |
no |
Default is false. Only uclahs_cds users should change this. When true, BLCDS folder structure is used |
work_dir |
path |
no |
Path of working directory for Nextflow. When included, Nextflow intermediate files and logs will be saved to this directory. With uclahs_cds = true, the default is /scratch and should only be changed for testing/development. Changing this directory to /hot or /tmp can lead to high server latency and potential disk space limitations, respectively. |
| Field |
Type |
Required |
Description |
| stats_max_rgs_per_sample |
integer |
no |
If a sample has more than this number of readgroups, SAMtools stats will not run per readgroup analysis. Default = 20 |
| stats_max_libs_per_sample |
integer |
no |
If a sample has more than this number of libraries, SAMtools stats will not run per library analysis. Default = 20 |
| stats_remove_duplicates |
boolean |
no |
Ignore reads marked as duplicate. Default = false |
| stats_additional_options |
string |
no |
Any additional options recognized by samtools stats |
Picard CollectWgsMetrics specific configuration
| Field |
Type |
Required |
Description |
| cwm_coverage_cap |
integer |
no |
Cap coverage at this value. Default = 250 |
| cwm_minimum_mapping_quality |
integer |
no |
Ignore reads with mapping quality below this value. Default = 20 |
| cwm_minimum_base_quality |
integer |
no |
Ignore bases with quality below this value. Default = 20 |
| cwm_use_fast_algorithm |
boolean |
no |
If true, fast algorithm is used |
| cwm_additional_options |
string |
no |
Any additional options recognized by CollectWgsMetrics |
Picard CollectHsMetrics specific configuration
| Field |
Type |
Required |
Description |
| chm_bait_intervals_bed |
path |
no |
if not defined, intervals_bed will be used |
| chm_coverage_cap |
integer |
no |
Cap coverage at this value. Default = 250 |
| chm_minimum_mapping_quality |
integer |
no |
Ignore reads with mapping quality below this value. Default = 20 |
| chm_minimum_base_quality |
integer |
no |
Ignore bases with quality below this value. Default = 20 |
| chm_per_base_output |
boolean |
no |
Default = false |
| chm_additional_options |
string |
no |
Any additional options recognized by CollectWgsMetrics |
FastQC specific configuration
| Field |
Type |
Required |
Description |
| fastqc_level |
string |
yes |
'readgroup', 'library' or 'sample' |
| fastqc_additional_options |
string |
no |
Any additional options recognized by FastQC |
Qualimap specific configuration
| Field |
Type |
Required |
Description |
| bamqc_output_format |
string |
no |
Choice of 'pdf' or 'html', default = 'html'. html is needed for multiqc |
| bamqc_additional_options |
string |
no |
Any additional options recognized by bamqc |
mosdepth window-based coverage specific configuration
| Field |
Type |
Required |
Description |
| mosdepth_windows |
integer |
no |
Size for mosdepth windows coverage calculations. Not used if intervals_bed is defined. Default = 500 |
| mosdepth_use_fast_algorithm |
boolean |
no |
fast algorithm ignores read pair overlaps and CIGARs. It should not be used on libraries with small insert sizes. Default = false |
| mosdepth_per_base_output |
boolean |
no |
Output coverage for every base. Default = true |
| mosdepth_additional_options |
string |
no |
Any additional options recognized by mosdepth, --mapq 20 recommended |
mosdepth quantize specific configuration
| Field |
Type |
Required |
Description |
| mosdepth_quantize_cutoffs |
string |
no |
cutoffs for coverage regions. Default = 0:1:5:150 |
| mosdepth_quantize_use_fast_algorithm |
boolean |
no |
fast algorithm ignores read pair overlaps and CIGARs. It should not be used on libraries with small insert sizes. Default = false |
| mosdepth_q0_label |
string |
no |
lowest coverage regions label. Default = Q0 |
| mosdepth_q1_label |
string |
no |
next coverage regions label. Default = Q1 |
| mosdepth_q2_label |
string |
no |
next coverage regions label. Default = Q2 |
| mosdepth_q3_label |
string |
no |
highest coverage regions label. Default = Q3 |
| mosdepth_quantize_additional_options |
string |
no |
Any additional options recognized by mosdepth. --mapq 20 recommended |
Base resource allocation updaters
To update the base resource (cpus or memory) allocations for processes, use the following structure. The default allocations can be found in the node-specific config files
base_resource_update {
memory = [
[['process_name', 'process_name2'], <multiplier for resource>],
[['process_name3', 'process_name4'], <different multiplier for resource>]
]
cpus = [
[['process_name', 'process_name2'], <multiplier for resource>],
[['process_name3', 'process_name4'], <different multiplier for resource>]
]
}
Note Resource updates will be applied in the order they're provided so if a process is included twice in the memory list, it will be updated twice in the order it's given.
Examples:
- To double memory of all processes:
base_resource_update {
memory = [
[[], 2]
]
}
- To double memory for
run_CollectWgsMetrics_Picard and triple memory for run_statsSamples_SAMtools and run_bamqc_Qualimap:
base_resource_update {
memory = [
['run_CollectWgsMetrics_Picard', 2],
[['run_statsSamples_SAMtools', 'run_bamqc_Qualimap'], 3]
]
}
- To double CPUs and memory for
run_CollectWgsMetrics_Picard and double memory for run_statsSamples_SAMtools:
base_resource_update {
cpus = [
['run_CollectWgsMetrics_Picard', 2]
]
memory = [
[['run_CollectWgsMetrics_Picard', 'run_statsSamples_SAMtools'], 2]
]
}