Inputs
Input YAML
Example:
---
patient_id: 'patient_id'
dataset_id: 'dataset_id'
input:
normal:
- path: /absolute/path/to/normal.bam
read_length: length
tumor:
- path: /absolute/path/to/tumor.bam
read_length: length
Config
Field | Type | Required | Description |
---|---|---|---|
algorithms |
list | no | List of tools to be run: ['stats', 'collectwgsmetrics', 'bamqc'], default = ['stats', 'collectwgsmetrics'] |
reference |
path | yes/no | Reference fasta is required only for CollectWgsMetrics |
output_dir |
path | yes | Not required if blcds_registered_dataset = true |
blcds_registered_dataset |
boolean | no | Default is false . Only uclahs_cds users should change this. When true , BLCDS folder structure is used |
work_dir |
path | no | Path of working directory for Nextflow. When included, Nextflow intermediate files and logs will be saved to this directory. With uclahs_cds = true , the default is /scratch and should only be changed for testing/development. Changing this directory to /hot or /tmp can lead to high server latency and potential disk space limitations, respectively. |
SAMtools specific configuration
Field | Type | Required | Description |
---|---|---|---|
remove_duplicates | boolean | no | Ignore reads marked as duplicate. default = false |
samtools_stats_additional_options | string | no | Any additional options recognized by samtools stats |
Picard specific configuration
Field | Type | Required | Description |
---|---|---|---|
cwm_coverage_cap | integer | no | Cap coverage at this value. Default = 250 |
cwm_minimum_mapping_quality | integer | no | Ignore reads with mapping quality below this value. Default = 20 |
cwm_minimum_base_quality | integer | no | Ignore bases with quality below this value. Default = 20 |
cwm_use_fast_algorithm | boolean | no | If true , fast algorithm is used |
cwm_additional_options | string | no | Any additional options recognized by CollectWgsMetrics |
Qualimap specific configuration
Field | Type | Required | Description |
---|---|---|---|
bamqc_outformat | string | no | Choice of 'pdf' or 'html', default = 'pdf' |
bamqc_additional_options | string | no | Any additional options recognized by bamqc |
Base resource allocation updaters
To update the base resource (cpus or memory) allocations for processes, use the following structure. The default allocations can be found in the node-specific config files
base_resource_update {
memory = [
[['process_name', 'process_name2'], <multiplier for resource>],
[['process_name3', 'process_name4'], <different multiplier for resource>]
]
cpus = [
[['process_name', 'process_name2'], <multiplier for resource>],
[['process_name3', 'process_name4'], <different multiplier for resource>]
]
}
Note Resource updates will be applied in the order they're provided so if a process is included twice in the memory list, it will be updated twice in the order it's given.
Examples:
- To double memory of all processes:
base_resource_update {
memory = [
[[], 2]
]
}
- To double memory for
run_CollectWgsMetrics_Picard
and triple memory forrun_stats_SAMtools
andrun_bamqc_Qualimap
:
base_resource_update {
memory = [
['run_CollectWgsMetrics_Picard', 2],
[['run_stats_SAMtools', 'run_bamqc_Qualimap'], 3]
]
}
- To double CPUs and memory for
run_CollectWgsMetrics_Picard
and double memory forrun_stats_SAMtools
:
base_resource_update {
cpus = [
['run_CollectWgsMetrics_Picard', 2]
]
memory = [
[['run_CollectWgsMetrics_Picard', 'run_stats_SAMtools'], 2]
]
}