Inputs

Input YAML

Example:

---
patient_id: 'patient_id'
dataset_id: 'dataset_id'
input:
  normal:
    - path: /absolute/path/to/normal.bam
      read_length: length
  tumor:
    - path: /absolute/path/to/tumor.bam
      read_length: length

Config

Field	Type	Required	Description
`algorithm`	list	no	List of tools to be run: ['fastqc', 'samtools_stats', 'collectwgsmetrics', 'collecthsmetrics', 'mosdepth_coverage', 'mosdepth_quantize', 'qualimap_bamqc'], default = ['stats', 'collectwgsmetrics']
`reference`	path	yes/no	Reference fasta is required only for `CollectWgsMetrics` and `CollectHsMetrics`
`intervals_bed`	path	no	Absolute path to BED file with intervals to process
`output_dir`	path	yes	Not required if `blcds_registered_dataset` = `true`
`blcds_registered_dataset`	boolean	no	Default is `false`. Only `uclahs_cds` users should change this. When `true`, BLCDS folder structure is used
`work_dir`	path	no	Path of working directory for Nextflow. When included, Nextflow intermediate files and logs will be saved to this directory. With `uclahs_cds` = `true`, the default is `/scratch` and should only be changed for testing/development. Changing this directory to `/hot` or `/tmp` can lead to high server latency and potential disk space limitations, respectively.

SAMtools specific configuration

Field	Type	Required	Description
stats_max_rgs_per_sample	integer	no	If a sample has more than this number of readgroups, `SAMtools stats` will not run per readgroup analysis. Default = 20
stats_max_libs_per_sample	integer	no	If a sample has more than this number of libraries, `SAMtools stats` will not run per library analysis. Default = 20
stats_remove_duplicates	boolean	no	Ignore reads marked as duplicate. Default = `false`
stats_additional_options	string	no	Any additional options recognized by `samtools stats`

Picard CollectWgsMetrics specific configuration

Field	Type	Required	Description
cwm_coverage_cap	integer	no	Cap coverage at this value. Default = 250
cwm_minimum_mapping_quality	integer	no	Ignore reads with mapping quality below this value. Default = 20
cwm_minimum_base_quality	integer	no	Ignore bases with quality below this value. Default = 20
cwm_use_fast_algorithm	boolean	no	If `true`, fast algorithm is used
cwm_additional_options	string	no	Any additional options recognized by `CollectWgsMetrics`

Picard CollectHsMetrics specific configuration

Field	Type	Required	Description
chm_bait_intervals_bed	path	no	if not defined, `intervals_bed` will be used
chm_coverage_cap	integer	no	Cap coverage at this value. Default = 250
chm_minimum_mapping_quality	integer	no	Ignore reads with mapping quality below this value. Default = 20
chm_minimum_base_quality	integer	no	Ignore bases with quality below this value. Default = 20
chm_per_base_output	boolean	no	Default = `false`
chm_additional_options	string	no	Any additional options recognized by `CollectWgsMetrics`

FastQC specific configuration

Field	Type	Required	Description
fastqc_level	string	yes	'readgroup', 'library' or 'sample'
fastqc_additional_options	string	no	Any additional options recognized by `FastQC`

Qualimap specific configuration

Field	Type	Required	Description
bamqc_output_format	string	no	Choice of 'pdf' or 'html', default = 'html'. `html` is needed for `multiqc`
bamqc_additional_options	string	no	Any additional options recognized by `bamqc`

mosdepth window-based coverage specific configuration

Field	Type	Required	Description
mosdepth_windows	integer	no	Size for `mosdepth windows` coverage calculations. Not used if `intervals_bed` is defined. Default = 500
mosdepth_use_fast_algorithm	boolean	no	`fast` algorithm ignores read pair overlaps and CIGARs. It should not be used on libraries with small insert sizes. Default = `false`
mosdepth_per_base_output	boolean	no	Output coverage for every base. Default = `true`
mosdepth_additional_options	string	no	Any additional options recognized by `mosdepth`, `--mapq 20 recommended`

mosdepth quantize specific configuration

Field	Type	Required	Description
mosdepth_quantize_cutoffs	string	no	cutoffs for coverage regions. Default = `0:1:5:150`
mosdepth_quantize_use_fast_algorithm	boolean	no	`fast` algorithm ignores read pair overlaps and CIGARs. It should not be used on libraries with small insert sizes. Default = `false`
mosdepth_q0_label	string	no	lowest coverage regions label. Default = `Q0`
mosdepth_q1_label	string	no	next coverage regions label. Default = `Q1`
mosdepth_q2_label	string	no	next coverage regions label. Default = `Q2`
mosdepth_q3_label	string	no	highest coverage regions label. Default = `Q3`
mosdepth_quantize_additional_options	string	no	Any additional options recognized by `mosdepth`. `--mapq 20 recommended`

Base resource allocation updaters

To update the base resource (cpus or memory) allocations for processes, use the following structure. The default allocations can be found in the node-specific config files

base_resource_update {
    memory = [
        [['process_name', 'process_name2'], <multiplier for resource>],
        [['process_name3', 'process_name4'], <different multiplier for resource>]
    ]
    cpus = [
        [['process_name', 'process_name2'], <multiplier for resource>],
        [['process_name3', 'process_name4'], <different multiplier for resource>]
    ]
}

Note Resource updates will be applied in the order they're provided so if a process is included twice in the memory list, it will be updated twice in the order it's given.

Examples:

To double memory of all processes:

base_resource_update {
    memory = [
        [[], 2]
    ]
}

To double memory for run_CollectWgsMetrics_Picard and triple memory for run_statsSamples_SAMtools and run_bamqc_Qualimap:

base_resource_update {
    memory = [
        ['run_CollectWgsMetrics_Picard', 2],
        [['run_statsSamples_SAMtools', 'run_bamqc_Qualimap'], 3]
    ]
}

To double CPUs and memory for run_CollectWgsMetrics_Picard and double memory for run_statsSamples_SAMtools:

base_resource_update {
    cpus = [
        ['run_CollectWgsMetrics_Picard', 2]
    ]
    memory = [
        [['run_CollectWgsMetrics_Picard', 'run_statsSamples_SAMtools'], 2]
    ]
}