Inputs

Input YAML

Example:

---
patient_id: 'patient_id'
dataset_id: 'dataset_id'
input:
  normal:
    - path: /absolute/path/to/normal.bam
      read_length: length
  tumor:
    - path: /absolute/path/to/tumor.bam
      read_length: length

Config

Field Type Required Description
algorithms list no List of tools to be run: ['stats', 'collectwgsmetrics', 'bamqc'], default = ['stats', 'collectwgsmetrics']
reference path yes/no Reference fasta is required only for CollectWgsMetrics
output_dir path yes Not required if blcds_registered_dataset = true
blcds_registered_dataset boolean no Default is false. Only uclahs_cds users should change this. When true, BLCDS folder structure is used
work_dir path no Path of working directory for Nextflow. When included, Nextflow intermediate files and logs will be saved to this directory. With uclahs_cds = true, the default is /scratch and should only be changed for testing/development. Changing this directory to /hot or /tmp can lead to high server latency and potential disk space limitations, respectively.

SAMtools specific configuration

Field Type Required Description
remove_duplicates boolean no Ignore reads marked as duplicate. default = false
samtools_stats_additional_options string no Any additional options recognized by samtools stats

Picard specific configuration

Field Type Required Description
cwm_coverage_cap integer no Cap coverage at this value. Default = 250
cwm_minimum_mapping_quality integer no Ignore reads with mapping quality below this value. Default = 20
cwm_minimum_base_quality integer no Ignore bases with quality below this value. Default = 20
cwm_use_fast_algorithm boolean no If true, fast algorithm is used
cwm_additional_options string no Any additional options recognized by CollectWgsMetrics

Qualimap specific configuration

Field Type Required Description
bamqc_outformat string no Choice of 'pdf' or 'html', default = 'pdf'
bamqc_additional_options string no Any additional options recognized by bamqc

Base resource allocation updaters

To update the base resource (cpus or memory) allocations for processes, use the following structure. The default allocations can be found in the node-specific config files

base_resource_update {
    memory = [
        [['process_name', 'process_name2'], <multiplier for resource>],
        [['process_name3', 'process_name4'], <different multiplier for resource>]
    ]
    cpus = [
        [['process_name', 'process_name2'], <multiplier for resource>],
        [['process_name3', 'process_name4'], <different multiplier for resource>]
    ]
}

Note Resource updates will be applied in the order they're provided so if a process is included twice in the memory list, it will be updated twice in the order it's given.

Examples:

  • To double memory of all processes:
base_resource_update {
    memory = [
        [[], 2]
    ]
}
  • To double memory for run_CollectWgsMetrics_Picard and triple memory for run_stats_SAMtools and run_bamqc_Qualimap:
base_resource_update {
    memory = [
        ['run_CollectWgsMetrics_Picard', 2],
        [['run_stats_SAMtools', 'run_bamqc_Qualimap'], 3]
    ]
}
  • To double CPUs and memory for run_CollectWgsMetrics_Picard and double memory for run_stats_SAMtools:
base_resource_update {
    cpus = [
        ['run_CollectWgsMetrics_Picard', 2]
    ]
    memory = [
        [['run_CollectWgsMetrics_Picard', 'run_stats_SAMtools'], 2]
    ]
}