Configuration

The following parameters are available at the metapipeline level:

Parameter	Type	Required	Description
`output_dir`	path	yes	Absolute path to directory where output files will be saved
`leading_work_dir`	path	yes	Absolute path to common working directory (under `/hot` for example for access across all nodes). Cannot be `/scratch` or any node-specific directory.
`pipeline_work_dir`	path	yes	Absolute path to outputs from each individual pipeline before copying to `output_dir`. Suggested: `/scratch`
`project_id`	string	yes	Project identifier used to name the main output directory of the run
`save_intermediate_files`	boolean	yes	Whether to save intermediate files. Default: `false`
`partition`	string	yes	Partition type for submitting each processing jobs
`clusterOptions`	string	yes	Additional `slurm` submission options
`max_parallel_jobs`	integer	yes	Number of jobs to submit at once. Default: 5
`cluster_submission_interval`	integer	yes	Time in minutes to wait between job submissions, Default: 90
`sample_mode`	string	yes	Mode for sample calling. Options: `paired`, `single`, `multi`. Default: `paired`
`requested_pipelines`	list	yes	List of pipelines requested.
`use_original_intervals`	boolean	yes	Whether original intervals should be used with pipelines rather than expanded intervals generated by calculate-targeted-coverage
`pipeline_params`	namespace	yes	Namespace containing parameters for each individual pipeline. Parameters for the requested pipelines must be given.
`override_realignment`	boolean	yes	Whether to override conversion to FASTQ and realignment when given BAM input. Default: `false`
`override_recalibrate_bam`	boolean	yes	Whether to override recalibrate-BAM pipeline when given BAM input. Default: `false`
`src_snv_tool`	string	yes	Which SNV tool's output to use for SRC. Default: `BCFtools-Intersect`
`src_cna_tool`	string	yes	Which CNA tool's output to use for SRC. Default: `Battenberg`
`override_src_precursor_disable`	boolean	yes	Whether to override the automatic disabling of either call-sSNV or call-sCNA when the respective outputs are provided in the input. Default: `false`
`src_run_all_combinations`	boolean	yes	TO-DO: Whether to run SRC using all combinations of SNV tool and CNA tool. Default: `false`
`run_downstream_pipelines_serially`	boolean	no	Whether to run pipelines downstream of recalibrate-BAM sequentially. Note: if this option is used in conjunction with `downstream_pipeline_order`, any pipelines with a given ordering will be run sequentially regardless of the value of this parameter. Default: `false`
`downstream_pipeline_order`	list	no	List indicating specific order in which to run pipelines downstream of recalibrate-BAM. Default: no order
`input_csv`	path	no	Absolute path to input CSV when using CSV input
`status_email_address`	string	no	Email address to notify when child pipelines start and complete. Default: ``

UCLAHS-CDS WGS global sample job submission parameters

The following parameters are intended to control the global number and rate of WGS jobs. By default, these parameters are enabled; in the case of non-WGS samples or non-UCLAHS-CDS environment, disable uclahs_cds_wgs in the config file params.

Input Parameter	Type	Required	Description
`uclahs_cds_wgs`	boolean	yes	Whether global job number and submission limits should be applied. Default: `true`
`global_rate_limit`	integer	yes	Time in minutes between submission of any WGS jobs. Default: 90

Pipeline selection

Pipeline selection is controlled by the requested_pipelines parameter. Given the list of requested pipelines, metapipeline-DNA will automatically identify any necessary dependencies and enable them for the run.

Pipeline selection follows some default behaviors:

When given BAM input, the default pipeline selector will perform conversion to FASTQ, re-align the FASTQs, and perform recalibration. This default behavior can be disabled with the override_realignment and override_recalibrate_bam parameters. With override_realignment, the back-conversion to FASTQ and re-alignment will be disabled. With override_recalibrate_bam, recalibration of the BAM using recalibrate-BAM will be disabled.
When SNV or CNA calls are given as inputs, metapipeline-DNA will automatically disable the call-sSNV and call-sCNA pipelines, respectively, and use the given inputs for call-SRC. This behavior can be controlled by override_src_precursor_disable to let metapipeline-DNA run the call-sSNV and call-sCNA pipelines to generate inputs for call-SRC using the BAM or FASTQ inputs. Note: This option only has an effect in the case of mixed inputs being provided as the call-sSNV and call-sCNA pipelines require sequencing data as inputs.

Pipeline-specific params

Each pipeline has a set of parameters that must be provided. The available parameters for each pipeline are documented in the links in the steps. Additionally, the default template.config contains the default set of parameters that need to be defined for each pipeline. Any additional supported parameters can be added as needed. The following keys are used as the pipeline names in this namespace:

Pipeline	Key
`convert-BAM2FASTQ`	`convert_BAM2FASTQ`
`align-DNA`	`align_DNA`
`recalibrate-BAM`	`recalibrate_BAM`
`calculate-targeted-coverage`	`calculate_targeted_coverage`
`generate-SQC-BAM`	`generate_SQC_BAM`
`call-gSNP`	`call_gSNP`
`call-sSNV`	`call_sSNV`
`call-mtSNV`	`call_mtSNV`
`call-gSV`	`call_gSV`
`call-sSV`	`call_sSV`
`call-sCNA`	`call_sCNA`
`call-SRC`	`call_SRC`

Each pipeline also defines a set of resources per process to run. These resources can be modified if necessary on a per-process per-pipeline basis by using the base_resource_update functionality for the specific pipeline (this functionality is defined in each pipeline's README). For example, to double the base memory of all processes in the call-sSNV pipeline:

params {
    ...
    pipeline_params {
        ...
        call_sSNV {
            ...
            base_resource_update {
                memory = [
                    [[], 2]
                ]
            }
        }
        ...
    }
}

Intervals

For targeted or exome sequencing, target intervals can be provided in BED format to some of the steps to control processing. The following steps accept intervals:

Step/pipeline	Parameter name
`call-sSNV`	`intersect_regions`
`call-gSNP`	`intervals`
`recalibrate-BAM`	`intervals`
`calculate-targeted-coverage`	`target_bed`

For the respective pipeline params, provide the full path to the intervals file in the generated config to make use of the targets. For example:

params {
    ...
    pipeline_params {
        ...
        call_sSNV {
            ...
            intersect_regions = "/full/path/to/intervals"
            ...
        }
        ...
    }
}

Sample modes

The metapipeline supports running samples in three modes: single, paired, and multi. This is controlled by the sample_mode parameter. In paired or multi sample modes, each patient is expected to have exactly one normal sample and one or more tumor samples.

Given the set of input patients and samples, grouping of samples is controlled based on the run mode as follows:

Single sample mode

All samples are processed individually, regardless of patient, as separate jobs.

Normal samples will go through germline calling (call-gSNP, call-gSV) and somatic SNV calling with Mutect2's normal-only mode.
Tumor samples will go through germline calling (call-gSNP) and somatic SNV calling with Mutect2's tumor-only mode.

Paired sample mode

All samples from the same patient are submitted as a single job, with each normal-tumor pair processed separately in the same job.

Individual samples will go through the convert-BAM2FASTQ and align-DNA pipelines.
The normal sample will then be paired with each tumor sample and each pair will go through recalibration and the somatic calling pipelines.
The normal sample will go through call-gSV.

Multi sample mode

All samples from the same patient are processed as a single job.

Individual samples will go through the convert-BAM2FASTQ and align-DNA pipelines.
The recalibration and germline SNP calling will then proceed on the entire set of samples together.
Somatic SNV calling will proceed in two ways:
The normal sample will be paired with each tumor sample and run through the call-sSNV pipeline
If Mutect2 was requested, the entire set of samples will go through multi-sample calling with just Mutect2 in call-sSNV.
The normal sample will be paired with each tumor sample and each pair will go through call-mtSNV, call-sSV, and call-sCNA.
The normal sample will go through call-gSV.