Inputs

Input YAML

Field	Type	Description
sample_id	string	Sample ID
normal	path	Set to absolute path to input BAM

---
input:
  BAM:
    normal:
      - "/path/to/input/BAM"

Note: This pipeline is designed to detect germline SVs. To maintain consistency with other Boutros Lab Nextflow pipelines, the input YAML format mirrors that of other somatic or germline variant calling pipelines. However, it's important to note that the sample type tags, whether labeled as normal or tumor, do NOT influence the germline SV/CNV calling processes in this pipeline.

Nextflow Config File Parameters

Input Parameter	Required	Type	Description
`dataset_id`	yes	string	Boutros lab dataset id.
`blcds_registered_dataset`	yes	boolean	Affirms if dataset should be registered in the Boutros Lab Data registry. Default value is `false`.
`genome_build`	no	string	Genome build for circos plot, `hg19` or `hg38`. Default is set to `hg38`
`variant_type`	yes	list	List containing variant types to call. Default is `["gSV", "gCNV"]`
`run_discovery`	yes	boolean	Specifies whether or not to run the "disovery" branch of the pipeline. Default value is `true`. (either `run_discovery` or `run_regenotyping` must be `true`)
`run_regenotyping`	yes	boolean	Specifies whether or not to run the "regenotyping" branch of the pipeline. Default value is `false`. (either `run_discovery` or `run_regenotyping` must be `true`)
`merged_sites`	yes	path	The path to the merged sites.bcf file. Must be populated if running the regenotyping branch.
`run_delly`	true	boolean	Whether or not the workflow should run Delly (either run_delly or run_manta must be set to `true`)
`run_manta`	true	boolean	Whether or not the workflow should run Manta (either run_delly or run_manta must be set to `true`)
`run_qc`	no	boolean	Optional parameter to indicate whether subsequent quality checks should be run on Delly outputs. Default value is `false`.
`reference_fasta`	yes	path	Absolute path to the reference genome `FASTA` file. The reference genome is used by Delly for SV calling.
`exclusion_file`	yes	path	Absolute path to the delly reference genome `exclusion` file utilized to remove suggested regions for SV calling. On Slurm, an HG38 exclusion file is located at `/hot/resource/tool-specific-input/Delly/hg38/human.hg38.excl.tsv`
`mappability_map`	yes	path	Absolute path to the delly mappability map to support GC and mappability fragment correction in CNV calling
`map_qual`	no	path	minimum paired-end (PE) mapping quaility threshold for Delly.
`save_intermediate_files`	yes	boolean	Optional parameter to indicate whether intermediate files will be saved. Default value is `false`.
`output_dir`	yes	path	Absolute path to the directory where the output files to be saved.
`work_dir`	optional	path	Path of working directory for Nextflow. When included in the sample config file, Nextflow intermediate files and logs will be saved to this directory. With `ucla_cds`, the default is `/scratch` and should only be changed for testing/development. Changing this directory to `/hot` or `/tmp` can lead to high server latency and potential disk space limitations, respectively.
`docker_container_registry`	optional	string	Registry containing tool Docker images. Default: `ghcr.io/uclahs-cds`

An example of the NextFlow Input Parameters Config file can be found here.

Base resource allocation updaters

To optionally update the base resource (cpus or memory) allocations for processes, use the following structure and add the necessary parts to the input.config file. The default allocations can be found in the node-specific config files

base_resource_update {
    memory = [
        [['process_name', 'process_name2'], <multiplier for resource>],
        [['process_name3', 'process_name4'], <different multiplier for resource>]
    ]
    cpus = [
        [['process_name', 'process_name2'], <multiplier for resource>],
        [['process_name3', 'process_name4'], <different multiplier for resource>]
    ]
}

Note Resource updates will be applied in the order they're provided so if a process is included twice in the memory list, it will be updated twice in the order it's given. Examples:

To double memory of all processes:

base_resource_update {
    memory = [
        [[], 2]
    ]
}

To double memory for call_gSV_Delly and triple memory for run_validate_PipeVal and call_gSV_Manta:

base_resource_update {
    memory = [
        ['call_gSV_Delly', 2],
        [['run_validate_PipeVal', 'call_gSV_Manta'], 3]
    ]
}

To double CPUs and memory for call_gSV_Manta and double memory for run_validate_PipeVal:

base_resource_update {
    cpus = [
        ['call_gSV_Manta', 2]
    ]
    memory = [
        [['call_gSV_Manta', 'run_validate_PipeVal'], 2]
    ]
}