Inputs

Input YAML

Field Type Description
sample_id string Sample ID
normal path Set to absolute path to input BAM
---
input:
  BAM:
    normal:
      - "/path/to/input/BAM"

Note: This pipeline is designed to detect germline SVs. To maintain consistency with other Boutros Lab Nextflow pipelines, the input YAML format mirrors that of other somatic or germline variant calling pipelines. However, it's important to note that the sample type tags, whether labeled as normal or tumor, do NOT influence the germline SV/CNV calling processes in this pipeline.

Nextflow Config File Parameters

Input Parameter Required Type Description
dataset_id yes string Boutros lab dataset id.
blcds_registered_dataset yes boolean Affirms if dataset should be registered in the Boutros Lab Data registry. Default value is false.
variant_type yes list List containing variant types to call. Default is ["gSV", "gCNV"]
run_discovery yes boolean Specifies whether or not to run the "disovery" branch of the pipeline. Default value is true. (either run_discovery or run_regenotyping must be true)
run_regenotyping yes boolean Specifies whether or not to run the "regenotyping" branch of the pipeline. Default value is false. (either run_discovery or run_regenotyping must be true)
merged_sites yes path The path to the merged sites.bcf file. Must be populated if running the regenotyping branch.
run_delly true boolean Whether or not the workflow should run Delly (either run_delly or run_manta must be set to true)
run_manta true boolean Whether or not the workflow should run Manta (either run_delly or run_manta must be set to true)
run_qc no boolean Optional parameter to indicate whether subsequent quality checks should be run on Delly outputs. Default value is false.
reference_fasta yes path Absolute path to the reference genome FASTA file. The reference genome is used by Delly for SV calling.
exclusion_file yes path Absolute path to the delly reference genome exclusion file utilized to remove suggested regions for SV calling. On Slurm, an HG38 exclusion file is located at /hot/ref/tool-specific-input/Delly/hg38/human.hg38.excl.tsv
mappability_map yes path Absolute path to the delly mappability map to support GC and mappability fragment correction in CNV calling
map_qual no path minimum paired-end (PE) mapping quaility threshold for Delly.
save_intermediate_files yes boolean Optional parameter to indicate whether intermediate files will be saved. Default value is false.
output_dir yes path Absolute path to the directory where the output files to be saved.
work_dir optional path Path of working directory for Nextflow. When included in the sample config file, Nextflow intermediate files and logs will be saved to this directory. With ucla_cds, the default is /scratch and should only be changed for testing/development. Changing this directory to /hot or /tmp can lead to high server latency and potential disk space limitations, respectively.
docker_container_registry optional string Registry containing tool Docker images. Default: ghcr.io/uclahs-cds

An example of the NextFlow Input Parameters Config file can be found here.

Base resource allocation updaters

To optionally update the base resource (cpus or memory) allocations for processes, use the following structure and add the necessary parts to the input.config file. The default allocations can be found in the node-specific config files

base_resource_update {
    memory = [
        [['process_name', 'process_name2'], <multiplier for resource>],
        [['process_name3', 'process_name4'], <different multiplier for resource>]
    ]
    cpus = [
        [['process_name', 'process_name2'], <multiplier for resource>],
        [['process_name3', 'process_name4'], <different multiplier for resource>]
    ]
}

Note Resource updates will be applied in the order they're provided so if a process is included twice in the memory list, it will be updated twice in the order it's given. Examples:

  • To double memory of all processes:
base_resource_update {
    memory = [
        [[], 2]
    ]
}
  • To double memory for call_gSV_Delly and triple memory for run_validate_PipeVal and call_gSV_Manta:
base_resource_update {
    memory = [
        ['call_gSV_Delly', 2],
        [['run_validate_PipeVal', 'call_gSV_Manta'], 3]
    ]
}
  • To double CPUs and memory for call_gSV_Manta and double memory for run_validate_PipeVal:
base_resource_update {
    cpus = [
        ['call_gSV_Manta', 2]
    ]
    memory = [
        [['call_gSV_Manta', 'run_validate_PipeVal'], 2]
    ]
}