Inputs
Input YAML
Field | Type | Description |
---|---|---|
sample_id | string | Sample ID |
normal | path | Set to absolute path to input BAM |
---
input:
BAM:
normal:
- "/path/to/input/BAM"
Note: This pipeline is designed to detect germline SVs. To maintain consistency with other Boutros Lab Nextflow pipelines, the input YAML format mirrors that of other somatic or germline variant calling pipelines. However, it's important to note that the sample type tags, whether labeled as
normal
ortumor
, do NOT influence the germline SV/CNV calling processes in this pipeline.
Nextflow Config File Parameters
Input Parameter | Required | Type | Description |
---|---|---|---|
dataset_id |
yes | string | Boutros lab dataset id. |
blcds_registered_dataset |
yes | boolean | Affirms if dataset should be registered in the Boutros Lab Data registry. Default value is false . |
variant_type |
yes | list | List containing variant types to call. Default is ["gSV", "gCNV"] |
run_discovery |
yes | boolean | Specifies whether or not to run the "disovery" branch of the pipeline. Default value is true . (either run_discovery or run_regenotyping must be true ) |
run_regenotyping |
yes | boolean | Specifies whether or not to run the "regenotyping" branch of the pipeline. Default value is false . (either run_discovery or run_regenotyping must be true ) |
merged_sites |
yes | path | The path to the merged sites.bcf file. Must be populated if running the regenotyping branch. |
run_delly |
true | boolean | Whether or not the workflow should run Delly (either run_delly or run_manta must be set to true ) |
run_manta |
true | boolean | Whether or not the workflow should run Manta (either run_delly or run_manta must be set to true ) |
run_qc |
no | boolean | Optional parameter to indicate whether subsequent quality checks should be run on Delly outputs. Default value is false . |
reference_fasta |
yes | path | Absolute path to the reference genome FASTA file. The reference genome is used by Delly for SV calling. |
exclusion_file |
yes | path | Absolute path to the delly reference genome exclusion file utilized to remove suggested regions for SV calling. On Slurm, an HG38 exclusion file is located at /hot/ref/tool-specific-input/Delly/hg38/human.hg38.excl.tsv |
mappability_map |
yes | path | Absolute path to the delly mappability map to support GC and mappability fragment correction in CNV calling |
map_qual |
no | path | minimum paired-end (PE) mapping quaility threshold for Delly. |
save_intermediate_files |
yes | boolean | Optional parameter to indicate whether intermediate files will be saved. Default value is false . |
output_dir |
yes | path | Absolute path to the directory where the output files to be saved. |
work_dir |
optional | path | Path of working directory for Nextflow. When included in the sample config file, Nextflow intermediate files and logs will be saved to this directory. With ucla_cds , the default is /scratch and should only be changed for testing/development. Changing this directory to /hot or /tmp can lead to high server latency and potential disk space limitations, respectively. |
docker_container_registry |
optional | string | Registry containing tool Docker images. Default: ghcr.io/uclahs-cds |
An example of the NextFlow Input Parameters Config file can be found here.
Base resource allocation updaters
To optionally update the base resource (cpus or memory) allocations for processes, use the following structure and add the necessary parts to the input.config file. The default allocations can be found in the node-specific config files
base_resource_update {
memory = [
[['process_name', 'process_name2'], <multiplier for resource>],
[['process_name3', 'process_name4'], <different multiplier for resource>]
]
cpus = [
[['process_name', 'process_name2'], <multiplier for resource>],
[['process_name3', 'process_name4'], <different multiplier for resource>]
]
}
Note Resource updates will be applied in the order they're provided so if a process is included twice in the memory list, it will be updated twice in the order it's given. Examples:
- To double memory of all processes:
base_resource_update {
memory = [
[[], 2]
]
}
- To double memory for
call_gSV_Delly
and triple memory forrun_validate_PipeVal
andcall_gSV_Manta
:
base_resource_update {
memory = [
['call_gSV_Delly', 2],
[['run_validate_PipeVal', 'call_gSV_Manta'], 3]
]
}
- To double CPUs and memory for
call_gSV_Manta
and double memory forrun_validate_PipeVal
:
base_resource_update {
cpus = [
['call_gSV_Manta', 2]
]
memory = [
[['call_gSV_Manta', 'run_validate_PipeVal'], 2]
]
}