Inputs

Input YAML

Field	Type	Description
patient_id	string	Patient ID (will be standardized according to data storage structure in the near future)
normal_BAM	path	Set to absolute path to normal BAM
tumor_BAM	path	Set to absolute path to tumor BAM

---
patient_id: "patient_id"
input:
  BAM:
    normal:
      - "/absolute/path/to/BAM"
      - "/absolute/path/to/BAM"
    tumor:
      - "/absolute/path/to/BAM"
      - "/absolute/path/to/BAM"

For normal-only or tumor-only samples, exclude the fields for the other state.

Config

Input Parameter	Required	Type	Description
`dataset_id`	Yes	string	Dataset ID
`blcds_registered_dataset`	Yes	boolean	Set to true when using BLCDS folder structure; use false for now
`output_dir`	Yes	string	Need to set if `blcds_registered_dataset = false`
`save_intermediate_files`	Yes	boolean	Set to false to disable publishing of intermediate files; true otherwise; disabling option will delete intermediate files to allow for processing of large BAMs
`cache_intermediate_pipeline_steps`	No	boolean	Set to true to enable process caching from Nextflow; defaults to false
`scatter_count`	Yes	integer	Number of intervals to divide into for parallelization
`intervals`	Yes	path	Use all .list in inputs for WGS; Set to absolute path to targeted exome interval file (with .interval_list, .list, .intervals, or .bed suffix)
`reference_fasta`	Yes	path	Absolute path to reference genome fasta file, e.g., `/hot/ref/reference/GRCh38-BI-20160721/Homo_sapiens_assembly38.fasta`
`bundle_mills_and_1000g_gold_standard_indels_vcf_gz`	Yes	path	Absolute path to Mills & 1000G Gold Standard Indels file, e.g., `/hot/ref/tool-specific-input/GATK/GRCh38/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz`
`bundle_v0_dbsnp138_vcf_gz`	Yes	path	Absolute path to dbsnp file, e.g., `/hot/ref/tool-specific-input/GATK/GRCh38/resources_broad_hg38_v0_Homo_sapiens_assembly38.dbsnp138.vcf.gz`
`bundle_hapmap_3p3_vcf_gz`	Yes	path	Absolute path to HapMap 3.3 file, e.g., `/hot/ref/tool-specific-input/GATK/GRCh38/hapmap_3.3.hg38.vcf.gz`
`bundle_omni_1000g_2p5_vcf_gz`	Yes	path	Absolute path to 1000 genomes OMNI 2.5 file, e.g., `/hot/ref/tool-specific-input/GATK/GRCh38/1000G_omni2.5.hg38.vcf.gz`
`bundle_phase1_1000g_snps_high_conf_vcf_gz`	Yes	path	Absolute path to 1000 genomes phase 1 high-confidence file, e.g., `/hot/ref/tool-specific-input/GATK/GRCh38/1000G_phase1.snps.high_confidence.hg38.vcf.gz`
`work_dir`	optional	path	Path of working directory for Nextflow. When included in the sample config file, Nextflow intermediate files and logs will be saved to this directory. With ucla_cds, the default is `/scratch` and should only be changed for testing/development. Changing this directory to `/hot` or `/tmp` can lead to high server latency and potential disk space limitations, respectively.
`docker_container_registry`	optional	string	Registry containing tool Docker images. Default: `ghcr.io/uclahs-cds`
`base_resource_update`	optional	namespace	Namespace of parameters to update base resource allocations in the pipeline. Usage and structure are detailed in `template.config` and below.

Base resource allocation updaters

To update the base resource (cpus or memory) allocations for processes, use the following structure and add the necessary parts. The default allocations can be found in the node-specific config files

base_resource_update {
    memory = [
        [['process_name', 'process_name2'], <multiplier for resource>],
        [['process_name3', 'process_name4'], <different multiplier for resource>]
    ]
    cpus = [
        [['process_name', 'process_name2'], <multiplier for resource>],
        [['process_name3', 'process_name4'], <different multiplier for resource>]
    ]
}

Note Resource updates will be applied in the order they're provided so if a process is included twice in the memory list, it will be updated twice in the order it's given.

Examples:

To double memory of all processes:

base_resource_update {
    memory = [
        [[], 2]
    ]
}

To double memory for run_ApplyVQSR_GATK and triple memory for run_validate_PipeVal and run_HaplotypeCallerVCF_GATK:

base_resource_update {
    memory = [
        ['run_ApplyVQSR_GATK', 2],
        [['run_validate_PipeVal', 'run_HaplotypeCallerVCF_GATK'], 3]
    ]
}

To double CPUs and memory for run_ApplyVQSR_GATK and double memory for run_validate_PipeVal:

base_resource_update {
    cpus = [
        ['run_ApplyVQSR_GATK', 2]
    ]
    memory = [
        [['run_ApplyVQSR_GATK', 'run_validate_PipeVal'], 2]
    ]
}