Inputs
Input YAML
Field | Type | Description |
---|---|---|
patient_id | string | Patient ID (will be standardized according to data storage structure in the near future) |
normal_BAM | path | Set to absolute path to normal BAM |
tumor_BAM | path | Set to absolute path to tumour BAM |
recalibration_table | path | (Optional) Absolute path to recalibration table |
Input without pre-existing recalibration table(s):
---
patient_id: "patient_id"
input:
BAM:
normal:
- "/absolute/path/to/BAM"
- "/absolute/path/to/BAM"
tumor:
- "/absolute/path/to/BAM"
- "/absolute/path/to/BAM"
Input with existing recalibration table(s):
---
patient_id: "patient_id"
input:
BAM:
normal:
- "/absolute/path/to/BAM"
- "/absolute/path/to/BAM"
tumor:
- "/absolute/path/to/BAM"
- "/absolute/path/to/BAM"
recalibration_table:
- "/absolute/path/to/recalibration/table1"
- "/absolute/path/to/recalibration/table2"
For normal-only or tumour-only samples, exclude the fields for the other state.
Config
Input Parameter | Required | Type | Description |
---|---|---|---|
dataset_id |
Yes | string | Dataset ID |
blcds_registered_dataset |
Yes | boolean | Set to true when using BLCDS folder structure; use false for now |
output_dir |
Yes | string | Need to set if blcds_registered_dataset = false |
save_intermediate_files |
Yes | boolean | Set to false to disable publishing of intermediate files; true otherwise; disabling option will delete intermediate files to allow for processing of large BAMs |
aligner |
Yes | string | Original aligner used to align input BAMs; formatted as \ |
cache_intermediate_pipeline_steps |
No | boolean | Set to true to enable process caching from Nextflow; defaults to false |
is_emit_original_quals |
Yes | boolean | Set to true to emit original quality scores; false to omit |
is_DOC_run |
Yes | boolean | Set to true to run GATK DepthOfCoverage (very time-consuming for large BAMs); false otherwise |
parallelize_by_chromosome |
Yes | boolean | Whether the parallelize by chromosome or by scattering intervals |
scatter_count |
Yes | integer | Number of intervals to divide into for parallelization |
intervals |
Yes | path | Use all .list in inputs for WGS; Set to absolute path to targeted exome interval file (with .interval_list, .list, .intervals, or .bed suffix) |
gatk_ir_compression |
No | integer | Compression level for BAMs output by IndelRealigner. Default: 0. Range: 0-9 |
reference_fasta |
Yes | path | Absolute path to reference genome fasta file, e.g., /hot/ref/reference/GRCh38-BI-20160721/Homo_sapiens_assembly38.fasta |
bundle_mills_and_1000g_gold_standard_indels_vcf_gz |
Yes | path | Absolute path to Mills & 1000G Gold Standard Indels file, e.g., /hot/ref/tool-specific-input/GATK/GRCh38/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz |
bundle_known_indels_vcf_gz |
Yes | path | Absolute path to known indels file, e.g., /hot/ref/tool-specific-input/GATK/GRCh38/Homo_sapiens_assembly38.known_indels.vcf.gz |
bundle_v0_dbsnp138_vcf_gz |
Yes | path | Absolute path to dbsnp file, e.g., /hot/ref/tool-specific-input/GATK/GRCh38/resources_broad_hg38_v0_Homo_sapiens_assembly38.dbsnp138.vcf.gz |
bundle_contest_hapmap_3p3_vcf_gz |
Yes | path | Absolute path to HapMap 3.3 biallelic sites file, e.g., /hot/ref/tool-specific-input/GATK/GRCh38/Biallelic/hapmap_3.3.hg38.BIALLELIC.PASS.2021-09-01.vcf.gz |
work_dir |
optional | path | Path of working directory for Nextflow. When included in the sample config file, Nextflow intermediate files and logs will be saved to this directory. With ucla_cds, the default is /scratch and should only be changed for testing/development. Changing this directory to /hot or /tmp can lead to high server latency and potential disk space limitations, respectively. |
base_resource_update |
optional | namespace | Namespace of parameters to update base resource allocations in the pipeline. Usage and structure are detailed in template.config and below. |
The below parameters have default values defined in default.config
and generally do not need to be set by the user.
Optional Parameter | Type | Description |
---|---|---|
metapipeline_delete_input_bams |
boolean | Set to true to delete the input BAM files once the initial processing step is complete. WARNING: This option should NOT be used for individual runs of recalibate-BAM; it's intended for metapipeline-DNA to optimize disk space usage by removing files that are no longer needed from the workDir . |
metapipeline_final_output_dir |
string | Absolute path for the final output directory of metapipeline-DNA that's expected to contain the output BAM from align-DNA. WARNING: This option should not be used for individual runs of recalibrate-BAM; it's intended for metapipeline-DNA to optimize disk space usage. |
metapipeline_states_to_delete |
list | List of states for which to delete input BAMs. WARNING: This option should not be used for individual runs of recalibrate-BAM; it's intended for metapipeline-DNA to optimize disk space usage. |
cache_intermediate_pipeline_steps |
boolean | Enable process caching from Nextflow. |
ucla_cds |
boolean | Overwrite default memory and CPU values by cluster-specific configs. |
docker_container_registry |
string | Registry containing tool Docker images. |
docker_image_gatk , gatk_version |
string | Docker image name and version for GATK. |
docker_image_pipeval , pipeval_version |
string | Docker image name and version for PipeVal. |
docker_image_gatk3 , gatk3_version |
string | Docker image name and version for GATK3. |
docker_image_picard , picard_version |
string | Docker image name and version for Picard. |
docker_image_samtools , samtools_version |
string | Docker image name and version for SAMtools. |
reference_fasta_fai , reference_fasta_dict |
path | Index and dictionary files for the required input. Default: Matching .fai and .dict files in the same directory. |
bundle_v0_dbsnp138_vcf_gz_tbi |
path | Index file for the required input. Default: Matching .tbi file in the same directory. |
bundle_known_indels_vcf_gz_tbi |
path | Index file for the required input. Default: Matching .tbi file in the same directory. |
bundle_contest_hapmap_3p3_vcf_gz_tbi |
path | Index file for the required input. Default: Matching .tbi file in the same directory. |
bundle_mills_and_1000g_gold_standard_indels_vcf_gz_tbi |
path | Index file for the required input. Default: Matching .tbi file in the same directory. |
Base resource allocation updaters
To update the base resource (cpus or memory) allocations for processes, use the following structure and add the necessary parts. The default allocations can be found in the node-specific config files
base_resource_update {
memory = [
[['process_name', 'process_name2'], <multiplier for resource>],
[['process_name3', 'process_name4'], <different multiplier for resource>]
]
cpus = [
[['process_name', 'process_name2'], <multiplier for resource>],
[['process_name3', 'process_name4'], <different multiplier for resource>]
]
}
Note Resource updates will be applied in the order they're provided so if a process is included twice in the memory list, it will be updated twice in the order it's given.
Examples:
- To double memory of all processes:
base_resource_update {
memory = [
[[], 2]
]
}
- To double memory for
run_ApplyBQSR_GATK
and triple memory forrun_validate_PipeVal
andrun_IndelRealigner_GATK
:
base_resource_update {
memory = [
['run_ApplyBQSR_GATK', 2],
[['run_validate_PipeVal', 'run_IndelRealigner_GATK'], 3]
]
}
- To double CPUs and memory for
run_ApplyBQSR_GATK
and double memory forrun_validate_PipeVal
:
base_resource_update {
cpus = [
['run_ApplyBQSR_GATK', 2]
]
memory = [
[['run_ApplyBQSR_GATK', 'run_validate_PipeVal'], 2]
]
}