Nextflow Config File Parameters
Field | Required | Type | Description |
---|---|---|---|
dataset_id | yes | string | Boutros Lab dataset id |
blcds_registered_dataset | yes | boolean | Affirms if dataset should be registered in the Boutros Lab Data registry. Default value is false . |
genome_build |
no | string | Genome build for circos plot, hg19 or hg38 . Default is set to hg38 |
algorithm | yes | list | List containing a combination of SV callers delly , manta , gridss2 . List can contain a single caller of choice. |
reference_fasta | yes | path | Absolute path to the reference genome FASTA file. The reference genome is used by Delly for structural variant calling. GRCh37 - /hot/resource/reference-genome/GRCh37-EBI-hs37d5/hs37d5.fa, GRCh38 - /hot/resource/reference-genome/GRCh38-BI-20160721/Homo_sapiens_assembly38.fasta |
save_intermediate_files | yes | boolean | Optional parameter to indicate whether intermediate files will be saved. Default value is false . |
output_dir | yes | path | Absolute path to the directory where the output files to be saved. |
work_dir | no | path | Path of working directory for Nextflow. When included in the sample config file, Nextflow intermediate files and logs will be saved to this directory. With ucla_cds , the default is /scratch and should only be changed for testing/development. Changing this directory to /hot or /tmp can lead to high server latency and potential disk space limitations, respectively. |
verbose | yes | boolean | If set to true , the values of input channels will be printed, can be used for debugging |
docker_container_registry |
optional | string | Registry containing tool Docker images. Default: ghcr.io/uclahs-cds |
An example of the NextFlow Input Parameters Config file can be found here.
DELLY Specific Parameters
Field | Required | Type | Description |
---|---|---|---|
exclusion_file | yes | path | Absolute path to the Delly reference genome exclusion file utilized to remove suggested regions for structural variant calling. GRCh37 - /hot/resource/tool-specific-input/Delly/GRCh37-EBI-hs37d/human.hs37d5.excl.tsv, GRCh38 - /hot/resource/tool-specific-input/Delly/hg38/human.hg38.excl.tsv |
map_qual | yes | integer | Minimum paired-end (PE) mapping quality (MAPQ) for Delly. Default set to 20. |
min_clique_size | yes | integer | Minimum number of supporting PE or split-read (SR) alignments required for a clique to be identified as a structural variant by Delly. Adjust this parameter to control the sensitivity and specificity of Delly variant calling. Default set to 5. |
mad_cutoff | yes | integer | Insert size cutoff, median+s*MAD (deletions only) for Delly. Default set to 15. |
GRIDSS2 Specific Parameters
Field | Required | Type | Description |
---|---|---|---|
gridss2_blacklist | yes | path | Path to GRIDSS2 blacklist BED file. GRCh37 - /hot/resource/tool-specific-input/GRIDSS2-2.13.2/GRCh37-EBI-hs37d5/ENCFF001TDO.bed and GRCh38 - /hot/resource/tool-specific-input/GRIDSS2-2.13.2/GRCh38-BI-20160721/ENCFF356LFX.bed |
gridss2_reference_fasta | yes | path | Path to GRIDSS2 reference FASTA file. GRCh37 - /hot/resource/tool-specific-input/GRIDSS2-2.13.2/GRCh37-EBI-hs37d5/hs37d5.fa and GRCh38 - /hot/resource/tool-specific-input/GRIDSS2-2.13.2/GRCh38-BI-20160721/Homo_sapiens_assembly38.fasta |
gridss2_pon_dir | yes | path | Path to GRIDSS2 Panel Of Normals (PON) directory. GRCh37 - /hot/resource/tool-specific-input/GRIDSS2-2.13.2/GRCh37-EBI-hs37d5/ and GRCh38 - /hot/resource/tool-specific-input/GRIDSS2-2.13.2/GRCh38-BI-20160721/ |
other_jvm_heap | no | string | Update other_jvm_heap if GRIDSS2 errors OutOfMemory. Default is 4.GB |
Base resource allocation updaters
To optionally update the base resource (cpus or memory) allocations for processes, use the following structure and add the necessary parts to the input.config file. The default allocations can be found in the node-specific config files
base_resource_update {
memory = [
[['process_name', 'process_name2'], <multiplier for resource>],
[['process_name3', 'process_name4'], <different multiplier for resource>]
]
cpus = [
[['process_name', 'process_name2'], <multiplier for resource>],
[['process_name3', 'process_name4'], <different multiplier for resource>]
]
}
Note Resource updates will be applied in the order they're provided so if a process is included twice in the memory list, it will be updated twice in the order it's given. Examples:
- To double memory of all processes:
base_resource_update {
memory = [
[[], 2]
]
}
- To double memory for
call_sSV_Delly
and triple memory forrun_validate_PipeVal
andcall_sSV_Manta
:
base_resource_update {
memory = [
['call_sSV_Delly', 2],
[['run_validate_PipeVal', 'call_sSV_Manta'], 3]
]
}
- To double CPUs and memory for
call_sSV_Manta
and double memory forrun_validate_PipeVal
:
base_resource_update {
cpus = [
['call_sSV_Manta', 2]
]
memory = [
[['call_sSV_Manta', 'run_validate_PipeVal'], 2]
]
}