Nextflow Config File Parameters

Field Required Type Description
dataset_id yes string Boutros Lab dataset id
blcds_registered_dataset yes boolean Affirms if dataset should be registered in the Boutros Lab Data registry. Default value is false.
genome_build no string Genome build for circos plot, hg19 or hg38. Default is set to hg38
algorithm yes list List containing a combination of SV callers delly, manta, gridss2. List can contain a single caller of choice.
reference_fasta yes path Absolute path to the reference genome FASTA file. The reference genome is used by Delly for structural variant calling. GRCh37 - /hot/resource/reference-genome/GRCh37-EBI-hs37d5/hs37d5.fa, GRCh38 - /hot/resource/reference-genome/GRCh38-BI-20160721/Homo_sapiens_assembly38.fasta
save_intermediate_files yes boolean Optional parameter to indicate whether intermediate files will be saved. Default value is false.
output_dir yes path Absolute path to the directory where the output files to be saved.
work_dir no path Path of working directory for Nextflow. When included in the sample config file, Nextflow intermediate files and logs will be saved to this directory. With ucla_cds, the default is /scratch and should only be changed for testing/development. Changing this directory to /hot or /tmp can lead to high server latency and potential disk space limitations, respectively.
verbose yes boolean If set to true, the values of input channels will be printed, can be used for debugging
docker_container_registry optional string Registry containing tool Docker images. Default: ghcr.io/uclahs-cds

An example of the NextFlow Input Parameters Config file can be found here.

DELLY Specific Parameters

Field Required Type Description
exclusion_file yes path Absolute path to the Delly reference genome exclusion file utilized to remove suggested regions for structural variant calling. GRCh37 - /hot/resource/tool-specific-input/Delly/GRCh37-EBI-hs37d/human.hs37d5.excl.tsv, GRCh38 - /hot/resource/tool-specific-input/Delly/hg38/human.hg38.excl.tsv
map_qual yes integer Minimum paired-end (PE) mapping quality (MAPQ) for Delly. Default set to 20.
min_clique_size yes integer Minimum number of supporting PE or split-read (SR) alignments required for a clique to be identified as a structural variant by Delly. Adjust this parameter to control the sensitivity and specificity of Delly variant calling. Default set to 5.
mad_cutoff yes integer Insert size cutoff, median+s*MAD (deletions only) for Delly. Default set to 15.

GRIDSS2 Specific Parameters

Field Required Type Description
gridss2_blacklist yes path Path to GRIDSS2 blacklist BED file. GRCh37 - /hot/resource/tool-specific-input/GRIDSS2-2.13.2/GRCh37-EBI-hs37d5/ENCFF001TDO.bed and GRCh38 - /hot/resource/tool-specific-input/GRIDSS2-2.13.2/GRCh38-BI-20160721/ENCFF356LFX.bed
gridss2_reference_fasta yes path Path to GRIDSS2 reference FASTA file. GRCh37 - /hot/resource/tool-specific-input/GRIDSS2-2.13.2/GRCh37-EBI-hs37d5/hs37d5.fa and GRCh38 - /hot/resource/tool-specific-input/GRIDSS2-2.13.2/GRCh38-BI-20160721/Homo_sapiens_assembly38.fasta
gridss2_pon_dir yes path Path to GRIDSS2 Panel Of Normals (PON) directory. GRCh37 - /hot/resource/tool-specific-input/GRIDSS2-2.13.2/GRCh37-EBI-hs37d5/ and GRCh38 - /hot/resource/tool-specific-input/GRIDSS2-2.13.2/GRCh38-BI-20160721/
other_jvm_heap no string Update other_jvm_heap if GRIDSS2 errors OutOfMemory. Default is 4.GB

Base resource allocation updaters

To optionally update the base resource (cpus or memory) allocations for processes, use the following structure and add the necessary parts to the input.config file. The default allocations can be found in the node-specific config files

base_resource_update {
    memory = [
        [['process_name', 'process_name2'], <multiplier for resource>],
        [['process_name3', 'process_name4'], <different multiplier for resource>]
    ]
    cpus = [
        [['process_name', 'process_name2'], <multiplier for resource>],
        [['process_name3', 'process_name4'], <different multiplier for resource>]
    ]
}

Note Resource updates will be applied in the order they're provided so if a process is included twice in the memory list, it will be updated twice in the order it's given. Examples:

  • To double memory of all processes:
base_resource_update {
    memory = [
        [[], 2]
    ]
}
  • To double memory for call_sSV_Delly and triple memory for run_validate_PipeVal and call_sSV_Manta:
base_resource_update {
    memory = [
        ['call_sSV_Delly', 2],
        [['run_validate_PipeVal', 'call_sSV_Manta'], 3]
    ]
}
  • To double CPUs and memory for call_sSV_Manta and double memory for run_validate_PipeVal:
base_resource_update {
    cpus = [
        ['call_sSV_Manta', 2]
    ]
    memory = [
        [['call_sSV_Manta', 'run_validate_PipeVal'], 2]
    ]
}