Inputs

input.yaml

This input YAML must comply with the format in the provided template.

Field Type Description
patient_id string Name of patient.
normal_BAM path Absolute path to normal BAM file.
tumor_BAM path Absolute path to tumor BAM file.

Single Mode

Provide either a normal sample or tumor sample and leave the other entry blank in the YAML. The data will be organized by the provided sample's ID.

Paired Mode

The data will be organized under the tumor sample ID.

input.config

The config file can take 6 arguments. See provided template.

Input Parameter Required Type Description
1 dataset_id yes string dataset identifier attached to pipeline output.
2 output_dir yes path Absolute path to location of output.
3 mt_ref_genome_dir yes path Absolute path to directory containing mitochondrial ref genome and mt ref genome index files. Path: /hot/ref/mitochondria_ref/genome_fasta
4 gmapdb yes path Absolute path to to gmapdb directory. Path: /hot/ref/mitochondria_ref/gmapdb/gmapdb_2021-03-08
5 save_intermediate_files no boolean Save intermediate files. If yes, not only the final BAM, but also the unmerged, unsorted, and duplicates unmarked BAM files will also be saved. Default is set to false.
6 cache_intermediate_pipeline_steps no boolean Enable caching to resume pipeline and the end of the last successful process completion when a pipeline fails (if true the default submission script must be modified). Default is set to false.
7 base_resource_update no namespace Namespace of parameters to update base resource allocations in the pipeline. Usage and structure are detailed in template.config and below.

Base resource allocation updaters

To optionally update the base resource (cpus or memory) allocations for processes, use the following structure and add the necessary parts. The default allocations can be found in the node-specific config files

base_resource_update {
    memory = [
        [['process_name', 'process_name2'], <multiplier for resource>],
        [['process_name3', 'process_name4'], <different multiplier for resource>]
    ]
    cpus = [
        [['process_name', 'process_name2'], <multiplier for resource>],
        [['process_name3', 'process_name4'], <different multiplier for resource>]
    ]
}

Note Resource updates will be applied in the order they're provided so if a process is included twice in the memory list, it will be updated twice in the order it's given.

Examples:

  • To double memory of all processes:
base_resource_update {
    memory = [
        [[], 2]
    ]
}
  • To double memory for convert_mitoCaller2vcf_mitoCaller and triple memory for Validate_Inputs and call_heteroplasmy:
base_resource_update {
    memory = [
        ['convert_mitoCaller2vcf_mitoCaller', 2],
        [['Validate_Inputs', 'call_heteroplasmy'], 3]
    ]
}
  • To double CPUs and memory for convert_mitoCaller2vcf_mitoCaller and double memory for Validate_Inputs:
base_resource_update {
    cpus = [
        ['convert_mitoCaller2vcf_mitoCaller', 2]
    ]
    memory = [
        [['convert_mitoCaller2vcf_mitoCaller', 'Validate_Inputs'], 2]
    ]
}