Inputs

input.yaml

This input YAML must comply with the format in the provided template.

Field	Type	Description
patient_id	string	Name of patient.
normal_BAM	path	Absolute path to normal BAM file.
tumor_BAM	path	Absolute path to tumor BAM file.

Single Mode

Provide either a normal sample or tumor sample and leave the other entry blank in the YAML. The data will be organized by the provided sample's ID.

Paired Mode

The data will be organized under the tumor sample ID.

input.config

The config file can take 6 arguments. See provided template.

	Input Parameter	Required	Type	Description
1	`dataset_id`	yes	string	dataset identifier attached to pipeline output.
2	`output_dir`	yes	path	Absolute path to location of output.
3	`mt_ref_genome_dir`	yes	path	Absolute path to directory containing mitochondrial ref genome and mt ref genome index files. Path: `/hot/ref/mitochondria_ref/genome_fasta`
4	`gmapdb`	yes	path	Absolute path to to gmapdb directory. Path: `/hot/ref/mitochondria_ref/gmapdb/gmapdb_2021-03-08`
5	`save_intermediate_files`	no	boolean	Save intermediate files. If yes, not only the final BAM, but also the unmerged, unsorted, and duplicates unmarked BAM files will also be saved. Default is set to `false`.
6	`cache_intermediate_pipeline_steps`	no	boolean	Enable caching to resume pipeline and the end of the last successful process completion when a pipeline fails (if true the default submission script must be modified). Default is set to `false`.
7	`base_resource_update`	no	namespace	Namespace of parameters to update base resource allocations in the pipeline. Usage and structure are detailed in `template.config` and below.

Base resource allocation updaters

To optionally update the base resource (cpus or memory) allocations for processes, use the following structure and add the necessary parts. The default allocations can be found in the node-specific config files

base_resource_update {
    memory = [
        [['process_name', 'process_name2'], <multiplier for resource>],
        [['process_name3', 'process_name4'], <different multiplier for resource>]
    ]
    cpus = [
        [['process_name', 'process_name2'], <multiplier for resource>],
        [['process_name3', 'process_name4'], <different multiplier for resource>]
    ]
}

Note Resource updates will be applied in the order they're provided so if a process is included twice in the memory list, it will be updated twice in the order it's given.

Examples:

To double memory of all processes:

base_resource_update {
    memory = [
        [[], 2]
    ]
}

To double memory for convert_mitoCaller2vcf_mitoCaller and triple memory for Validate_Inputs and call_heteroplasmy:

base_resource_update {
    memory = [
        ['convert_mitoCaller2vcf_mitoCaller', 2],
        [['Validate_Inputs', 'call_heteroplasmy'], 3]
    ]
}

To double CPUs and memory for convert_mitoCaller2vcf_mitoCaller and double memory for Validate_Inputs:

base_resource_update {
    cpus = [
        ['convert_mitoCaller2vcf_mitoCaller', 2]
    ]
    memory = [
        [['convert_mitoCaller2vcf_mitoCaller', 'Validate_Inputs'], 2]
    ]
}