How To Run

Requirements

Currently supported Nextflow versions: v23.04.2

Run steps

Below is a summary of how to run the pipeline. See here for full instructions.

Pipelines should be run WITH A SINGLE SAMPLE AT TIME. Otherwise resource allocation and Nextflow errors could cause the pipeline to fail.

  1. The recommended way of running the pipeline is to directly use the source code located here: /hot/software/pipeline/pipeline-call-gSV/Nextflow/release/, rather than cloning a copy of the pipeline.

  2. The source code should never be modified when running our pipelines

  3. Create a config file for input, output, and parameters. An example for a config file can be found here. See Nextflow Config File Parameters for the detailed description of each variable in the config file.

  4. Do not directly modify the source template.config, but rather you should copy it from the pipeline release folder to your project-specific folder and modify it there

  5. Create the input YAML using the template. See Input YAML for a detailed description.

  6. Again, do not directly modify the source template YAML file. Instead, copy it from the pipeline release folder to your project-specific folder and modify it there.

  7. The pipeline can be executed locally using the command below:

nextflow run path/to/main.nf -config path/to/sample-specific.config
  • For example, path/to/main.nf could be: /hot/software/pipeline/pipeline-call-gSV/Nextflow/release/5.0.0-rc.1/main.nf
  • path/to/sample-specific.config is the path to where you saved your project-specific copy of template.config
  • path/to/input.yaml is the path to where you saved your sample-specific copy of call-gSV-input.yaml

To submit to UCLAHS-CDS's Azure cloud, use the submission script here with the command below:

python path/to/submit_nextflow_pipeline.py \
    --nextflow_script path/to/main.nf \
    --nextflow_config path/to/sample-specific.config \
    --nextflow_yaml path/to/input.yaml \
    --pipeline_run_name <sample_name> \
    --partition_type F16 \
    --email <your UCLA email, jdoe@ucla.edu>

In the above command, the partition type can be changed based on the size of the dataset. An F16 node is generally recommended for larger datasets like A-full.

Note: Because this pipeline uses an image stored in the GitHub Container Registry, you must follow the steps listed in the Docker Introduction on Confluence to set up a PAT for your GitHub account and log into the registry on the cluster before running this pipeline.