How to Run:

Requirements

Currently supported Nextflow versions: v23.04.2

Run steps

Below is a summary of how to run the pipeline. See here for full instructions.

Pipelines should be run WITH A SINGLE SAMPLE AT A TIME. Otherwise resource allocation and Nextflow errors could cause the pipeline to fail.

  1. The recommended way of running the pipeline is to directly use the source code located here: /hot/software/pipeline/pipeline-call-sSV/Nextflow/release/, rather than cloning a copy of the pipeline.

  2. The source code should never be modified when running our pipelines

  3. Create a config file for input, output, and parameters. An example for a config file can be found here. See Nextflow Config File Parameters for the detailed description of each variable in the config file.

  4. Do not directly modify the source template.config, but rather you should copy it from the pipeline release folder to your project-specific folder and modify it there

  5. Create the input YAML using the template.See Input YAML for detailed description of each column.

  6. Again, do not directly modify the source template input YAML file. Instead, copy it from the pipeline release folder to your project-specific folder and modify it there.

  7. The pipeline can be executed locally using the command below:

  8. YAML input

nextflow run path/to/main.nf -config path/to/sample-specific.config -params-file path/to/input.yaml
  • For example, path/to/main.nf could be: /hot/software/pipeline/pipeline-call-sSV/Nextflow/release/6.0.0-rc.1/main.nf
  • path/to/sample-specific.config is the path to where you saved your project-specific copy of template.config
  • path/to/input.yaml is the path to where you saved your sample-specific copy of input-sSV.yaml

To submit to UCLAHS-CDS's Azure cloud, use the submission script here with the command below:

python path/to/submit_nextflow_pipeline.py \
    --nextflow_script path/to/main.nf \
    --nextflow_config path/to/sample-specific.config \
    --nextflow_yaml path/to/input.yaml \
    --pipeline_run_name <sample_name> \
    --partition_type F16 \
    --email <your UCLA email, jdoe@ucla.edu>

In the above command, the partition type can be changed based on the size of the dataset. At this point, node F16 is generally recommended for larger datasets like A-full and node F2 for smaller datasets like A-mini.

* Manta SV calling wouldn't work on an F2 node due to incompatible resources. In order to test the pipeline for tasks not relevant to Manta, please set algorithm = ['delly'] in the sample specific config file.

Note: Because this pipeline uses an image stored in the GitHub Container Registry, you must follow the steps listed in the Docker Introduction on Confluence to set up a PAT for your GitHub account and log into the registry on the cluster before running this pipeline.