How To Run
Requirements
Currently supported Nextflow versions: v23.04.2
Run steps
Below is a summary of how to run the pipeline. See here for full instructions.
Pipelines should be run WITH A SINGLE SAMPLE AT TIME. Otherwise resource allocation and Nextflow errors could cause the pipeline to fail.
-
The recommended way of running the pipeline is to directly use the source code located here:
/hot/software/pipeline/pipeline-call-gSV/Nextflow/release/
, rather than cloning a copy of the pipeline. -
The source code should never be modified when running our pipelines
-
Create a config file for input, output, and parameters. An example for a config file can be found here. See Nextflow Config File Parameters for the detailed description of each variable in the config file.
-
Do not directly modify the source
template.config
, but rather you should copy it from the pipeline release folder to your project-specific folder and modify it there -
Create the input YAML using the template. See Input YAML for a detailed description.
-
Again, do not directly modify the source template YAML file. Instead, copy it from the pipeline release folder to your project-specific folder and modify it there.
-
The pipeline can be executed locally using the command below:
nextflow run path/to/main.nf -config path/to/sample-specific.config
- For example,
path/to/main.nf
could be:/hot/software/pipeline/pipeline-call-gSV/Nextflow/release/5.0.0-rc.1/main.nf
path/to/sample-specific.config
is the path to where you saved your project-specific copy of template.configpath/to/input.yaml
is the path to where you saved your sample-specific copy of call-gSV-input.yaml
To submit to UCLAHS-CDS's Azure cloud, use the submission script here with the command below:
python path/to/submit_nextflow_pipeline.py \
--nextflow_script path/to/main.nf \
--nextflow_config path/to/sample-specific.config \
--nextflow_yaml path/to/input.yaml \
--pipeline_run_name <sample_name> \
--partition_type F16 \
--email <your UCLA email, jdoe@ucla.edu>
In the above command, the partition type can be changed based on the size of the dataset. An F16 node is generally recommended for larger datasets like A-full.
Note: Because this pipeline uses an image stored in the GitHub Container Registry, you must follow the steps listed in the Docker Introduction on Confluence to set up a PAT for your GitHub account and log into the registry on the cluster before running this pipeline.