How to Run:
Requirements
Currently supported Nextflow versions: v23.04.2
Run steps
Below is a summary of how to run the pipeline. See here for full instructions.
Pipelines should be run WITH A SINGLE SAMPLE AT A TIME. Otherwise resource allocation and Nextflow errors could cause the pipeline to fail.
-
The recommended way of running the pipeline is to directly use the source code located here:
/hot/software/pipeline/pipeline-call-sSV/Nextflow/release/
, rather than cloning a copy of the pipeline. -
The source code should never be modified when running our pipelines
-
Create a config file for input, output, and parameters. An example for a config file can be found here. See Nextflow Config File Parameters for the detailed description of each variable in the config file.
-
Do not directly modify the source
template.config
, but rather you should copy it from the pipeline release folder to your project-specific folder and modify it there -
Create the input YAML using the template.See Input YAML for detailed description of each column.
-
Again, do not directly modify the source template input YAML file. Instead, copy it from the pipeline release folder to your project-specific folder and modify it there.
-
The pipeline can be executed locally using the command below:
-
YAML input
nextflow run path/to/main.nf -config path/to/sample-specific.config -params-file path/to/input.yaml
- For example,
path/to/main.nf
could be:/hot/software/pipeline/pipeline-call-sSV/Nextflow/release/6.0.0-rc.1/main.nf
path/to/sample-specific.config
is the path to where you saved your project-specific copy of template.configpath/to/input.yaml
is the path to where you saved your sample-specific copy of input-sSV.yaml
To submit to UCLAHS-CDS's Azure cloud, use the submission script here with the command below:
python path/to/submit_nextflow_pipeline.py \
--nextflow_script path/to/main.nf \
--nextflow_config path/to/sample-specific.config \
--nextflow_yaml path/to/input.yaml \
--pipeline_run_name <sample_name> \
--partition_type F16 \
--email <your UCLA email, jdoe@ucla.edu>
In the above command, the partition type can be changed based on the size of the dataset. At this point, node F16 is generally recommended for larger datasets like A-full and node F2 for smaller datasets like A-mini.
* Manta SV calling wouldn't work on an F2 node due to incompatible resources. In order to test the pipeline for tasks not relevant to Manta, please set algorithm = ['delly']
in the sample specific config file.
Note: Because this pipeline uses an image stored in the GitHub Container Registry, you must follow the steps listed in the Docker Introduction on Confluence to set up a PAT for your GitHub account and log into the registry on the cluster before running this pipeline.