Chapter 3 Reproducing a Workflow/Pipeline

What is a Workflow/Pipeline?

Workflow refers to running different steps in a serial manner, where each step output can be input of another.

Traditionally people used to write simple workflows using shell scripting and make files.

But writing workflows in Bioinformatics can be quite challenging with those traditional methods. As sometime it involves lot of complex steps. For that purpose there are some dedicated languages available.

One such language we are going to discuss here.

3.1 A basic Nextflow guide

What are the basic things you need to a single step in a workflow?

  • Input
  • Output
  • Executable Commands
  • Environment (Ignore for now, will be get back to this in latter part)

Nextflow basic structure

process process_name {

    input:
    Single/Multiple Input

    output:
    Single/Multiple Output

    script:
    """
    whatever commands you like to execute.
    """ 
}

Each step in nextflow refereed as process.

Nextflow script written in a file called with extension .nf

Lets see a simple nextflow script to do quality check of FASTQ files.

File Name - fastqc.nf

params.forward = "./test_data/FASTQ/SRR1039508_1.fastq.gz"
params.reverse = "./test_data/FASTQ/SRR1039508_2.fastq.gz"

process fastqc {
	publishDir ".", mode: 'copy'

    input:
    params.forward
    params.reverse

    output:
    file("*") into fastqc_ch

    script:
    """
    mkdir -p fastqc_output
    fastqc -o fastqc_output -f fastq -q ${params.forward} ${params.reverse}
    """ 
}

Now lets extend it.

File Name - quality_check.nf