nf-core/proteinfamilies

Generation and updating of protein families

metagenomicsprotein-familiesproteomics

These pages are for an old version of the pipeline (1.3.0). The latest stable release is 2.2.0 .

A known regression advisory with severity high has been issued for this version of the pipeline.
See the advisory entry for more information.

Launch version 1.3.0 https://github.com/nf-core/proteinfamilies

Define where the pipeline should find input data and save output data.

Path to comma-separated file ‘.csv’ containing information about the samples in the experiment.

required

type: string

pattern: ^\S+\.csv$

The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.

required

type: string

Email address for completion summary. Example: name.surname@example.com

type: string

pattern: ^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$

MultiQC report title. Printed as page header, used for filename if not otherwise specified.

type: string

Parameters used to describe centralised config profiles. These should not be edited.

Git commit id for Institutional configs.

hidden

type: string

default: master

Base directory for Institutional configs.

hidden

type: string

default: https://raw.githubusercontent.com/nf-core/configs/master

Institutional config name.

hidden

type: string

Institutional config description.

hidden

type: string

Institutional config contact information.

hidden

type: string

Institutional config URL link.

hidden

type: string

Less common options for the pipeline, typically set in a config file.

Display version and exit.

hidden

type: boolean

Method used to save pipeline results to output directory.

hidden

type: string

Email address for completion summary, only when pipeline fails. Example: name.surname@example.com

hidden

type: string

pattern: ^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$

Send plain-text email instead of HTML.

hidden

type: boolean

File size limit when attaching MultiQC reports to summary emails. Example: name.surname@example.com

hidden

type: string

default: 25.MB

pattern: ^\d+(\.\d+)?\.?\s*(K|M|G|T)?B$

Do not use coloured log outputs.

hidden

type: boolean

Incoming hook URL for messaging service

hidden

type: string

Custom config file to supply to MultiQC.

hidden

type: string

Custom logo file to supply to MultiQC. File name must also be set in the MultiQC config file

hidden

type: string

Custom MultiQC yaml file containing HTML including a methods description.

type: string

Boolean whether to validate parameters against the schema at runtime

hidden

type: boolean

default: true

Base URL or local path to location of pipeline test dataset files

hidden

type: string

default: https://raw.githubusercontent.com/nf-core/test-datasets/proteinfamilies/

Suffix to add to the trace report filename. Default is the date and time in the format yyyy-MM-dd_HH-mm-ss.

hidden

type: string

Use these parameters to control the flow of the clustering subworkflow execution.

Save the db output folder of mmseqs createdb

type: boolean

Choose clustering algorithm. Either simple ‘cluster’ for medium size inputs, or ‘linclust’ for less sensitive clustering of larger datasets.

type: string

mmseqs parameter for minimum sequence identity

type: number

default: 0.5

mmseqs parameter for minimum sequence coverage ratio

type: number

default: 0.9

mmseqs parameter for coverage mode: 0 for both, 1 for target and 2 for query sequence

type: integer

Save the clustering output folder of mmseqs cluster or linclust

type: boolean

Minimum clustering chunk size threshold to create seed Multiple Sequence Alignments upon.

type: integer

default: 25

Save membership-filtered initial mmseqs clusters in fasta format

type: boolean

Use these parameters to control the Multiple Sequence Alignment subworkflow execution.

Choose alignment tool. FAMSA is recommended as best time-memory-accuracy combination option.

type: string

Boolean whether to trim the Multiple Sequence Alignment (MSA) gaps

hidden

type: boolean

default: true

Choose the output format of the clipped alignment.

type: string

default: clipkit

Choose if ClipKIT should only clip gaps at the ends of the MSAs.

type: boolean

default: true

Multiple Sequence Alignment (MSA) positions with gappiness greater than this threshold will be trimmed

type: number

default: 0.5

Set to true to recruit additional sequences from the input FASTA file using the family Hidden Markov Models (HMMs) to refine the alignments

hidden

type: boolean

default: true

Boolean whether to generate target results file of hmmsearch

hidden

type: boolean

Boolean whether to generate domain results file of hmmsearch

hidden

type: boolean

default: true

hmmsearch e-value cutoff threshold for reported results

type: number

default: 0.001

Save the output of hmmsearch (.domtbl.gz and .tbl.gz)

type: boolean

hmmsearch minimum length percentage filter of hit env vs query length

type: number

default: 0.9

Save family fasta files after recruiting sequences with hmmsearch

type: boolean

Use these parameters to control the redundancy removal subworkflow execution.

Removal of between-family redundancy via hmmsearch.

hidden

type: boolean

default: true

hmmsearch minimum length percentage filter of hit env vs query length, for redundant family removal

type: number

default: 0.9

Save only the fasta files of non-redundant families (might still contain redundant sequences)

type: boolean

Removal of inside-family redundancy of sequences via mmseqs clustering.

hidden

type: boolean

default: true

mmseqs parameter for minimum sequence identity

type: number

default: 0.9

mmseqs parameter for minimum sequence coverage ratio

type: number

default: 0.9

mmseqs parameter for coverage mode: 0 for both, 1 for target and 2 for query sequence

type: integer

Save the final family fasta files with sequence redundancy removed

type: boolean

On this page

nf-core/proteinfamilies

Input/output options

Institutional config options

Generic options

Clustering parameters

Alignment parameters

Redundancy removal parameters