nf-core/proteinfamilies
Generation and updating of protein families
1.3.0). The latest
stable release is
2.0.0
.
See the advisory entry for more information.
Define where the pipeline should find input data and save output data.
Path to comma-separated file ‘.csv’ containing information about the samples in the experiment.
string^\S+\.csv$The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.
stringEmail address for completion summary. Example: name.surname@example.com
string^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$MultiQC report title. Printed as page header, used for filename if not otherwise specified.
stringParameters used to describe centralised config profiles. These should not be edited.
Git commit id for Institutional configs.
stringmasterBase directory for Institutional configs.
stringhttps://raw.githubusercontent.com/nf-core/configs/masterInstitutional config name.
stringInstitutional config description.
stringInstitutional config contact information.
stringInstitutional config URL link.
stringLess common options for the pipeline, typically set in a config file.
Display version and exit.
booleanMethod used to save pipeline results to output directory.
stringEmail address for completion summary, only when pipeline fails. Example: name.surname@example.com
string^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$Send plain-text email instead of HTML.
booleanFile size limit when attaching MultiQC reports to summary emails. Example: name.surname@example.com
string25.MB^\d+(\.\d+)?\.?\s*(K|M|G|T)?B$Do not use coloured log outputs.
booleanIncoming hook URL for messaging service
stringCustom config file to supply to MultiQC.
stringCustom logo file to supply to MultiQC. File name must also be set in the MultiQC config file
stringCustom MultiQC yaml file containing HTML including a methods description.
stringBoolean whether to validate parameters against the schema at runtime
booleantrueBase URL or local path to location of pipeline test dataset files
stringhttps://raw.githubusercontent.com/nf-core/test-datasets/proteinfamilies/Suffix to add to the trace report filename. Default is the date and time in the format yyyy-MM-dd_HH-mm-ss.
stringUse these parameters to control the flow of the clustering subworkflow execution.
Save the db output folder of mmseqs createdb
booleanChoose clustering algorithm. Either simple ‘cluster’ for medium size inputs, or ‘linclust’ for less sensitive clustering of larger datasets.
stringmmseqs parameter for minimum sequence identity
number0.5mmseqs parameter for minimum sequence coverage ratio
number0.9mmseqs parameter for coverage mode: 0 for both, 1 for target and 2 for query sequence
integerSave the clustering output folder of mmseqs cluster or linclust
booleanMinimum clustering chunk size threshold to create seed Multiple Sequence Alignments upon.
integer25Save membership-filtered initial mmseqs clusters in fasta format
booleanUse these parameters to control the Multiple Sequence Alignment subworkflow execution.
Choose alignment tool. FAMSA is recommended as best time-memory-accuracy combination option.
stringBoolean whether to trim the Multiple Sequence Alignment (MSA) gaps
booleantrueChoose the output format of the clipped alignment.
stringclipkitChoose if ClipKIT should only clip gaps at the ends of the MSAs.
booleantrueMultiple Sequence Alignment (MSA) positions with gappiness greater than this threshold will be trimmed
number0.5Set to true to recruit additional sequences from the input FASTA file using the family Hidden Markov Models (HMMs) to refine the alignments
booleantrueBoolean whether to generate target results file of hmmsearch
booleanBoolean whether to generate domain results file of hmmsearch
booleantruehmmsearch e-value cutoff threshold for reported results
number0.001Save the output of hmmsearch (.domtbl.gz and .tbl.gz)
booleanhmmsearch minimum length percentage filter of hit env vs query length
number0.9Save family fasta files after recruiting sequences with hmmsearch
booleanUse these parameters to control the redundancy removal subworkflow execution.
Removal of between-family redundancy via hmmsearch.
booleantruehmmsearch minimum length percentage filter of hit env vs query length, for redundant family removal
number0.9Save only the fasta files of non-redundant families (might still contain redundant sequences)
booleanRemoval of inside-family redundancy of sequences via mmseqs clustering.
booleantruemmseqs parameter for minimum sequence identity
number0.9mmseqs parameter for minimum sequence coverage ratio
number0.9mmseqs parameter for coverage mode: 0 for both, 1 for target and 2 for query sequence
integerSave the final family fasta files with sequence redundancy removed
boolean