-
Notifications
You must be signed in to change notification settings - Fork 0
Commands
These are the command line commands of Annif, with REST API equivalents when applicable.
Most of these methods take a projectid
parameter. Projects are
identified by alphanumeric strings (A-Za-z0-9_-
).
annif loadvoc <projectid> <subjectfile>
Parameters:
-
subjectfile
: path to a file containing subjects in a subject vocabulary format
This will load the vocabulary to be used in subject indexing. Note that although projectid
is a parameter of the command, the vocabulary is shared by all the projects with the same vocab
identifier in the project configuration, and the vocabulary only needs to be loaded for one of those projects. If a vocabulary has already been loaded, reinvoking loadvoc
with a new subject file will update the Annif's internal vocabulary: label names are updated and any subject not appearing in the new subject file is removed. Note that new subjects won't be suggested before the project is retrained with the updated vocabulary.
REST equivalent: N/A
annif list-projects
REST equivalent:
GET /projects/
Show a list of currently defined projects. Projects are defined in a
configuration file, normally called projects.cfg
. See Project configuration for details.
annif show-project <projectid>
REST equivalent:
GET /projects/<projectid>
annif clear <projectid>
Initialize a project to its original, untrained state: removes the data files of the model.
REST equivalent: N/A
annif train <projectid> <path> [<path2> ...] [--projects FILE] [--backend-param BACKEND.PARAM=VAL]
or
annif train <projectid> --cached [--projects FILE] [--backend-param BACKEND.PARAM=VAL]
Parameters:
-
path
: path(s) to a directory containing text files in the corpus format, or a TSV file (possibly gzipped) -
projects
: Set path to config file -
backend-param
: Override a backend parameter of the config file -
cached
: If set, reuse preprocessed training data from the previous run. See Reusing preprocessed training data
This will train the project using all the documents from the given directory or TSV file in a single batch operation.
REST equivalent: N/A
annif learn <projectid> <path> [<path2> ...] [--projects FILE] [--backend-param BACKEND.PARAM=VAL]
Parameters:
-
path
: path(s) to a directory containing text files in the corpus format, or a TSV file (possibly gzipped) -
projects
: Set path to config file -
backend-param
: Override a backend parameter of the config file
This will continue training an already trained project using all the documents from the given directory or TSV file in a single batch operation. Not supported by all backends.
REST equivalent: /projects//learn
annif suggest <projectid> [--limit MAX] [--threshold THRESHOLD] [--projects FILE] [--backend-param BACKEND.PARAM=VAL] <document.txt
This will read a text document from standard input and suggest subjects for it.
Parameters:
-
limit
: maximum number of subjects to return -
threshold
: minimum score threshold, below which results will not be returned -
projects
: Set path to projects.cfg -
backend-param
: Override a backend parameter of the config file
REST equivalent:
POST /projects/<projectid>/suggest
annif eval <projectid> [--limit MAX] [--threshold THRESHOLD] [--projects FILE] <path> [<path2> ...]
You need to supply the documents in one of the supported Document corpus formats, i.e. either as a directory or as a TSV file. It is possible to give multiple corpora (even mixing corpus formats), in which case they will all be processed in the same run.
The output is a list of statistical measures.
Parameters:
-
limit
: maximum number of subjects to return -
threshold
: minimum score threshold, below which results will not be returned -
projects
: Set path to projects.cfg -
path
: path(s) to a directory containing text files in the corpus format or a TSV file (possibly gzipped)
REST equivalent: N/A
annif optimize <projectid> <path> [--projects FILE] [--backend-param BACKEND.PARAM=VAL] [<path2> ...]
As with eval
, you need to supply the documents in one of the supported Document corpus formats.
This command will read each document, assign subjects to it using different limit and threshold values, and compare the results with the gold standard subjects.
The output is a list of parameter combinations and their scores. From the output, you can determine the optimum limit and threshold parameters depending on which measure you want to target.
Parameters:
-
path
: path(s) to a directory containing text files in the corpus format or a TSV file (possibly gzipped) -
projects
: Set path to projects.cfg -
backend-param
: Override a backend parameter of the config file
REST equivalent: N/A
annif run
This will start a development web server on http://localhost:5000/ .
REST equivalent: N/A
- Home
- Getting started
- System requirements
- Optional features and dependencies
- Usage with Docker
- Architecture
- Commands
- Web user interface
- Corpus formats
- Project configuration
- Analyzers
- Achieving good results
- Reusing preprocessed training data
- Running as a WSGI service
- Backends
- Development flow, branches and tags
- Release process
- Creating a new backend