Most of the parameters specific to the
recoll GUI are set through the
Preferences menu and stored in the standard Qt
place ($HOME/.config/Recoll.org/recoll.conf
).
You probably do not want to edit this by hand.
Recoll indexing options are set inside text configuration files located in a configuration directory. There can be several such directories, each of which defines the parameters for one index.
The configuration files can be edited by hand or through the Index configuration dialog (Preferences menu). The GUI tool will try to respect your formatting and comments as much as possible, so it is quite possible to use both ways.
The most accurate documentation for the configuration parameters is given by comments inside the default files, and we will just give a general overview here.
By default, for each index, there are two sets of
configuration files. System-wide configuration files are kept
in a directory named
like /usr/[local/]share/recoll/examples
,
and define default values, shared by all indexes. For each
index, a parallel set of files defines the customized
parameters.
In addition (as of Recoll version 1.19.7), it is possible
to specify two additional configuration directories which will
be stacked before and after the user configuration
directory. These are defined by
the RECOLL_CONFTOP
and RECOLL_CONFMID
environment
variables. Values from configuration files inside the top
directory will override user ones, values from configuration
files inside the middle directory will override system ones
and be overriden by user ones. These two variables may be of
use to applications which augment Recoll functionality, and
need to add configuration data without disturbing the user's
files. Please note that the two, currently single, values will
probably be interpreted as colon-separated lists in the
future: do not use colon characters inside the directory
paths.
The default location of the configuration is the
.recoll
directory in your home. Most people will only use this
directory.
This location can be changed, or others can be added with the
RECOLL_CONFDIR
environment variable or the
-c
option parameter to recoll and
recollindex.
If the .recoll
directory does not
exist when recoll or
recollindex are started, it will be created
with a set of empty configuration files.
recoll will give you a chance to edit the
configuration file before starting
indexing. recollindex will proceed
immediately. To avoid mistakes, the automatic directory
creation will only occur for the
default location, not if -c
or
RECOLL_CONFDIR
were used (in the latter
cases, you will have to create the directory).
All configuration files share the same format. For example, a short extract of the main configuration file might look as follows:
# Space-separated list of directories to index. topdirs = ~/docs /usr/share/doc [~/somedirectory-with-utf8-txt-files] defaultcharset = utf-8
There are three kinds of lines:
Comment (starts with #) or empty.
Parameter affectation (name = value).
Section definition ([somedirname]).
Depending on the type of configuration file, section definitions either separate groups of parameters or allow redefining some parameters for a directory sub-tree. They stay in effect until another section definition, or the end of file, is encountered. Some of the parameters used for indexing are looked up hierarchically from the current directory location upwards. Not all parameters can be meaningfully redefined, this is specified for each in the next section.
When found at the beginning of a file path, the tilde character (~) is expanded to the name of the user's home directory, as a shell would do.
White space is used for separation inside lists. List elements with embedded spaces can be quoted using double-quotes.
Encoding issues. Most of the configuration parameters are plain ASCII. Two particular sets of values may cause encoding issues:
File path parameters may contain non-ascii characters and should use the exact same byte values as found in the file system directory. Usually, this means that the configuration file should use the system default locale encoding.
The unac_except_trans
parameter
should be encoded in UTF-8. If your system locale is not UTF-8, and
you need to also specify non-ascii file paths, this poses a
difficulty because common text editors cannot handle multiple
encodings in a single file. In this relatively unlikely case, you
can edit the configuration file as two separate text files with
appropriate encodings, and concatenate them to create the complete
configuration.
recoll.conf
is the main
configuration file. It defines things like
what to index (top directories and things to ignore), and the
default character set to use for document types which do not
specify it internally.
The default configuration will index your home directory. If this is not appropriate, start recoll to create a blank configuration, click , and edit the configuration file before restarting the command. This will start the initial indexing, which may take some time.
Most of the following parameters can be changed from the Index Configuration menu in the recoll interface. Some can only be set by editing the configuration file.
topdirs
Specifies the list of directories or files to
index (recursively for directories). You can use symbolic links
as elements of this list. See the
followLinks
option about following symbolic links
found under the top elements (not followed by default).
skippedNames
A space-separated list of patterns for names of files or directories that should be completely ignored. The list defined in the default file is:
skippedNames = #* bin CVS Cache cache* caughtspam tmp .thumbnails .svn \ *~ .beagle .git .hg .bzr loop.ps .xsession-errors \ .recoll* xapiandb recollrc recoll.conf
The list can be redefined at any sub-directory in the indexed area.
The top-level directories are not affected by this
list (that is, a directory in topdirs
might match and would still be indexed).
The list in the default configuration does not
exclude hidden directories (names beginning with a
dot), which means that it may index quite a few things
that you do not want. On the other hand, email user
agents like thunderbird
usually store messages in hidden directories, and you
probably want this indexed. One possible solution is to
have .*
in
skippedNames
, and add things like
~/.thunderbird
or
~/.evolution
in
topdirs
.
Not even the file names are indexed for patterns
in this list. See the
recoll_noindex
variable in
mimemap
for an alternative
approach which indexes the file names.
skippedPaths
and
daemSkippedPaths
A space-separated list of patterns for paths of files or directories that should be skipped. There is no default in the sample configuration file, but the code always adds the configuration and database directories in there.
skippedPaths
is used both by
batch and real time
indexing. daemSkippedPaths
can be
used to specify things that should be indexed at
startup, but not monitored.
Example of use for skipping text files only in a specific directory:
skippedPaths = ~/somedir/*.txt
skippedPathsFnmPathname
The values in the
*skippedPaths
variables are matched by
default with fnmatch(3)
, with the
FNM_PATHNAME and FNM_LEADING_DIR flags. This means that '/'
characters must be matched explicitely. You can set
skippedPathsFnmPathname
to 0 to disable
the use of FNM_PATHNAME (meaning that /*/dir3 will match
/dir1/dir2/dir3).
zipSkippedNames
A space-separated list of patterns for names of files or directories that should be ignored inside zip archives. This is used directly by the zip filter, and has a function similar to skippedNames, but works independantly. Can be redefined for filesystem subdirectories. For versions up to 1.19, you will need to update the Zip filter and install a supplementary Python module. The details are described on the Recoll wiki.
followLinks
Specifies if the indexer should follow
symbolic links while walking the file tree. The default is
to ignore symbolic links to avoid multiple indexing of
linked files. No effort is made to avoid duplication when
this option is set to true. This option can be set
individually for each of the topdirs
members by using sections. It can not be changed below the
topdirs
level.
indexedmimetypes
Recoll normally indexes any file which it knows how to read. This list lets you restrict the indexed mime types to what you specify. If the variable is unspecified or the list empty (the default), all supported types are processed. Can be redefined for subdirectories.
compressedfilemaxkbs
Size limit for compressed (.gz or .bz2) files. These need to be decompressed in a temporary directory for identification, which can be very wasteful if 'uninteresting' big compressed files are present. Negative means no limit, 0 means no processing of any compressed file. Defaults to -1.
textfilemaxmbs
Maximum size for text files. Very big text files are often uninteresting logs. Set to -1 to disable (default 20MB).
textfilepagekbs
If set to other than -1, text files will be indexed as multiple documents of the given page size. This may be useful if you do want to index very big text files as it will both reduce memory usage at index time and help with loading data to the preview window. A size of a few megabytes would seem reasonable (default: 1MB).
membermaxkbs
This defines the maximum size in kilobytes for an archive member (zip, tar or rar at the moment). Bigger entries will be skipped.
indexallfilenames
Recoll indexes file names in a special section of the database to allow specific file names searches using wild cards. This parameter decides if file name indexing is performed only for files with mime types that would qualify them for full text indexing, or for all files inside the selected subtrees, independently of mime type.
usesystemfilecommand
Decide if we use the
file -i
system command
as a final step for determining the mime type for a file
(the main procedure uses suffix associations as defined in
the mimemap
file). This can be useful
for files with suffix-less names, but it will also cause
the indexing of many bogus "text" files.
processwebqueue
If this is set, process the directory where Web browser plugins copy visited pages for indexing.
webqueuedir
The path to the web indexing queue. This is
hard-coded in the Firefox plugin as
~/.recollweb/ToIndex
so there should be no
need to change it.
Changing some of these parameters will imply a full reindex. Also, when using multiple indexes, it may not make sense to search indexes that don't share the values for these parameters, because they usually affect both search and index operations.
indexStripChars
Decide if we strip characters of diacritics and
convert them to lower-case before terms are indexed. If we
don't, searches sensitive to case and diacritics can be
performed, but the index will be bigger, and some marginal
weirdness may sometimes occur. The default is a stripped
index (indexStripChars = 1
) for
now. When using multiple indexes for a search,
this parameter must be defined identically for
all. Changing the value implies an index reset.
maxTermExpand
Maximum expansion count for a single term (e.g.: when using wildcards). The default of 10000 is reasonable and will avoid queries that appear frozen while the engine is walking the term list.
maxXapianClauses
Maximum number of elementary clauses we can add to a single Xapian query. In some cases, the result of term expansion can be multiplicative, and we want to avoid using excessive memory. The default of 100 000 should be both high enough in most cases and compatible with current typical hardware configurations.
nonumbers
If this set to true, no terms will be generated for numbers. For example "123", "1.5e6", 192.168.1.4, would not be indexed ("value123" would still be). Numbers are often quite interesting to search for, and this should probably not be set except for special situations, ie, scientific documents with huge amounts of numbers in them. This can only be set for a whole index, not for a subtree.
nocjk
If this set to true, specific east asian
(Chinese Korean Japanese) characters/word splitting is
turned off. This will save a small amount of cpu if you
have no CJK documents. If your document base does include
such text but you are not interested in searching it,
setting nocjk
may be a significant time
and space saver.
cjkngramlen
This lets you adjust the size of n-grams used for indexing CJK text. The default value of 2 is probably appropriate in most cases. A value of 3 would allow more precision and efficiency on longer words, but the index will be approximately twice as large.
indexstemminglanguages
A list of languages for which the stem
expansion databases will be built. See recollindex(1) or use the
recollindex -l
command
for possible values. You can add a stem expansion database
for a different language by using
recollindex -s
, but it
will be deleted during the next indexing. Only languages
listed in the configuration file are permanent.
defaultcharset
The name of the character set used for
files that do not contain a character set definition (ie:
plain text files). This can be redefined for any
sub-directory. If it is not set at all, the character set
used is the one defined by the nls environment (
LC_ALL
, LC_CTYPE
,
LANG
), or iso8859-1
if nothing is set.
unac_except_trans
This is a list of characters, encoded in UTF-8,
which should be handled specially when converting text to
unaccented lowercase. For example, in Swedish, the letter
a with diaeresis
has full alphabet
citizenship and should not be turned into an
a
. Each element in the space-separated list
has the special character as first element and the translation
following. The handling of both the lowercase and upper-case
versions of a character should be specified, as appartenance to
the list will turn-off both standard accent and case
processing. Example for Swedish:
unac_except_trans = åå Åå ää Ää öö Öö
Note that the translation is not limited to a single
character, you could very well have something like
üue
in the list.
The default value set for
unac_except_trans
can't be listed here
because I have trouble with SGML and UTF-8, but it only
contains ligature decompositions: german ss, oe, ae, fi,
fl.
This parameter can't be defined for subdirectories, it is global, because there is no way to do otherwise when querying. If you have document sets which would need different values, you will have to index and query them separately.
maildefcharset
This can be used to define the default character set specifically for email messages which don't specify it. This is mainly useful for readpst (libpst) dumps, which are utf-8 but do not say so.
localfields
This allows setting fields for all documents
under a given directory. Typical usage would be to set an
"rclaptg" field, to be used in mimeview
to
select a specific viewer. If several fields are to be set, they
should be separated with a semi-colon (';') character, which there
is currently no way to escape. Also note the initial semi-colon.
Example:
localfields= ;rclaptg=gnus;other = val
, then
select specifier viewer with
mimetype|tag=...
in
mimeview
.
noxattrfields
Recoll versions 1.19 and later
automatically translate file extended attributes into
document fields (to be processed according to the
parameters from the fields
file). Setting this variable to 1 will disable the
behaviour.
metadatacmds
This allows executing external commands for each file and storing the output in Recoll document fields. This could be used for example to index external tag data. The value is a list of field names and commands, don't forget an initial semi-colon. Example:
[/some/area/of/the/fs] metadatacmds = ; tags = tmsu tags %f; otherfield = somecmd -xx %f
As a specially disgusting hack brought by
Recoll 1.19.7, if a "field name" begins
with rclmulti
, the data returned by
the command is expected to contain multiple field
values, in configuration file format. This allows
setting several fields by executing a single
command. Example:
metadatacmds = ; rclmulti1 = somecmd %f
If somecmd
returns data in the form
of:
field1 = value1 field2 = "value for field2"
field1
and field2
will be set inside the
document metadata.
dbdir
The name of the Xapian data directory. It will be created if needed when the index is initialized. If this is not an absolute path, it will be interpreted relative to the configuration directory. The value can have embedded spaces but starting or trailing spaces will be trimmed. You cannot use quotes here.
idxstatusfile
The name of the scratch file where the indexer
process updates its status. Default:
idxstatus.txt
inside the configuration
directory.
maxfsoccuppc
Maximum file system occupation before we stop indexing. The value is a percentage, corresponding to what the "Capacity" df output column shows. The default value is 0, meaning no checking.
mboxcachedir
The directory where mbox message offsets cache files are held. This is normally $RECOLL_CONFDIR/mboxcache, but it may be useful to share a directory between different configurations.
mboxcacheminmbs
The minimum mbox file size over which we cache the offsets. There is really no sense in caching offsets for small files. The default is 5 MB.
webcachedir
This is only used by the web browser
plugin indexing code, and defines where the cache for visited
pages will live. Default:
$RECOLL_CONFDIR/webcache
webcachemaxmbs
This is only used by the web browser plugin indexing code, and defines the maximum size for the web page cache. Default: 40 MB.
idxflushmb
Threshold (megabytes of new text data) where we flush from memory to disk index. Setting this can help control memory usage. A value of 0 means no explicit flushing, letting Xapian use its own default, which is flushing every 10000 (or XAPIAN_FLUSH_THRESHOLD) documents, which gives little memory usage control, as memory usage also depends on average document size. The default value is 10, and it is probably a bit low. If your system usually has free memory, you can try higher values between 20 and 80. In my experience, values beyond 100 are always counterproductive.
The Recoll indexing process recollindex can use multiple threads to speed up indexing on multiprocessor systems. The work done to index files is divided in several stages and some of the stages can be executed by multiple threads. The stages are:
You can also read a longer document about the transformation of Recoll indexing to multithreading.
The threads configuration is controlled by two configuration file parameters.
thrQSizes
This variable defines the job input queues configuration. There are three possible queues for stages 2, 3 and 4, and this parameter should give the queue depth for each stage (three integer values). If a value of -1 is used for a given stage, no queue is used, and the thread will go on performing the next stage. In practise, deep queues have not been shown to increase performance. A value of 0 for the first queue tells Recoll to perform autoconfiguration (no need for the two other values in this case) - this is the default configuration.
thrTCounts
This defines the number of threads used for each stage. If a value of -1 is used for one of the queue depths, the corresponding thread count is ignored. It makes no sense to use a value other than 1 for the last stage because updating the Xapian index is necessarily single-threaded (and protected by a mutex).
The following example would use three queues (of depth 2), and 4 threads for converting source documents, 2 for processing their text, and one to update the index. This was tested to be the best configuration on the test system (quadri-processor with multiple disks).
thrQSizes = 2 2 2 thrTCounts = 4 2 1
The following example would use a single queue, and the complete processing for each document would be performed by a single thread (several documents will still be processed in parallel in most cases). The threads will use mutual exclusion when entering the index update stage. In practise the performance would be close to the precedent case in general, but worse in certain cases (e.g. a Zip archive would be performed purely sequentially), so the previous approach is preferred. YMMV... The 2 last values for thrTCounts are ignored.
thrQSizes = 2 -1 -1 thrTCounts = 6 1 1
The following example would disable multithreading. Indexing will be performed by a single thread.
thrQSizes = -1 -1 -1
autodiacsens
IF the index is not stripped, decide if we
automatically trigger diacritics sensitivity if the search
term has accented characters (not in
unac_except_trans
). Else you need to use
the query language and the D
modifier to
specify diacritics sensitivity. Default is no.
autocasesens
IF the index is not stripped, decide if we
automatically trigger character case sensitivity if the
search term has upper-case characters in any but the first
position. Else you need to use the query language and the
C
modifier to specify character-case
sensitivity. Default is yes.
loglevel,daemloglevel
Verbosity level for recoll and
recollindex. A value of 4 lists quite a lot of
debug/information messages. 2 only lists errors. The
daem
version is specific to the indexing monitor
daemon.
logfilename,
daemlogfilename
Where the messages should go. 'stderr' can
be used as a special value, and is the default. The
daem
version is specific to the indexing monitor
daemon.
mondelaypatterns
This allows specify wildcard path patterns (processed with fnmatch(3) with 0 flag), to match files which change too often and for which a delay should be observed before re-indexing. This is a space-separated list, each entry being a pattern and a time in seconds, separated by a colon. You can use double quotes if a path entry contains white space. Example:
mondelaypatterns = *.log:20 "this one has spaces*:10"
monixinterval
Minimum interval (seconds) for processing the indexing queue. The real time monitor does not process each event when it comes in, but will wait this time for the queue to accumulate to diminish overhead and in order to aggregate multiple events to the same file. Default 30 S.
monauxinterval
Period (in seconds) at which the real time monitor will regenerate the auxiliary databases (spelling, stemming) if needed. The default is one hour.
monioniceclass, monioniceclassdata
These allow defining the ionice class and data used by the indexer (default class 3, no data).
filtermaxseconds
Maximum filter execution time, after which it is aborted. Some postscript programs just loop...
filtersdir
A directory to search for the external filter scripts used to index some types of files. The value should not be changed, except if you want to modify one of the default scripts. The value can be redefined for any sub-directory.
iconsdir
The name of the directory where recoll result list icons are stored. You can change this if you want different images.
idxabsmlen
Recoll stores an abstract for each indexed
file inside the database. The text can come from an actual
'abstract' section in the document or will just be the
beginning of the document. It is stored in the index so
that it can be displayed inside the result lists without
decoding the original
file. The idxabsmlen
parameter defines
the size of the stored abstract. The default value is 250 bytes.
The search interface gives you the choice to display this
stored text or a synthetic abstract built by extracting
text around the search terms. If you always
prefer the synthetic abstract, you can reduce this value
and save a little space.
aspellLanguage
Language definitions to use when creating the aspell dictionary. The value must match a set of aspell language definition files. You can type "aspell config" to see where these are installed (look for data-dir). The default if the variable is not set is to use your desktop national language environment to guess the value.
noaspell
If this is set, the aspell dictionary generation is turned off. Useful for cases where you don't need the functionality or when it is unusable because aspell crashes during dictionary generation.
mhmboxquirks
This allows definining location-related quirks
for the mailbox handler. Currently only the
tbird
flag is defined, and it should be set
for directories which hold
Thunderbird data, as their folder
format is weird.
This file contains information about dynamic fields handling
in Recoll. Some very basic fields have hard-wired behaviour,
and, mostly, you should not change the original data inside the
fields
file. But you can create custom fields
fitting your data and handle them just like they were native
ones.
The fields
file has several sections,
which each define an aspect of fields processing. Quite often,
you'll have to modify several sections to obtain the desired
behaviour.
We will only give a short description here, you should refer to the comments inside the file for more detailed information.
Field names should be lowercase alphabetic ASCII.
A field becomes indexed (searchable) by having a prefix defined in this section.
A field becomes stored (displayable inside results) by having its name listed in this section (typically with an empty value).
This section defines lists of synonyms for the
canonical names used inside the [prefixes]
and [stored]
sections
Some filters may need specific
configuration for handling fields. Only the email message filter
currently has such a section (named
[mail]
). It allows indexing arbitrary email
headers in addition to the ones indexed by default. Other such
sections may appear in the future.
Here follows a small example of a personal
fields
file. This would extract a specific email header and
use it as a searchable field, with data displayable inside result
lists. (Side note: as the email filter does no decoding on the values,
only plain ascii headers can be indexed, and only the
first occurrence will be used for headers that occur several times).
[prefixes] # Index mailmytag contents (with the given prefix) mailmytag = XMTAG [stored] # Store mailmytag inside the document data record (so that it can be # displayed - as %(mailmytag) - in result lists). mailmytag = [mail] # Extract the X-My-Tag mail header, and use it internally with the # mailmytag field name x-my-tag = mailmytag
Recoll versions 1.19 and later process user extended file attributes as documents fields by default.
Attributes are processed as fields of the same name,
after removing the user
prefix on
Linux.
The [xattrtofields]
section of the fields
file allows
specifying translations from extended attributes names to
Recoll field names. An empty translation disables use of the
corresponding attribute data.
mimemap
specifies the
file name extension to mime type mappings.
For file names without an extension, or with an unknown
one, the system's file -i
command will be
executed to determine the mime type (this can be switched off
inside the main configuration file).
The mappings can be specified on a per-subtree basis,
which may be useful in some cases. Example:
gaim logs have a
.txt
extension but
should be handled specially, which is possible because they
are usually all located in one place.
mimemap
also has a
recoll_noindex
variable which is a list of
suffixes. Matching files will be skipped (which avoids
unnecessary decompressions or file
executions). This is partially redundant with
skippedNames
in the main configuration
file, with a few differences: it will not affect directories,
it cannot be made dependant on the file-system location (it is
a configuration-wide parameter), and the file names will still
be indexed (not even the file names are indexed for patterns
in skippedNames
.
recoll_noindex
is used mostly for things
known to be unindexable by a given Recoll version. Having it
there avoids cluttering the more user-oriented and locally
customized skippedNames
.
mimeconf
specifies how the
different mime types are handled for indexing, and which icons
are displayed in the recoll result lists.
Changing the parameters in the [index] section is probably not a good idea except if you are a Recoll developer.
The [icons] section allows you to change the icons which
are displayed by recoll in the result
lists (the values are the basenames of the png images inside
the iconsdir
directory (specified in
recoll.conf
).
mimeview
specifies which programs
are started when you click on an Open link
in a result list. Ie: HTML is normally displayed using
firefox, but you may prefer
Konqueror, your
openoffice.org
program might be named oofice instead of
openoffice etc.
Changes to this file can be done by direct editing, or through the recoll GUI preferences dialog.
If Use desktop preferences to choose document
editor is checked in the Recoll GUI preferences, all
mimeview
entries will be ignored except the
one labelled application/x-all
(which is set to
use xdg-open by default).
In this case, the xallexcepts
top level
variable defines a list of mime type exceptions which
will be processed according to the local entries instead of being
passed to the desktop. This is so that specific Recoll options
such as a page number or a search string can be passed to
applications that support them, such as the
evince viewer.
As for the other configuration files, the normal usage
is to have a mimeview
inside your own
configuration directory, with just the non-default entries,
which will override those from the central configuration
file.
All viewer definition entries must be placed under a
[view]
section.
The keys in the file are normally mime types. You can add an
application tag to specialize the choice for an area of the
filesystem (using a localfields
specification
in mimeconf
). The syntax for the key is
mimetype
|
tag
The nouncompforviewmts
entry, (placed at
the top level, outside of the [view]
section),
holds a list of mime types that should not be uncompressed before
starting the viewer (if they are found compressed, ie:
mydoc.doc.gz
).
The right side of each assignment holds a command to be executed for opening the file. The following substitutions are performed:
%D. Document date
%f. File name. This may be the name of a temporary file if it was necessary to create one (ie: to extract a subdocument from a container).
%F. Original file name. Same as %f except if a temporary file is used.
%i. Internal path, for subdocuments of containers. The format depends on the container type. If this appears in the command line, Recoll will not create a temporary file to extract the subdocument, expecting the called application (possibly a script) to be able to handle it.
%M. Mime type
%p. Page index. Only significant for a subset of document types, currently only PDF, Postscript and DVI files. Can be used to start the editor at the right page for a match or snippet.
%s. Search term. The value will only be set for documents with indexed page numbers (ie: PDF). The value will be one of the matched search terms. It would allow pre-setting the value in the "Find" entry inside Evince for example, for easy highlighting of the term.
%U, %u. Url.
In addition to the predefined values above, all strings like
%(fieldname)
will be replaced by the value of
the field named fieldname
for the
document. This could be used in combination with field
customisation to help with opening the document.
ptrans
specifies query-time path
translations. These can be useful
in multiple
cases.
The file has a section for any index which needs translations, either the main one or additional query indexes. The sections are named with the Xapian index directory names. No slash character should exist at the end of the paths (all comparisons are textual). An exemple should make things sufficiently clear
[/home/me/.recoll/xapiandb] /this/directory/moved = /to/this/place [/path/to/additional/xapiandb] /server/volume1/docdir = /net/server/volume1/docdir /server/volume2/docdir = /net/server/volume2/docdir
Imagine that you have some kind of file which does not
have indexable content, but for which you would like to have a
functional Open link in the result list
(when found by file name). The file names end in
.blob
and can be displayed by
application blobviewer
.
You need two entries in the configuration files for this to work:
In $RECOLL_CONFDIR/mimemap
(typically ~/.recoll/mimemap
), add the
following line:
.blob = application/x-blobapp
Note that the mime type is made up here, and you could
call it diesel/oil
just the
same.
In $RECOLL_CONFDIR/mimeview
under the [view]
section, add:
application/x-blobapp = blobviewer %f
We are supposing
that blobviewer
wants a file
name parameter here, you would use %u
if
it liked URLs better.
If you just wanted to change the application used by
Recoll to display a mime type which it already knows, you
would just need to edit mimeview
. The
entries you add in your personal file override those in the
central configuration, which you do not need to
alter. mimeview
can also be modified
from the Gui.
Let us now imagine that the above
.blob
files actually contain
indexable text and that you know how to extract it with a
command line program. Getting Recoll to index the files is
easy. You need to perform the above alteration, and also to
add data to the mimeconf
file
(typically in ~/.recoll/mimeconf
):
Under the [index]
section, add the following line (more about the
rclblob
indexing script
later):
application/x-blobapp = exec rclblob
Under the [icons]
section, you should choose an icon to be displayed for the
files inside the result lists. Icons are normally 64x64
pixels PNG files which live in
/usr/[local/]share/recoll/images
.
Under the [categories]
section, you should add the mime type where it makes sense
(you can also create a category). Categories may be used
for filtering in advanced search.
The rclblob
filter should
be an executable program or script which exists inside
/usr/[local/]share/recoll/filters
. It
will be given a file name as argument and should output the
text or html contents on the standard output.
The filter programming section describes in more detail how to write a filter.