The query language processor is activated in the GUI simple search entry when the search mode selector is set to Query Language. It can also be used with the KIO slave or the command line search. It broadly has the same capabilities as the complex search interface in the GUI.
The language is based on the (seemingly defunct) Xesam user search language specification.
If the results of a query language search puzzle you and you
doubt what has been actually searched for, you can use the GUI
Show Query
link at the top of the result list to
check the exact query which was finally executed by Xapian.
Here follows a sample request that we are going to explain:
author:"john doe" Beatles OR Lennon Live OR Unplugged -potatoes
This would search for all documents with
John Doe
appearing as a phrase in the author field (exactly what this is
would depend on the document type, ie: the
From:
header, for an email message),
and containing either beatles
or
lennon
and either
live
or
unplugged
but not
potatoes
(in any part of the document).
An element is composed of an optional field specification,
and a value, separated by a colon (the field separator is the last
colon in the element). Example:
Eugenie
,
author:balzac
,
dc:title:grandet
The colon, if present, means "contains". Xesam defines other relations, which are mostly unsupported for now (except in special cases, described further down).
All elements in the search entry are normally combined
with an implicit AND. It is possible to specify that elements be
OR'ed instead, as in Beatles
OR
Lennon
. The
OR
must be entered literally (capitals), and
it has priority over the AND associations:
word1
word2
OR
word3
means
word1
AND
(word2
OR
word3
)
not
(word1
AND
word2
) OR
word3
. Explicit
parenthesis are not supported.
An element preceded by a -
specifies a
term that should not appear. Pure negative
queries are forbidden.
As usual, words inside quotes define a phrase
(the order of words is significant), so that
title:"prejudice pride"
is not the same as
title:prejudice title:pride
, and is
unlikely to find a result.
Modifiers can be set on a phrase clause, for example to specify a proximity search (unordered). See the modifier section.
Recoll currently manages the following default fields:
title
,
subject
or caption
are
synonyms which specify data to be searched for in the
document title or subject.
author
or
from
for searching the documents
originators.
recipient
or
to
for searching the documents
recipients.
keyword
for searching the
document-specified keywords (few documents actually have
any).
filename
for the document's
file name.
ext
specifies the file
name extension (Ex: ext:html
)
The field syntax also supports a few field-like, but special, criteria:
dir
for filtering the
results on file location
(Ex: dir:/home/me/somedir
).
-dir
also works to find results not in the specified directory
(release >= 1.15.8). A tilde inside the value will be
expanded to the home directory. Wildcards will be
expanded, but
please have a
look at an important limitation of wildcards in
path filters.
Relative paths also make sense, for example,
dir:share/doc
would match either
/usr/share/doc
or
/usr/local/share/doc
Several dir
clauses can be specified,
both positive and negative. For example the following makes sense:
dir:recoll dir:src -dir:utils -dir:common
This would select results which have both
recoll
and src
in the
path (in any order), and which have not either
utils
or
common
.
You can also use OR
conjunctions
with dir:
clauses.
A special aspect of dir
clauses is
that the values in the index are not transcoded to UTF-8, and
never lower-cased or unaccented, but stored as binary. This means
that you need to enter the values in the exact lower or upper
case, and that searches for names with diacritics may sometimes
be impossible because of character set conversion
issues. Non-ASCII UNIX file paths are an unending source of
trouble and are best avoided.
You need to use double-quotes around the path value if it contains space characters.
size
for filtering the
results on file size. Example:
size<10000
. You can use
<
, >
or
=
as operators. You can specify a range like the
following: size>100 size<1000
. The usual
k/K, m/M, g/G, t/T
can be used as (decimal)
multipliers. Ex: size>1k
to search for files
bigger than 1000 bytes.
date
for searching or filtering
on dates. The syntax for the argument is based on the ISO8601
standard for dates and time intervals. Only dates are supported, no
times. The general syntax is 2 elements separated by a
/
character. Each element can be a date or a
period of time. Periods are specified as
P
n
Y
n
M
n
D
.
The n
numbers are the respective numbers
of years, months or days, any of which may be missing. Dates are
specified as
YYYY
-MM
-DD
.
The days and months parts may be missing. If the
/
is present but an element is missing, the
missing element is interpreted as the lowest or highest date in the
index. Examples:
2001-03-01/2002-05-01
the
basic syntax for an interval of dates.
2001-03-01/P1Y2M
the
same specified with a period.
2001/
from the beginning of
2001 to the latest date in the index.
2001
the whole year of
2001
P2D/
means 2 days ago up to
now if there are no documents with dates in the future.
/2003
all documents from
2003 or older.
Periods can also be specified with small letters (ie: p2y).
mime
or
format
for specifying the
mime type. This one is quite special because you can specify
several values which will be OR'ed (the normal default for the
language is AND). Ex: mime:text/plain
mime:text/html
. Specifying an explicit boolean
operator before a
mime
specification is not supported and
will produce strange results. You can filter out certain types
by using negation (-mime:some/type
), and you can
use wildcards in the value (mime:text/*
).
Note that mime
is
the ONLY field with an OR default. You do need to use
OR
with ext
terms for
example.
type
or
rclcat
for specifying the category (as in
text/media/presentation/etc.). The classification of mime
types in categories is defined in the Recoll configuration
(mimeconf
), and can be modified or
extended. The default category names are those which permit
filtering results in the main GUI screen. Categories are OR'ed
like mime types above. This can't be negated with
-
either.
Words inside phrases and capitalized words are not stem-expanded. Wildcards may be used anywhere inside a term. Specifying a wild-card on the left of a term can produce a very slow search (or even an incorrect one if the expansion is truncated because of excessive size). Also see More about wildcards.
The document filters used while indexing have the possibility to create other fields with arbitrary names, and aliases may be defined in the configuration, so that the exact field search possibilities may be different for you if someone took care of the customisation.
Some characters are recognized as search modifiers when found
immediately after the closing double quote of a phrase, as in
"some term"modifierchars
. The actual "phrase"
can be a single term of course. Supported modifiers:
l
can be used to turn off
stemming (mostly makes sense with p
because
stemming is off by default for phrases).
o
can be used to specify a
"slack" for phrase and proximity searches: the number of
additional terms that may be found between the specified
ones. If o
is followed by an integer number,
this is the slack, else the default is 10.
p
can be used to turn the
default phrase search into a proximity one
(unordered). Example:"order any in"p
C
will turn on case
sensitivity (if the index supports it).
D
will turn on diacritics
sensitivity (if the index supports it).
A weight can be specified for a query element
by specifying a decimal value at the start of the
modifiers. Example: "Important"2.5
.