[2022-03-02] Grabed from https://wiki.zimbra.com/wiki/Zimbra_Web_Client_Search_Tips
Some type of query is always applied to produce the view that you see in the Zimbra interface. This topic describes in detail the search grammar used for Zimbra's Search feature & the most overlooked feature: The main search bar!
-
TIP: You can set your General Options to 'Always show search string' to see the current query in the Search toolbar. For example, when this option is set, clicking your Inbox folder shows the query string 'in:inbox'.
-
If you set the "Initial mail search:" to something besides in:inbox and upon login you will be taken to that folder/results first.
Simple searches can be done by just entering a word into the search field. Bare words (words without a search operator) are interpreted to search in the 'content:' operator -- this matches any text in the message.
More advanced searches can be done by specifying a search operator. A search operator is a special keyword followed by a colon, followed by some other parameter specific to that operator. For example:
-
in:inbox the operator is "in" and the parameter is "inbox" - this returns messages which are in the folder named "inbox"
-
from:someone the operator is "from" and the parameter is "someone" - this returns messages which have the word "someone" in their email address
You can prefix any keyword with the word "not" to specify items that do not have that criterion, for example not in:inbox. Search is case insensitive, meaning that "in:inbox" is the same as "in:Inbox". The minus sign (-) is a synonym for NOT So: not in:inbox is the same as -in:inbox
In most cases, it is not necessary to include punctuation-type characters in your search string, as these are ignored by the search code. There are certain times where this is not true (for example, searching for a time '9:30' in a message) and in those cases you should enclose the search parameter in quotation marks. For example: subject:"9:30" will return messages which have the string 9:30 in the subject.
Allowable characters in the search parameter:
-
The following characters cannot be anywhere in a search parameter unless it is enclosed by quotes: ~ ' ! # $ % ^ & * ( ) _ ? / { }[ ] ; :
-
The following characters are allowed in a search parameter as long as they are not the first character: - + < >
If multiple search terms are entered (separated by spaces) they are ANDed together by default. in:inbox foo means "return me messages which are in the inbox AND which have the word foo in them". For searches using multiple criteria, you can either find items that match one of the specified criteria or all of them. You can perform both types of searches using the Advanced search builder in the UI.
For all search panes other than the Basic search, the rules are:
-
Searching for messages that match any of the specified criteria is called an 'OR' search, because if the message contains either X or Y, then it is considered a match. For Advanced search panes with multiple check boxes, making multiple selections within a single pane creates an 'OR' search for those items.
-
Searching for messages that contain both X and Y is called an 'AND' search, because the message must meet all the specified criteria in order to be considered a match. For Advanced server panes with check boxes, opening multiple instances of the same pane and making different check box selections in each one causes the criteria to be specified as an 'AND' search.
Only "OR" appears in a query. If you selected as an option to show the search query in the Search bar as you make selections in the Advanced search, the Search text box updates to show the resulting query. With the 'AND' type of search, the word 'AND' does not appear.
Tip: Using parenthesis with AND and OR. Words within parentheses are considered as a unit. For example from: (john thomas) is equivalent to from:john AND from:thomas. If you use OR in the parenthesis, from:(john or smith), the search is for results from:john OR from:thomas.
The asterisk (*) can be used as a wildcard in a search to find content that contains words that have similar spellings.
Use the asterisk * as a wildcard after a prefix. For example, the search string do* returns items such as do, dog, door, etc.
content: Specifies text that the message must contain. For example, content:bananas finds all items containing the word "bananas".
from: Specifies a sender name or email address that is in the From header. This can be text, as in "John Smith III", an email address such as "joe@acme.com", or a domain such as "@zimbra.com".
to: Same as from: except that it specifies one of the people to whom the email was addressed in the To: header.
cc: Same as from: except that it specifies a recipient in the Cc: header of the message.
toccme: Same as from: except that it specifies me as one of the people to whom the email was addressed in the TO: or cc: header.
subject: Specifies text that must appear in the subject header of the message. An example might be subject:new vacation policy.
in: Specifies a folder. For example, in:sent would show all items in your 'Sent' folder.
under: Specifies searching a folder and its sub-folders.
has: Specifies an attribute that the message must have. The types of object you can specify are "attachment", "phone", or "url". For example, has:attachment would find all messages which contain one or more attachments of any type.
filename: Specifies an attachment file name. For example, filename:query.txt would find messages with a file attachment named "query.txt".
type: Specifies a search within attachments of a specified type. The types of attachment you can specify are "text", "word", "excel", and "pdf". For example, type:word "hello" finds messages with attachments that are Microsoft Word documents and searches within those attachments for the word "hello".
attachment: Specifies any item with a certain type of attachment. For example, attachment:word would find all messages with Word attachments.
is: Searches for messages with a certain status - for example, is: unread will find all unread messages. Allowable values are "unread", "read", "flagged", "unflagged", "sent", "draft", "received", "replied", "unreplied", "forwarded", unforwarded", "anywhere", "remote" (in a shared folder), "local", "sent", "invite", "solo" (no other messages in conversation), "tome", "fromme", "ccme", "tofromme". "fromccme", "tofromccme" (to, from cc me, including my aliases)
date: Use this keyword to specify a date, using the format that is default for your browser's locale (for US English the format is mm/dd/yyyy). For example, date:2/1/2007 would find messages dated February 1, 2007. The greater than (>) or less than (<) symbols can be used instead of after or before. >= and <= are also allowed.
after: Specifies mail sent after a certain date. For example, after:2/1/2007 specifies mail sent after February 1, 2007.
before: Same as after: except specifies mail sent before the specified date.
size: Specifies messages whose total size, including attachments, is a specified number of bytes, kilobytes, or megabytes For example, size:12 kb would find messages that are exactly 12K in size. The greater than (>) or less than (<) symbols can be used instead of bigger or smaller.
larger: Similar to size: except specifies greater than the specified size.
smaller: Similar to size: except specifies smaller than the specified size.
tag: Finds messages which have been tagged with a specified tag. For example, tag:amber will find message that have a tag called "amber" applied.
Searching email header fields can be quite useful. When searching header fields, use one of two formats:
#FIELDNAME:string
field[FIELDNAME]:string
Use the 'Show Original' feature to view the raw message source and expose all searchable headers indicated by FIELDNAME:.
-
Search all messages sent with Outlook or Microsoft Outlook so you can tell those senders to convert to the Zimbra Web Client. Remember to use double quotes with search strings having spaces. #X-Mailer:Outlook #X-Mailer:"Microsoft Outlook"
-
Find messages with a certain DKIM signature. #DKIM-Signature:d=newsletters.slashdot.org
-
These messages do not have DKIM-Signature header. Note the search is prefixed with a "-". -#DKIM-Signature:*
-
Search for messages relayed through a particular server. #Received:"from mail-lb0-f182.google.com"
As provided in /opt/zimbra/docs/query.txt
Query Syntax
---------------------------------------------------------------------------------------------------
content:(TEXT)
subject:[>,<,>=,<=](TEXT)
msgid:(TEXT) // Message-Id: field from mime header
envto:(TEXT|EMAIL_ADDR|DOMAIN) // x-envelope-to mime header
envfrom:(TEXT|EMAIL|DOMAIN) // x-envelope-from mime-header
contact:(TEXT) // special-case searching for contact picker (matches type=contact documents only)
to:(TEXT|EMAIL_ADDR|DOMAIN)
from:[>,<,>=,<=]({TEXT}|{EMAIL_ADDR}|{DOMAIN})
cc:(TEXT|EMAIL|DOMAIN)
tofrom:(TEXT|EMAIL|DOMAIN) // TO or FROM
tocc:(TEXT|EMAIL|DOMAIN) // TO or CC
fromcc:(TEXT|EMAIL|DOMAIN) // TO or FROM or CC
tofromcc:(TEXT|EMAIL|DOMAIN) // TO or FROM or CC
in:(FOLDER_LABEL) // in the specified folder
under:(FOLDER_LABEL) // in the specified folder and all descendants
inid:(FOLDER_ID) // in the specified folder
underid:{FOLDER_ID} // in the specified folder and all descendants
has:(attachment|OBJECT_TYPE)
filename:(TEXT)
type:(RAW_MIME_TYPE|FRIENDLY_MIME_TYPE)
attachment:(RAW_MIME_TYPE|FRIENDLY_MIME_TYPE)
is:(anywhere|unread|read|flagged|unflagged|sent|received|replied|unreplied|
forwarded|unforwarded|invite|solo|tome|fromme|ccme|tofromme|fromccme|
tofromccme|local|remote)
date:[>,<,>=,<=](DATE) // created date
mdate:[>,<,>=,<=](DATE) // modified date
day:[>,<,>=,<=](DATE)
week:[>,<,>=,<=](DATE)
month:[>,<,>=,<=](DATE)
year:[>,<,>=,<=](DATE)
after:(DATE)
before:(DATE)
size:(SIZE)
bigger:(SIZE)
larger:(SIZE)
smaller:(SIZE)
tag:(TAG)
priority:(high|low)
message:(DB_MSG_ID)
my:(MY_SAVED_SEARCH_NAME) // not supported yet
modseq:[>,<,>=,<=](CHANGE_ID)
conv:(DB_CONV_ID)
conv-count:(NUM)
conv-minm:(NUM)
conv-maxm:(NUM)
conv-start:(DATE)
conv-end:(DATE)
appt-start:[>,<,>=,<=](DATE)
appt-end:[>,<,>=,<=](DATE)
author:(TEXT)
title:(TEXT)
keywords:(TEXT)
company:(TEXT)
metadata:(TEXT)
item:(all|none|[0-9]+|{[0-9]+(,[0-9]+)*})
field[FIELDNAME]:(TEXT)|[>,<,>=,<=](NUMBER)
#FIELDNAME:(TEXT)|[>,<,>=,<=](NUMBER)
sort: overrides the sort field specified in the <SearchRequest>
FRIENDLY_MIME_TYPE:"text"|"application"|"word"|"msword"|"excel"|"xls"|"ppt"|"pdf"|"ms-tnef"|"image"|"jpeg"|"gif"|"bmp"|"none"|"any"
TEXT: text string, must be in "'s if has spaces in it
EMAIL_ADDR: text string, no spaces, with @ sign
DOMAIN: such as *.com
FOLDER_LABEL: mail|trash|spam|anywhere
TAG: tag_name
OBJECT_TYPE: "phone" "url" "credit-card" etc...types of parsed objects
DATE: absolute-date = mm/dd/yyyy (locale sensitive) OR
relative-date = {+|-}nnnn{mi|minute[s]|h|hour[s]|d|day[s]|w|week[s]|m|month[s]|y|year[s]}
SIZE: ([<>])?n+{b,kb,mb} // default is b
DB_MSG_ID: ??
NUM: ([<>])?n+
---------------------------------------------------------------------------------------------------
Test Cases
----------
ski
after:3/1/2004
subject:linux
subject:"code has"
linux or has:ssn
larger:1M
is:flagged
not is:flagged
not in:junk
-is:read
- is a synonym for "not" and may immediately precede a field
Fields
---------------------------------------------------------------------------------------------------
CONTENT:
CONTENT field (e.g. email message body) is tokenized by word. We deem the following Unicode ranges are CJK characters.
* 2E80-2EFF CJK Radicals Supplement
* 2F00-2FDF Kangxi Radicals
* 2FF0-2FFF Ideographic Description Characters
*------------------------------------------------------------------------------
* 3000-303F [EXCLUDE] CJK Symbols and Punctuation
*------------------------------------------------------------------------------
* 3040-309F Hiragana
* 30A0-30FF Katakana
* 3100-312F Bopomofo
* 3130-318F Hangul Compatibility Jamo
* 3190-319F Kanbun
* 31A0-31BF Bopomofo Extended
* 31C0-31EF CJK Strokes
* 31F0-31FF Katakana Phonetic Extensions
* 3200-32FF Enclosed CJK Letters and Months
* 3300-33FF CJK Compatibility
* 3400-4DBF CJK Unified Ideographs Extension A
* 4DC0-4DFF Yijing Hexagram Symbols
* 4E00-9FFF CJK Unified Ideographs
*------------------------------------------------------------------------------
* AC00-D7AF Hangul Syllables
* D7B0-D7FF Hangul Jamo Extended-B
*------------------------------------------------------------------------------
* FF00-FFEF Halfwidth and Fullwidth Forms
To CJK character sequences, we apply the bigram tokenization where words are constructed by every subsequence of 2
characters (e.g. ABCD consists of AB, BC and CD) regardless of the grammatical structure. Therefore, CJK character
sequences are searchable by a subsequence of 2 characters or a combination of those. For example, suppose you have
a text content: "ABCDEFG". Searching by "AB", "BC", "BCD", "BCDE" are all hits even if "ABCDEFG" are grammatically
tokenized into "ABC" "DEF", and "G".
To non CJK character sequences, words are split by whitespace [\r|\n|\t|\f] or punctuation [_|-|/|.|,] with some
exceptions:
* We recognize email addresses by a pattern of "*@*.*[.*]".
* We recognize host names by a pattern of "*.*[.*]".
* We recognize numbers by a pattern of "*[0-9]([_|-|/|.|,][0-9])*".
After tokenization, all words become case-insensitive, and the following common English words are trimmed.
"a", "an", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "no", "not", "of", "on",
"or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"
Therefore, non CJK character sequences are only searchable by a word or a combination of words. For example, suppose
you have a text content: "3401 Hillview Ave, Palo Alto, CA 94304 USA". Searching by "3401", "ave" and "palo alto" are
all hits. But, substrings of each word such as "hill", "view", "94" or "US" are not hits.
Wildcards (*) are only supported at the end of word, and the search works as if it's a prefix search.
CONTACT:
CONTACT field works in the same manner as CONTENT field besides it's also seachable by stop words and always translated
to a prefix search.
header-related
from:
to:
cc:{name|domain}
subject:
saved-search-related: (UNIMPLEMENTED)
my:{name-of-saved-search}
i.e., could have a saved search of "domain:example.zimbra.com"
called "zimbra" and say:
my:zimbra
object-related fields:
has:attachment constrains search to messages with attachments
has:{phone|url|ssn|po...} messages with "objects"
mime-related
filename:{file-name} constrains search to messages with attachments of given name
type:{mime-type} constrains search to blobs of the given type
attachment:{mime-type} constrains search to messages with attachments of the given type
mime-type = raw-mime-type | mime-type-alias
raw-mime-type = type/sub-type (i.e., text/plain)
# aliases are "user-friendly" mime types
mime-type-alias = "word" | "excel" | "pdf", "image", "audio", etc...
EXAMPLES:
type:word "hello" searches within only words docs for "hello"
attachment:word "hello" searches within messages that have word docs for "hello"
attachment:image/* matches all messages that have image attachments
flags
is:anywhere --> in any folder (overrides spam-trash setting for that query part)
note that "is:anywhere" does NOT imply "across all mountpoints". Mountpoints must
be explicitly included with an "in:" term -- in:mountpointname.
is:unread
is:read
is:flagged
is:unflagged
is:sent
is:received
is:invite
is:solo --> true if the item has no conversation
The flag name can also be used in a search:
tag:\{FlagName} where {FlagName} is one of the following values:
Attached, Answered, Copied, Deleted, Draft, Flagged, Forwarded, Notified, Subscribed, Unread
date-related fields
after:{date}
before:{date}
date = {absolute-date} | {relative-date}
absolute-date = mm/dd/yyyy (locale sensitive)
relative-date = [+/-]nnnn{minute,hour,day,week,month,year}
NOTE: absolute-date is locale sensitive. Our implementation delegates it to
JDK's DateFormat class whose behavior is as follows:
* ar - dd/mm/yyyy
* be - dd.mm.yyyy
* bg - yyyy-mm-dd
* ca - dd/mm/yyyy
* cs - dd.mm.yyyy
* da - dd-mm-yyyy
* de - dd.mm.yyyy
* el - dd/mm/yyyy
* en - mm/dd/yyyy (default)
* en_AU - dd/mm/yyyy
* en_CA - dd/mm/yyyy
* en_GB - dd/mm/yyyy
* en_IE - dd/mm/yyyy
* en_IN - dd/mm/yyyy
* en_NZ - dd/mm/yyyy
* en_ZA - yyyy/mm/dd
* es - dd/mm/yyyy
* es_DO - mm/dd/yyyy
* es_HN - mm-dd-yyyy
* es_PR - mm-dd-yyyy
* es_SV - mm-dd-yyyy
* et - dd.mm.yyyy
* fi - dd.mm.yyyy
* fr - dd/mm/yyyy
* fr_CA - yyyy-mm-dd
* fr_CH - dd.mm.yyyy
* hr - yyyy.MM.dd
* hr_HR - dd.MM.yyyy.
* hu - yyyy.MM.dd.
* is - dd.mm.yyyy
* it - dd/mm/yyyy
* it_CH - dd.mm.yyyy
* iw - dd/mm/yyyy
* ja - yyyy/mm/dd
* ko - yyyy. mm. dd
* lt - yyyy.mm.dd
* lv - yyyy.dd.mm
* mk - dd.mm.yyyy
* nl - dd-mm-yyyy
* nl_BE - dd/mm/yyyy
* no - dd.mm.yyyy
* pl - yyyy-mm-dd
* pl_PL - dd.mm.yyyy
* pt - dd-mm-yyyy
* pt_BR - dd/mm/yyyy
* ro - dd.mm.yyyy
* ru - dd.mm.yyyy
* sk - dd.mm.yyyy
* sl - dd.mm.yyyy
* sq - yyyy-mm-dd
* sv - yyyy-mm-dd
* th - dd/mm/yyyy
* tr - dd.mm.yyyy
* uk - dd.mm.yyyy
* vi - dd/mm/yyyy
* zh - yyyy-mm-dd
* zh_TW - yyyy/mm/dd
In case of format error, it falls back to mm/dd/yyyy.
NOTE: need to figure out how to represent "this week", "last week", "this month", etc. probably
some special casing of relative dates and use with after/before. i.e., maybe "after:-2d AND before:0d" means
yesterday? i.e., for relative day/week/month/year, you zero out month/week/day/hour/minute?
last 4 hours: after:-4hour
today: after:0day
yesterday: (after:-2day AND before:0day)
this week: after:0week
last week: (after:-2week AND before:0week)
this month: after:0month
last month: (after:-2month AND before:0month)
this year: after:0year
last year: (after:-2year AND before:0year)
last year and older: before:0year
appointment search operators
---------------------------------------------------------------------------------------------------
appt-start: appt-end:
Search based on the start and end times of the appointment.
For non-recurring appointments, this is basically what you
expect. For recurring appointments, the start and end times are
the *earliest possible* time (start of the first instance in the
recurrence) and *latest possible* time, or sometime in 2099 if
the recurrence has no end.
size-releted fields
larger:{size}
smaller:{size}
size:{size}
size is [<>]nnnn{b,kb,mb,gb} # default is kb?
tag-related fields
tag:{user-defined-tag}
domain-related fields
domain:{domain-list}
EXAMPLES: domain:stanford.edu OR: domain:*.org
db-related fields
message:{db-message-id} # constrain searches to a particular message
conversation-related-fields:
conv:{db-conv-id} # constrain searches to a paritcular conversation
* conv-min-count:{num} # constrain searches to conversations of a particular length
* conv-max-count:{num} # constrain searches to conversations of a particular length
conv-start:{date}
conv-end:{date}
metadata-related fields
author:
title:
keywords:
company:
metadata:
The metadata fields refer to the metadata of a non-textual attachment.
The fields author, title, keywords, company refer to the metadata fields
of the same name in the document.
The field metadata aggregates all the metadata fields including the above four.
E.g.,
author:acme finds all attachments whose author is acme
metadata:acme finds all attachments where acme appears in any metadata
fields, including author.
misc fields
* minm:nnnn # constrain to conversations with at least nnnn messages
* maxm:nnnn # constrain to conversations with at most nnnn messages
other-mime-specific fields
how do we want to handle doc properties from word, pdf, mp3, etc?
i.e.:
genre:rock OR artist:rush
title:"customer visit"
keywords:security
author:ross
* maybe {mime-type-alias}.field? i.e.:
audio.genre:rock OR audio.artist:rush (or mp3.*?)
word.title:hello
where the mime-type-alias can be left off if field is non-ambigious?
do we want to try and promote certain fields that we can share between mulitple types? (title, author, keywords)
Structured-Data Searching
---------------------------------------------------------------------------------------------------
Search/Indexing now has the ability to analyze and store data in
Name-Value pairs in such a way that they can be searched for in a
structured way by the query language.
For example, Contact name-value pairs are indexed this way.
Structured data is stored in the "l.field" lucene field, and it should be added to the index document in a format like this:
"fieldOne:value1 value2 value3 value4\nFieldTwo:value2 value3 value4"
The search language has been extended to allow field searches to be expressed like this:
#FieldName:value
For example, to find a contact with the last name "Davis" you would use a search query like:
#lastname:davis
OR
FIELD[lastname]:davis