Solr Searching Guide
CLAIMS Direct is a web service that provides access to the IFI CLAIMS Global Patent Database, a data warehouse that contains patent records from over 100 patenting authorities stored in a common XML format. Each publication, including all published applications and granted patents, is represented by a separate record in the data.
The data warehouse is indexed in Solr, the fast open-source enterprise search platform from the Apache Lucene project. The search interface is a single search box, into which you can type simple or complex queries. The data warehouse is searchable by field. Field names and sample searches are provided below, along with an introduction to CLAIMS Solr search basics. For more information about Solr searching, see the Solr Reference Guide.
Important Information
Since the field names are case-sensitive, always use lower-case letters
Capitalize operators (AND, OR, NOT, TO)
Use straight quotes to enclose phrases. (Note: This is an issue only if you are cutting/pasting from another source where "smart quotes" may have been used.)
Boolean Operators
The Solr index supports AND, OR, and NOT as Boolean operators. Boolean operators must be ALL CAPS. If you enter these operators in lower case letters, the system will search them as terms.
AND
The AND operator matches documents where both terms exist anywhere in the text of a single document. This is equivalent to an intersection using sets. For example, to search for documents that contain "solar energy" and "heating", use the following query:
|
NOT
The NOT operator excludes documents that contain the term after NOT. This is equivalent to a difference using sets. The symbol ! can be used in place of the word NOT. For example, to search for documents that contain "solar energy" but not "heating", use the following query:
|
OR
The OR operator links two terms and finds a matching document if either of the terms exist in a document. This is equivalent to a union using sets. The symbol || can be used in place of the word OR. For example, to search for documents that contain either "solar energy" or "wind power", use the following query:
|
Default Operator
In CLAIMS Direct, the default operator is AND. This means that if no operator is specified, the system assumes AND. In the above examples, we explicitly included the operator in all cases for purposes of clarity.
Default Fields
When no field is specified in the query, the search is directed to the title, abstract, description, and claims fields.
This means that if you are searching for multiple values in a specified field, you must place the query in parentheses or quotes. Otherwise, all of the values after the first space will be searched in the default fields.
For example, the following query will return erroneous results because it will search for "General" in the assignee field, but will search for "Electric" in the default fields:
|
To return the correct results, the entire phrase must be in quotes:
|
Similarly, parentheses must be used in searches containing Boolean operators such as "OR". The following search will not return the expected results because it will search for records which contain "a61j" in the CPC field OR "a61k" in the default fields:
|
To return the correct results, both values must be placed in parentheses:
|
Wildcards
? -- Use the question mark to represent a single character (one and only one) at the end or within a word. To search for British or American spellings, use a query like this: sterili?e.
* -- Use the asterisk to represent 0 to many characters. For example, to search for test, tests, testing, tester, etc., use the search: test*. To retrieve sulphur or sulfur, use the search: sul*ur.
Note: You cannot use a * or ? symbol as the first character of a search.
Range Searching
Range Queries use the TO operator to match documents whose field values are between a specified lower and upper bound. The brackets around a range query determine its inclusiveness:
Square brackets [ & ] are used to include the upper and lower bounds.
Curly brackets { & } are used to exclude the upper and lower bounds.
Note: In a range query, the operator TO must be ALL CAPS.
For example, the following search will find documents whose publication dates have values between 20020101 and 20021231, including the specified dates:
|
You can also mix your use of brackets as in the following example, which will return records with publication dates of 20020101 through 20021231, but will not return records with a publication date of 20030101:
|
Phrases and Proximity
A phrase is a group of words surrounded by double quotes, such as "fuel cell". To retrieve only documents containing the phrase exactly as searched, place the phrase within quotes, as shown in the example below:
|
For facet or string fields (identified as "string" fields in the Field Type column in their respective Solr Search Field tables), ampersands are replaced by spaces during the search process. To search a phrase containing ampersands, the string must be in quotes (and depending on the web client, HTML-escaped using the string “&”). Alternatively, you can replace the ampersand with a space. For example, you could search:
"Rhodes&Schwartz" |
or
( Rhodes Schwartz ) |
CLAIMS Solr supports finding words that are within a specific proximity to one another. To execute a proximity search, use the tilde symbol "~" at the end of a phrase. For example, to search for "solar" and "generation" within 5 words of each other in a document, use the following search query:
|
You cannot use a wildcard inside a phrase with the default parser. However, it is now possible to search phrases that include wild cards and OR'd terms using the complex phrase parser. For more information about the Solr Complex Phrase Query Parser, see https://solr.apache.org/guide/8_11/other-parsers.html#complex-phrase-query-parser.
This type of search must be prefixed as a complex phrase. The specific syntax is as follows:
|
To search abstracts for documents related to solar energy storage modules, use wild cards in your phrase query, as shown in the example below:
|
Although the default search for this index is to follow the words in the order specified, you can also search for a phrase containing these same words in any order, as shown in the example below:
|
To search abstracts for documents containing information about thermal barriers, you might use a complex phrase search such as the one provided in the example below:
|
Fuzzy Search
Solr allows for fuzzy searching based on the Levenshtein Distance. A fuzzy search query returns terms similar to the queried term.
For example, the following query returns 'thermic', 'thermo', and 'thermal', but it also returns 'dermal'.
|
If you want to fine-tune the results similarity, you can attach a parameter (a number from 0 to 1 -- with 1 being the highest similarity) to the fuzzy search, following the tilde. When this parameter is not specified, the system defaults to .5.
You can use fuzzy searches to ferret out spelling variations and errors. For example:
|
or
|
If you want to use fuzzy search within a phrase, you can use the complex phrase query parser, as shown in the example below:
|
In complex queries, the distance parameter can only be used within a phrase. However, the proximity parameter is also possible outside the phrase, as in this example:
|
Case Sensitivity
Searches in CLAIMS Solr are not case-sensitive. Search terms may be entered in caps or lower case, regardless of case in the documents. The one exception to this rule is for facet or string fields. Because they return only exact matches, they are case-sensitive. These fields are identified as "string" fields in the Field Type column in their respective Solr Search Field tables.
Note: You must enter Operators in ALL CAPS and enter field names in lower case.
Complex Queries
Searches in CLAIMS Solr can include multiple search fields and multiple criteria per field. A few examples are provided to illustrate these more complex queries. Please consult the Solr Search Fields section for descriptions and additional search examples on a field-by-field basis.
Examples:
To search for European or PCT applications published in 2010 that have title, abstract, and claims in English that concern intraocular lenses, the search syntax would look like this:
|
The query can also be written on a single line:
|
To search for US granted patents published in the second quarter of 2010, issued to Chevron or Exxon or Total, the search syntax would look like this:
|
To search for EP publications with publication dates since 1 December 2010 in Cooperative Patent Classification H01G 9, the search syntax would look like this:
|
To search for any publications in December 2010 in US Class 208/415 or Cooperative Patent Classification C10G 1/00, the search syntax would look like this:
|
Further Reading