Class AnalyzingInfixSuggester
- java.lang.Object
-
- org.apache.lucene.search.suggest.Lookup
-
- org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester
-
- All Implemented Interfaces:
java.io.Closeable
,java.lang.AutoCloseable
,Accountable
- Direct Known Subclasses:
BlendedInfixSuggester
public class AnalyzingInfixSuggester extends Lookup implements java.io.Closeable
Analyzes the input text and then suggests matches based on prefix matches to any tokens in the indexed text. This also highlights the tokens that match.This suggester supports payloads. Matches are sorted only by the suggest weight; it would be nice to support blended score + weight sort in the future. This means this suggester best applies when there is a strong a-priori ranking of all the suggestions.
This suggester supports contexts, including arbitrary binary terms.
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.lucene.search.suggest.Lookup
Lookup.LookupPriorityQueue, Lookup.LookupResult
-
-
Field Summary
Fields Modifier and Type Field Description private boolean
allTermsRequired
private boolean
closeIndexWriterOnBuild
private boolean
commitOnBuild
protected static java.lang.String
CONTEXTS_FIELD_NAME
Field name used for the indexed context, as a StringField and a SortedSetDVField, for filtering.static boolean
DEFAULT_ALL_TERMS_REQUIRED
Default boolean clause option for multiple terms matching (all terms required).protected static boolean
DEFAULT_CLOSE_INDEXWRITER_ON_BUILD
Default option to close the IndexWriter once the index has been built.static boolean
DEFAULT_HIGHLIGHT
Default higlighting option.static int
DEFAULT_MIN_PREFIX_CHARS
Default minimum number of leading characters before PrefixQuery is used (4).private Directory
dir
protected static java.lang.String
EXACT_TEXT_FIELD_NAME
Field name used for the indexed text, as a StringField, for exact lookup.private boolean
highlight
protected Analyzer
indexAnalyzer
Analyzer used at index time(package private) int
minPrefixChars
protected Analyzer
queryAnalyzer
Analyzer used at search timeprotected SearcherManager
searcherMgr
IndexSearcher
used for lookups.protected java.lang.Object
searcherMgrLock
Used to manage concurrent access to searcherMgrprivate static Sort
SORT
How we sort the postings and search results.protected static java.lang.String
TEXT_FIELD_NAME
Field name used for the indexed text.protected static java.lang.String
TEXTGRAMS_FIELD_NAME
edgegrams for searching short prefixes without Prefix Query that's controlled by minPrefixCharsprotected IndexWriter
writer
Used for ongoing NRT additions/updates.-
Fields inherited from class org.apache.lucene.search.suggest.Lookup
CHARSEQUENCE_COMPARATOR
-
-
Constructor Summary
Constructors Constructor Description AnalyzingInfixSuggester(Directory dir, Analyzer analyzer)
Create a new instance, loading from a previously built AnalyzingInfixSuggester directory, if it exists.AnalyzingInfixSuggester(Directory dir, Analyzer indexAnalyzer, Analyzer queryAnalyzer, int minPrefixChars, boolean commitOnBuild)
Create a new instance, loading from a previously built AnalyzingInfixSuggester directory, if it exists.AnalyzingInfixSuggester(Directory dir, Analyzer indexAnalyzer, Analyzer queryAnalyzer, int minPrefixChars, boolean commitOnBuild, boolean allTermsRequired, boolean highlight)
Create a new instance, loading from a previously built AnalyzingInfixSuggester directory, if it exists.AnalyzingInfixSuggester(Directory dir, Analyzer indexAnalyzer, Analyzer queryAnalyzer, int minPrefixChars, boolean commitOnBuild, boolean allTermsRequired, boolean highlight, boolean closeIndexWriterOnBuild)
Create a new instance, loading from a previously built AnalyzingInfixSuggester directory, if it exists.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
add(BytesRef text, java.util.Set<BytesRef> contexts, long weight, BytesRef payload)
Adds a new suggestion.void
addContextToQuery(BooleanQuery.Builder query, BytesRef context, BooleanClause.Occur clause)
This method is handy as we do not need access to internal fields such as CONTEXTS_FIELD_NAME in order to build queries However, here may not be its best location.protected void
addNonMatch(java.lang.StringBuilder sb, java.lang.String text)
Called while highlighting a single result, to append a non-matching chunk of text from the suggestion to the provided fragments list.protected void
addPrefixMatch(java.lang.StringBuilder sb, java.lang.String surface, java.lang.String analyzed, java.lang.String prefixToken)
Called while highlighting a single result, to append a matched prefix token, to the provided fragments list.protected void
addWholeMatch(java.lang.StringBuilder sb, java.lang.String surface, java.lang.String analyzed)
Called while highlighting a single result, to append the whole matched token to the provided fragments list.void
build(InputIterator iter)
Builds up a new internalLookup
representation based on the givenInputIterator
.private Document
buildDocument(BytesRef text, java.util.Set<BytesRef> contexts, long weight, BytesRef payload)
void
close()
void
commit()
Commits all pending changes made to this suggester to disk.protected java.util.List<Lookup.LookupResult>
createResults(IndexSearcher searcher, TopFieldDocs hits, int num, java.lang.CharSequence charSequence, boolean doHighlight, java.util.Set<java.lang.String> matchedTokens, java.lang.String prefixToken)
Create the results based on the search hits.private void
ensureOpen()
protected Query
finishQuery(BooleanQuery.Builder in, boolean allTermsRequired)
Subclass can override this to tweak the Query before searching.java.util.Collection<Accountable>
getChildResources()
Returns nested resources of this class.long
getCount()
Get the number of entries the lookup was built withprotected Directory
getDirectory(java.nio.file.Path path)
Subclass can override to choose a specificDirectory
implementation.private Analyzer
getGramAnalyzer()
protected IndexWriterConfig
getIndexWriterConfig(Analyzer indexAnalyzer, IndexWriterConfig.OpenMode openMode)
Override this to customize index settings, e.g.protected Query
getLastTokenQuery(java.lang.String token)
This is called if the last token isn't ended (e.g.protected FieldType
getTextFieldType()
Subclass can override this method to change the field type of the text field e.g.protected java.lang.Object
highlight(java.lang.String text, java.util.Set<java.lang.String> matchedTokens, java.lang.String prefixToken)
Override this method to customize the Object representing a single highlighted suggestions; the result is set on eachLookup.LookupResult.highlightKey
member.boolean
load(DataInput out)
Discard current lookup data and load it from a previously saved copy.java.util.List<Lookup.LookupResult>
lookup(java.lang.CharSequence key, int num, boolean allTermsRequired, boolean doHighlight)
Lookup, without any context.java.util.List<Lookup.LookupResult>
lookup(java.lang.CharSequence key, java.util.Map<BytesRef,BooleanClause.Occur> contextInfo, int num, boolean allTermsRequired, boolean doHighlight)
Retrieve suggestions, specifying whether all terms must match (allTermsRequired
) and whether the hits should be highlighted (doHighlight
).java.util.List<Lookup.LookupResult>
lookup(java.lang.CharSequence key, java.util.Set<BytesRef> contexts, boolean onlyMorePopular, int num)
Look up a key and return possible completion for this key.java.util.List<Lookup.LookupResult>
lookup(java.lang.CharSequence key, java.util.Set<BytesRef> contexts, int num, boolean allTermsRequired, boolean doHighlight)
Lookup, with context but without booleans.java.util.List<Lookup.LookupResult>
lookup(java.lang.CharSequence key, BooleanQuery contextQuery, int num, boolean allTermsRequired, boolean doHighlight)
This is an advanced method providing the capability to send down to the suggester any arbitrary lucene query to be used to filter the result of the suggesterlong
ramBytesUsed()
Return the memory usage of this object in bytes.void
refresh()
Reopens the underlying searcher; it's best to "batch up" many additions/updates, and then call refresh once in the end.boolean
store(DataOutput in)
Persist the constructed lookup data to a directory.private BooleanQuery
toQuery(java.util.Map<BytesRef,BooleanClause.Occur> contextInfo)
private BooleanQuery
toQuery(java.util.Set<BytesRef> contextInfo)
void
update(BytesRef text, java.util.Set<BytesRef> contexts, long weight, BytesRef payload)
Updates a previous suggestion, matching the exact same text as before.
-
-
-
Field Detail
-
TEXTGRAMS_FIELD_NAME
protected static final java.lang.String TEXTGRAMS_FIELD_NAME
edgegrams for searching short prefixes without Prefix Query that's controlled by minPrefixChars- See Also:
- Constant Field Values
-
TEXT_FIELD_NAME
protected static final java.lang.String TEXT_FIELD_NAME
Field name used for the indexed text.- See Also:
- Constant Field Values
-
EXACT_TEXT_FIELD_NAME
protected static final java.lang.String EXACT_TEXT_FIELD_NAME
Field name used for the indexed text, as a StringField, for exact lookup.- See Also:
- Constant Field Values
-
CONTEXTS_FIELD_NAME
protected static final java.lang.String CONTEXTS_FIELD_NAME
Field name used for the indexed context, as a StringField and a SortedSetDVField, for filtering.- See Also:
- Constant Field Values
-
queryAnalyzer
protected final Analyzer queryAnalyzer
Analyzer used at search time
-
indexAnalyzer
protected final Analyzer indexAnalyzer
Analyzer used at index time
-
dir
private final Directory dir
-
minPrefixChars
final int minPrefixChars
-
allTermsRequired
private final boolean allTermsRequired
-
highlight
private final boolean highlight
-
commitOnBuild
private final boolean commitOnBuild
-
closeIndexWriterOnBuild
private final boolean closeIndexWriterOnBuild
-
writer
protected IndexWriter writer
Used for ongoing NRT additions/updates.
-
searcherMgr
protected SearcherManager searcherMgr
IndexSearcher
used for lookups.
-
searcherMgrLock
protected final java.lang.Object searcherMgrLock
Used to manage concurrent access to searcherMgr
-
DEFAULT_MIN_PREFIX_CHARS
public static final int DEFAULT_MIN_PREFIX_CHARS
Default minimum number of leading characters before PrefixQuery is used (4).- See Also:
- Constant Field Values
-
DEFAULT_ALL_TERMS_REQUIRED
public static final boolean DEFAULT_ALL_TERMS_REQUIRED
Default boolean clause option for multiple terms matching (all terms required).- See Also:
- Constant Field Values
-
DEFAULT_HIGHLIGHT
public static final boolean DEFAULT_HIGHLIGHT
Default higlighting option.- See Also:
- Constant Field Values
-
DEFAULT_CLOSE_INDEXWRITER_ON_BUILD
protected static final boolean DEFAULT_CLOSE_INDEXWRITER_ON_BUILD
Default option to close the IndexWriter once the index has been built.- See Also:
- Constant Field Values
-
SORT
private static final Sort SORT
How we sort the postings and search results.
-
-
Constructor Detail
-
AnalyzingInfixSuggester
public AnalyzingInfixSuggester(Directory dir, Analyzer analyzer) throws java.io.IOException
Create a new instance, loading from a previously built AnalyzingInfixSuggester directory, if it exists. This directory must be private to the infix suggester (i.e., not an external Lucene index). Note thatclose()
will also close the provided directory.- Throws:
java.io.IOException
-
AnalyzingInfixSuggester
public AnalyzingInfixSuggester(Directory dir, Analyzer indexAnalyzer, Analyzer queryAnalyzer, int minPrefixChars, boolean commitOnBuild) throws java.io.IOException
Create a new instance, loading from a previously built AnalyzingInfixSuggester directory, if it exists. This directory must be private to the infix suggester (i.e., not an external Lucene index). Note thatclose()
will also close the provided directory.- Parameters:
minPrefixChars
- Minimum number of leading characters before PrefixQuery is used (default 4). Prefixes shorter than this are indexed as character ngrams (increasing index size but making lookups faster).commitOnBuild
- Call commit after the index has finished building. This would persist the suggester index to disk and future instances of this suggester can use this pre-built dictionary.- Throws:
java.io.IOException
-
AnalyzingInfixSuggester
public AnalyzingInfixSuggester(Directory dir, Analyzer indexAnalyzer, Analyzer queryAnalyzer, int minPrefixChars, boolean commitOnBuild, boolean allTermsRequired, boolean highlight) throws java.io.IOException
Create a new instance, loading from a previously built AnalyzingInfixSuggester directory, if it exists. This directory must be private to the infix suggester (i.e., not an external Lucene index). Note thatclose()
will also close the provided directory.- Parameters:
minPrefixChars
- Minimum number of leading characters before PrefixQuery is used (default 4). Prefixes shorter than this are indexed as character ngrams (increasing index size but making lookups faster).commitOnBuild
- Call commit after the index has finished building. This would persist the suggester index to disk and future instances of this suggester can use this pre-built dictionary.allTermsRequired
- All terms in the suggest query must be matched.highlight
- Highlight suggest query in suggestions.- Throws:
java.io.IOException
-
AnalyzingInfixSuggester
public AnalyzingInfixSuggester(Directory dir, Analyzer indexAnalyzer, Analyzer queryAnalyzer, int minPrefixChars, boolean commitOnBuild, boolean allTermsRequired, boolean highlight, boolean closeIndexWriterOnBuild) throws java.io.IOException
Create a new instance, loading from a previously built AnalyzingInfixSuggester directory, if it exists. This directory must be private to the infix suggester (i.e., not an external Lucene index). Note thatclose()
will also close the provided directory.- Parameters:
minPrefixChars
- Minimum number of leading characters before PrefixQuery is used (default 4). Prefixes shorter than this are indexed as character ngrams (increasing index size but making lookups faster).commitOnBuild
- Call commit after the index has finished building. This would persist the suggester index to disk and future instances of this suggester can use this pre-built dictionary.allTermsRequired
- All terms in the suggest query must be matched.highlight
- Highlight suggest query in suggestions.closeIndexWriterOnBuild
- If true, the IndexWriter will be closed after the index has finished building.- Throws:
java.io.IOException
-
-
Method Detail
-
getIndexWriterConfig
protected IndexWriterConfig getIndexWriterConfig(Analyzer indexAnalyzer, IndexWriterConfig.OpenMode openMode)
Override this to customize index settings, e.g. which codec to use.
-
getDirectory
protected Directory getDirectory(java.nio.file.Path path) throws java.io.IOException
Subclass can override to choose a specificDirectory
implementation.- Throws:
java.io.IOException
-
build
public void build(InputIterator iter) throws java.io.IOException
Description copied from class:Lookup
Builds up a new internalLookup
representation based on the givenInputIterator
. The implementation might re-sort the data internally.
-
commit
public void commit() throws java.io.IOException
Commits all pending changes made to this suggester to disk.- Throws:
java.io.IOException
- See Also:
IndexWriter.commit()
-
getGramAnalyzer
private Analyzer getGramAnalyzer()
-
ensureOpen
private void ensureOpen() throws java.io.IOException
- Throws:
java.io.IOException
-
add
public void add(BytesRef text, java.util.Set<BytesRef> contexts, long weight, BytesRef payload) throws java.io.IOException
Adds a new suggestion. Be sure to useupdate(org.apache.lucene.util.BytesRef, java.util.Set<org.apache.lucene.util.BytesRef>, long, org.apache.lucene.util.BytesRef)
instead if you want to replace a previous suggestion. After adding or updating a batch of new suggestions, you must callrefresh()
in the end in order to see the suggestions inlookup(java.lang.CharSequence, java.util.Set<org.apache.lucene.util.BytesRef>, boolean, int)
- Throws:
java.io.IOException
-
update
public void update(BytesRef text, java.util.Set<BytesRef> contexts, long weight, BytesRef payload) throws java.io.IOException
Updates a previous suggestion, matching the exact same text as before. Use this to change the weight or payload of an already added suggestion. If you know this text is not already present you can useadd(org.apache.lucene.util.BytesRef, java.util.Set<org.apache.lucene.util.BytesRef>, long, org.apache.lucene.util.BytesRef)
instead. After adding or updating a batch of new suggestions, you must callrefresh()
in the end in order to see the suggestions inlookup(java.lang.CharSequence, java.util.Set<org.apache.lucene.util.BytesRef>, boolean, int)
- Throws:
java.io.IOException
-
buildDocument
private Document buildDocument(BytesRef text, java.util.Set<BytesRef> contexts, long weight, BytesRef payload) throws java.io.IOException
- Throws:
java.io.IOException
-
refresh
public void refresh() throws java.io.IOException
Reopens the underlying searcher; it's best to "batch up" many additions/updates, and then call refresh once in the end.- Throws:
java.io.IOException
-
getTextFieldType
protected FieldType getTextFieldType()
Subclass can override this method to change the field type of the text field e.g. to change the index options
-
lookup
public java.util.List<Lookup.LookupResult> lookup(java.lang.CharSequence key, java.util.Set<BytesRef> contexts, boolean onlyMorePopular, int num) throws java.io.IOException
Description copied from class:Lookup
Look up a key and return possible completion for this key.- Specified by:
lookup
in classLookup
- Parameters:
key
- lookup key. Depending on the implementation this may be a prefix, misspelling, or even infix.contexts
- contexts to filter the lookup by, or null if all contexts are allowed; if the suggestion contains any of the contexts, it's a matchonlyMorePopular
- return only more popular resultsnum
- maximum number of results to return- Returns:
- a list of possible completions, with their relative weight (e.g. popularity)
- Throws:
java.io.IOException
-
lookup
public java.util.List<Lookup.LookupResult> lookup(java.lang.CharSequence key, int num, boolean allTermsRequired, boolean doHighlight) throws java.io.IOException
Lookup, without any context.- Throws:
java.io.IOException
-
lookup
public java.util.List<Lookup.LookupResult> lookup(java.lang.CharSequence key, java.util.Set<BytesRef> contexts, int num, boolean allTermsRequired, boolean doHighlight) throws java.io.IOException
Lookup, with context but without booleans. Context booleans default to SHOULD, so each suggestion must have at least one of the contexts.- Throws:
java.io.IOException
-
getLastTokenQuery
protected Query getLastTokenQuery(java.lang.String token) throws java.io.IOException
This is called if the last token isn't ended (e.g. user did not type a space after it). Return an appropriate Query clause to add to the BooleanQuery.- Throws:
java.io.IOException
-
lookup
public java.util.List<Lookup.LookupResult> lookup(java.lang.CharSequence key, java.util.Map<BytesRef,BooleanClause.Occur> contextInfo, int num, boolean allTermsRequired, boolean doHighlight) throws java.io.IOException
Retrieve suggestions, specifying whether all terms must match (allTermsRequired
) and whether the hits should be highlighted (doHighlight
).- Throws:
java.io.IOException
-
toQuery
private BooleanQuery toQuery(java.util.Map<BytesRef,BooleanClause.Occur> contextInfo)
-
toQuery
private BooleanQuery toQuery(java.util.Set<BytesRef> contextInfo)
-
addContextToQuery
public void addContextToQuery(BooleanQuery.Builder query, BytesRef context, BooleanClause.Occur clause)
This method is handy as we do not need access to internal fields such as CONTEXTS_FIELD_NAME in order to build queries However, here may not be its best location.- Parameters:
query
- an instance of @SeeBooleanQuery
context
- the contextclause
- one ofBooleanClause.Occur
-
lookup
public java.util.List<Lookup.LookupResult> lookup(java.lang.CharSequence key, BooleanQuery contextQuery, int num, boolean allTermsRequired, boolean doHighlight) throws java.io.IOException
This is an advanced method providing the capability to send down to the suggester any arbitrary lucene query to be used to filter the result of the suggester- Overrides:
lookup
in classLookup
- Parameters:
key
- the keyword being looked forcontextQuery
- an arbitrary Lucene query to be used to filter the result of the suggester.addContextToQuery(org.apache.lucene.search.BooleanQuery.Builder, org.apache.lucene.util.BytesRef, org.apache.lucene.search.BooleanClause.Occur)
could be used to build this contextQuery.num
- number of items to returnallTermsRequired
- all searched terms must match or notdoHighlight
- if true, the matching term will be highlighted in the search result- Returns:
- the result of the suggester
- Throws:
java.io.IOException
- f the is IO exception while reading data from the index
-
createResults
protected java.util.List<Lookup.LookupResult> createResults(IndexSearcher searcher, TopFieldDocs hits, int num, java.lang.CharSequence charSequence, boolean doHighlight, java.util.Set<java.lang.String> matchedTokens, java.lang.String prefixToken) throws java.io.IOException
Create the results based on the search hits. Can be overridden by subclass to add particular behavior (e.g. weight transformation). Note that there is no prefix token (theprefixToken
argument will be null) whenever the final token in the incoming request was in fact finished (had trailing characters, such as white-space).- Throws:
java.io.IOException
- If there are problems reading fields from the underlying Lucene index.
-
finishQuery
protected Query finishQuery(BooleanQuery.Builder in, boolean allTermsRequired)
Subclass can override this to tweak the Query before searching.
-
highlight
protected java.lang.Object highlight(java.lang.String text, java.util.Set<java.lang.String> matchedTokens, java.lang.String prefixToken) throws java.io.IOException
Override this method to customize the Object representing a single highlighted suggestions; the result is set on eachLookup.LookupResult.highlightKey
member.- Throws:
java.io.IOException
-
addNonMatch
protected void addNonMatch(java.lang.StringBuilder sb, java.lang.String text)
Called while highlighting a single result, to append a non-matching chunk of text from the suggestion to the provided fragments list.- Parameters:
sb
- TheStringBuilder
to append totext
- The text chunk to add
-
addWholeMatch
protected void addWholeMatch(java.lang.StringBuilder sb, java.lang.String surface, java.lang.String analyzed)
Called while highlighting a single result, to append the whole matched token to the provided fragments list.- Parameters:
sb
- TheStringBuilder
to append tosurface
- The surface form (original) textanalyzed
- The analyzed token corresponding to the surface form text
-
addPrefixMatch
protected void addPrefixMatch(java.lang.StringBuilder sb, java.lang.String surface, java.lang.String analyzed, java.lang.String prefixToken)
Called while highlighting a single result, to append a matched prefix token, to the provided fragments list.- Parameters:
sb
- TheStringBuilder
to append tosurface
- The fragment of the surface form (indexed duringbuild(org.apache.lucene.search.suggest.InputIterator)
, corresponding to this matchanalyzed
- The analyzed token that matchedprefixToken
- The prefix of the token that matched
-
store
public boolean store(DataOutput in) throws java.io.IOException
Description copied from class:Lookup
Persist the constructed lookup data to a directory. Optional operation.- Specified by:
store
in classLookup
- Parameters:
in
-DataOutput
to write the data to.- Returns:
- true if successful, false if unsuccessful or not supported.
- Throws:
java.io.IOException
- when fatal IO error occurs.
-
load
public boolean load(DataInput out) throws java.io.IOException
Description copied from class:Lookup
Discard current lookup data and load it from a previously saved copy. Optional operation.
-
close
public void close() throws java.io.IOException
- Specified by:
close
in interfacejava.lang.AutoCloseable
- Specified by:
close
in interfacejava.io.Closeable
- Throws:
java.io.IOException
-
ramBytesUsed
public long ramBytesUsed()
Description copied from interface:Accountable
Return the memory usage of this object in bytes. Negative values are illegal.- Specified by:
ramBytesUsed
in interfaceAccountable
-
getChildResources
public java.util.Collection<Accountable> getChildResources()
Description copied from interface:Accountable
Returns nested resources of this class. The result should be a point-in-time snapshot (to avoid race conditions).- Specified by:
getChildResources
in interfaceAccountable
- See Also:
Accountables
-
-