Uses of Class
org.apache.lucene.analysis.Tokenizer
-
Packages that use Tokenizer Package Description org.apache.lucene.analysis Text analysis.org.apache.lucene.analysis.cn.smart Analyzer for Simplified Chinese, which indexes words.org.apache.lucene.analysis.core Basic, general-purpose analysis components.org.apache.lucene.analysis.icu.segmentation Tokenizer that breaks text into words with the Unicode Text Segmentation algorithm.org.apache.lucene.analysis.ja Analyzer for Japanese.org.apache.lucene.analysis.ko Analyzer for Korean.org.apache.lucene.analysis.ngram Character n-gram tokenizers and filters.org.apache.lucene.analysis.path Analysis components for path-like strings such as filenames.org.apache.lucene.analysis.pattern Set of components for pattern-based (regex) analysis.org.apache.lucene.analysis.standard Fast, general-purpose grammar-based tokenizerStandardTokenizer
implements the Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29.org.apache.lucene.analysis.th Analyzer for Thai.org.apache.lucene.analysis.util Utility functions for text analysis.org.apache.lucene.analysis.wikipedia Tokenizer that is aware of Wikipedia syntax. -
-
Uses of Tokenizer in org.apache.lucene.analysis
Constructors in org.apache.lucene.analysis with parameters of type Tokenizer Constructor Description TokenStreamComponents(Tokenizer tokenizer)
Creates a newAnalyzer.TokenStreamComponents
from a TokenizerTokenStreamComponents(Tokenizer tokenizer, TokenStream result)
Creates a newAnalyzer.TokenStreamComponents
instance -
Uses of Tokenizer in org.apache.lucene.analysis.cn.smart
Subclasses of Tokenizer in org.apache.lucene.analysis.cn.smart Modifier and Type Class Description class
HMMChineseTokenizer
Tokenizer for Chinese or mixed Chinese-English text.Methods in org.apache.lucene.analysis.cn.smart that return Tokenizer Modifier and Type Method Description Tokenizer
HMMChineseTokenizerFactory. create(AttributeFactory factory)
-
Uses of Tokenizer in org.apache.lucene.analysis.core
Subclasses of Tokenizer in org.apache.lucene.analysis.core Modifier and Type Class Description class
KeywordTokenizer
Emits the entire input as a single token.class
LetterTokenizer
A LetterTokenizer is a tokenizer that divides text at non-letters.class
UnicodeWhitespaceTokenizer
A UnicodeWhitespaceTokenizer is a tokenizer that divides text at whitespace.class
WhitespaceTokenizer
A tokenizer that divides text at whitespace characters as defined byCharacter.isWhitespace(int)
.Methods in org.apache.lucene.analysis.core that return Tokenizer Modifier and Type Method Description Tokenizer
WhitespaceTokenizerFactory. create(AttributeFactory factory)
-
Uses of Tokenizer in org.apache.lucene.analysis.icu.segmentation
Subclasses of Tokenizer in org.apache.lucene.analysis.icu.segmentation Modifier and Type Class Description class
ICUTokenizer
Breaks text into words according to UAX #29: Unicode Text Segmentation (http://www.unicode.org/reports/tr29/) -
Uses of Tokenizer in org.apache.lucene.analysis.ja
Subclasses of Tokenizer in org.apache.lucene.analysis.ja Modifier and Type Class Description class
JapaneseTokenizer
Tokenizer for Japanese that uses morphological analysis. -
Uses of Tokenizer in org.apache.lucene.analysis.ko
Subclasses of Tokenizer in org.apache.lucene.analysis.ko Modifier and Type Class Description class
KoreanTokenizer
Tokenizer for Korean that uses morphological analysis. -
Uses of Tokenizer in org.apache.lucene.analysis.ngram
Subclasses of Tokenizer in org.apache.lucene.analysis.ngram Modifier and Type Class Description class
EdgeNGramTokenizer
Tokenizes the input from an edge into n-grams of given size(s).class
NGramTokenizer
Tokenizes the input into n-grams of the given size(s).Methods in org.apache.lucene.analysis.ngram that return Tokenizer Modifier and Type Method Description Tokenizer
EdgeNGramTokenizerFactory. create(AttributeFactory factory)
Tokenizer
NGramTokenizerFactory. create(AttributeFactory factory)
-
Uses of Tokenizer in org.apache.lucene.analysis.path
Subclasses of Tokenizer in org.apache.lucene.analysis.path Modifier and Type Class Description class
PathHierarchyTokenizer
Tokenizer for path-like hierarchies.class
ReversePathHierarchyTokenizer
Tokenizer for domain-like hierarchies.Methods in org.apache.lucene.analysis.path that return Tokenizer Modifier and Type Method Description Tokenizer
PathHierarchyTokenizerFactory. create(AttributeFactory factory)
-
Uses of Tokenizer in org.apache.lucene.analysis.pattern
Subclasses of Tokenizer in org.apache.lucene.analysis.pattern Modifier and Type Class Description class
PatternTokenizer
This tokenizer uses regex pattern matching to construct distinct tokens for the input stream.class
SimplePatternSplitTokenizer
class
SimplePatternTokenizer
-
Uses of Tokenizer in org.apache.lucene.analysis.standard
Subclasses of Tokenizer in org.apache.lucene.analysis.standard Modifier and Type Class Description class
ClassicTokenizer
A grammar-based tokenizer constructed with JFlexclass
StandardTokenizer
A grammar-based tokenizer constructed with JFlex.class
UAX29URLEmailTokenizer
This class implements Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29 URLs and email addresses are also tokenized according to the relevant RFCs. -
Uses of Tokenizer in org.apache.lucene.analysis.th
Subclasses of Tokenizer in org.apache.lucene.analysis.th Modifier and Type Class Description class
ThaiTokenizer
Tokenizer that useBreakIterator
to tokenize Thai text.Methods in org.apache.lucene.analysis.th that return Tokenizer Modifier and Type Method Description Tokenizer
ThaiTokenizerFactory. create(AttributeFactory factory)
-
Uses of Tokenizer in org.apache.lucene.analysis.util
Subclasses of Tokenizer in org.apache.lucene.analysis.util Modifier and Type Class Description class
CharTokenizer
An abstract base class for simple, character-oriented tokenizers.class
SegmentingTokenizerBase
Breaks text into sentences with aBreakIterator
and allows subclasses to decompose these sentences into words.Methods in org.apache.lucene.analysis.util that return Tokenizer Modifier and Type Method Description Tokenizer
TokenizerFactory. create()
Creates a TokenStream of the specified input using the default attribute factory.abstract Tokenizer
TokenizerFactory. create(AttributeFactory factory)
Creates a TokenStream of the specified input using the given AttributeFactory -
Uses of Tokenizer in org.apache.lucene.analysis.wikipedia
Subclasses of Tokenizer in org.apache.lucene.analysis.wikipedia Modifier and Type Class Description class
WikipediaTokenizer
Extension of StandardTokenizer that is aware of Wikipedia syntax.
-