weka.classifiers.bayes
Class ComplementNaiveBayes

java.lang.Object
  extended by weka.classifiers.Classifier
      extended by weka.classifiers.bayes.ComplementNaiveBayes
All Implemented Interfaces:
java.io.Serializable, java.lang.Cloneable, CapabilitiesHandler, OptionHandler, RevisionHandler, TechnicalInformationHandler, WeightedInstancesHandler

public class ComplementNaiveBayes
extends Classifier
implements OptionHandler, WeightedInstancesHandler, TechnicalInformationHandler

Class for building and using a Complement class Naive Bayes classifier.

For more information see,

Jason D. Rennie, Lawrence Shih, Jaime Teevan, David R. Karger: Tackling the Poor Assumptions of Naive Bayes Text Classifiers. In: ICML, 616-623, 2003.

P.S.: TF, IDF and length normalization transforms, as described in the paper, can be performed through weka.filters.unsupervised.StringToWordVector.

BibTeX:

 @inproceedings{Rennie2003,
    author = {Jason D. Rennie and Lawrence Shih and Jaime Teevan and David R. Karger},
    booktitle = {ICML},
    pages = {616-623},
    publisher = {AAAI Press},
    title = {Tackling the Poor Assumptions of Naive Bayes Text Classifiers},
    year = {2003}
 }
 

Valid options are:

 -N
  Normalize the word weights for each class
 
 -S
  Smoothing value to avoid zero WordGivenClass probabilities (default=1.0).
 

Version:
$Revision: 5516 $
Author:
Ashraf M. Kibriya (amk14@cs.waikato.ac.nz)
See Also:
Serialized Form

Constructor Summary
ComplementNaiveBayes()
           
 
Method Summary
 void buildClassifier(Instances instances)
          Generates the classifier.
 double classifyInstance(Instance instance)
          Classifies a given instance.
 Capabilities getCapabilities()
          Returns default capabilities of the classifier.
 boolean getNormalizeWordWeights()
          Returns true if the word weights for each class are to be normalized
 java.lang.String[] getOptions()
          Gets the current settings of the classifier.
 java.lang.String getRevision()
          Returns the revision string.
 double getSmoothingParameter()
          Gets the smoothing value to be used to avoid zero WordGivenClass probabilities.
 TechnicalInformation getTechnicalInformation()
          Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.
 java.lang.String globalInfo()
          Returns a string describing this classifier
 java.util.Enumeration listOptions()
          Returns an enumeration describing the available options.
static void main(java.lang.String[] argv)
          Main method for testing this class.
 java.lang.String normalizeWordWeightsTipText()
          Returns the tip text for this property
 void setNormalizeWordWeights(boolean doNormalize)
          Sets whether if the word weights for each class should be normalized
 void setOptions(java.lang.String[] options)
          Parses a given list of options.
 void setSmoothingParameter(double val)
          Sets the smoothing value used to avoid zero WordGivenClass probabilities
 java.lang.String smoothingParameterTipText()
          Returns the tip text for this property
 java.lang.String toString()
          Prints out the internal model built by the classifier.
 
Methods inherited from class weka.classifiers.Classifier
debugTipText, distributionForInstance, forName, getDebug, makeCopies, makeCopy, setDebug
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

ComplementNaiveBayes

public ComplementNaiveBayes()
Method Detail

listOptions

public java.util.Enumeration listOptions()
Returns an enumeration describing the available options.

Specified by:
listOptions in interface OptionHandler
Overrides:
listOptions in class Classifier
Returns:
an enumeration of all the available options.

getOptions

public java.lang.String[] getOptions()
Gets the current settings of the classifier.

Specified by:
getOptions in interface OptionHandler
Overrides:
getOptions in class Classifier
Returns:
an array of strings suitable for passing to setOptions

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
Parses a given list of options.

Valid options are:

 -N
  Normalize the word weights for each class
 
 -S
  Smoothing value to avoid zero WordGivenClass probabilities (default=1.0).
 

Specified by:
setOptions in interface OptionHandler
Overrides:
setOptions in class Classifier
Parameters:
options - the list of options as an array of strings
Throws:
java.lang.Exception - if an option is not supported

getNormalizeWordWeights

public boolean getNormalizeWordWeights()
Returns true if the word weights for each class are to be normalized

Returns:
true if the word weights are normalized

setNormalizeWordWeights

public void setNormalizeWordWeights(boolean doNormalize)
Sets whether if the word weights for each class should be normalized

Parameters:
doNormalize - whether the word weights are to be normalized

normalizeWordWeightsTipText

public java.lang.String normalizeWordWeightsTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getSmoothingParameter

public double getSmoothingParameter()
Gets the smoothing value to be used to avoid zero WordGivenClass probabilities.

Returns:
the smoothing value

setSmoothingParameter

public void setSmoothingParameter(double val)
Sets the smoothing value used to avoid zero WordGivenClass probabilities

Parameters:
val - the new smooting value

smoothingParameterTipText

public java.lang.String smoothingParameterTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

globalInfo

public java.lang.String globalInfo()
Returns a string describing this classifier

Returns:
a description of the classifier suitable for displaying in the explorer/experimenter gui

getTechnicalInformation

public TechnicalInformation getTechnicalInformation()
Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.

Specified by:
getTechnicalInformation in interface TechnicalInformationHandler
Returns:
the technical information about this class

getCapabilities

public Capabilities getCapabilities()
Returns default capabilities of the classifier.

Specified by:
getCapabilities in interface CapabilitiesHandler
Overrides:
getCapabilities in class Classifier
Returns:
the capabilities of this classifier
See Also:
Capabilities

buildClassifier

public void buildClassifier(Instances instances)
                     throws java.lang.Exception
Generates the classifier.

Specified by:
buildClassifier in class Classifier
Parameters:
instances - set of instances serving as training data
Throws:
java.lang.Exception - if the classifier has not been built successfully

classifyInstance

public double classifyInstance(Instance instance)
                        throws java.lang.Exception
Classifies a given instance.

The classification rule is:
MinC(forAllWords(ti*Wci))
where
ti is the frequency of word i in the given instance
Wci is the weight of word i in Class c.

For more information see section 4.4 of the paper mentioned above in the classifiers description.

Overrides:
classifyInstance in class Classifier
Parameters:
instance - the instance to classify
Returns:
the index of the class the instance is most likely to belong.
Throws:
java.lang.Exception - if the classifier has not been built yet.

toString

public java.lang.String toString()
Prints out the internal model built by the classifier. In this case it prints out the word weights calculated when building the classifier.

Overrides:
toString in class java.lang.Object

getRevision

public java.lang.String getRevision()
Returns the revision string.

Specified by:
getRevision in interface RevisionHandler
Overrides:
getRevision in class Classifier
Returns:
the revision

main

public static void main(java.lang.String[] argv)
Main method for testing this class.

Parameters:
argv - the options