Skip to content


Paul Götze edited this page Dec 17, 2017 · 3 revisions

Weka‘s classification and regression algorithms can be found in the Weka::Classifiers namespace.

The classifier classes are organised in the following submodules:


Getting information about a classifier

To get a description about the classifier class and its available options you can use the class methods .description and .options on each classifier:

puts Weka::Classifiers::Trees::RandomForest.description
# Class for constructing a forest of random trees.
# For more information see:
# Leo Breiman (2001). Random Forests. Machine Learning. 45(1):5-32.

puts Weka::Classifiers::Trees::RandomForest.options
# -I <number of trees>  Number of trees to build.
#   (default 100)
# -K <number of features> Number of features to consider (<1=int(log_2(#predictors)+1)).
#   (default 0)
# ...

The default options that are used for a classifier can be displayed with:

# => "-I 100 -K 0 -S 1 -num-slots 1"

Creating a new classifier

To build a new classifiers model based on training instances you can use the following syntax:

instances = Weka::Core::Instances.from_arff('weather.arff')
instances.class_attribute = :play

classifier =
classifier.use_options('-I 200 -K 5')

You can also build a classifier by using the block syntax:

classifier = do
  use_options '-I 200 -K 5'
  train_with_instances instances

Evaluating a classifier model

You can evaluate the trained classifier using cross-validation:

# default number of folds is 3
evaluation = classifier.cross_validate

# with a custom number of folds
evaluation = classifier.cross_validate(folds: 10)

The cross-validation returns a Weka::Classifiers::Evaluation object which can be used to get details about the accuracy of the trained classification model:

puts evaluation.summary
# Correctly Classified Instances          10               71.4286 %
# Incorrectly Classified Instances         4               28.5714 %
# Kappa statistic                          0.3778
# Mean absolute error                      0.4098
# Root mean squared error                  0.4657
# Relative absolute error                 87.4588 %
# Root relative squared error             96.2945 %
# Coverage of cases (0.95 level)         100      %
# Mean rel. region size (0.95 level)      96.4286 %
# Total Number of Instances               14

The evaluation holds detailed information about a number of different meassures of interest, like the precision and recall, the FP/FN/TP/TN-rates, F-Measure and the areas under PRC and ROC curve.

If your trained classifier should be evaluated against a set of test instances, you can use evaluate:

test_instances = Weka::Core::Instances.from_arff('test_data.arff')
test_instances.class_attribute = :play

evaluation = classifier.evaluate(test_instances)

Classifying new data

Each classifier implements either a classify method or a distibution_for method, or both.

The classify method takes a Weka::Core::DenseInstance, an Array or a Hash of values as argument and returns the predicted class value:

instances = Weka::Core::Instances.from_arff('unclassified_data.arff')

# with an instance as argument do |instance|
# => ['no', 'yes', 'yes', ...]

# with an Array of values as argument
classifier.classify([:sunny, 80, 80, :FALSE, '?'])
# => 'yes'

# with a Hash of the values as argument
classifier.classify({ outlook: :sunny, temperature: 80, humidity: 80, windy: :FALSE, play: '?'})
# => 'yes'

The distribution_for method takes a Weka::Core::DenseInstance, an Array or a Hash of values as argument as well and returns a hash with the distributions per class value:

instances = Weka::Core::Instances.from_arff('unclassified_data.arff')

# with an instance as argument
# => { "yes" => 0.26, "no" => 0.74 }

# with an Array of values as argument
classifier.distribution_for [:sunny, 80, 80, :FALSE, '?']
# => { "yes" => 0.62, "no" => 0.38 }

# with a Hash of the values as argument
classifier.distribution_for({ outlook: :sunny, temperature: 80, humidity: 80, windy: :FALSE, play: '?' })
# => { "yes" => 0.62, "no" => 0.38 }