-
Notifications
You must be signed in to change notification settings - Fork 8
Classifiers
- Getting information about a classifier
- Creating a new classifier
- Evaluating a classifier model
- Classifying new data
Weka‘s classification and regression algorithms can be found in the Weka::Classifiers
namespace.
The classifier classes are organised in the following submodules:
Weka::Classifiers::Bayes
Weka::Classifiers::Functions
Weka::Classifiers::Lazy
Weka::Classifiers::Meta
Weka::Classifiers::Rules
Weka::Classifiers::Trees
To get a description about the classifier class and its available options
you can use the class methods .description
and .options
on each classifier:
puts Weka::Classifiers::Trees::RandomForest.description
# Class for constructing a forest of random trees.
# For more information see:
# Leo Breiman (2001). Random Forests. Machine Learning. 45(1):5-32.
puts Weka::Classifiers::Trees::RandomForest.options
# -I <number of trees> Number of trees to build.
# (default 100)
# -K <number of features> Number of features to consider (<1=int(log_2(#predictors)+1)).
# (default 0)
# ...
The default options that are used for a classifier can be displayed with:
Weka::Classifiers::Trees::RandomForest.default_options
# => "-I 100 -K 0 -S 1 -num-slots 1"
To build a new classifiers model based on training instances you can use the following syntax:
instances = Weka::Core::Instances.from_arff('weather.arff')
instances.class_attribute = :play
classifier = Weka::Classifiers::Trees::RandomForest.new
classifier.use_options('-I 200 -K 5')
classifier.train_with_instances(instances)
You can also build a classifier by using the block syntax:
classifier = Weka::Classifiers::Trees::RandomForest.build do
use_options '-I 200 -K 5'
train_with_instances instances
end
You can evaluate the trained classifier using cross-validation:
# default number of folds is 3
evaluation = classifier.cross_validate
# with a custom number of folds
evaluation = classifier.cross_validate(folds: 10)
The cross-validation returns a Weka::Classifiers::Evaluation
object which can be used to get details about the accuracy of the trained classification model:
puts evaluation.summary
#
# Correctly Classified Instances 10 71.4286 %
# Incorrectly Classified Instances 4 28.5714 %
# Kappa statistic 0.3778
# Mean absolute error 0.4098
# Root mean squared error 0.4657
# Relative absolute error 87.4588 %
# Root relative squared error 96.2945 %
# Coverage of cases (0.95 level) 100 %
# Mean rel. region size (0.95 level) 96.4286 %
# Total Number of Instances 14
The evaluation holds detailed information about a number of different meassures of interest, like the precision and recall, the FP/FN/TP/TN-rates, F-Measure and the areas under PRC and ROC curve.
If your trained classifier should be evaluated against a set of test instances,
you can use evaluate
:
test_instances = Weka::Core::Instances.from_arff('test_data.arff')
test_instances.class_attribute = :play
evaluation = classifier.evaluate(test_instances)
Each classifier implements either a classify
method or a distibution_for
method, or both.
The classify
method takes a Weka::Core::DenseInstance, an Array or a Hash of values as argument and returns the predicted class value:
instances = Weka::Core::Instances.from_arff('unclassified_data.arff')
# with an instance as argument
instances.map do |instance|
classifier.classify(instance)
end
# => ['no', 'yes', 'yes', ...]
# with an Array of values as argument
classifier.classify([:sunny, 80, 80, :FALSE, '?'])
# => 'yes'
# with a Hash of the values as argument
classifier.classify({ outlook: :sunny, temperature: 80, humidity: 80, windy: :FALSE, play: '?'})
# => 'yes'
The distribution_for
method takes a Weka::Core::DenseInstance, an Array or a Hash of values as argument as well and returns a hash with the distributions per class value:
instances = Weka::Core::Instances.from_arff('unclassified_data.arff')
# with an instance as argument
classifier.distribution_for(instances.first)
# => { "yes" => 0.26, "no" => 0.74 }
# with an Array of values as argument
classifier.distribution_for [:sunny, 80, 80, :FALSE, '?']
# => { "yes" => 0.62, "no" => 0.38 }
# with a Hash of the values as argument
classifier.distribution_for({ outlook: :sunny, temperature: 80, humidity: 80, windy: :FALSE, play: '?' })
# => { "yes" => 0.62, "no" => 0.38 }