public class InformationGain extends Function
valueType
for the algorithm. The data column that the
classification is being performed for is of type Integer
and can be set and
retrieved using AiHeuristicConst
.COLINDEX
as the key and retrieved
from dataset.getParams()
.
str1a str2a str3a
str1b str2b str3a
str1a str2c str3b
str1b str2c str3a
The tabular dataset can be created by adding columns of type MetricValue
to a
MetricDataset
and passing this as the parameter. For example:
ArrayList dataPoints = new ArrayList();
dataPoints.add("Sunny");
dataPoints.add("Sunny");
dataPoints.add("Overcast");
dataPoints.add("Rain");
dataPoints.add("Rain");
MetricValue metricValue = MetricValue.toValue("Outlook", valueType, dataPoints);
allColumns.add(metricValue);
...
//repeat for the other data columns
...
MetricDataset evalDataset = MetricDataset.toDataset(null, valueType, allColumns);
//add column to evaluate for
evalDataset.getParams().put(AiHeuristicConst.COLINDEX, new Integer(4));
...
//initialise and run the algorithm
InformationGain infoGain = new InformationGain(valueType);
ArrayList resultList = (ArrayList)infoGain.evaluate(evalDataset).getValue();
The algorithm automatically counts the number of similar entries for each column variable
and uses this number to determine the best entropy result. The information gain ordering
is then returned as a result in the form of a ArrayList
of MetricValue
, where the name is the
column variable name and the value is the Double
value evaluation. Algorithms in
the ai_heuristic
package try to maximise and so a larger final value means a better split.
Some help from the tutorial by Mohammad A Rahman at http://www.codeproject.com/Articles/259241/ID3-Decision-Tree-Algorithm-Part-1#
config, mathCompare, valueType
Constructor and Description |
---|
InformationGain()
Create a new instance of InformationGain.
|
InformationGain(java.lang.String thisValueType)
Create a new instance of InformationGain.
|
Modifier and Type | Method and Description |
---|---|
ReplySet |
evaluate(MetricDataset dataset)
Evaluate the resulting IG after splitting the datasets over each attribute (column variable),
apart from the decision attribute or column that the IG is being calculated for.
|
protected void |
initialise()
Initialise the function values, setting the config parameters or other.
|
checkValueType, createFunction, createFunction, createFunction, getConfigParams, innerObject, isLegalNumber, setConfigParams, setEvaluator, setValueType
public InformationGain() throws java.lang.Exception
java.lang.Exception
- any error.public InformationGain(java.lang.String thisValueType) throws java.lang.Exception
thisValueType
- the type of object being evaluated. Can be null if set later or not used.java.lang.Exception
- any error.protected void initialise()
initialise
in class Function
public ReplySet evaluate(MetricDataset dataset) throws java.lang.Exception
evaluate
in interface FunctionDef
evaluate
in class Function
dataset
- tabular list of values. This should store a list of MetricValue
entities
that each store a set of values for a single attribute column entry. Each MetricValue
in
the list should be assigned the entity name. Note that in addition to this, the config
of the dataset
needs to set the data column that the classification is being performed for.
This is of type Integer
and can be set and retrieved using AiHeuristicConst
.COLINDEX
as the key.ArrayList
of MetricValue
entities.
Each entry is assigned the attribute entity name and the value is the information gain, as a Double
.
For the decision column, the value is null, with only the entity name.java.lang.Exception
- any error.