public class NGram extends FunctionMetric
n
terms. The n value is not calculated
automatically by the algorithm, but can be set through the initialisation config parameters,
using AiHeuristicConst
.NGN
as the key, or set using setN()
. The default
implementation can be used simply with an empty constructor, calling the setN(n)
method
and then calling the evaluate(String compOne, String compTwo)
method. For comparing
strings or characters through the evaluate
method, the algorithm creates n-grams of
the specified size and measures how many are the same in both of the string parameters.
This is the similarity measure and is the only result that this algorithm can produce.
The desired result is a larger distance, but you can use the isBetter()
method to
determine what value is better.
As the classes have been made generic, it is also possible to attempt a NGN-Gram evaluation
over other data types. For this, you might be required to add the evaluation function that
evaluates the data comparison. You would then use the evaluate(Object parameter)
method
to attempt the comparison. This function can take, either a String[2]
as input,
which are the two strings to mathCompare. It can also take a MetricCompare
object where
the two objects to evaluate are the two datasets. The algorithm also uses some hard-coded
characters %, 0
and so might not work with complex objects, for example.
This is a modified version of, but using the same algorithm: http://blogs.ucl.ac.uk/chime/2010/06/28/java-example-code-of-commonString-similarity-algorithms-used-in-data-mining/
config, mathCompare, valueType
Constructor and Description |
---|
NGram()
Create a new instance of NGram.
|
NGram(java.lang.String thisValueType,
java.util.HashMap thisConfig)
Create a new instance of NGram.
|
Modifier and Type | Method and Description |
---|---|
ReplySet |
evaluate(MetricDataset ds1,
MetricDataset ds2)
Evaluate the comparison of the two vectors of data and return the result.
|
ReplySet |
evaluate(java.lang.Object parameter)
Return a value based on the function evaluation.
|
double |
evaluate(java.lang.String compOne,
java.lang.String compTwo)
Return a value based on the function evaluation.
|
protected void |
initialise()
Initialise the function values, setting the config parameters or other.
|
boolean |
isBetter(java.lang.String valueType,
java.lang.Object value1,
java.lang.Object value2)
Return true if value2 is better than value1, as determined by the measurements of
this evaluation function.
|
FunctionMetric |
newInstance()
Create and return a new instance of the function, initialised with this function's
value type
valueType and math evaluator mathCompare . |
void |
setN(int thisN)
Set the n value.
|
boolean |
sib()
Return true if a smaller distance between the two vectors is better.
|
evaluateCompare, lib
checkValueType, createFunction, createFunction, createFunction, evaluate, getConfigParams, innerObject, isLegalNumber, setConfigParams, setEvaluator, setValueType
public NGram() throws java.lang.Exception
java.lang.Exception
- any error.public NGram(java.lang.String thisValueType, java.util.HashMap thisConfig) throws java.lang.Exception
thisValueType
- the type of object being evaluated.thisConfig
- list of initialisation function-specific parameters.java.lang.Exception
- any error.protected void initialise() throws java.lang.Exception
initialise
in class Function
java.lang.Exception
- any error.public void setN(int thisN)
thisN
- the gram size value.public ReplySet evaluate(java.lang.Object parameter) throws java.lang.Exception
parameter
- the value to pass through the function. This should be a ArrayList
list with two elements. The first element can be either is can be either a
String[2]
or a MetricCompare
, with only one data object in each
of the data lists. The second element can be an Integer with the AiHeuristicConst
.NGN
value.Double
value.java.lang.Exception
- any error.public ReplySet evaluate(MetricDataset ds1, MetricDataset ds2) throws java.lang.Exception
evaluate
in interface FunctionMetricDef
evaluate
in class FunctionMetric
ds1
- first value dataset.ds2
- second value dataset.java.lang.Exception
- any error.public double evaluate(java.lang.String compOne, java.lang.String compTwo) throws java.lang.Exception
compOne
- the first comparison string.compTwo
- the second comparison string.java.lang.Exception
- any error.public boolean isBetter(java.lang.String valueType, java.lang.Object value1, java.lang.Object value2) throws java.lang.Exception
isBetter
in class FunctionMetric
valueType
- the java type of the values to be evaluated.value1
- the first value type.value2
- the second value type.java.lang.Exception
- any error.public boolean sib()
sib
in class FunctionMetric
public FunctionMetric newInstance() throws java.lang.Exception
valueType
and math evaluator mathCompare
.newInstance
in class FunctionMetric
java.lang.Exception
- any error.