public class BowHeuristic
extends java.lang.Object
Modifier and Type | Field and Description |
---|---|
protected java.util.HashMap<java.lang.String,java.lang.Integer> |
termCount
Count for every term in any document
|
protected java.util.ArrayList<java.lang.String> |
termList
List of all terms in all documents
|
Constructor and Description |
---|
BowHeuristic()
Create a new instance of BowHeuristic.
|
Modifier and Type | Method and Description |
---|---|
java.util.ArrayList<BagOfWords> |
createBoW(java.util.ArrayList<java.lang.String> filePaths,
boolean wordStem)
Create a bag-of-words for each file.
|
BagOfWords |
createBoW(java.lang.String theText,
boolean wordStem)
Create a bag-of-words for the input text sequence.
|
protected java.util.HashMap<java.lang.String,java.lang.Integer> |
getTermCount(java.util.ArrayList<BagOfWords> bowList)
Count all terms over all bag of words.
|
java.util.ArrayList<java.lang.String> |
sortTermOrder(java.util.HashMap<java.lang.String,java.lang.Float> termScore)
Sort the terms in descending order of the term score.
|
java.util.ArrayList<java.lang.String> |
sortTopTermOrder(int termNumber,
java.util.HashMap<java.lang.String,java.lang.Float> termScore)
Sort the top number of terms to use.
|
protected java.util.ArrayList<java.lang.String> termList
protected java.util.HashMap<java.lang.String,java.lang.Integer> termCount
public java.util.ArrayList<BagOfWords> createBoW(java.util.ArrayList<java.lang.String> filePaths, boolean wordStem) throws java.lang.Exception
filePaths
- list of files to read.wordStem
- if true use word stemming.java.lang.Exception
- any error.public BagOfWords createBoW(java.lang.String theText, boolean wordStem) throws java.lang.Exception
theText
- the text to read.wordStem
- if true use word stemming.java.lang.Exception
- any error.protected java.util.HashMap<java.lang.String,java.lang.Integer> getTermCount(java.util.ArrayList<BagOfWords> bowList) throws java.lang.Exception
bowList
- list of bag of words to process. Of type BagOfWords
.java.lang.Exception
- any error.public java.util.ArrayList<java.lang.String> sortTermOrder(java.util.HashMap<java.lang.String,java.lang.Float> termScore) throws java.lang.Exception
termScore
- a score (floating point value) for each term.java.lang.Exception
- any error.public java.util.ArrayList<java.lang.String> sortTopTermOrder(int termNumber, java.util.HashMap<java.lang.String,java.lang.Float> termScore) throws java.lang.Exception
termNumber
- the maximum number of terms to add to the sorted list.termScore
- a score (float value) for each term. Key is term name, value is the score.java.lang.Exception
- any error.