public class BagOfWords
extends java.lang.Object
Modifier and Type | Field and Description |
---|---|
protected java.util.HashMap<java.lang.String,java.lang.Integer> |
bagOfWords
This stores the bag of words.
|
protected java.lang.String |
name
Unique name or ID
|
protected java.util.ArrayList<java.lang.String> |
wordOrder
Word ordering from most to least counts
|
Constructor and Description |
---|
BagOfWords()
Create a new instance of BagOfWords.
|
Modifier and Type | Method and Description |
---|---|
void |
createBagOfWords(java.lang.String theText)
Parse the text that is entered to create the bag of words.
|
void |
createBagOfWords(java.lang.String theText,
boolean wordStem)
Parse the text that is entered to create the bag of words.
|
BagOfWords |
difference(BagOfWords thisBagOfWords)
Create a new bag of words that is the difference between of this one and
the one passed in.
|
float |
dotproduct(BagOfWords thisBagOfWords)
Calculate the dotproduct value of this bag-of-words with the one passed in.
|
java.util.HashMap<java.lang.String,java.lang.Integer> |
getBagOfWords()
Get a copy of the bag of words structure.
|
java.lang.String |
getName()
Get the name or id of this bag-of-words.
|
int |
getTotalWordCount()
Get the total count of all instances of each word.
|
java.util.ArrayList<java.lang.String> |
getWordOrder()
Get the ordered word list from most to least counts.
|
BagOfWords |
intersection(BagOfWords thisBagOfWords)
Create a new bag of words that is the intersection of this one with
the one passed in.
|
float |
magnitude(BagOfWords thisBagOfWords)
Calculate the magnitude value of this bag-of-words with the one passed in.
|
protected void |
orderBagOfWords()
Create a ordered list of most to least counts for the bag of words.
|
void |
removeWords(java.util.ArrayList<java.lang.String> toRemove)
Remove the list of words from the BOW structures -
bagOfWords and wordOrder . |
boolean |
sameWordList(BagOfWords compareWith)
Return true if this bag-of-words has the same word list as the bag-of-words
passed in.
|
void |
setBagOfWords(java.util.HashMap<java.lang.String,java.lang.Integer> thisBagOfWords)
Set the bag of words for this class to process.
|
void |
setBagOfWords(java.util.HashMap<java.lang.String,java.lang.Integer> thisBagOfWords,
java.util.ArrayList<java.lang.String> thisWordOrder)
Set the bag of words for this class to process.
|
void |
setName(java.lang.String thisName)
Set the name or id of this bag-of-words.
|
BagOfWords |
subtract(BagOfWords thisBagOfWords)
Create a new bag of words that is the subtraction of the bag passed in from
this one.
|
org.licas_xml.abs.Element |
toXml()
Convert this bag of words into an XML format.
|
BagOfWords |
union(BagOfWords thisBagOfWords)
Create a new bag of words that is the union of this one with
the one passed in.
|
protected java.lang.String name
protected java.util.HashMap<java.lang.String,java.lang.Integer> bagOfWords
protected java.util.ArrayList<java.lang.String> wordOrder
public void createBagOfWords(java.lang.String theText) throws java.lang.Exception
theText
- the text sequence to parse.java.lang.Exception
- any error.public void createBagOfWords(java.lang.String theText, boolean wordStem) throws java.lang.Exception
theText
- the text sequence to parse.wordStem
- if true re-parse using word stemming, if false leave as original words.java.lang.Exception
- any error.protected void orderBagOfWords() throws java.lang.Exception
java.lang.Exception
- any error.public void removeWords(java.util.ArrayList<java.lang.String> toRemove)
bagOfWords
and wordOrder
.toRemove
- list of words to remove.public boolean sameWordList(BagOfWords compareWith)
compareWith
- the bag-of-words to compare with.public BagOfWords subtract(BagOfWords thisBagOfWords) throws java.lang.Exception
thisBagOfWords
- the bag of words to intersect with.java.lang.Exception
- any error.public BagOfWords difference(BagOfWords thisBagOfWords) throws java.lang.Exception
thisBagOfWords
- the bag of words to intersect with.java.lang.Exception
- any error.public BagOfWords intersection(BagOfWords thisBagOfWords) throws java.lang.Exception
thisBagOfWords
- the bag of words to intersect with.java.lang.Exception
- any error.public BagOfWords union(BagOfWords thisBagOfWords) throws java.lang.Exception
thisBagOfWords
- the bag of words to combine with.java.lang.Exception
- any error.public float dotproduct(BagOfWords thisBagOfWords)
thisBagOfWords
- the bag of words to combine with.public float magnitude(BagOfWords thisBagOfWords)
thisBagOfWords
- the bag of words to combine with.public int getTotalWordCount()
public void setName(java.lang.String thisName)
thisName
- the bag-of-words name.public java.lang.String getName()
public void setBagOfWords(java.util.HashMap<java.lang.String,java.lang.Integer> thisBagOfWords) throws java.lang.Exception
thisBagOfWords
- the bag of words structure.java.lang.Exception
- any error.public void setBagOfWords(java.util.HashMap<java.lang.String,java.lang.Integer> thisBagOfWords, java.util.ArrayList<java.lang.String> thisWordOrder) throws java.lang.Exception
thisBagOfWords
- the bag of words structure.thisWordOrder
- word ordering for the bag-of-words.java.lang.Exception
- any error.public java.util.HashMap<java.lang.String,java.lang.Integer> getBagOfWords()
public java.util.ArrayList<java.lang.String> getWordOrder()
public org.licas_xml.abs.Element toXml() throws java.lang.Exception
java.lang.Exception
- any error.