Store k-gram frequency counts in hash tables
More...
#include <kgramFreqs.h>
Store k-gram frequency counts in hash tables
◆ kgramFreqs() [1/3]
kgramFreqs::kgramFreqs |
( |
size_t |
N | ) |
|
|
inline |
Constructor with empty dictionary.
- Parameters
-
N | Positive integer. Maximum order of k-grams to be considered. |
Constructs a kgramFreqs object of order N with an empty dictionary.
◆ kgramFreqs() [2/3]
kgramFreqs::kgramFreqs |
( |
size_t |
N, |
|
|
const std::vector< std::string > & |
dict |
|
) |
| |
|
inline |
Constructor with predefined dictionary.
- Parameters
-
N | Positive integer. Maximum order of k-grams to be considered. |
dict | a list of strings (words) to be included in the dictionary. |
◆ kgramFreqs() [3/3]
kgramFreqs::kgramFreqs |
( |
size_t |
N, |
|
|
const Dictionary & |
dict |
|
) |
| |
|
inline |
Constructor with predefined dictionary.
- Parameters
-
N | Positive integer. Maximum order of k-grams to be considered. |
dict | a Dictionary. |
◆ dict_contains()
bool kgramFreqs::dict_contains |
( |
std::string |
word | ) |
const |
|
inline |
Check if a word is found in the dictionary.
- Parameters
-
word | a string. Word to be queried. |
- Returns
- true or false.
◆ N()
size_t kgramFreqs::N |
( |
| ) |
const |
|
inline |
Maximum order of k-grams.
- Returns
- A positive integer N, the maximum order of k-grams for which frequency counts can be stored.
◆ process_sentences()
void kgramFreqs::process_sentences |
( |
const std::vector< std::string > & |
sentences, |
|
|
bool |
fixed_dictionary = false |
|
) |
| |
store k-gram counts from a list of sentences.
- Parameters
-
sentences | Vector of strings. A list of sentences from which to extract sentences |
fixed_dictionary | true or false. If true, any new word not appearing in the dictionary encountered during processing is replaced by an Unknown-Word token. Otherwise, new words are added to the dictionary. |
Each entry of 'sentences' is considered a single sentence. For each sentence, anything separated by one or more space characters is considered a word.
◆ query()
double kgramFreqs::query |
( |
std::string |
kgram | ) |
const |
Retrieve counts for a given k-gram.
- Parameters
-
kgram | string. The k-gram to be queried. |
- Returns
- A positive integer. Number of occurrences of 'kgram' in the text data processed so far.
query() considers anything delimited by one or more characters as a word. Thus, for instance, the calls
query("i love you")
or
query(" i love you ")
or
query(" i love you ")
would all produce the same result.
◆ V()
size_t kgramFreqs::V |
( |
| ) |
const |
|
inline |
Dictionary size.
- Returns
- A positive integer V. Size of the dictionary, excluding the Begin-Of-Sentence, End-Of-Sentence and Unknown word tokens.
The documentation for this class was generated from the following files: