kgrams
0.1.0
|
Word dictionary for language models. More...
#include <Dictionary.h>
Public Member Functions | |
Dictionary () | |
Default constructor. More... | |
Dictionary (const std::vector< std::string > &dict) | |
Initialize Dictionary from list of words. More... | |
bool | contains (std::string word) const |
Check if a word is contained in the Dictionary. More... | |
void | insert (std::string word) |
Insert a word in the Dictionary. More... | |
std::string | word (std::string index) const |
Return the word corresponding to a given word index. More... | |
std::string | index (std::string word) const |
Return the index corresponding to a given word. More... | |
size_t | length () const |
Return size of the dictionary, excluding the special tokens (BOS, EOS, UNK). More... | |
size_t | size () const |
Return size of the dictionary, excluding the special tokens (BOS, EOS, UNK). More... | |
std::pair< size_t, std::string > | kgram_code (std::string kgram) const |
Extract k-gram code from a string. More... | |
Word dictionary for language models.
This class has two main purposes: (i) store a list of "known" words to be used within a language model and (ii) provide conversions between word and k-gram tokens and word and k-gram codes (strings of integers), where the latters are employed in the internal implementation of kgramFreqs class.
|
inline |
Default constructor.
Only special tokens (BOS, EOS, UNK) are included in the dictionary.
|
inline |
Initialize Dictionary from list of words.
dict | A vector of strings. List of words to be included in the dictionary. |
In addition to the words explicitly included, the constructor also adds the special tokens (BOS, EOS, UNK) to the dictionary.
|
inline |
Check if a word is contained in the Dictionary.
word | A string. |
|
inline |
Return the index corresponding to a given word.
word | A string. |
|
inline |
Insert a word in the Dictionary.
word | A string. |
|
inline |
Extract k-gram code from a string.
kgram | a string. |
Automatically takes care of leading, trailing and multiple spaces, recognizes the EOS token.
|
inline |
Return size of the dictionary, excluding the special tokens (BOS, EOS, UNK).
|
inline |
Return size of the dictionary, excluding the special tokens (BOS, EOS, UNK).
|
inline |
Return the word corresponding to a given word index.
index | A string. |