kgrams  0.1.0
Public Member Functions | List of all members
Dictionary Class Reference

Word dictionary for language models. More...

#include <Dictionary.h>

Inheritance diagram for Dictionary:
Inheritance graph
[legend]

Public Member Functions

 Dictionary ()
 Default constructor. More...
 
 Dictionary (const std::vector< std::string > &dict)
 Initialize Dictionary from list of words. More...
 
bool contains (std::string word) const
 Check if a word is contained in the Dictionary. More...
 
void insert (std::string word)
 Insert a word in the Dictionary. More...
 
std::string word (std::string index) const
 Return the word corresponding to a given word index. More...
 
std::string index (std::string word) const
 Return the index corresponding to a given word. More...
 
size_t length () const
 Return size of the dictionary, excluding the special tokens (BOS, EOS, UNK). More...
 
size_t size () const
 Return size of the dictionary, excluding the special tokens (BOS, EOS, UNK). More...
 
std::pair< size_t, std::string > kgram_code (std::string kgram) const
 Extract k-gram code from a string. More...
 

Detailed Description

Word dictionary for language models.

This class has two main purposes: (i) store a list of "known" words to be used within a language model and (ii) provide conversions between word and k-gram tokens and word and k-gram codes (strings of integers), where the latters are employed in the internal implementation of kgramFreqs class.

Constructor & Destructor Documentation

◆ Dictionary() [1/2]

Dictionary::Dictionary ( )
inline

Default constructor.

Only special tokens (BOS, EOS, UNK) are included in the dictionary.

◆ Dictionary() [2/2]

Dictionary::Dictionary ( const std::vector< std::string > &  dict)
inline

Initialize Dictionary from list of words.

Parameters
dictA vector of strings. List of words to be included in the dictionary.

In addition to the words explicitly included, the constructor also adds the special tokens (BOS, EOS, UNK) to the dictionary.

Member Function Documentation

◆ contains()

bool Dictionary::contains ( std::string  word) const
inline

Check if a word is contained in the Dictionary.

Parameters
wordA string.
Returns
true if the word is contained in the Dictionary, false otherwise.

◆ index()

std::string Dictionary::index ( std::string  word) const
inline

Return the index corresponding to a given word.

Parameters
wordA string.
Returns
A string, index corresponding to 'word'.

◆ insert()

void Dictionary::insert ( std::string  word)
inline

Insert a word in the Dictionary.

Parameters
wordA string.

◆ kgram_code()

std::pair<size_t, std::string> Dictionary::kgram_code ( std::string  kgram) const
inline

Extract k-gram code from a string.

Parameters
kgrama string.
Returns
A pair of a positive integer and a string. The integer is the order of the input k-gram (i.e. 'k'), while the string is its code, obtained by pasting the individual word codes separated by a space.

Automatically takes care of leading, trailing and multiple spaces, recognizes the EOS token.

◆ length()

size_t Dictionary::length ( ) const
inline

Return size of the dictionary, excluding the special tokens (BOS, EOS, UNK).

Returns
A positive integer. Size of the dictionary.

◆ size()

size_t Dictionary::size ( ) const
inline

Return size of the dictionary, excluding the special tokens (BOS, EOS, UNK).

Returns
A positive integer. Size of the dictionary.

◆ word()

std::string Dictionary::word ( std::string  index) const
inline

Return the word corresponding to a given word index.

Parameters
indexA string.
Returns
A string, word corresponding to 'index'.

The documentation for this class was generated from the following file: