kgrams  0.1.0
Public Member Functions | List of all members
kgramFreqs Class Reference

Store k-gram frequency counts in hash tables
More...

#include <kgramFreqs.h>

Inheritance diagram for kgramFreqs:
Inheritance graph
[legend]

Public Member Functions

 kgramFreqs (size_t N)
 Constructor with empty dictionary. More...
 
 kgramFreqs (size_t N, const std::vector< std::string > &dict)
 Constructor with predefined dictionary. More...
 
 kgramFreqs (size_t N, const Dictionary &dict)
 Constructor with predefined dictionary. More...
 
void process_sentences (const std::vector< std::string > &, bool fixed_dictionary=false)
 store k-gram counts from a list of sentences. More...
 
double query (std::string) const
 Retrieve counts for a given k-gram. More...
 
bool dict_contains (std::string word) const
 Check if a word is found in the dictionary. More...
 
size_t N () const
 Maximum order of k-grams. More...
 
size_t V () const
 Dictionary size. More...
 
const Dictionarydictionary () const
 Return constant reference to Dictionary.
 

Detailed Description

Store k-gram frequency counts in hash tables

Constructor & Destructor Documentation

◆ kgramFreqs() [1/3]

kgramFreqs::kgramFreqs ( size_t  N)
inline

Constructor with empty dictionary.

Parameters
NPositive integer. Maximum order of k-grams to be considered.

Constructs a kgramFreqs object of order N with an empty dictionary.

◆ kgramFreqs() [2/3]

kgramFreqs::kgramFreqs ( size_t  N,
const std::vector< std::string > &  dict 
)
inline

Constructor with predefined dictionary.

Parameters
NPositive integer. Maximum order of k-grams to be considered.
dicta list of strings (words) to be included in the dictionary.

◆ kgramFreqs() [3/3]

kgramFreqs::kgramFreqs ( size_t  N,
const Dictionary dict 
)
inline

Constructor with predefined dictionary.

Parameters
NPositive integer. Maximum order of k-grams to be considered.
dicta Dictionary.

Member Function Documentation

◆ dict_contains()

bool kgramFreqs::dict_contains ( std::string  word) const
inline

Check if a word is found in the dictionary.

Parameters
worda string. Word to be queried.
Returns
true or false.

◆ N()

size_t kgramFreqs::N ( ) const
inline

Maximum order of k-grams.

Returns
A positive integer N, the maximum order of k-grams for which frequency counts can be stored.

◆ process_sentences()

void kgramFreqs::process_sentences ( const std::vector< std::string > &  sentences,
bool  fixed_dictionary = false 
)

store k-gram counts from a list of sentences.

Parameters
sentencesVector of strings. A list of sentences from which to extract sentences
fixed_dictionarytrue or false. If true, any new word not appearing in the dictionary encountered during processing is replaced by an Unknown-Word token. Otherwise, new words are added to the dictionary.

Each entry of 'sentences' is considered a single sentence. For each sentence, anything separated by one or more space characters is considered a word.

◆ query()

double kgramFreqs::query ( std::string  kgram) const

Retrieve counts for a given k-gram.

Parameters
kgramstring. The k-gram to be queried.
Returns
A positive integer. Number of occurrences of 'kgram' in the text data processed so far.

query() considers anything delimited by one or more characters as a word. Thus, for instance, the calls

query("i love you") 

or

query(" i love you ") 

or

query("  i    love  you   ") 

would all produce the same result.

◆ V()

size_t kgramFreqs::V ( ) const
inline

Dictionary size.

Returns
A positive integer V. Size of the dictionary, excluding the Begin-Of-Sentence, End-Of-Sentence and Unknown word tokens.

The documentation for this class was generated from the following files: