Methodology

DFC

Document Frequency Factor – multiplier that is applied to Number of Documents variable in TF-cIDF term weight model. If DFC = 0, then it is TF model, if DFC=1, then it is typical TF-IDF.

Gold Standard

Ground truth, truth table or gold standard refers to a set of predefined correct results used for evaluation purposes. Usually in evaluation scenarios involving: Precision, Recall and F1 score.

Perplexity

[Topic Model] Perplexity is a standard performance measure used to evaluate models of text data. It measures a model’s ability to generalise and predict new documents: the perplexity is an indication of the number of equally likely words that can occur at an arbitrary position in a document. A lower perplexity therefore indicates better generalisation. We calculate…