Perplexity – imbVeles

[Topic Model] is a standard performance measure used to evaluate models of text data. It measures a model’s ability to generalise and predict new documents: the is an indication of
the number of equally likely words that can occur at an arbitrary position in a document. A lower therefore indicates better generalisation. We calculate on the test
C∗ containing M∗ documents as follows:

$p(C*) = exp \{ - { { \sum^{M^*}_{d=1} \log p(w_d) } \over { \sum^{M^*}_{d=1} N_d } } \}$

A. De Waal, E. Barnard, Evaluating topic models with stability, 19th Annu. Symp. Pattern Recognit. Assoc. South Africa. (2008) 79–84. http://researchspace.csir.co.za/dspace/handle/10204/3016.