Classification

Bag of words

The bag-of-words representation of a document is the matrix representation. It neglects word order and only stores the word counts in each document.

Correlated Topic Model (CTM)

Correlated Topic Model (CTM) [2] extends LDA using a logistic-normal prior which explicitly models correlation patterns with a Gaussian covariance matrix. [1] J. He, Z. Hu, T. Berg-Kirkpatrick, Y. Huang, E.P. Xing, Efficient Correlated Topic Modeling with Topic Embedding, Proc. 23rd ACM SIGKDD Int. Conf. Knowl. Discov. Data Min. – KDD ’17. (2017) 225–233. doi:10.1145/3097983.3098074.…

Hungarian method

The Hungarian method is a Topic Alignment method. It searches for the match with maximum weight, i.e., the set of edges that touches each topic in the two sets exactly once, so that sum of weights is maximized [1]. [1] A. De Waal, E. Barnard, Evaluating topic models with stability, 19th Annu. Symp. Pattern Recognit.…

Latent Dirichlet Allocation

LDA is a generative probabilistic topic model. It represents the documents as a random mixtures of topics over the latent topic space, where each topic is characterized by a distribution over a dictionary of words. LDA and its extensions are ineffective when used with short documents (texts). Issues are coming from: ineffective word relation induction…

Latent Semantic Indexing/Analysis

The Vector Space Model, document representation method, doesn’t give the semantic relations of term. The LSI method overcomes the limitation of VSM. LSI is an approach that use particular matrix transformation technique called Singular Value Decomposition (SVD).

Perplexity

[Topic Model] Perplexity is a standard performance measure used to evaluate models of text data. It measures a model’s ability to generalise and predict new documents: the perplexity is an indication of the number of equally likely words that can occur at an arbitrary position in a document. A lower perplexity therefore indicates better generalisation. We calculate…