Text document representation models – literature review
Three (currently) most popular document representation methods are : Vector Space Model – VSM TF-IDF & Cosine similarity Latent Semantic Indexing (LSI), also called: Latent Semantic Analysis (LSA) Semantic Similarity Retrieval Model (SSRM) Probabilistic topic model Probabilistic Latent Semantic Indexing (PLSI) Latent Dirichlet Allocation Statistical language model n-gram language models Bag of words Set…





