Text Corpus

In linguistics, a corpus (plural corpora) or text corpus is a large and structured set of texts (nowadays usually electronically stored and processed). They are used to do statistical analysis and hypothesis testing, checking occurrences or validating linguistic rules within a specific language territory.

Topic Significance

Let d be a document, Zj be a topic in the user interest model, be matched patterns, k = 1,…, nj, to document d, and be the corresponding frequencies of the matched patterns within Zj, the topic significance Zj of j to d is defined as: [1] Y. Gao, Y. Xu, Y. Li, Pattern-Based Topic Models…


TRE is a lightweight, robust, efficient, portable, and POSIX compliant regexp matching library. Key features include the agrep command line tool for approximate regexp matching in the style of grep, an approximate matching library API, portability, wide character and multibyte character support, binary pattern and data support, complete thread safety, consistently efficient matching, low memory…


Refers to HTML Tag induced term weight factor, applied during Lemma Table construction (Industry Term Model, imbWBI)