Glossary

Text Corpus

In linguistics, a corpus (plural corpora) or text corpus is a large and structured set of texts (nowadays usually electronically stored and processed). They are used to do statistical analysis and hypothesis testing, checking occurrences or validating linguistic rules within a specific language territory.

Tll_max

By Goran Grubic

Time limit per iteration, inactivity time limit in minutes.

Topic Significance

By Goran Grubic

Let d be a document, Zj be a topic in the user interest model, be matched patterns, k = 1,…, nj, to document d, and be the corresponding frequencies of the matched patterns within Zj, the topic significance Zj of j to d is defined as: [1] Y. Gao, Y. Xu, Y. Li, Pattern-Based Topic Models…

TRE

By Goran Grubic

TRE is a lightweight, robust, efficient, portable, and POSIX compliant regexp matching library. Key features include the agrep command line tool for approximate regexp matching in the style of grep, an approximate matching library API, portability, wide character and multibyte character support, binary pattern and data support, complete thread safety, consistently efficient matching, low memory…

TW

By Goran Grubic

Refers to HTML Tag induced term weight factor, applied during Lemma Table construction (Industry Term Model, imbWBI)

Visual Importance

By Goran Grubic

Visual Importance

imbVeles

Web Exploration, Load and Extraction Subsystem

Glossary

Text Corpus

Tll_max

Topic Significance

TRE

TW

Visual Importance