A .grf file is a text file that contains presentation information in addition to information representing the contents of the boxes and the transitions of the graph. A .grf file begins with the following lines: #Unigraph{ SIZE 1313 950{ FONT Times New Roman: 12{ OFONT Times New Roman:B 12{ BCOLOR 16777215{ FCOLOR 0{ ACOLOR 12632256{…

Bag of words

The bag-of-words representation of a document is the matrix representation. It neglects word order and only stores the word counts in each document.


Business Entity Classification system, implementation of Industry Term Model (business category / industry description model) for Business Entities classification by processing web site content. Part of imbWBI.


CasSys provides users the possibility to create Unitex cascade of transducers and new opportunities to work on natural language whith Finite State Graphs. A cascade of transducers applies several FSGraphs (also called automata or transducers), one after the other, onto a text: each graph modifies the text, and changes can be useful for further processings…

Composite Text Density

Hybrid Text Density, Composite Text Density and Text Density are measures proposed in context of Noise Removal, Information Extraction. With the textual information, we propose two measures for the evaluation of the textual importance of tags in web pages: Text Density and Composite Text Density. Once an HTML document is parsed and represented by a…