Category: imbWEM

The Diversity Module

The Diversity Module inherits the Frontier Ranking Module base class and commences target sorting according to the estimated semantic difference (as complementary value of semantic similarity) between the Target and already crawled content. The crawled content is represented by two collections: the Target Tokens Repository (TTR), which is domain level term frequency table aggregating TSTs…

The Template Module

The heart of this module is procedure of page decomposition and detection of semantic role for each of extracted content blocks. This is the only module in the stack that evaluates links using strictly information immutable across the DLC process iterations. Furthermore, the alternative ranking implementation assumes that higher position in the navigation menu hierarchy…