Web Classification Project

(draft version – full version is coming soon)

imbWBIWeb Business Intelligence libraries of imbVeles Framework. Web Classification mechanism is called “Industry Term Model” (ITM), as it describes categories (industries) as semantic clouds of lemma terms, extracted from training set web sites. The ITM performs single-label multi-class classification of web sites, after it is trained by list of domains, separated in corresponding categories.

Here we will cover basic operations you can perform from imbWBIWeb Business Intelligence libraries of imbVeles Framework. Console Tool application. The ITM is supported in imbWBIWeb Business Intelligence libraries of imbVeles Framework. Console Tool as imbACE Console plugin named “itm”, therefore all commands related to ITM have prefix “itm.”.

Creating a new ITM project

Let’s create a new Industry Term Model project

// this will create new project called "test"
itm.Open "test";

// here we are saving just created project, to get folder structure and all configuration files created at imbWBI\projects\itmPlugin\test subfolder
itm.Save;

// here we shutdown the console application
Quit;

Now, examine the folder [location of the imbWBIWeb Business Intelligence libraries of imbVeles Framework. Console Tool]\projects\itmPlugin\test

 

Constructing the Mining Context for the categories

// We are opening a project named [itm01]
itm.Open "itm01";

// Calling for crawl of [constructions] category, saying the system to do not clear existing MC repository, to run in debug mode and to execute the crawling script just after generates the imbWEM script.
itm.CrawlScript name="constructions";clearRepo=false;debug=true;autorun=true;

itm.CrawlScript name="cooling";clearRepo=false;debug=true;autorun=true;

itm.CrawlScript name="energetics";clearRepo=false;debug=true;autorun=true;

// Here we are using implicit syntax of ACE Script, with default values for other parameters of the method
itm.CrawlScript "heating";

// Again, we are using implicit syntax of ACE Script, but specifying values for all parameters of the method
itm.CrawlScript "furniture";false;false;true;

// Saving the project
itm.Save;

 

The crawl script for imbWEM plugin is automatically generated, and executed:

// This is auto-generated script to build MC Repository for Industry Term Model Project
// Date 12/31/2017
// Defining job
wem.Job "MCRepo for constructions";"Building MCRepo for ITMP itm01";true;"";1;
// Loading web domains
wem.SampleFile "G:\imbWBI_Test\projects\imbWBIToolState_jobs\imbWBIToolState\constructions_crawl.txt",false,"Domains of constructions",true,0,-1,True;
// Creates new instance of built-in crawler
wem.Crawler classname="SM_LTS";LT_t=1;I_max=50;PL_max=15;PS_c=10;instanceNameSufix="_MC";primLanguage="serbian";secLanguage="english";
// Configuring Crawl Job Engine
wem.CrawlJobEngineSettings TC_max=2;Tdl_max=20;Tll_max=50;Tcjl_max=120;
// Opens new session with the Index Engine
wem.OpenSession experimentSession="itm01_constructions";IndexID="itm01";useJobSettings=false;crawlFolderNameTemplate="*";
// Opens new session with the Mining Context manager
mcm.Open repo="constructions_const"; log_msg="MCRepo construction for constructions"; debug=True;
// Adds plugin
wem.plugin plugin_classname="reportPlugIn_CrawlToMC";
// Runs the crawl job
wem.Run;
// Closes the currently opened Mining Context session
mcm.Close log_msg="Ending MCRepo construction for constructions"; doReport=true; debug=True;

 

Performing an experiment

 

Performing series of template-composite experiments

 

Generating secondary reports

 


Check imbWBIWeb Business Intelligence libraries of imbVeles Framework. API Documentation:

http://doc.veles.rs/

Attachments

  • imbWBI Console Reference
    imbWBIWeb Business Intelligence libraries of imbVeles Framework. Console Tool autogenerated help document
    File size: 147 KB Downloads: 962
Spread the love