The following commands will call crawl script generation for the categories specified within the IndustryTermModelIndustry Term Model is working title for the Web Classification algorithm, and it refers to particular namespace within imbWBI (documentation). The namespace contains few classes that are just connecting different parts of imbWBI.Core (documentation), imbNLP.PartOfSpeech (documentation) and imbWEM.Core (documentation) libraries, together to perform classification of business entities, actually their web sites, using natural language processing, ontology construction and at the finale:... project with name “itm01”:
// loads the project "itm01" itm.Open "itm01"; // here we use explicit ACE syntax itm.CrawlScript name="constructions";clearRepo=false;debug=true;autorun=true; itm.CrawlScript name="cooling";clearRepo=false;debug=true;autorun=true; itm.CrawlScript name="energetics";clearRepo=false;debug=true;autorun=true; // then a bit of implicit :) itm.CrawlScript "heating"; itm.CrawlScript "furniture"; // saves the project, which is actually not neaded, but why not itm.Save; // quits the console Quit
Each CrawlScript call will generate proper Crawling Script, calling imbWEM plugin. Generated scripts will be saved at imbWBIToolState directory. If you set autorun=true, it will be executed as well.
The content of one such crawling script:
// This is auto-generated script to build MC Repository for Industry Term Model Project // Date 12/31/2017 // Defining job wem.Job "MCRepo for constructions";"Building MCRepo for ITMP itm01";true;"";1; // Loading web domains wem.SampleFile "G:\imbWBI_Test\projects\imbWBIToolState_jobs\imbWBIToolState\constructions_crawl.txt",false,"Domains of constructions",true,0,-1,True; // Creates new instance of built-in crawler wem.Crawler classname="SM_LTS";LT_t=1;I_max=50;PL_max=15;PS_c=10;instanceNameSufix="_MC";primLanguage="serbian";secLanguage="english"; // Configuring Crawl Job Engine wem.CrawlJobEngineSettings TC_max=2;Tdl_max=20;Tll_max=50;Tcjl_max=120; // Opens new session with the Index Engine wem.OpenSession experimentSession="itm01_constructions";IndexID="itm01";useJobSettings=false;crawlFolderNameTemplate="*"; // Opens new session with the Mining Context manager mcm.Open repo="constructions_const"; log_msg="MCRepo construction for constructions"; debug=True; // Adds plugin wem.plugin plugin_classname="reportPlugIn_CrawlToMC"; // Runs the crawl job wem.Run; // Closes the currently opened Mining Context session mcm.Close log_msg="Ending MCRepo construction for constructions"; doReport=true; debug=True;
On same location a text file with crawl targets will appear.