ACE Script example

In this post I’ll briefly show an example of ACE Script, executed with the R&D imbWEM console [analyticConsole] instance.

The script is creating a Mining Context repository by crawling the Sc list of web sites. It uses the SM-LS crawler (set to look for content on Serbian language, by default), the Mining Context is named “dev” and the result of the crawl will appear in [application path]\MCRepo\dev folder after the script execution completes.

The first block of the script turns off all reporting options – effectively overriding settings we have in the configuration file, in order to speed up the execution:

// Changing global settings on reporting
setGlobal path="directReportEngine.DR_ReportModules";value="false";
setGlobal path="directReportEngine.DR_ReportTimeline";value="false";
setGlobal path="directReportEngine.DR_ReportIterationTerms";value="false";
setGlobal path="directReportEngine.DR_ReportIterationUrls";value="false";
setGlobal path="directReportEngine.DR_ReportDomainTerms";value="false";
setGlobal path="directReportEngine.DR_ReportDomainPages";value="false";
setGlobal path="directReportEngine.doTF_IDF_crawlReports";value="false";
setGlobal path="directReportEngine.doPublishPerformance";value="false";
setGlobal path="directReportEngine.doPublishExperimentSessionTable";value="false";
setGlobal path="directReportEngine.doPublishIndexPerformanceTable";value="false";
setGlobal path="directReportEngine.doIterationReport";value="false";
setGlobal path="directReportEngine.doDomainReport";value="false";

Then we turn off few other unwanted options in the Index Engine and the Crawl Job Engine:

// some global settings on index engine
setGlobal path="indexEngine.doIndexPublishAndBackupOnOpenSession";value="false";
setGlobal path="indexEngine.doIndexFullTrustMode";value="false";
// some global settings on the crawler engine
setGlobal path="crawlerJobEngine.doRandomizeSampleTake";value="false";
setGlobal path="crawlerJobEngine.doRandomizeSampleOrder";value="false";

Now, let’s start some more specific operations.
In the block below we’ll define name and description of our analyticJob – so, reports and logs contain this information, easing later interpretation of the results; and define the seed URLs for our crawler to start from:

// Defining a new imbWEM AnalyticJob
Job "MC Construction";"";true;"";1;

// Loads the sample list from the text file and defines the name for the sample
importSample filename="S_C.txt";"Subset_C";fileHasPriority=false;limit=10;skip=0;

Next step is to create an instance of crawler and to setup the execution parameters of the Crawl Job Engine. In this case we choose a built-in modular crawler SM-LS, having two modules (of the frontier layer type): the Language module and the Structure module. By the CJEngineSetup call, we set 4 parallel threads to be used, and placed some execution time limits: 25 minutes per domain-level crawl (DLC), 5 minutes limit for an iteration time out and 100 minutes for the complete crawl job.

// Creates new instance of built-in crawler
Crawler classname="SMLS";LT_t=1;I_max=1000;PL_max=50;PS_c=15;instanceNameSufix="_MC";

// Setups the Crawl Job Engine variables:
CJEngineSetup TC_max=4;Tdl_max=25;Tll_max=5;Tcjl=100;

Then, we have to: open Mining Context repository, add plugin that will feed the content into repository for each DLC done and finally we start the crawl job.

// Opens new session with the Mining Context manager
MCManager.Open repo="dev"; log_msg="Initial debug session"; debug=false;

// Adds plugin 
plugin plugin_className="reportPlugIn_CrawlToMC";

// Runs the crawl job

In the final lines we have to close the Mining Context repository session properly so summarizing procedures may be performed, and at the end we’ll close the console application.

// Closes the currently opened Mining Context session
MCManager.Close log_msg="Ending debug session"; doReport=true; debug=false;

// Closes the console application

This example was written for 0.1.* version of the imbWEM framework, so, keep in mind that some things might got changed since this post was created.

Download the complete ACE Script here.

Read more:

Spread the love