imbWBI.ConsoleTool Console Reference (v0.3.1)

020 Plugin: wem

Class imbWEM.Core.consolePlugin.crawlJobPlugin

imbWBIWeb Business Intelligence libraries of imbVeles Framework..ConsoleTool Console.crawlJobPlugin

This is imbACE advanced console plugin for crawlJobPlugin

1 wem.Crawler

Defines new instance of the specified crawler. LT_t defines link take per iteration, I_max is iteration limit, PL_max defines max. page loads, PS_c is count of selected pages at end.

New crawler is attached to the AnalyticJobRecord and set as current on the state level

Command arguments:

ID Name Type Default Comment
01 classname String SM_LTS Name of the crawler class
02 LT_t Int32 1 Load take – number of parallel loads
03 I_max Int32 100 Iteration number limit
04 PL_max Int32 50 Page Loads limit
05 instanceNameSufix String Crawler name sufix
06 primLanguage basicLanguageEnum serbian Primary languageValues: serbian,italian,english,german,russian,slovenian,unknown,serbianCyr,french,catalan,danish,dutch,swedish,spanish,portugese,polish,czech,slovak,croatian,bosnian,bulgarian,macedonian,ukrainian,moldavian,romanian,greek,hungarian,hebrew,mandarin,arabic,albanian,turkish,hindi,persian,uzbek,armenian,mongolian,finnish,norwegianNB,norwegianNN,icelandic,lithuanian,latvian,estonian,vietnamese
07 secLanguage basicLanguageEnum english Secondary languageValues: serbian,italian,english,german,russian,slovenian,unknown,serbianCyr,french,catalan,danish,dutch,swedish,spanish,portugese,polish,czech,slovak,croatian,bosnian,bulgarian,macedonian,ukrainian,moldavian,romanian,greek,hungarian,hebrew,mandarin,arabic,albanian,turkish,hindi,persian,uzbek,armenian,mongolian,finnish,norwegianNB,norwegianNN,icelandic,lithuanian,latvian,estonian,vietnamese

Example :

wem.Crawler classname="SM_LTS";LT_t=1;I_max=100;PL_max=50;instanceNameSufix="";primLanguage=serbian;secLanguage=english;

2 wem.CrawlJobEngineSettings [CJES]

Crawl Job Engine controls the parallel execution of the Crawl Job.
Tdl_maxTime limit for one domain level crawl (DLC), in minutes defines max. minutes per one domain level crawl, Tll_maxTime limit per iteration, inactivity time limit in minutes. per single link load and TC_maxAllowed number of parallel DLC threads defines number of parallel domain loads.

This command sets the most important parameters of the Crawl Job execution. For Tdl_maxTime limit for one domain level crawl (DLC), in minutes and Tll_maxTime limit per iteration, inactivity time limit in minutes. value -1 means limit is off, for TC_maxAllowed number of parallel DLC threads value -1 means auto management.

Command arguments:

ID Name Type Default Comment
01 TC_maxAllowed number of parallel DLC threads Int32 8 Maximum number of parallel DLC executing in the same moment
02 Tdl_maxTime limit for one domain level crawl (DLC), in minutes Int32 50 Maximum minutes allowed for single DLC to run
03 Tll_maxTime limit per iteration, inactivity time limit in minutes. Int32 20 Maximum minutes of single iteration allowed for a DLC before its termination
04 Tcjl_max Int32 100 Maximum minutes for the complete Crawl Job execution

Example :

wem.CrawlJobEngineSettings TC_max=8;Tdl_max=50;Tll_max=20;Tcjl_max=100;

3 wem.Job

AnaliticJob declares one experimental run, this is the first command to call in scripts with experiment definitions

Creates new instance of ActivityJog and assigns it to the current state.

Command arguments:

ID Name Type Default Comment
01 jobName String job Name of the Job to define
02 jobDesc String Description for the job
03 defaultStage Boolean True If true it will prepare default crawler stage to execute crawl inValues: True,False
04 stampPrefix String Prefix at timestamp
05 stampCount Int32 1 Stamp version count

Example :

wem.Job jobName="job";jobDesc="";defaultStage=True;stampPrefix="";stampCount=1;

4 wem.OpenSession

Selects and preloads local index and Experiment session information. useJobSettings option will ignore other params and use Job definition

It will set report output information and create or load local index

AliasOpenSession

Command arguments:

ID Name Type Default Comment
01 experimentSession String
02 IndexID String
03 useJobSettings Boolean False Values: True,False
04 crawlFolderNameTemplate String *

Example :

wem.OpenSession experimentSession="";IndexID="";useJobSettings=False;crawlFolderNameTemplate="*";

5 wem.Plugin

Allows additional execution customization by crawling plugin

It will create instance of specified plug in and set it into proper collection

Command arguments:

ID Name Type Default Comment
01 plugin_classname String * Proper name of the crawling plugin class

Example :

wem.Plugin plugin_classname="*";

6 wem.Run [R]

Runs the current crawl job

Starts crawl execution

Example :

wem.Run 

7 wem.SampleFile

Imports sample from text file

Loads the file and adds domain urls from it into context’s sample list

Command arguments:

ID Name Type Default Comment
01 path String * path to file with samples, if * it will open dialog to select the file
02 inWorkspace Boolean True if true, the file path is interpreted as relative to console workspaceValues: True,False
03 sampleName String Name of the sample list, if empty it will not change current sample list name
04 replace Boolean False if set to true it will replace any existing samples in the listValues: True,False
05 skip Int32 0 Number of entries to skip, from the imported file
06 limit Int32 -1 If set above 0, it limits the total number of domains imported
07 debug Boolean True if true it will report on link preprocessingValues: True,False

Example :

wem.SampleFile path="*";inWorkspace=True;sampleName="";replace=False;skip=0;limit=-1;debug=True;

Spread the love