020 Plugin: wem
Class imbWEM.Core.consolePlugin.crawlJobPlugin
imbWBIWeb Business Intelligence libraries of imbVeles Framework..ConsoleTool Console.crawlJobPlugin
This is imbACE advanced console plugin for crawlJobPlugin
1 wem.Crawler
Defines new instance of the specified crawler. LT_t defines link take per iteration, I_max is iteration limit, PL_max defines max. page loads, PS_c is count of selected pages at end.
New crawler is attached to the AnalyticJobRecord and set as current on the state level
Command arguments:
ID | Name | Type | Default | Comment |
---|---|---|---|---|
01 | classname | String | SM_LTS | Name of the crawler class |
02 | LT_t | Int32 | 1 | Load take – number of parallel loads |
03 | I_max | Int32 | 100 | Iteration number limit |
04 | PL_max | Int32 | 50 | Page Loads limit |
05 | instanceNameSufix | String | Crawler name sufix | |
06 | primLanguage | basicLanguageEnum | serbian | Primary languageValues: serbian,italian,english,german,russian,slovenian,unknown,serbianCyr,french,catalan,danish,dutch,swedish,spanish,portugese,polish,czech,slovak,croatian,bosnian,bulgarian,macedonian,ukrainian,moldavian,romanian,greek,hungarian,hebrew,mandarin,arabic,albanian,turkish,hindi,persian,uzbek,armenian,mongolian,finnish,norwegianNB,norwegianNN,icelandic,lithuanian,latvian,estonian,vietnamese |
07 | secLanguage | basicLanguageEnum | english | Secondary languageValues: serbian,italian,english,german,russian,slovenian,unknown,serbianCyr,french,catalan,danish,dutch,swedish,spanish,portugese,polish,czech,slovak,croatian,bosnian,bulgarian,macedonian,ukrainian,moldavian,romanian,greek,hungarian,hebrew,mandarin,arabic,albanian,turkish,hindi,persian,uzbek,armenian,mongolian,finnish,norwegianNB,norwegianNN,icelandic,lithuanian,latvian,estonian,vietnamese |
Example :
wem.Crawler classname="SM_LTS";LT_t=1;I_max=100;PL_max=50;instanceNameSufix="";primLanguage=serbian;secLanguage=english;
2 wem.CrawlJobEngineSettings [CJES]
Crawl Job Engine controls the parallel execution of the Crawl Job.
Tdl_maxTime limit for one domain level crawl (DLC), in minutes defines max. minutes per one domain level crawl, Tll_maxTime limit per iteration, inactivity time limit in minutes. per single link load and TC_maxAllowed number of parallel DLC threads defines number of parallel domain loads.
This command sets the most important parameters of the Crawl Job execution. For Tdl_maxTime limit for one domain level crawl (DLC), in minutes and Tll_maxTime limit per iteration, inactivity time limit in minutes. value -1 means limit is off, for TC_maxAllowed number of parallel DLC threads value -1 means auto management.
Command arguments:
ID | Name | Type | Default | Comment |
---|---|---|---|---|
01 | TC_maxAllowed number of parallel DLC threads | Int32 | 8 | Maximum number of parallel DLC executing in the same moment |
02 | Tdl_maxTime limit for one domain level crawl (DLC), in minutes | Int32 | 50 | Maximum minutes allowed for single DLC to run |
03 | Tll_maxTime limit per iteration, inactivity time limit in minutes. | Int32 | 20 | Maximum minutes of single iteration allowed for a DLC before its termination |
04 | Tcjl_max | Int32 | 100 | Maximum minutes for the complete Crawl Job execution |
Example :
wem.CrawlJobEngineSettings TC_max=8;Tdl_max=50;Tll_max=20;Tcjl_max=100;
3 wem.Job
AnaliticJob declares one experimental run, this is the first command to call in scripts with experiment definitions
Creates new instance of ActivityJog and assigns it to the current state.
Command arguments:
ID | Name | Type | Default | Comment |
---|---|---|---|---|
01 | jobName | String | job | Name of the Job to define |
02 | jobDesc | String | Description for the job | |
03 | defaultStage | Boolean | True | If true it will prepare default crawler stage to execute crawl inValues: True,False |
04 | stampPrefix | String | Prefix at timestamp | |
05 | stampCount | Int32 | 1 | Stamp version count |
Example :
wem.Job jobName="job";jobDesc="";defaultStage=True;stampPrefix="";stampCount=1;
4 wem.OpenSession
Selects and preloads local index and Experiment session information. useJobSettings option will ignore other params and use Job definition
It will set report output information and create or load local index
Alias: OpenSession
Command arguments:
ID | Name | Type | Default | Comment |
---|---|---|---|---|
01 | experimentSession | String | ||
02 | IndexID | String | ||
03 | useJobSettings | Boolean | False | Values: True,False |
04 | crawlFolderNameTemplate | String | * |
Example :
wem.OpenSession experimentSession="";IndexID="";useJobSettings=False;crawlFolderNameTemplate="*";
5 wem.Plugin
Allows additional execution customization by crawling plugin
It will create instance of specified plug in and set it into proper collection
Command arguments:
ID | Name | Type | Default | Comment |
---|---|---|---|---|
01 | plugin_classname | String | * | Proper name of the crawling plugin class |
Example :
wem.Plugin plugin_classname="*";
6 wem.Run [R]
Runs the current crawl job
Starts crawl execution
Example :
wem.Run
7 wem.SampleFile
Imports sample from text file
Loads the file and adds domain urls from it into context’s sample list
Command arguments:
ID | Name | Type | Default | Comment |
---|---|---|---|---|
01 | path | String | * | path to file with samples, if * it will open dialog to select the file |
02 | inWorkspace | Boolean | True | if true, the file path is interpreted as relative to console workspaceValues: True,False |
03 | sampleName | String | Name of the sample list, if empty it will not change current sample list name | |
04 | replace | Boolean | False | if set to true it will replace any existing samples in the listValues: True,False |
05 | skip | Int32 | 0 | Number of entries to skip, from the imported file |
06 | limit | Int32 | -1 | If set above 0, it limits the total number of domains imported |
07 | debug | Boolean | True | if true it will report on link preprocessingValues: True,False |
Example :
wem.SampleFile path="*";inWorkspace=True;sampleName="";replace=False;skip=0;limit=-1;debug=True;