017 Plugin: exp
Class imbWBIWeb Business Intelligence libraries of imbVeles Framework..IndustryTermModel.consolePlugin.becExperimentPlugin
imbWBIWeb Business Intelligence libraries of imbVeles Framework..ConsoleTool Console.becExperimentPlugin
Plugin for BECBusiness Entity Classification system, implementation of Industry Term Model (business category / industry description model) for Business Entities classification by processing web site content. Part of imbWBI.... Experiment configuration and creation
1 exp.AddFeatureDimension
Defines new feature dimension
It creates new feature dimension and optionally clears existing
Command arguments:
ID | Name | Type | Default | Comment |
---|---|---|---|---|
01 | type | FeatureVectorDimensionType | similarityFunction | Dimension typeValues: directTermWeight,similarityFunction,topicWeight |
02 | function | String | CosineSimilarityFunction | Name of the function, if it is required by dimension type (e.g. similarity with class dimensions) |
03 | clearExisting | Boolean | False | Removes any existing feature dimension from the experiment settingsValues: True,False |
Example :
exp.AddFeatureDimension type=similarityFunction;function="CosineSimilarityFunction";clearExisting=False;
2 exp.Dataset
Configures dataset to be used for the experiment
Set path, filter and selection options
Command arguments:
ID | Name | Type | Default | Comment |
---|---|---|---|---|
01 | path | String | G:\imbWBI\datasets\7sectors | Diskdrive path, pointing to the root folder of the dataset (WebKB format) |
02 | pageLimit | Int32 | 1 | Minimum number of pages that a document set (website) must have in order to be accepted for the experiment |
03 | filterEmpty | Boolean | True | Filters out empty documents from the datasetValues: True,False |
Example :
exp.Dataset path="G:\imbWBI\datasets\7sectors";pageLimit=1;filterEmpty=True;
3 exp.FeatureFilter
Configures feature selection filter
It will set feature count limit and function
Command arguments:
ID | Name | Type | Default | Comment |
---|---|---|---|---|
01 | function | String | IDFElement | Name of the global function that will rank the features |
02 | limit | Int32 | 4000 | Number of features to be adopted |
03 | TDP | TDPFactor | chi | TDP factor to be applied (when used with Collection basedGlobal element)Values: none,idf,idf_prob,chi,ig,gr,or,rf |
04 | IDFc | IDFComputation | logPlus | Inverse Document Frequency computation variationValues: logPlus,modified,DF |
Example :
exp.FeatureFilter function="IDFElement";limit=4000;TDP=chi;IDFc=logPlus;
4 exp.GlobalIGMWeight
Configures a global function based on Gravity moment
What it will do?
Command arguments:
ID | Name | Type | Default | Comment |
---|---|---|---|---|
01 | l | Double | 7 | Lambda factor of IGM |
02 | weight | Double | 1 | Weigth associated with the function |
03 | removeExisting | Boolean | False | If any existing global factor should be removedValues: True,False |
Example :
exp.GlobalIGMWeight l=7;weight=1;removeExisting=False;
5 exp.GlobalTDPWeight
Configures a global function based on Term Discrimination Power
It will add specified global factor, optionally if will remove any existing global factors
Command arguments:
ID | Name | Type | Default | Comment |
---|---|---|---|---|
01 | factor | TDPFactor | chi | What factor should be addedValues: none,idf,idf_prob,chi,ig,gr,or,rf |
02 | weight | Double | 1 | Weigth associated with the function |
03 | removeExisting | Boolean | False | If any existing global factor should be removedValues: True,False |
Example :
exp.GlobalTDPWeight factor=chi;weight=1;removeExisting=False;
6 exp.GlobalWeight
Configures a global function in the feature weighting model – supports: ICF, ICSd, IDF, IGM, mIDF
It will add specified global factor, optionally if will remove any existing global factors
Command arguments:
ID | Name | Type | Default | Comment |
---|---|---|---|---|
01 | function | String | IDFElement | Name of function elemenet |
02 | weight | Double | 1 | Weigth associated with the function |
03 | IDF | IDFComputation | logPlus | How IDF should be computedValues: logPlus,modified,DF |
04 | removeExisting | Boolean | False | If any existing global factor should be removedValues: True,False |
Example :
exp.GlobalWeight function="IDFElement";weight=1;IDF=logPlus;removeExisting=False;
7 exp.kNN
Sets multi-class k-nearest neighbours classifier
Removes any existing classifier and sets k-NN with specified settings
Command arguments:
ID | Name | Type | Default | Comment |
---|---|---|---|---|
01 | distance | DistanceFunctionType | SquareEuclidean | Distance function to be used with k-NN classifierValues: SquareEuclidean,Euclidean,Cosine,Jaccard,Hamming,Dice |
02 | k | Int32 | 5 | K parameter – number of neighbours to vote for class membership |
Example :
exp.kNN distance=SquareEuclidean;k=5;
8 exp.Load
Loads an experiment setup from the file
It will try to load the specified file
Command arguments:
ID | Name | Type | Default | Comment |
---|---|---|---|---|
01 | filename | String | bec_exp | name for the setup |
Example :
exp.Load filename="bec_exp";
9 exp.LocalWeight
Configures local function in the feature weighting model
It will set computation and normalization options for feature weighting
Command arguments:
ID | Name | Type | Default | Comment |
---|---|---|---|---|
01 | computation | TFComputation | normal | –Values: normal,squareRooted,glasgow,modifiedTF |
02 | normalization | TFNormalization | divisionByMaxTF | –Values: divisionByMaxTF,squareRootOfSquareSum |
Example :
exp.LocalWeight computation=normal;normalization=divisionByMaxTF;
10 exp.mSVM
Sets multi-class Support Vector Machine classifier
Removes any existing classifier and sets mSVM with specified settings
Command arguments:
ID | Name | Type | Default | Comment |
---|---|---|---|---|
01 | loss | Loss | L2 | Loss function to setValues: L1,L2 |
02 | model | mSVMModels | linear | Model to be used with SVM classifierValues: linear,gaussian |
Example :
exp.mSVM loss=L2;model=linear;
11 exp.PageFilter
Ranks and selects top-n documents from a document set
It will set DocumentFilter function of EntityPlaneMethod
Command arguments:
ID | Name | Type | Default | Comment |
---|---|---|---|---|
01 | function | String | DocumentEntropyFunction | Name of filter function class |
02 | limit | Int32 | -2 | number of top n pages to select, -2 will leave existing settings |
03 | debug | Boolean | True | –Values: True,False |
Example :
exp.PageFilter function="DocumentEntropyFunction";limit=-2;debug=True;
12 exp.ParallelExecution
Sets allowed number of parallel threads
Command arguments:
ID | Name | Type | Default | Comment |
---|---|---|---|---|
01 | n | Int32 | 5 | Number of parallel threads |
Example :
exp.ParallelExecution n=5;
13 exp.RenderInstruction
Instructs HTML to text extraction engine (EntityPlaneMethod) to produce text from xpath
It will add specified instruction to the rendering instruction set, and optionally remove all existing instructions before it.
Command arguments:
ID | Name | Type | Default | Comment |
---|---|---|---|---|
01 | name | String | ::BODYTEXT:: | Instruction name, it is human-readable descriptive name or special instructin name like ::BODYTEXT:: |
02 | xpath | String | XPath associated with the instruction, selects nodes to be rendered into text | |
03 | weight | Double | 1 | Weight factor of the instruction, i.e. number of times the content should be repeated (boosting TF) |
04 | remove | Boolean | False | If true it will remove any existing instruction in the setValues: True,False |
Example :
exp.RenderInstruction name="::BODYTEXT::";xpath="";weight=1;remove=False;
14 exp.Save
Saves the experiment setup
Destination folder is command console workspace folder
Command arguments:
ID | Name | Type | Default | Comment |
---|---|---|---|---|
01 | filename | String | bec_exp | name for the setup |
Example :
exp.Save filename="bec_exp";
15 exp.SignatureSuffix
Custom appendix to configuration signature
Sets the custom signature suffix, to be added at the end of signature used for experiment run name
Command arguments:
ID | Name | Type | Default | Comment |
---|---|---|---|---|
01 | suffix | String | word | textual suffix |
Example :
exp.SignatureSuffix suffix="word";
16 exp.Validation
Configures k-fold cross validation or single-fold calidation
Updates active instance of the configuration
Command arguments:
ID | Name | Type | Default | Comment |
---|---|---|---|---|
01 | K | Int32 | 1 | Number of folds, if 1 (or 0) it will go into single-fold mode |
02 | TestFolds | Int32 | 1 | Number of folds to be used as test folds, usually 1 |
03 | Randomize | Boolean | True | Shell content of the folds be randomizedValues: True,False |
04 | LimitExecution | Int32 | -1 | When above 0, only specified number of folds will be executed |
Example :
exp.Validation K=1;TestFolds=1;Randomize=True;LimitExecution=-1;
018 Plugin: wds
Class imbWEM.Core.consolePlugin.webDatasetPlugin
imbWBIWeb Business Intelligence libraries of imbVeles Framework..ConsoleTool Console.webDatasetPlugin
This is imbACE advanced console plugin for webDatasetPlugin
1 wds.ExtractDomainList
Extracting sample list for crawl from existing data set
It will load dataset specified and extract domain list from it
Command arguments:
ID | Name | Type | Default | Comment |
---|---|---|---|---|
01 | DataSet | String | word | Path to dataset |
02 | Output | String | Path where domain list should be saved | |
03 | debug | Boolean | True | –Values: True,False |
Example :
wds.ExtractDomainList DataSet="word";Output="";debug=True;
2 wds.ExtractURLsFromDataset
Extracts all crawled urls from the dataset
It creates single txt file with list of all URLs crawled by the dataset
Command arguments:
ID | Name | Type | Default | Comment |
---|---|---|---|---|
01 | runName | String | word | Name of the report folder |
02 | datasetPath | String | Path to dataset – when other than currently loaded should be reported about | |
03 | debug | Boolean | True | –Values: True,False |
Example :
wds.ExtractURLsFromDataset runName="word";datasetPath="";debug=True;
3 wds.GetDomains
It will execute subset compilation and set result as active sample list
It will query domains from the dataset source, using subset compilation specified
Command arguments:
ID | Name | Type | Default | Comment |
---|---|---|---|---|
01 | subsetCompilation | String | ODPBusinessDistantTopics | name of the subset compilation to activate |
02 | saveFile | Boolean | True | if true it will save result to the sample list fileValues: True,False |
03 | construct | Boolean | True | if true it will prepare output WebDocumentCategory directory to store crawled contentValues: True,False |
04 | limit | Int32 | -1 | Upper limit for crawl size |
Example :
wds.GetDomains subsetCompilation="ODPBusinessDistantTopics";saveFile=True;construct=True;limit=-1;
4 wds.InitDatasets
Performs initiation of the mail dataset sources
It will connect and check state of WebKB and ODP datasources
Example :
wds.InitDatasets
5 wds.LoadDomainCategory
Loads WebDomainCategory tree from specified path
It will search the specified path and load hierarchical domain list
Command arguments:
ID | Name | Type | Default | Comment |
---|---|---|---|---|
01 | path | String | word | WebDomainCategory root folder to load from |
Example :
wds.LoadDomainCategory path="word";
6 wds.Test
It will run several diagnostic procedures
What it will do?
Example :
wds.Test