imbWBI.ConsoleTool Console Reference (v0.3.1)

017 Plugin: exp

Class imbWBIWeb Business Intelligence libraries of imbVeles Framework..IndustryTermModel.consolePlugin.becExperimentPlugin

imbWBIWeb Business Intelligence libraries of imbVeles Framework..ConsoleTool Console.becExperimentPlugin

Plugin for BECBusiness Entity Classification system, implementation of Industry Term Model (business category / industry description model) for Business Entities classification by processing web site content. Part of imbWBI.... Experiment configuration and creation

1 exp.AddFeatureDimension

Defines new feature dimension

It creates new feature dimension and optionally clears existing

Command arguments:

ID Name Type Default Comment
01 type FeatureVectorDimensionType similarityFunction Dimension typeValues: directTermWeight,similarityFunction,topicWeight
02 function String CosineSimilarityFunction Name of the function, if it is required by dimension type (e.g. similarity with class dimensions)
03 clearExisting Boolean False Removes any existing feature dimension from the experiment settingsValues: True,False

Example :

exp.AddFeatureDimension type=similarityFunction;function="CosineSimilarityFunction";clearExisting=False;

2 exp.Dataset

Configures dataset to be used for the experiment

Set path, filter and selection options

Command arguments:

ID Name Type Default Comment
01 path String G:\imbWBI\datasets\7sectors Diskdrive path, pointing to the root folder of the dataset (WebKB format)
02 pageLimit Int32 1 Minimum number of pages that a document set (website) must have in order to be accepted for the experiment
03 filterEmpty Boolean True Filters out empty documents from the datasetValues: True,False

Example :

exp.Dataset path="G:\imbWBI\datasets\7sectors";pageLimit=1;filterEmpty=True;

3 exp.FeatureFilter

Configures feature selection filter

It will set feature count limit and function

Command arguments:

ID Name Type Default Comment
01 function String IDFElement Name of the global function that will rank the features
02 limit Int32 4000 Number of features to be adopted
03 TDP TDPFactor chi TDP factor to be applied (when used with Collection basedGlobal element)Values: none,idf,idf_prob,chi,ig,gr,or,rf
04 IDFc IDFComputation logPlus Inverse Document Frequency computation variationValues: logPlus,modified,DF

Example :

exp.FeatureFilter function="IDFElement";limit=4000;TDP=chi;IDFc=logPlus;

4 exp.GlobalIGMWeight

Configures a global function based on Gravity moment

What it will do?

Command arguments:

ID Name Type Default Comment
01 l Double 7 Lambda factor of IGM
02 weight Double 1 Weigth associated with the function
03 removeExisting Boolean False If any existing global factor should be removedValues: True,False

Example :

exp.GlobalIGMWeight l=7;weight=1;removeExisting=False;

5 exp.GlobalTDPWeight

Configures a global function based on Term Discrimination Power

It will add specified global factor, optionally if will remove any existing global factors

Command arguments:

ID Name Type Default Comment
01 factor TDPFactor chi What factor should be addedValues: none,idf,idf_prob,chi,ig,gr,or,rf
02 weight Double 1 Weigth associated with the function
03 removeExisting Boolean False If any existing global factor should be removedValues: True,False

Example :

exp.GlobalTDPWeight factor=chi;weight=1;removeExisting=False;

6 exp.GlobalWeight

Configures a global function in the feature weighting model – supports: ICF, ICSd, IDF, IGM, mIDF

It will add specified global factor, optionally if will remove any existing global factors

Command arguments:

ID Name Type Default Comment
01 function String IDFElement Name of function elemenet
02 weight Double 1 Weigth associated with the function
03 IDF IDFComputation logPlus How IDF should be computedValues: logPlus,modified,DF
04 removeExisting Boolean False If any existing global factor should be removedValues: True,False

Example :

exp.GlobalWeight function="IDFElement";weight=1;IDF=logPlus;removeExisting=False;

7 exp.kNN

Sets multi-class k-nearest neighbours classifier

Removes any existing classifier and sets k-NN with specified settings

Command arguments:

ID Name Type Default Comment
01 distance DistanceFunctionType SquareEuclidean Distance function to be used with k-NN classifierValues: SquareEuclidean,Euclidean,Cosine,Jaccard,Hamming,Dice
02 k Int32 5 K parameter – number of neighbours to vote for class membership

Example :

exp.kNN distance=SquareEuclidean;k=5;

8 exp.Load

Loads an experiment setup from the file

It will try to load the specified file

Command arguments:

ID Name Type Default Comment
01 filename String bec_exp name for the setup

Example :

exp.Load filename="bec_exp";

9 exp.LocalWeight

Configures local function in the feature weighting model

It will set computation and normalization options for feature weighting

Command arguments:

ID Name Type Default Comment
01 computation TFComputation normal –Values: normal,squareRooted,glasgow,modifiedTF
02 normalization TFNormalization divisionByMaxTF –Values: divisionByMaxTF,squareRootOfSquareSum

Example :

exp.LocalWeight computation=normal;normalization=divisionByMaxTF;

10 exp.mSVM

Sets multi-class Support Vector Machine classifier

Removes any existing classifier and sets mSVM with specified settings

Command arguments:

ID Name Type Default Comment
01 loss Loss L2 Loss function to setValues: L1,L2
02 model mSVMModels linear Model to be used with SVM classifierValues: linear,gaussian

Example :

exp.mSVM loss=L2;model=linear;

11 exp.PageFilter

Ranks and selects top-n documents from a document set

It will set DocumentFilter function of EntityPlaneMethod

Command arguments:

ID Name Type Default Comment
01 function String DocumentEntropyFunction Name of filter function class
02 limit Int32 -2 number of top n pages to select, -2 will leave existing settings
03 debug Boolean True –Values: True,False

Example :

exp.PageFilter function="DocumentEntropyFunction";limit=-2;debug=True;

12 exp.ParallelExecution

Sets allowed number of parallel threads

Command arguments:

ID Name Type Default Comment
01 n Int32 5 Number of parallel threads

Example :

exp.ParallelExecution n=5;

13 exp.RenderInstruction

Instructs HTML to text extraction engine (EntityPlaneMethod) to produce text from xpath

It will add specified instruction to the rendering instruction set, and optionally remove all existing instructions before it.

Command arguments:

ID Name Type Default Comment
01 name String ::BODYTEXT:: Instruction name, it is human-readable descriptive name or special instructin name like ::BODYTEXT::
02 xpath String XPath associated with the instruction, selects nodes to be rendered into text
03 weight Double 1 Weight factor of the instruction, i.e. number of times the content should be repeated (boosting TF)
04 remove Boolean False If true it will remove any existing instruction in the setValues: True,False

Example :

exp.RenderInstruction name="::BODYTEXT::";xpath="";weight=1;remove=False;

14 exp.Save

Saves the experiment setup

Destination folder is command console workspace folder

Command arguments:

ID Name Type Default Comment
01 filename String bec_exp name for the setup

Example :

exp.Save filename="bec_exp";

15 exp.SignatureSuffix

Custom appendix to configuration signature

Sets the custom signature suffix, to be added at the end of signature used for experiment run name

Command arguments:

ID Name Type Default Comment
01 suffix String word textual suffix

Example :

exp.SignatureSuffix suffix="word";

16 exp.Validation

Configures k-fold cross validation or single-fold calidation

Updates active instance of the configuration

Command arguments:

ID Name Type Default Comment
01 K Int32 1 Number of folds, if 1 (or 0) it will go into single-fold mode
02 TestFolds Int32 1 Number of folds to be used as test folds, usually 1
03 Randomize Boolean True Shell content of the folds be randomizedValues: True,False
04 LimitExecution Int32 -1 When above 0, only specified number of folds will be executed

Example :

exp.Validation K=1;TestFolds=1;Randomize=True;LimitExecution=-1;

018 Plugin: wds

Class imbWEM.Core.consolePlugin.webDatasetPlugin

imbWBIWeb Business Intelligence libraries of imbVeles Framework..ConsoleTool Console.webDatasetPlugin

This is imbACE advanced console plugin for webDatasetPlugin

1 wds.ExtractDomainList

Extracting sample list for crawl from existing data set

It will load dataset specified and extract domain list from it

Command arguments:

ID Name Type Default Comment
01 DataSet String word Path to dataset
02 Output String Path where domain list should be saved
03 debug Boolean True –Values: True,False

Example :

wds.ExtractDomainList DataSet="word";Output="";debug=True;

2 wds.ExtractURLsFromDataset

Extracts all crawled urls from the dataset

It creates single txt file with list of all URLs crawled by the dataset

Command arguments:

ID Name Type Default Comment
01 runName String word Name of the report folder
02 datasetPath String Path to dataset – when other than currently loaded should be reported about
03 debug Boolean True –Values: True,False

Example :

wds.ExtractURLsFromDataset runName="word";datasetPath="";debug=True;

3 wds.GetDomains

It will execute subset compilation and set result as active sample list

It will query domains from the dataset source, using subset compilation specified

Command arguments:

ID Name Type Default Comment
01 subsetCompilation String ODPBusinessDistantTopics name of the subset compilation to activate
02 saveFile Boolean True if true it will save result to the sample list fileValues: True,False
03 construct Boolean True if true it will prepare output WebDocumentCategory directory to store crawled contentValues: True,False
04 limit Int32 -1 Upper limit for crawl size

Example :

wds.GetDomains subsetCompilation="ODPBusinessDistantTopics";saveFile=True;construct=True;limit=-1;

4 wds.InitDatasets

Performs initiation of the mail dataset sources

It will connect and check state of WebKB and ODP datasources

Example :

wds.InitDatasets 

5 wds.LoadDomainCategory

Loads WebDomainCategory tree from specified path

It will search the specified path and load hierarchical domain list

Command arguments:

ID Name Type Default Comment
01 path String word WebDomainCategory root folder to load from

Example :

wds.LoadDomainCategory path="word";

6 wds.Test

It will run several diagnostic procedures

What it will do?

Example :

wds.Test 

Spread the love