Working with IndustryTermModel plugin

Project files

From imbWBI Console Tool you are able to: load or create new project and to execute experiment. To change configuration of the project, to define your own categories, sample sets and other parameters of web classification project, you have to edit xml and text files of the project.

Project files are located at:

  • [imbWBIWeb Business Intelligence libraries of imbVeles Framework. Console Tool path]
    • projects
      • itmPlugin
        • [name of the project]

Default project name is [itm01].

ITM Project Folder Files in the itmPlugin's project directory
FilenameDescription
chunkComposer.xmlConfiguration of the Chunk Composer [phrase extraction unit]
crawlSetup.xmlGeneral configuration of inner crawling component
DocumentSetClasses.txtList of Categories, used in the Business Entities Classification project
general_inclusiveTerms.txtSpecification of default term qualifications, for Inclusive term evaluation criterion
general_irrelevantTerms.txtSpecification of default term qualifications, the terms that are irrelevant for all Categories
industryTermModelProject.xmlThe main project configuration file, controlling main aspects of experiment execution and reporting process
nlpRepoProcessSetup.xmlContent pipeline processing setup - controls how the content from MC Repository is selected and decomposed
CompositeTermplatesDirectory with parts of experiment specification, that can be used for template-based experiment execution
[Category name]Directory with specification files for a category with [Category name]
experiment[####].xmlSaved experiment specification, file numbers are automatically added.

 

ITM Project Category files Files in the itmPlugin project's category subdirectory for [Category name]
FilenameDescription
[Category name]_truthTable.txtSpecification of Category term qualifications, used by reporting plugin to help Term Categorization evaluation process by creating preliminary evaluation table
industryClassModel.xmlHolds basic description of the Category model
WebSiteSample.txtList of web domains that are included in the Category

itm.ExperimentRange

Creates a range of experiments, with specified (single) TWRefers to HTML Tag induced term weight factor, applied during Lemma Table construction (Industry Term Model, imbWBI)..., TCRefers to Term Category induced term weight factor, in context of Industry Term Model (imbWBI)..., RXRefers to Continual Filter - Weight Reduction, in context of Semantic Cloud Constructor, Industry Term Model (imbWBI)... options, and range values for LPFCloud Matrix mechanism: Low Pass Filter, that reduces Term Weight or removes the term completely from Semantic Cloud of a Category, based on Cloud Frequency (CF) - number of Categories having the same term in the Cloud...., DFCDocument Frequency Factor - multiplier that is applied to Number of Documents variable in TF-cIDF term weight model. If DFC = 0, then it is TF model, if DFC=1, then it is typical TF-IDF.... and STXSemantic Term Expansion, number of steps that a lemma (term) will be expanded using Semantic Cloud, part of Cosine SSRM computation model - semantic similarity computation, applied in Industry Term Model (imbWBI)...

Example:

itm.ExperimentRange TW="std"; TC="std"; RX="div"; DFC="2,3.5,5"; LPF="4,3,2"; IDFOn=true; fve="CSSRM"; shell="k2"; stxStart=3; stxEnd=8; StrictPOS=false; Report=false;

Performs 9 experiment batches, sharing specified TX, TCRefers to Term Category induced term weight factor, in context of Industry Term Model (imbWBI)... and RXRefers to Continual Filter - Weight Reduction, in context of Semantic Cloud Constructor, Industry Term Model (imbWBI)... configuration triples, combining DFCDocument Frequency Factor - multiplier that is applied to Number of Documents variable in TF-cIDF term weight model. If DFC = 0, then it is TF model, if DFC=1, then it is typical TF-IDF.... 2, 3.5 and 5 and LPFCloud Matrix mechanism: Low Pass Filter, that reduces Term Weight or removes the term completely from Semantic Cloud of a Category, based on Cloud Frequency (CF) - number of Categories having the same term in the Cloud.... set to 4, 3, 2. In each batch it will have FVERefers to Feature Vector Extraction model, that may have one or more Feature Vector Providers, which is implementation wrapper for particular document semantic similarity computation model.... models with STX=3,4,5,6 and 7. Other parameters are designated by Composite Template files “CSSRM” and “k2”.

 

itm.ExperimentMatrix

Creates 27 experiment batches, made by combining all 3 TWRefers to HTML Tag induced term weight factor, applied during Lemma Table construction (Industry Term Model, imbWBI)..., TX, RXRefers to Continual Filter - Weight Reduction, in context of Semantic Cloud Constructor, Industry Term Model (imbWBI)... value sets, and applying specified Composite Template files, DFCDocument Frequency Factor - multiplier that is applied to Number of Documents variable in TF-cIDF term weight model. If DFC = 0, then it is TF model, if DFC=1, then it is typical TF-IDF.... value and STXSemantic Term Expansion, number of steps that a lemma (term) will be expanded using Semantic Cloud, part of Cosine SSRM computation model - semantic similarity computation, applied in Industry Term Model (imbWBI)... value range

itm.ExperimentMatrix LPF=6;DFC=5;IDFOn=true;fve="CSSRM";shell="k2";stxStart=2;stxEnd=3;StrictPOS=false;Report=true;

Experiments will use CSSRM and k2 template segments (Composite Template) will have LPF=6, DFC=5, and STX=2, it will make report files for each experiment.

Attachments

  • imbWBI Console Tool v0.3.1
    2nd release of imbWBIWeb Business Intelligence libraries of imbVeles Framework. Experimentation Console Tool
    File size: 101 MB Downloads: 403
  • imbWBI Console Reference
    imbWBIWeb Business Intelligence libraries of imbVeles Framework. Console Tool autogenerated help document
    File size: 147 KB Downloads: 753
Spread the love