Project files
From imbWBI Console Tool you are able to: load or create new project and to execute experiment. To change configuration of the project, to define your own categories, sample sets and other parameters of web classification project, you have to edit xml and text files of the project.
Project files are located at:
- [imbWBIWeb Business Intelligence libraries of imbVeles Framework. Console Tool path]
- projects
- itmPlugin
- [name of the project]
- itmPlugin
- projects
Default project name is [itm01].
Filename | Description |
---|---|
chunkComposer.xml | Configuration of the Chunk Composer [phrase extraction unit] |
crawlSetup.xml | General configuration of inner crawling component |
DocumentSetClasses.txt | List of Categories, used in the Business Entities Classification project |
general_inclusiveTerms.txt | Specification of default term qualifications, for Inclusive term evaluation criterion |
general_irrelevantTerms.txt | Specification of default term qualifications, the terms that are irrelevant for all Categories |
industryTermModelProject.xml | The main project configuration file, controlling main aspects of experiment execution and reporting process |
nlpRepoProcessSetup.xml | Content pipeline processing setup - controls how the content from MC Repository is selected and decomposed |
CompositeTermplates | Directory with parts of experiment specification, that can be used for template-based experiment execution |
[Category name] | Directory with specification files for a category with [Category name] |
experiment[####].xml | Saved experiment specification, file numbers are automatically added. |
Filename | Description |
---|---|
[Category name]_truthTable.txt | Specification of Category term qualifications, used by reporting plugin to help Term Categorization evaluation process by creating preliminary evaluation table |
industryClassModel.xml | Holds basic description of the Category model |
WebSiteSample.txt | List of web domains that are included in the Category |
itm.ExperimentRange
Creates a range of experiments, with specified (single) TWRefers to HTML Tag induced term weight factor, applied during Lemma Table construction (Industry Term Model, imbWBI)..., TCRefers to Term Category induced term weight factor, in context of Industry Term Model (imbWBI)..., RXRefers to Continual Filter - Weight Reduction, in context of Semantic Cloud Constructor, Industry Term Model (imbWBI)... options, and range values for LPFCloud Matrix mechanism: Low Pass Filter, that reduces Term Weight or removes the term completely from Semantic Cloud of a Category, based on Cloud Frequency (CF) - number of Categories having the same term in the Cloud...., DFCDocument Frequency Factor - multiplier that is applied to Number of Documents variable in TF-cIDF term weight model. If DFC = 0, then it is TF model, if DFC=1, then it is typical TF-IDF.... and STXSemantic Term Expansion, number of steps that a lemma (term) will be expanded using Semantic Cloud, part of Cosine SSRM computation model - semantic similarity computation, applied in Industry Term Model (imbWBI)...
Example:
itm.ExperimentRange TW="std"; TC="std"; RX="div"; DFC="2,3.5,5"; LPF="4,3,2"; IDFOn=true; fve="CSSRM"; shell="k2"; stxStart=3; stxEnd=8; StrictPOS=false; Report=false;
Performs 9 experiment batches, sharing specified TX, TCRefers to Term Category induced term weight factor, in context of Industry Term Model (imbWBI)... and RXRefers to Continual Filter - Weight Reduction, in context of Semantic Cloud Constructor, Industry Term Model (imbWBI)... configuration triples, combining DFCDocument Frequency Factor - multiplier that is applied to Number of Documents variable in TF-cIDF term weight model. If DFC = 0, then it is TF model, if DFC=1, then it is typical TF-IDF.... 2, 3.5 and 5 and LPFCloud Matrix mechanism: Low Pass Filter, that reduces Term Weight or removes the term completely from Semantic Cloud of a Category, based on Cloud Frequency (CF) - number of Categories having the same term in the Cloud.... set to 4, 3, 2. In each batch it will have FVERefers to Feature Vector Extraction model, that may have one or more Feature Vector Providers, which is implementation wrapper for particular document semantic similarity computation model.... models with STX=3,4,5,6 and 7. Other parameters are designated by Composite Template files “CSSRM” and “k2”.
itm.ExperimentMatrix
Creates 27 experiment batches, made by combining all 3 TWRefers to HTML Tag induced term weight factor, applied during Lemma Table construction (Industry Term Model, imbWBI)..., TX, RXRefers to Continual Filter - Weight Reduction, in context of Semantic Cloud Constructor, Industry Term Model (imbWBI)... value sets, and applying specified Composite Template files, DFCDocument Frequency Factor - multiplier that is applied to Number of Documents variable in TF-cIDF term weight model. If DFC = 0, then it is TF model, if DFC=1, then it is typical TF-IDF.... value and STXSemantic Term Expansion, number of steps that a lemma (term) will be expanded using Semantic Cloud, part of Cosine SSRM computation model - semantic similarity computation, applied in Industry Term Model (imbWBI)... value range
itm.ExperimentMatrix LPF=6;DFC=5;IDFOn=true;fve="CSSRM";shell="k2";stxStart=2;stxEnd=3;StrictPOS=false;Report=true;
Experiments will use CSSRM and k2 template segments (Composite Template) will have LPF=6, DFC=5, and STX=2, it will make report files for each experiment.