imbWBIWeb Business Intelligence libraries of imbVeles Framework. Console Tool (version 0.4 – 3rd alpha release)
- Uncompress the archive and execute the .msi Windows Installer Package: imbWBI Console Tool v0.4
imbWBIWeb Business Intelligence libraries of imbVeles Framework. Console Tool (version 0.3.1 – 2nd alpha release)
- Uncompress the archive and execute the .msi Windows Installer Package: imbWBI Console Tool 0.3.1
imbWBIWeb Business Intelligence libraries of imbVeles Framework. Console Tool (version 0.1.2 – first alpha release)
- Uncompress the archive and execute the .msi Windows Installer Package: imbWBI Console Tool v0.1.2
-
Installation path
It is recommended to install the software on short path, outside OS restrictions (e.g.: D:\imbWBIWeb Business Intelligence libraries of imbVeles Framework.) for several reasons:
- All ACE Console Applications perform post-installation content deployment and regularly create side reporting files, logs and cache files. If you install the application in OS restricted area, you’ll have to run the program with Administrator Mode – which can’t be a smart idea, since this is pre-alpha release and many things could go wrong
- Keep installation path short: depending on experiment being performed and reporting options used, you may reach Windows maximum path length – leading to crash or at least incomplete reporting
- During imbWEM operations (web crawling), BrightStarDB triplestore engine is used intensively. For sake of performances, it is better idea to have the program installed on other-then primary OS hard-drive.
- imbNLP and imbWBI both make use of morphosyntactic resources, which are usually quite huge text files. During our experiments with Web Classification algorithm, it usually took 2-3 minutes for SrLex 1.2 dictionary, just to be loaded. Here again, if it is feasible, keep OS and imbWBIWeb Business Intelligence libraries of imbVeles Framework. Console Tool on two separated HDD/SDD units.
-
Recommended hardware configuration
- 10-fold cross validation and multivariate practical evaluation of the system, consumes a lot of computing power and RAM. Just to load Serbian morphosyntactic dictionary, imbWBIWeb Business Intelligence libraries of imbVeles Framework. Console Tool allocates 3-4 GB of RAM. Crawling experiments were also RAM and CPU intensive. The most of imbWBIWeb Business Intelligence libraries of imbVeles Framework., imbNLP and imbWEM, operations are implemented in parallel manner, using multi-threading to cut execution times – keeping all CPU cores quite busy during experiments. Therefore, we recommend at least: 20 GB of RAM, 64bit quad-core CPU and preferably to have the program running on a dedicated SSD unit.
-
Software requirements
- imbVeles Framework is developed for .NET Framework 4.0, therefore, it should work even on Windows XP SP2. So far, no tests were made on a Linux distribution – but theoretically it should be possible, since .NET Framework 4.0 is well behind cutting-edge compatibility Mono CLR achieved (currently compatible with .NET 4.7).
- To perform imbWEM experiments (Semantic Inner Crawling), you would need BrightStarDB triplestore, accessible via HTTP or as Windows service (http://brightstardb.com/)
- MySQL database is supported as storage for System Knowledge (Named Entity Tables, special Data Mining POS tag tables, TLD index, index of world countries, languages…) but it is not required by default configuration of the system. However, if you make use of MySQL, keep in mind that by default configuration imbACE subsystem will look for server on local host (127.0.0.1) as WAMP was used during development (http://www.wampserver.com/en/).
- Excel files are generated with open-source libraries, therefore, no installation of Microsoft Office (nor OpenOffice/LibreOffice) is required for proper reporting.
-
Included resources
The installation is bundled with linguistic resources and various data sets, consumed by different components of the system.
- Top Level Domain index, is actually a XML file with comprehensive relational table, that puts together: TLDs with global WhoIS servers, national and international sub TLDs and basic national identification – ISO codes. The TLD index is critical for proper operation of imbWEM crawlers and subsequently for imbWBIWeb Business Intelligence libraries of imbVeles Framework. Web Classification algorithm as well.
- Another important option to consider opting for is “Lexical Resources: Standard set of Hunspell dictionaries”.
- Hunspell dictionaries are used by many components of imbNLP, imbWEM and imbWBIWeb Business Intelligence libraries of imbVeles Framework.. The imbWEM and imbNLP have several text-content language detection algorithms, where the most precise one, combines given number of different Hunspell dictionaries to filter out false positives. Working with higher number of dictionaries proved critical, especially when languages from the same group had to be distinguished (e.g.: Slovenian vs Serbian vs Russian).
- Beside Hunspell monolingual dictionaries, current version of imbNLP supports: Apertium bilingual dictionaries, decompressed Unitex/GramLab inflectional lexicon format (tag-set) and MULTEXT-EAST morphosyntactic tag-set format, used for SrLex 1.2. The last is bundled with the installer, as we are using the resource in our recent research on Web Classification.
- The MULTEXT-EAST tag-set interpretation is specified in an external Excel spreadsheet document, that you might modify in order to make the system compatible with language and tag-set format of your interests. The imbNLP morphosyntactic framework is designed with aim to provide common foundation for different enumeration schema and systematization used for lexical resources across the globe. Such goal seems impossible to achieve completely, on any reasonable level of abstraction, but we’ll continue with development in direction of greater cross tag-set format compatibility. In case current scope of the imbNLP framework fits your needs: consider modifying that Excel spreadsheet template (delivered in this installation)..
- To reproduce our research in Web Classification of manufacturing business entities, the best way is to install all optional resources – shipped with the Windows Installer. You will also need additional resources given below as separate downloads. Detailed instructions on reproduction of our experiments are given here.
More documentation and tutorials on imbWBIWeb Business Intelligence libraries of imbVeles Framework., imbNLP and imbWEM are under production. Meanwhile, it might be good idea to read about basics of imbVeles framework, particularly on imbACE Console Application framework, concept of ACE Console-s, ACE Console Plugins, Workspaces and ACE Script (automation and console&command-line instruction language).