Guide: reproducing the Web Classification Research

Installing the Console Tool

  • Please, take a look at general recommendations regarding the hardware and software requirements [here]
  • Download all archives attached to this page (the list is below the article)
  • Unzip the imbWBI Console Tool v0.3.1 and run the .msi Windows Installer Package to install the imbWBIWeb Business Intelligence libraries of imbVeles Framework.:Console Tool
  • Select all features offered by the Setup Wizard, as these resources are required by the Web Classification system (internally named: imbWBIWeb Business Intelligence libraries of imbVeles Framework..IndustryTermModel)

All features selected, and non-default path set

  • Once the installation is completed, and if it was successful, the imbWBIWeb Business Intelligence libraries of imbVeles Framework. Console Tool icon should appear in the Start Menu

imbWBIWeb Business Intelligence libraries of imbVeles Framework. Console Tool at Start Menu – if installation was ok

  • Run it, and you should get screen as shown below:

imbWBIWeb Business Intelligence libraries of imbVeles Framework. Console Tool – the first run: after the application boot sequence (finished with line: “All index page filenames count: 23”) script named “autoexec.ace” is automatically executed. In this case, the application had to generate default “autoexec.ace” script, since this is the first run after installation. Default “autoexec.ace” contains two lines with comments and calls imbACE Command Console command “help”, without argument – therefore, the application asks the user to pick “Help option” argument value.

After installation, only one sub-directory existed at [imbWBIWeb Business Intelligence libraries of imbVeles Framework. Install Path], called resources. Together with the first boot of the Console Tool, initial directory sub-tree is created automatically. It will make the rest of the experimental setup easier – that was the only purpose of running the Console, therefore, you may should close it now.

Autocreated folders, at directory where Console Tool is installed


Deployment of the project and MC Repository

Next step is to deploy clone of our imbWBIWeb Business Intelligence libraries of imbVeles Framework..IndustryTermModel project. The project contains definitions of categories, list of web sites in each (sample set), and other settings that are common for all experiments reported.

Remark: You might noticed in reports, published in the data.mendeley.com repository, that our project name was [itm01]. Since [itm01] is actually default name for every new project, we renamed the clone (supplied in archive below) to [wcr], so you don’t get confused if by accident you create new project, that would have very different configuration then ours.

Original project [itm01] folder structure graph (PDF version)


Now, unzip content of the Main Repository archive

  • to the directory: [imbWBIWeb Business Intelligence libraries of imbVeles Framework. Installation path]\resources\MCRepo

For component evaluation and system configuration experiments, limited version of the repository is used Limited MC Repository.

Remark: By deploying the very same web content (the MC Repository), used in our experiments, we ensured you are able to get exactly the same results as I did. Alternatively, you could perform the complete process from beginning, starting with web content retrieval (crawling). If you want to take that path, then skip the MC Repository deployment, as you will build new one with the web crawler. 

To read next:

Mendeley Data data sets:

Grubić, Goran (2018), imbWBIWeb Business Intelligence libraries of imbVeles Framework.: Classification of Business Entities on Multilingual Web: configuration optimization, auxiliary experimental reports and other resources”, Mendeley Data, v1 – http://dx.doi.org/10.17632/mg98ypgc8s.1

Grubić, Goran (2018), imbWBIWeb Business Intelligence libraries of imbVeles Framework.: Classification of Business Entities on Multilingual Web – The Main Results”, Mendeley Data, v1 – http://dx.doi.org/10.17632/8x9n2mn7h4.1

(this is draft version – to be updated soon)

 

Attachments

  • imbWBI Console Reference
    imbWBIWeb Business Intelligence libraries of imbVeles Framework. Console Tool autogenerated help document
    File size: 147 KB Downloads: 1033
  • Original project [itm01] folder structure graph - PDF
    Original project [itm01] folder structure graph - PDF
    File size: 126 KB Downloads: 529
  • Business Entities Classification - Project clone
    Updated version of the imbWBIWeb Business Intelligence libraries of imbVeles Framework. Console Application - IndustryTermModelIndustry Term Model is working title for the Web Classification algorithm, and it refers to particular namespace within imbWBI (documentation). The namespace contains few classes that are just connecting different parts of imbWBI.Core (documentation), imbNLP.PartOfSpeech (documentation) and imbWEM.Core (documentation) libraries, together to perform classification of business entities, actually their web sites, using natural language processing, ontology construction and at the finale:... project clone
    File size: 3 MB Downloads: 445
  • Limited MC Repository
    Limited edition of the MC Repository, used in research on web classification - during component evaluation and system configuration optimization
    File size: 36 MB Downloads: 384
  • Main Repository
    Main edition of the MC Repository, used in research on web classification - for system evaluation
    File size: 48 MB Downloads: 387
  • imbWBI Console Tool v0.3.1
    2nd release of imbWBIWeb Business Intelligence libraries of imbVeles Framework. Experimentation Console Tool
    File size: 101 MB Downloads: 510
Spread the love