NLP & IE

MTE

MULTITEXT EAST:  language resources, a multilingual dataset for language engineering research, focused on the morphosyntactic level of linguistic description. The MULTEXT-East dataset includes the EAGLES-based morphosyntactic specifications, morphosyntactic lexicons, and an annotated multilingual corpora. The parallel corpus, the novel “1984” by George Orwell, is sentence aligned and contains hand-validated morphosyntactic descriptions and lemmas. The resources are…

MWU

Multi-Word Units Multi-word units (MWUs) encompass a bunch of hard-to-define and controversial linguistic objects (cf. [52], [18]). Their numerous linguistic and pragmatic definitions ([5], [22], [65], [4], [36], [3], [86], [37], [13]) invoke three major points: points:  they are composed of two or more words  they show some degree of morphological, distributional or…