NLP & IE

POS

Part-of-speech, is very frequently used to provide linguistic information to NER and CR in form of features in statistical approaches

Schematron

In markup languages, Schematron is a rule-based validation language for making assertions about the presence or absence of patterns in XML trees. It is a structural schema language expressed in XML using a small number of elements and XPath. In a typical implementation, the Schematron schema XML is processed into normal XSLT code for deployment…

Sequences Corpus

We call sequences corpus or qualified corpus a list of sequences of one or several words that we want to be recognized by only one local grammar graph. This sequences corpus is stored in one single file wich must be from one of the following formats :  raw text files in which sequences are…

TEI Lite

TEI Lite was the name adopted for what the TEI editors originally conceived of as a simple demonstration of how the TEI (Text Encoding Initiative) encoding scheme might be adopted to meet 90% of the needs of 90% of the TEI user community. In retrospect, it was predictable that many people should imagine TEI Lite…