Statistical Natural Language Processing

Lecture Master

Content

Humanity generate exabytes of data every year. Most of this data is available in some rendition of natural language (in particular text). Hence, the inclusion of textual data sources is of growing importance in large-scale data-driven applications. A popular application scenario for this use are personal assistants (Siri, Google Home, Cortana, etc.), which rely partly on Web pages to extract of select answers to user questions. Processing large amounts of text in a semantically sound manner however turns out to be rather difficult for machines. The goal of this lecture is to provide students with insights in approaches based mostly on probabilistic models, which aim to facilitate the implementation of pipelines for processing natural language text. The lecture is structured as follows:

Finite-state automata
Language models
Spell checkers
Deduplication
Classification
Hidden Markov Models
Grammar and semantics
Parsing natural language
Word Sense Disambiguation
Distributional semantics

Course in PAUL

L.079.05702 Statistical Natural Language Processing (in English)

Contact

Michael Röder