Humanity generate exabytes of data every year. Most of this data is available in some rendition of natural language (in particular text). Hence, the inclusion of textual data sources is of growing importance in large-scale data-driven applications. A popular application scenario for this use are personal assistants (Siri, Google Home, Cortana, etc.), which rely partly on Web pages to extract of select answers to user questions. Processing large amounts of text in a semantically sound manner however turns out to be rather difficult for machines. The goal of this lecture is to provide students with insights in approaches based mostly on probabilistic models, which aim to facilitate the implementation of pipelines for processing natural language text. The lecture is structured as follows:
L.079.05702 Statistical Natural Language Processing (in English)