Introduction to NLP#
📜 Course Description#
This course covers the basics of natural language processing, including text pre-processing, part-of-speech tagging, parsing, and semantic analysis.
The course will also cover advanced topics such as word embeddings. Upon completion of this course, students will be able to apply these techniques to real-world data sets.
♾️ Learning Goals#
Be able to describe key concepts, models and challenges in Natural Language Processing
Be able to describe, implement, and apply a variety of fundamental algorithms in Natural Language Processing
Be able to describe and evaluate more complex software systems for various Natural Language Processing tasks
Be able to describe current approaches, datasets and systems for various Natural Language Processing tasks
📚 Textbook#
Jurafsky, D., & Martin, J. H. (2019). Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition (3rd ed.). Upper Saddle River, NJ: Pearson Education.
Reference Books#
Eckert, P., & Strunk Jr., W. (2000). Elements of style (4th ed.). New York: Allyn and Bacon.
Gelbukh, A. (Ed.). (2018). Computational linguistics and intelligent text processing: 18th international conference, CICLing 2017, New Delhi, India, February 19-25, 2017, revised selected papers (Vol. 10573). Cham: Springer International Publishing.
Mitchell, T. M. (1997). Machine learning (1st ed.). New York: McGraw-Hill Science/Engineering/Math.
NLTK Book - Bird et al.: Natural Language Processing with Python
🏆 Grading#
Participation: 10%
Midterm: 30%
Term Project: 60%
🧠 Term Project#
For the term project, students will choose a real-world data set and build a natural language processing system that can perform some task on the data set.
The project will be presented in the form of a poster at the end of the semester.
📒 Lecture Notes#
You can find the lecture notes of the course by clicking on the following link:
https://entelecheia.github.io/ekorpkit-book/docs/lectures/intro_nlp
🎲 The Whole Game#
Harvard Professor David Perkins’s book, Making Learning Whole, popularized the idea of “teaching the whole game.”
We don’t require kids to memorize all the rules of baseball and understand all the technical details before we let them play the game.
Rather, they start playing with a just general sense of it, and then gradually learn more rules/details as time goes on.
This course takes this approach to deep learning.
Most courses on deep learning focus only on what the network “is” and how it works.
This course is different: instead of teaching just the network, we show how to use it to solve problems.
We start by teaching a complete, working, very usable deep learning network using simple, expressive tools. Then we show how to use it to solve real-world problems.
This approach has several advantages:
It makes deep learning more accessible and understandable. Students can see how deep learning can be used in practice, and they can immediately start using it to solve their own problems.
It helps students learn the whole game of machine learning, not just deep learning. In addition to showing how to use a state-of-the-art deep learning network, we also teach important concepts such as data preprocessing, model evaluation, and deployment.
It gives students a strong foundation for further study. Because the course covers both the theory and practice of deep learning, students will be well prepared for more advanced courses on the subject.
🗓️ Table of Contents#
- Introduction
- Getting started with ekorpkit
- Research Applications
- Language Models
- Topic Modeling
- Topic Models
- Topic Coherence Measures
- Sentiment Analysis
- Tokenization
- Word Segmentation and Association
- Vector Semantics and Representation
- Word Embeddings
- Lab 1: Preparing Wikipedia Corpora
- Lab 2: EDA on Corpora