Course Description
Written text usually consists of multiple sentences; to fully understand a text as a whole requires information that cannot be obtained when considering each sentence individually. In this seminar course, we look at discourse processing: how references to entities, and relationships between clauses and sentences (e.g., cause, result, elaboration), contribute to the local coherence of the text. We will study the following aspects: frameworks, corpora, and computational models (e.g., coreference resolution and discourse parsing). We will also discuss discourse processing in the context of a number of Natural Language Processing tasks, such as summarization, question answering, and sentiment analysis.
Prerequisites
Graduate standing. LIN 353C or CS 388 or CS 395T or equivalent prior exposure to Computational Linguistics/Natural Language Processing.
Texts
There will be no textbook for this seminar; reading material will consists of technical papers discussed in each meeting. That said, if you would like overview texts of discourse processing, check out Chapters 15 and 16 of Jacob Eisenstein’s notes, and/or Discourse Processing by Manfred Stede.
Organization and Content
- Research seminar: we will discuss a mix of theoretical and computational papers in discourse and pragmatics. Topics and readings will be given on the schedule page. For each paper, we will:
- Give an initial summary of the paper;
- Discuss comments regarding the paper, which may include strengths, weaknesses, and technical questions.
- Course project: each student will be involved in one course project throughout the semester (individually or in teams of 2). For the first half of the semester, explore potential topics and/or conduct preliminary experiments, and settle on a project. For the second half of the semester, work on the project. Specific deliverables include:
- Midterm presentation (10 mins) and report (2 pages).
- Final presentation (15 mins) and report (8 pages).
- Assignments: assignments will be hands-on experiments using tools in computational linguistics; team of 2 is allowed. Assignment topics include:
- Running discourse parsers and coreference resolution systems;
- Learning to set up annotation tasks such as MechanicalTurk and Brat.
Grading Policy
- Class participation (10%)
- Two assignments (15% each, 30% total)
- Class project (60% total):
- Midterm report 15%
- Midterm presentation 5%
- Final report 30%
- Final presentation 10%
Extension Policy
If you turn in your assignment late, expect points to be deducted. Extensions will be considered on a case-by-case basis. If you anticipate that you will need an extension for some assignment, let me know in advance.
By default, 5 points (out of 100) will be deducted for lateness, plus an additional 1 point for every 24-hour period beyond 2 that the assignment is late. The maximum extension penalty is 40 points if handed in before the last day of class. Resubmissions of assignments are allowed; extension penalty applies for post-deadline resubmissions. Note that there are always some points to be had, even if you turn in your assignment late. So if you would like to know if you should still turn in the assignment even though it is late, the answer is always yes.
Academic Dishonesty Policy
You are encouraged to discuss assignments with classmates. But all written work must be your own. Students caught cheating will automatically fail the course. If in doubt, ask the instructor.
Schedule
- Week 1
- 8/30: Course introduction; Plan for the semester.
- Week 2: background
- 9/4
- 9/6
- Discourse from a cognitive perspective: Sanders and Spooren, Discourse and Text Structure [UT-Access] [ResearchGate]
- Week 3: topic segmentation
- 9/11
- 9/13
- Related/interesting papers:
- Week 4: coreference resolution
- 9/18
- 9/20
- Related/interesting papers:
- Week 5: bridging/centering
- Week 6: local coherence
- Week 7: RST
- 10/9
- 10/11
- Background/reference papers:
- Week 8: Project presentations
- 10/16
- 10/18
- 10/19 Midterm report due at 11:59pm
- Week 9: RST
- Week 10: RST application/PDTB
- Week 11: PDTB
- 11/6
- 11/8
- Related papers/additional readings:
- Week 12: PDTB and applications
- 11/13
- 11/15
- Additional readings:
- Week 13: Argumentation
- 11/20
- 11/22: Thanksgiving break
- Additional readings:
- Week 14: Project presentations/Argumentation
- 11/27
- 11/29
- Additional readings:
- Week 15: Project presentations
- 12/10 (last day of class) final reports due at 11:59pm