Show/hide contentOpenClose All
Curricular information is subject to change
Upon successful completion of the course, students will be able to:
1. Understand fundamental issues in (quantitative) text analysis, such as inter-coder agreement, reliability, and validation.
2. Convert texts into quantitative matrices of features, and then analyse those features using statistical methods.
3. Use human coding of texts to train supervised classifiers.
4. Apply these methods to a text corpus to address a substantive research question.
5. Critically evaluate (social science) research that uses automated text analysis methods.
Statistical software and programming using R and RMarkdown; assumptions and workflow of quantitative text analysis approaches; tokenisation and document-feature matrix; dictionaries and sentiment analysis; describing and comparing texts; human coding and document classification; supervised and unsupervised scaling; multilingual text analysis; topic models; speech recognition; word embeddings
Student Effort Type | Hours |
---|---|
Lectures | 24 |
Autonomous Student Learning | 226 |
Total | 250 |
NOTE: Prior familiarity with the statistical programming language R (or Python) is a prerequisite for this course due to its direct relevance to the content and assignments. Below are some reasons why prior experience with R or Python is crucial for students to follow the course and apply the methods effectively:
– Implementation of Text Analysis Methods: Text analysis is a central component of the course, and R is widely used for implementing text analysis techniques. R provides a comprehensive set of libraries and packages specifically designed for text processing, natural language processing (NLP), and sentiment analysis. Students with prior experience in R will be able to navigate and utilize these tools more efficiently, enabling them to implement text analysis methods covered in the course effectively.
– Course Content Alignment: The course content, lectures, and materials are designed with a focus on R-based implementation. The examples, code snippets, and demonstrations provided throughout the course will be predominantly in R. Some of the advanced methods are implemented in Python, but a good understanding of R will make it much easier to write and run code in Python. Without prior familiarity, students may struggle to comprehend and replicate these examples, hindering their understanding of the core concepts and methodologies.
– Homework Assignments and Research Papers: The assignments and research papers in this course will require students to apply the text analysis methods discussed in class to real-world data. Students without prior experience with R may find it challenging to write R code to preprocess large text corpora, visualise results, and interpret the findings. Their lack of proficiency in R could impede their ability to complete assignments accurately and efficiently.
Resit In | Terminal Exam |
---|---|
Summer | No |
• Feedback individually to students, post-assessment
Feedback will be provided to students within 20 working days of the deadline for the assignment in accordance with university policy.