COMP30780 Data Science in Practice

Academic Year 2020/2021

This is an intensive 15-credit,, project-based, data science module. It is designed for students of Computer Science on the Data Science pathway who are not participating in the in-trimester internship programme.

In this module, students will have an opportunity to put their data science skills into practice by implementing a substantial data science project of their own design. Typically this will involve working on a dataset(s) to investigate a meaningful set of research questions, which may be answered from the data. Students are encouraged, within reason, to follow their own interests, and expertise, when it comes to selecting a suitable project topic and dataset.

The project is a collaborative one, and will typically involve groups of 2 students working together on all aspects of the project, from data capture and preparation, through to analysis, and results presentation. The work will be carried out in Python using Jupyter notebooks and the Python 'data-stack', including Pandas, Numpy, Matplotlib etc. Each group will collaborate to produce a final project report and presentation, along with the product of their analysis (a completed Jupyter notebook(s), supplementary code, validated dataset(s) etc.)

Prior to the start of the module students will be expected to assemble into suitable groups and to develop their initial ideas for their proposed project. This will typically take place during the first week of march, one-week prior to the module bootcamp.

The module will begin with an intensive "data science bootcamp" in which the students will be taken through an example data analysis project. This will help students to refresh their Python knowledge and to learn how it can be applied to common data science scenarios and tasks. At the conclusion of the bootcamp students groups will make a formal presentation of their own group project proposal.

The practical work for this module will be supported by intensive weekly laboratory sessions in which students will present their progress and engage in relevant problem-based Q&A designed to help them to achieve their goals. Every week (Mondays and Friday) groups will be expected to prepare short status updates to discuss their plans for the week and their achievements by the end of the week. Students will also participate in regular code reviews to help them learn about coding best practice,

During the module regular guest lectures will be hosted to explore topical aspects of data science in practice. Writing and presentation workshops will also support students in the latter stages of the project as their prepare their final report and project presentation.

This is a technically challenging module that requires students to have mastered the basics of Python and its data science packages such as Pandas and Matplotlib.

Show/hide contentOpenClose All

Curricular information is subject to change

Learning Outcomes:

Key learninng outsomes shall include:

1) The key learning objective of this module is for students to gain real-world experience in developing their own substantive data science project, from start to finish.

2) They will further develop their technical skills in Python and its data-stack as they work to capture, clean, analyse and visualise their datasets and insights.

3) They will gain valuable experience when it comes to designing a data science project: formulating an appropriate set of hypotheses; identifying and harnessing suitable data sources; applying statistical and data science best-practices to explore their research questions using this data; validating and presenting data science results.

4) The module will also provide students with an opportunity to further their collaboration and project management skills in the context of a data science task.

5) Students will also learn about the importance of reproducibility in science and put concrete ideas into practice in their project work.

6) Students will gain valuable experience in "data carpentry" and learn about best practices when it comes to data science workflows (scripting, version control, testing).

Student Effort Hours: 
Student Effort Type Hours
Lectures

25

Practical

275

Total

300

Approaches to Teaching and Learning:
This project includes a mix of approaches to teaching and learning include:
- Traditional lectures.
- In-class Q&A, code reviews, discussions.
- Group work, project management, status reporting etc.
- In-class presentations.
- Producing a final project work.

 
Requirements, Exclusions and Recommendations
Learning Requirements:

Students must be competent in Python and its core data science packages such as Pandas and Matplotlib.

Students should also have a basic foundation in statistics and hypthesis testing.

Learning Recommendations:

Some knowledge of machine learning will be useful.


Module Requisites and Incompatibles
Co-requisite:
COMP30750 - Information Visualisation -DS , COMP30760 - Data Science in Python - DS, COMP30770 - Programming for Big Data


 
Assessment Strategy  
Description Timing Open Book Exam Component Scale Must Pass Component % of Final Grade
Group Project: Data Science Project Throughout the Trimester n/a Graded No

100


Carry forward of passed components
No
 
Remediation Type Remediation Timing
In-Module Resit Prior to relevant Programme Exam Board
Please see Student Jargon Buster for more information about remediation types and timing. 
Feedback Strategy/Strategies

• Feedback individually to students, post-assessment

How will my Feedback be Delivered?

Not yet recorded.

Python for Data Analysis by Wes McKinney (O'Reilly) or equivalent book on Python and Pandas.