Project Description
Data Mining Group Project
​
The project is an opportunity to explore a machine learning related problem of choice. The idea is to perform an end-to-end machine-learning project on a real-world data set. You will go through the entire machine-learning life cycle – data collection, cleaning, exploratory data analysis, applying machine-learning models, and evaluation of the models.
You will work in teams of three to four people. Please follow the key dates and deliverable described below. Submit your interim reports to d2l, and on a team web page.
​
Project Proposal (Due: March 21)
​
Your project proposal (1-2 pages) should include:
-
Project title, team name, team information, and a public web page for your project.
-
Project description: description of the problem, why it is interesting, and who benefits.
-
Project goals: what you plan to achieve, and how you would evaluate your model.
-
Data Set – description of data set, how are you collecting the data? You may benefit by looking at the Data Set repositories posted under “Useful Links” section of the course web page.
-
Tools – specific tools/package that you plan to use for this.
-
Literature Review – Collect 2-5 papers or resources related to your problem/area and summarize them. Type in your keywords and search for related papers/ industry white papers using your favorite search engine. Go to ACM (https://dl.acm.org/) and IEEE Digital libraries (https://www.computer.org/csdl) .
​
Midway Report and Short Presentation (Due: April 10)
​
This should be a 3-4 pages short report and it serves as a checkpoint. This will help you make sure you are on right track. Your report should highlight what you have accomplished so far, and plans for the rest of the semester. Make sure to update your team web page that reflects your progress such as your mid-term report, link to your GitHub page, etc.
Your midway report should be a template of your final report. You may build on your project proposal. The report should have paper title, team information, abstract, introduction (which typically includes problem statement, motivation), related work, data set and features, methods (models), results and discussion, and then conclusions. Some of these sections will be incomplete, and that is expected.
Be sure to include each team member’s contribution in the appendix of the report.
Upload 3-5 slides describing your progress to the shareable google slide deck given here.
https://docs.google.com/presentation/d/1wlUzfdUcQ6iYfhmdfIDGV-5hZ72U7w3E3unnAMIm6Ng/edit?usp=sharing
You will present a quick summary (3-5 minutes) on April 12th.
Project Presentations (May 1, May 3rd)
Final Report (Due: May 8)