Title: Data Science Classification Project
Objective: The goal of this project is to perform data modeling using a classification approach. The tasks will involve retrieving and preparing data, exploring the dataset, and building and comparing classification models.
Dataset Selection:
Choose the BuddyMove Data Set from the UCI Machine Learning Repository. More details about this dataset can be found : Dataset: onlineuseradvertisement dataset
Dataset
1. Download the dataset
2. Import the required libraries. Convert the dataset into CSV format if it is not already in that format.
Task 1: Data Preparation
Goal Setting: Clearly state the goal of the project.
Data Pre-processing: Thoroughly pre-process the dataset. Describe each attribute and the steps taken to prepare the data for analysis. Provide a detailed explanation of the dataset.
Task 2: Data Exploration
Column Exploration: Explore each column (or at least 10 columns if there are more than 10)
using descriptive statistics and appropriate graphs. For each column, include:
The method used to explore the column (e.g., graph type).
Observations and insights from the exploration.
Carefully formatted graphs with appropriate labels, titles, legends, and colors.
Relationship Exploration: Explore relationships between pairs of attributes
(at least 10 pairs).
For each pair:
Generate a visualization graph.
State the hypothesis being investigated.
Discuss any interesting relationships or lack thereof observed in the visualization.
Include representative graphs showing significant information.
Task 3: Data Modeling
Classification Models: Treat the data as a classification task and build two models using the following steps:
Feature Selection: Select appropriate features for the models.
Model Building: Use Decision Tree and K-Nearest Neighbors (KNN)
classifiers from sklearn.
Training and Evaluation: Train and evaluate the models, selecting appropriate parameter values and justifying the choices.
Comparison: Compare the two models, include results of the comparison, and recommend the best model.
Deliverables:
Detailed explanation of the dataset and preprocessing steps.
Exploratory analysis with graphs and insights.
Model building and evaluation process with justification of parameter choices.
Comparison of the models with a recommendation on which model to use.