Understanding the basics of classification

Introduction to Logistics regression

Understanding the logit function

Coefficients in logistics regression

Concept of maximum log-likelihood

Performance metrics like confusion metric, recall, accuracy, precision, f1-score, AUC, and ROC

Importing the dataset and required libraries.

Performing basic Exploratory Data Analysis (EDA).

Data inspection and cleaning

Using statsmodel and sklearn libraries to build the model

Splitting Dataset into Train and Test using sklearn.

Training a model using Classification techniques like Logistics Regression,

Making predictions using the trained model.

Handling the unbalanced data using various methods.

Performing feature selection with multiple methods

Saving the best model in pickle format for future use.

**Business Objective **

Predicting a qualitative response for observation can be referred to as classifying that observation since it involves assigning the observation to a category or class. Classification forms the basis for Logistic Regression. Logistic Regression is a supervised algorithm used to predict a dependent variable that is categorical or discrete. Logistic regression models the data using the sigmoid function.

Churned Customers are those who have decided to end their relationship with their existing company. In our case study, we will be working on a churn dataset.

XYZ is a service-providing company that provides customers with a one-year subscription plan for their product. The company wants to know if the customers will renew the subscription for the coming year or not.

**Data Description **

** **

The CSV consists of around 2000 rows and 16 columns

**Features:**

- Year
- Customer_id - unique id
- Phone_no - customer phone no
- Gender -Male/Female
- Age
- No of days subscribed - the number of days since the subscription
- Multi-screen - does the customer have a single/ multiple screen subscription
- Mail subscription - customer receive mails or not
- Weekly mins watched - number of minutes watched weekly
- Minimum daily mins - minimum minutes watched
- Maximum daily mins - maximum minutes watched
- Weekly nights max mins - number of minutes watched at night time
- Videos watched - total number of videos watched
- Maximum_days_inactive - days since inactive
- Customer support calls - number of customer support calls
- Churn -

- 1- Yes
- 0 - No

**Aim**

** **

Build a logistics regression learning model on the given dataset to determine whether the customer will churn or not.

** **

**Tech stack **

- Language - Python
- Libraries - numpy, pandas, matplotlib, seaborn, sklearn, pickle, imblearn, statsmodel

** **

**Approach **

- Importing the required libraries and reading the dataset.
- Inspecting and cleaning up the data
- Perform data encoding on categorical variables
- Exploratory Data Analysis (EDA)

- Data Visualization

- Feature Engineering

- Dropping of unwanted columns

- Model Building

- Using the statsmodel library

- Model Building

- Performing train test split
- Logistic Regression Model

- Model Validation (predictions)

- Accuracy score
- Confusion matrix
- ROC and AUC
- Recall score
- Precision score
- F1-score

- Handling the unbalanced data

- With balanced weights
- Random weights
- Adjusting imbalanced data
- Using SMOTE

- Feature Selection

- Barrier threshold selection
- RFE method

- Save the model in the form of a pickle file.

Personal Introduction

01m

The basics of classification

04m

Intuition behind logistic regression

05m

Understanding logit function Part1

06m

Understanding logit function Part2

06m

Coefficients in logistic regression

03m

Maximum likelihood

03m

Project workflow

01m

Introduction to r-squared value Part1

05m

Introduction to r-squared value Part2

03m

Understanding p-values

02m

Data inspection and cleaning

04m

Encoding categorical variables

04m

Exploratory data analysis(EDA)

07m

Running the logistics regression with statsmodels

06m

Model evaluation - Confusion matrix

08m

Model evaluation - ROC and AUC

04m

Running the model with sklearn

04m

Evaluating the performance metrics

03m

Dealing with class imbalance

07m

Feature selection

04m

Save and load the model

01m