Telecom Customer Churn Prediction

The GitHub repository for this project can be viewed here.

Telecom Customer Churn Prediction

This project applies supervised machine learning techniques to analyze and predict customer churn in the telecom sector. Using the Telco Customer Churn dataset, the goal was to identify key drivers of churn and build models capable of predicting whether a customer is likely to leave.

๐Ÿ“Œ Note: This is a guided learning project included in my portfolio to demonstrate my ability to execute an end-to-end data science pipeline.


๐Ÿง  Models Trained

  • Logistic Regression
  • K-Nearest Neighbors (KNN)
  • Support Vector Classifier (SVC)
  • Decision Tree Classifier
  • Random Forest Classifier
  • AdaBoost Classifier
  • Gradient Boosting Classifier
  • Voting Classifier (ensemble of selected models)

๐Ÿ† Best Performer

  • Gradient Boosting Classifier
    • Accuracy: ~81%
    • ROC AUC: ~0.84

๐Ÿ“‚ Files in the Repository

FileDescription
TelcoCustomerChurn.ipynbFull notebook containing EDA, preprocessing, and model evaluation
WA_Fn-UseC_-Telco-Customer-Churn.csvOriginal dataset sourced from Kaggle
README.mdFull project documentation

๐Ÿ“ˆ Key Insights

  • Churn Rate: ~26.6%
  • High Risk Groups: Month-to-month customers, electronic check users, new customers, and fiber optic users
  • Tenure Matters: Long-term customers are significantly less likely to churn

๐Ÿงช Evaluation Metrics Used

  • Accuracy
  • Precision
  • Recall
  • F1 Score
  • ROC AUC
  • Confusion Matrix

๐Ÿงฐ Tools & Technologies

  • Python 3.8+
  • Libraries: pandas, numpy, matplotlib, seaborn, scikit-learn
  • Jupyter Notebook

๐Ÿงบ Future Improvements

  • Hyperparameter tuning with GridSearchCV
  • Feature importance visualization
  • Integration with customer support or feedback data
  • Deploy as a churn risk prediction tool using Streamlit or Flask