Telecom Customer Churn Prediction
The GitHub repository for this project can be viewed here.
Telecom Customer Churn Prediction
This project applies supervised machine learning techniques to analyze and predict customer churn in the telecom sector. Using the Telco Customer Churn dataset, the goal was to identify key drivers of churn and build models capable of predicting whether a customer is likely to leave.
๐ Note: This is a guided learning project included in my portfolio to demonstrate my ability to execute an end-to-end data science pipeline.
๐ง Models Trained
- Logistic Regression
- K-Nearest Neighbors (KNN)
- Support Vector Classifier (SVC)
- Decision Tree Classifier
- Random Forest Classifier
- AdaBoost Classifier
- Gradient Boosting Classifier
- Voting Classifier (ensemble of selected models)
๐ Best Performer
- Gradient Boosting Classifier
- Accuracy: ~81%
- ROC AUC: ~0.84
๐ Files in the Repository
File | Description |
---|---|
TelcoCustomerChurn.ipynb | Full notebook containing EDA, preprocessing, and model evaluation |
WA_Fn-UseC_-Telco-Customer-Churn.csv | Original dataset sourced from Kaggle |
README.md | Full project documentation |
๐ Key Insights
- Churn Rate: ~26.6%
- High Risk Groups: Month-to-month customers, electronic check users, new customers, and fiber optic users
- Tenure Matters: Long-term customers are significantly less likely to churn
๐งช Evaluation Metrics Used
- Accuracy
- Precision
- Recall
- F1 Score
- ROC AUC
- Confusion Matrix
๐งฐ Tools & Technologies
- Python 3.8+
- Libraries:
pandas
,numpy
,matplotlib
,seaborn
,scikit-learn
- Jupyter Notebook
๐งบ Future Improvements
- Hyperparameter tuning with GridSearchCV
- Feature importance visualization
- Integration with customer support or feedback data
- Deploy as a churn risk prediction tool using Streamlit or Flask