🏥 Diabetes Prediction

Multi-Model ML Pipeline with Automated CI/CD

Models Trained

4

Dataset Size

442

Features

10

Best Model

XGBoost

📊 Model Performance Comparison

Model Comparison

Comprehensive comparison of 4 machine learning models on diabetes prediction task

🤖 Models Compared

Linear Regression

Simple baseline model with linear assumptions

Ridge Regression

L2 regularization to prevent overfitting

Random Forest

Ensemble of 100 decision trees

XGBoost 🏆

Gradient boosting - typically the winner!

✨ Features

Automated Training: All models trained automatically on every push
Best Model Selection: System picks the best performer
CI/CD Pipeline: Full automation with GitHub Actions
Docker Ready: Containerized for easy deployment
Flask API: REST API for predictions
Comprehensive Tests: 7 automated tests ensure quality

🔄 CI/CD Pipeline

Every push to main branch automatically:

  1. Loads the diabetes dataset
  2. Trains all 4 models
  3. Runs comprehensive tests
  4. Generates performance visualizations
  5. Selects the best model
  6. Deploys updated dashboard
  7. Sends notifications (if enabled)
✅ Tests Passing 🚀 Auto-Deploy 📊 Live Updates

📚 Dataset Information

Diabetes Dataset (scikit-learn)

Ten baseline variables: age, sex, BMI, average blood pressure, and six blood serum measurements

Target: Quantitative measure of disease progression one year after baseline

📂 View on GitHub