← Back to Projects

// Project_03 — Machine Learning

AI Football
Match Predictor

A Python ML pipeline that ingests historical football match data, engineers meaningful features, and trains a classifier to predict Win / Draw / Loss outcomes — complete with per-team win probability outputs.

Python scikit-learn pandas Streamlit Machine Learning Data Science Sports Analytics

About the Project

This project was born from a personal challenge: could I build a machine learning model that accurately predicts football match outcomes using real historical data? Using Python and scikit-learn, I constructed a complete ML data and interactive web dashboard using Streamlit that transforms raw CSV match statistics win probabilities for every possible outcome (Win, Draw, Loss).

The model learns from historical league fixtures, identifying patterns in team statistics such as recent form, goals scored/conceded, home and away performance, head-to-head records, and other derived features. This was my first hands-on machine learning project and a significant milestone in my journey into data science and sports analytics. It showcased the power of feature engineering and the importance of proper model evaluation.

App Overview

🔒 football-predictor-pro · Streamlit
Football Predictor Pro — full walkthrough
📊
Predictions Dashboard
Upload CSV → instant fixture predictions
📈
Model Metrics
Classification report & accuracy scores

What It Does

  • Reads and parses historical match data from CSV files using pandas
  • Cleans and preprocesses data, handling missing values and data type conversions
  • Engineers powerful features: rolling form windows, home/away win rates, goal averages, head-to-head statistics
  • Trains a classification model (Random Forest or Logistic Regression) via scikit-learn
  • Outputs precise Win / Draw / Loss probabilities for any given fixture
  • Evaluates model performance using cross-validation, precision, recall, and F1-scores
  • Serializes the trained model to disk for efficient reuse without retraining
  • Developed an interactive Streamlit dashboard for local data processing and real-time prediction visualization.
  • Provides comprehensive classification reports and confusion matrices for model analysis

Predicted Probabilities

Home Away Win % Draw % Loss %
Team ATeam B62%21%17%
Team CTeam D34%38%28%
Team ETeam F19%27%54%

What I Learned

As someone relatively new to Python, this project was a breakthrough moment in my learning journey. Completed in less than one month, it demonstrates the power of effective prompt engineering and leveraging AI tools like Claude to accelerate development. Rather than spending months learning Python fundamentals in isolation, I was able to tackle a real machine learning project by asking the right questions and iterating with AI assistance.

Through this project, I gained hands-on exposure to pandas for data manipulation, scikit-learn for model building, and the complete ML workflow. I learned that understanding your data and asking precise technical questions is often more valuable than prior expertise. This experience reinforced that day-by-day learning combined with intelligent tool use can produce tangible results faster than traditional approaches. It's a testament to how prompt engineering and AI collaboration can bridge knowledge gaps and enable rapid skill development.