Coffee Sales Data Analysis & Forecasting

Data Analyst / Machine Learning Engineer · 2025 · 3 weeks · 1 person · 1 min read

Built an end-to-end analytics and forecasting system that identified top-performing products, seasonal trends, and achieved 94.1% short-term prediction accuracy using Random Forest.

Overview

An exploratory data analysis and machine learning project focused on understanding coffee sales behavior, revenue trends, and predicting future sales using historical transaction data.

Problem

The business lacked clear insights into sales patterns, high-performing products, and future demand, making inventory planning and promotional decisions inefficient.

Constraints

  • Limited to a single CSV dataset with no external context
  • Incomplete card/payment information
  • No direct customer identifiers for segmentation
  • Time-series data with strong seasonal and weekly patterns

Approach

Cleaned and engineered features from raw transaction data, performed exploratory data analysis to uncover trends, and evaluated multiple machine learning models to forecast future sales.

Key Decisions

Use rolling averages as core predictive features

Reasoning:

Recent sales momentum proved to be the strongest indicator of future demand.

Alternatives considered:
  • Using only raw daily sales
  • Relying solely on calendar-based features
  • Manual seasonal decomposition

Select Random Forest as the final model

Reasoning:

It outperformed linear models by capturing non-linear weekly and seasonal patterns.

Alternatives considered:
  • Linear Regression
  • Ridge Regression
  • Gradient Boosting

Tech Stack

  • Python
  • pandas
  • NumPy
  • Matplotlib
  • Seaborn
  • Scikit-learn
  • Jupyter Notebook

Result & Impact

  • 94.1%
    Forecast Accuracy (Last Week)
  • 0.56
    R² Score
  • 2.95 sales
    Mean Absolute Error

Provided actionable insights into product performance and seasonal demand, enabling more informed inventory planning and marketing decisions.

Learnings

  • Feature engineering has a larger impact than model complexity in time-series forecasting
  • Non-linear models are better suited for real-world sales patterns
  • Visualization is critical for communicating analytical insights to non-technical stakeholders