Coffee Sales Data Analysis & Forecasting
Built an end-to-end analytics and forecasting system that identified top-performing products, seasonal trends, and achieved 94.1% short-term prediction accuracy using Random Forest.
Overview
An exploratory data analysis and machine learning project focused on understanding coffee sales behavior, revenue trends, and predicting future sales using historical transaction data.
Problem
The business lacked clear insights into sales patterns, high-performing products, and future demand, making inventory planning and promotional decisions inefficient.
Constraints
- Limited to a single CSV dataset with no external context
- Incomplete card/payment information
- No direct customer identifiers for segmentation
- Time-series data with strong seasonal and weekly patterns
Approach
Cleaned and engineered features from raw transaction data, performed exploratory data analysis to uncover trends, and evaluated multiple machine learning models to forecast future sales.
Key Decisions
Use rolling averages as core predictive features
Recent sales momentum proved to be the strongest indicator of future demand.
- Using only raw daily sales
- Relying solely on calendar-based features
- Manual seasonal decomposition
Select Random Forest as the final model
It outperformed linear models by capturing non-linear weekly and seasonal patterns.
- Linear Regression
- Ridge Regression
- Gradient Boosting
Tech Stack
- Python
- pandas
- NumPy
- Matplotlib
- Seaborn
- Scikit-learn
- Jupyter Notebook
Result & Impact
- 94.1%Forecast Accuracy (Last Week)
- 0.56R² Score
- 2.95 salesMean Absolute Error
Provided actionable insights into product performance and seasonal demand, enabling more informed inventory planning and marketing decisions.
Learnings
- Feature engineering has a larger impact than model complexity in time-series forecasting
- Non-linear models are better suited for real-world sales patterns
- Visualization is critical for communicating analytical insights to non-technical stakeholders