Activity33
Ensemble Forecasting
Ensemble forecasting combines predictions from multiple models to average out individual biases and reduce variance. This approach is particularly useful when data may exhibit regime uncertainty or when different models capture complementary features of the series.
Key Equation:
\[ \begin{align} \hat{y}_t^{Ensemble} &= \frac{1}{m}\sum_{i=1}^m \hat{y}_t^{(i)} \end{align} \]
where \(m\) is the number of models.
Australian Cement Production Example
We use the aus_production dataset focusing on Australian cement production. The code below loads the data, builds two models (ETS and ARIMA), and forms a simple average ensemble.
Exploring Ensemble Forecasts with Simulations
The generate() function is useful for simulating multiple future scenarios (Monte Carlo simulations). This can help explore forecast uncertainty and the range of possible outcomes.
Further Exploration Prompts:
- Experiment with different ensemble weights (e.g., weighted average instead of simple average).
- Compare forecast accuracy using measures like MAE or CRPS by splitting your data into training and test sets.
- Use residual analysis (e.g., with
gg_tsresiduals()
) to check for model adequacy.
Lab Activity: Sydney Tourism Ensemble Models
This guided activity explores forecasting for Sydney tourism using the tourism dataset. Follow these steps and review the prompts to deepen your understanding:
Data Preparation & Model Building:
Filter the tourism data for Sydney (holiday purpose) and build three models: two ETS variants and an ARIMA model.
Model Comparison:
Compare the information criteria (e.g., AICc, BIC) and residual variance to decide on the best model.
Simulating Future Scenarios:
Use the generate() function to simulate 100 future paths over 12 months. Examine the range of forecasts.
Further Exploration:
- Prompt 1: How do the different model specifications (ETS_ANA vs. ETS_AAA) impact the forecast uncertainty?
The ETS_AAA model (additive trend) shows wider forecast intervals than ETS_ANA (no trend) because trend uncertainty compounds over time. The ensemble averages these uncertainties, resulting in low variance.
- Prompt 2: Try adjusting the forecast horizon or the number of simulation paths; what changes do you observe in the distribution of outcomes?
Increasing the horizon amplifies divergence in simulated paths due to accumulating errors. More paths (e.g., times = 500
) better approximate the forecast distribution but don’t fundamentally alter its spread.
- Prompt 3: Experiment with combining models using different weights and compare the ensemble’s performance with that of individual models.
- Prompt 4: Perform a residual analysis on each model to detect potential structural breaks or model inadequacies.
Need to check for:
- Autocorrelation (significant spikes in ACF → poor fit)
- Non-normality (histogram skew → invalid prediction intervals)
- Heteroscedasticity (changing variance → consider transformations)