# 1. Get NYSE Composite index data (volume) from tidyquant and convert to tsibble
<- tq_get("^NYA", from = "2018-01-01") %>%
nyse as_tsibble(index = date) %>%
fill_gaps() %>%
mutate(adjusted = imputeTS::na_ma(adjusted, k = 21))
# Plot adjusted series
%>%
nyse autoplot(adjusted) +
labs(title = "NYSE Composite Index Volume", y = "Volume", x = "Date")
Activity13
Seasonal Modeling
Seasonality = Repeating patterns with fixed known periods (e.g., daily, weekly, yearly cycles). Differentiated from cycles by strict periodicity.
1. Fourier Spectral Analysis
Frequency domain decomposition using basis functions:
\[S(t) = \sum_{k=1}^K [a_k\cos(2\pi f_k t) + b_k\sin(2\pi f_k t)]\]
Critical parameters:
- Nyquist frequency: Max detectable frequency = \(f_{\text{max}} = \frac{1}{2\Delta t}\)
- Frequency resolution: \(\Delta f = \frac{1}{N\Delta t}\)
- Number of harmonics (K): Balances complexity vs. overfitting
Harmonics represent pairs of sine/cosine waves that collectively approximate complex seasonal patterns. Think of them as “building blocks” for seasonal shapes. We can capture periodic patterns in time series data using Fourier terms and periodic regressions. Fourier terms in regression models:
\[y_t = \beta_0 + \sum_{k=1}^K \beta_k\cos(2\pi kt/m) + \gamma_k\sin(2\pi kt/m) + \epsilon_t\] where \(m\) = seasonal period
Hybrid Approach: Combines Fourier terms (deterministic seasonality) with ARIMA (stochastic patterns):
\[y_t = \underbrace{\sum_{k=1}^K [\alpha_k\sin(2\pi kt/m) + \beta_k\cos(2\pi kt/m)]}_{\text{Fourier terms}} + \underbrace{\text{ARIMA}(p,d,q)(P,D,Q)_m}_{\text{Seasonal ARIMA}} + \epsilon_t\]
Datasets & Models
Finance Example (Intraday)
- Dataset:
nyse
(Stock Volume) - Model: \[ y_t = \beta_0 + \sum_{k=1}^2\Big[\alpha_k\sin\Big(\frac{2\pi kt}{78}\Big) + \beta_k\cos\Big(\frac{2\pi kt}{78}\Big)\Big] \]
- Activity: Optimize Fourier terms via cross-validation.
- Dataset:
We retrieve real NYSE Composite index data (ticker ^NYA
), extract the adjusted price, and apply Fourier regression to capture seasonal patterns.
Fit seasonal models using Fourier terms (using 1 and 2 harmonics) with cross-validation for model selection.
- K=1 → \(\cos(2\pi t/52)\), \(\sin(2\pi t/52)\)
- K=2 → Adds \(\cos(4\pi t/52)\), \(\sin(4\pi t/52)\)
# 2. Define seasonal period (business days/year ≈ 252)
<- 30 # Annual seasonality in trading days
period
# 3. K Optimization via AICc
<- map_dfr(1:10, function(k) {
aic_results <- nyse %>%
fit model(
ARIMA(
~ fourier(K = k, period = period) +
adjusted pdq(0:5, 0:2, 0:5) + PDQ(0:2, 0:1, 0:2)
)
)
glance(fit) %>%
mutate(K = k)
})
<- aic_results %>%
optimal_k filter(AICc == min(AICc)) %>%
pull(K)
# 4. Final Hybrid Model
<- nyse %>%
final_model model(
Optimal = ARIMA(
~ fourier(period = period, K = optimal_k)
adjusted
) )
# 5. Cross-validation with optimal K
<- nyse %>%
cv_results stretch_tsibble(.init = 3*period, .step = 21) %>% # 3-year window, monthly steps
model(
Optimal = ARIMA(
~ pdq() + PDQ() +
adjusted fourier(period = period, K = optimal_k)
)%>%
) forecast(h = 21) %>% # 1-month ahead
accuracy(nyse)
# 6. Diagnostic Visualization
%>%
final_model gg_tsresiduals() +
labs(title = paste("Residuals for K =", optimal_k))
%>%
final_model forecast(h = period) %>% # 1-year forecast
autoplot(nyse %>% filter(year(date) >= 2022)) +
labs(title = "NYSE Adjusted Price Forecast with Optimized Fourier-SARIMA")
Lab Activities
Task: For the following COVID data, compute daily new confirmed cases, and try fitting the periodic regression with Fourier terms with different seasonal periods (e.g., 7-day vs. 14-day cycles) and compare the residual diagnostics.
References
Guidotti, E., Ardia, D., (2020), “COVID-19 Data Hub”, Journal of Open Source Software 5(51):2376, doi: 10.21105/joss.02376