Why Machine Learning is more Practical than Econometrics in the Real World

Adrian Antico

August 18, 2019

Motivation

I’ve read several studies and articles that claim Econometric models are still superior to machine learning when it comes to forecasting. In the article, “Statistical and Machine Learning forecasting methods: Concerns and ways forward”, the author mentions that,

“After comparing the post-sample accuracy of popular ML methods with that of eight traditional statistical ones, we found that the former are dominated across both accuracy measures used and for all forecasting horizons examined.”

In many business environments a data scientist is responsible for generating hundreds or thousands (possibly more) forecasts for an entire company, opposed to a single series forecast. While it appears that Econometric methods are better at forecasting a single series (which I generally agree with), how do they compare at forecasting multiple series, which is likely a more common requirement in the real world? Some other things to consider when digesting the takeaways from that study:

Do the ML models benefit from building a single model to forecast all series at once, which most time series models cannot do?
What are the run-time differences with both approaches?
The author in the linked article above states that the Econometrics models outperform machine learning models across all forecast horizons but is that really the case?

Approach

In this article, I am going to show you an experiment I ran that compares machine learning models and Econometrics models for time series forecasting on an entire company’s set of stores and departments.

Before I kick this off, I have to mention that I’ve come across several articles that describe how one can utilize ML for forecasting (typically with deep learning models) but I haven’t seen any that truly gives ML the best chance at outperforming traditional Econometric models. On top of that, I also haven’t seen too many legitimate attempts to showcase the best that Econometric models can do either. That’s where this article and evaluation differ. The suite of functions I tested are near-fully optimized versions of both ML models and Econometric models (list of models and tuning details are below). The functions come from the R open source package RemixAutoML, which is a suite of functions for automated machine learning (AutoML), automated forecasting, automated anomaly detection, automated recommender systems, automated feature engineering, and more. I provided the R script at the bottom of this article so you can replicate this experiment. You can also utilize the functions in Python via the r2py package and Julia via the RCall package.

The Data

The data I’m utilizing comes from Kaggle — weekly Walmart sales by store and department. I’m only using the store and department combinations that have complete data to minimize the noise added to the experiment, which leaves me with a total of 2,660 individual store and department time series. Each store & dept combo has 143 records of weekly sales. I also removed the “IsHoliday” column that was provided.

Preview of Walmart Store Sales Kaggle Data Set

The Experiment

Given the comments from the article linked above, I wanted to test out several forecast horizons. The performance for all models are compared on n-step ahead forecasts, for n = {1,5,10,20,30}, with distinct model builds used for each n-step forecast test. For each run, I have 2,660 evaluation time series for comparison, represented by each store and department combination. In the Results section you can find the individual results for each of those runs.

The Models

In the experiment I used the AutoTS() function for testing out Econometric models and I used the RemixAutoML CARMA suite (Calendar-Auto-Regressive-Moving-Average) for testing out Machine Learning. The AutoTS() function tests out every model from the list below in in several ways (similar to grid tuning in ML). The ML suite contains 4 different tree-based algorithms. As a side note, the Econometric models all come from the forecast package in R. You can see a detailed breakdown of how each model is optimized below the Results section in this article.

Econometrics Models used in AutoTS()

DSHW — Double-Seasonal Holt-Winters
ARIMA — Autoregressive, integrated, moving average
ARFIMA — Fractionally differenced ARIMA
ETS — Exponential Smoothing State-Space Model
NN — Feed-forward neural network with a single hidden layer and lagged inputs. I’m counting this towards Econometrics because it came from the Forecast package in R which is an Econometrics package along with the fact that it’s not as customizable as a TensorFlow or PyTorch model. (Besides, I’ve seen authors state that linear regression is machine learning which would imply that all the Econometrics methods are Machine Learning but I don’t want to debate that here). If you want to consider the NN as a Machine Learning model, just factor that into the results data below.
TBATS (Exponential smoothing state-space model with Box-Cox transformation, ARMA errors, Trend and Seasonal components)
TSLM — time series linear model with trend and seasonal components

Example Plot from AutoTS(): Single Store Forecast with 80% & 95% Prediction Intervals

Machine Learning Algorithms

AutoCatBoostCARMA() — CatBoost
AutoXGBoostCARMA() — XGBoost
AutoH2oGBMCARMA() — H2O Gradient Boosting Machine
AutoH2oDRFCARMA() — H2O Distributed Random Forest

Example Plot from AutoCatBoostCARMA(): Aggregated Forecast

Results

The table outputs below shows the ranks of 11 models (7 Econometric and 4 Machine Learning) when it comes to lowest mean absolute error (MAE) for every single store and department combination (2,660 individual time series) across five different forecast horizons.

For example, in the 1-step ahead forecast table below, NN was the most accurate model on 666 of the 2,660 time series. TBATS was the most accurate 414 times out of the 2,660.

Still looking at the 1-step ahead forecast table below, the NN was the second most accurate on 397 out of 2,660 time series. TBATS was the second most accurate on 406 out of the 2,660 time series. TBATS ranked last place (11th) 14 times.

Winner in Single Model Accuracy — TBATS is the winner of the competition (Econometrics model) with a mean rank of 1.6 across all five forecast horizons.
Runner-Up in Single Model Accuracy — Catboost is the runner up of the competition (Machine Learning model) with a mean rank of 3.6 across all five forecast horizons.
Winner in Run time — ML is winner: For a single run (there were 5 total, 1 for each forecast horizon) the Econometrics automated forecasting took an average of 33 hours! to run while the automated ML models took an average of 3.5 hours, where each run included a grid tune of 6 comparisons, (1 hour for CatBoost, 1 hour for XGBoost, 30 minutes for H2O GBM and 1 hour for H2O Distributed Random Forest).
Winner in Shorter-Horizon Accuracy — The Econometrics models dominate on shorter forecast horizons.
Winner in Longer Horizon Accuracy — The Machine Learning models dominate on longer forecast horizons.

Aggregate Summaries

Ranks: Long Term = {20,30} Period-Ahead & Short Term = {1,5,10} Period-Ahead

The histograms below were derived from selecting the best Econometrics models for each individual store and department time series (essentially the ensemble results) and the best Machine Learning models for each individual store and department time series (ensemble). You can see that as the forecast horizon grows, the Machine Learning models catch up and overcome (slightly) the Econometrics models. With the shorter forecast horizon, the Econometrics models outperform the Machine Learning models by a larger amount than the converse.

Forecast MAE by Econometrics(Blue) and Machine Learning(Gold): 1-Period, 5-Period, 20-Period, 30-Period

Individual Forecast Horizon Summaries by Model

1- Period Ahead Forecast Model Counts by Rank (based on lowest MAE)

5- Period Ahead Forecast Model Counts by Rank (based on lowest MAE)

10- Period Ahead Forecast Model Counts by Rank (based on lowest MAE)

20- Period Ahead Forecast Model Counts by Rank (based on lowest MAE)

30- Period Ahead Forecast Model Counts by Rank (based on lowest MAE)

Conclusion

While the short term horizon forecasts are more accurate via the Econometrics models I tend to have a greater need for longer term forecasts for planning purposes and the Machine Learning models exceed the Econometrics in that category. On top of that, the run-time is a pretty significant factor for me.

If your business needs are the opposite, the Econometrics models are probably your best bet, assuming the run times are not a concern.

If I had enough resources available I’d run both functions and utilize the individual models that performed best for each series, which means I’d be utilizing all 11 models.

Algorithm Tuning Details

Econometrics Model Details:

Each of the individual Econometrics models in AutoTS() are optimized based on the following treatments.

Global Optimizations (applies to all models):

A) Optimal Box-Cox Transformations are used in every run where data is strictly positive. The optimal transformation could be no transformation (artifact of Box-Cox).

B) Four different treatments are tested for each model:

user-specified time frequency + no outlier smoothing & no imputation
model-based time frequency + no outlier smoothing & no imputation
user-specified time frequency + outlier smoothing & imputation
model-based time frequency + outlier smoothing & imputation

The treatment of outlier smoothing and imputation sometimes has a beneficial effect on forecasts; sometimes it doesn’t. You really need to test out both to see what generates more accurate predictions out-of-sample. Same goes with manually defining the frequency of the data. If you have daily data, you specify “day” in the AutoTS arguments. Alternatively, if specified, spectral analysis is done to find the frequency of the data based on the dominant trend and seasonality. Sometimes this approach works better, sometimes it doesn’t. That’s why I test all the combinations for each model.

Individual Model Optimizations:

C) For the ARIMA and ARFIMA, I used up to 25 lags and moving averages, algorithmically determined how many differences and seasonal differences to use, and up to a single difference and seasonal difference can be used, all determined in the stepwise procedure (all combinations can be tested and run in parallel but it’s too time consuming for my patience).

D) For the Double Seasonal Holt-Winters model, alpha, beta, gamma, omega, and phi are determined using least-squares and the forecasts are adjusted using an AR(1) model for the errors.

E) The Exponential Smoothing State-Space model runs through an automatic selection of the error type, trend type, and season type, with the options being “none”, “additive”, and “multiplicative”, along with testing of damped vs. non-damped trend (either additive or multiplicative). Alpha, beta, and phi are estimated.

F) The Neural Network is set up to test out every combination of lags and seasonal lags (25 lags, 1 seasonal lag) and the version with the best holdout score is selected.

G) The TBATS model utilizes 25 lags and moving averages for the errors, damped trend vs. non-damped trend are tested, trend vs. non-trend are also tested, and the model utilizes parallel processing.

H) The TSLM model utilizes simple time trend and season depending on the frequency of the data.

Machine Learning Model Details:

The CARMA suite utilizes several features to ensure proper models are built to generate the best possible out-of-sample forecasts.

A) Feature engineering: I use a time trend, calendar variables, holiday counts, and 25 lags and moving averages along with 51, 52, and 53-week lags and moving averages (all specified as arguments in the CARMA function suite). Internally, the CARMA functions utilize several RemixAutoML functions, all written using data.table for fast and memory efficient processing:

DT_GDL_Feature_Engineering() — creates lags and moving average features by grouping variables (also creates lags and moving averages off of time between records)
Scoring_GDL_Feature_Engineering() — creates lags and moving average features for a single record by grouping variables (along with the time between features)
CreateCalendarVariables() — creates numeric features identifying various time units based on date columns
CreateHolidayFeatures() — creates count features based on the specified holiday groups you want to track and the date columns you supply

B) Optimal transformations: the target variable along with the associated lags and moving average features were transformed. This is really useful for regression models with categorical features that have associated target values that significantly differ from each other. The transformation options that are tested (using a Pearson test for normality) include:

YeoJohnson, Box-Cox, Arcsinh, Identity,
arcsin(sqrt(x)), logit(x) — for proportion data, not used in experiment

The functions used to create the transformations throughout the process and then back-transform them after the forecasts have been generated come from RemixAutoML :

AutoTransformationCreate()
AutoTransformationScore()

C) Models: there are four CARMA functions and each use a different algorithm for the model fitting. The models used to fit the time series data come from RemixAutoML and include:

AutoCatBoostRegression()
AutoXGBoostRegression()
AutoH2oDRFRegression()
AutoH2oGBMRegression()

You can view all of the 21 process steps in those functions on my GitHub page README under the section titled, “Supervised Learning Models” in the “Regression” sub-section (you can also view the source code directly of course).

D) GPU: With the CatBoost and XGBoost functions, you can build the models utilizing GPU (I ran them with a GeForce 1080ti) which results in an average 10x speedup in model training time (compared to running on CPU with 8 threads). I should also note, the lags and moving average features by store and department and pretty intensive to compute and are built with data.table exclusively which means that if you have a CPU with a lot of threads then those calculations will be faster as data.table is parallelized.

E) One model for all series: I built the forecasts for all the store and department combinations with a single model by simply specifying c(“Store”,”Dept”) in the GroupVariables argument, which provides superior results compared to building a single model for each series. The group variables are used as categorical features and do not require one-hot-encoding before hand as CatBoost and H2O handle those internally. The AutoXGBoostCARMA() version utilizes the DummifyDT() function from RemixAutoML to handle the categorical features.

F) The max number of trees used for each model was (early stopping is used internally):

AutoCatBoostCARMA() = 1,500
AutoXGBoostCARMA() = 5,000
AutoH2oDRFCARMA() = 2,000
AutoH2oGBMCARMA() = 2,000

G) Grid tuning: I ran a 6 model random hyper-parameter grid tune for each algorithm. Essentially, a baseline model is built and then 5 other models are built and compared with the lowest MAE model being selected. This is all done internally in the CARMA function suite.

H) Data partitioning: for creating the training, validation, and test data, the CARMA functions utilize the RemixAutoML::AutoDataPartition()function and utilizes the “timeseries” option for the PartitionTypeargument which ensures that the train data reflects the furthest data points back in time, followed by the validation data, and then the test data which is the most recent data points in time. For the experiment, I used 10/143 as the percent holdout for validation data. The test data varied by which n-step ahead holdout was being tested, and the remaining data went to the training set.

I) Forecasting: Once the regression model is built, the forecast process replicates an ARIMA process. First, a single step-ahead forecast is made. Next, the lags and moving average features are updated, making use of the predicted values from the previous step. Next, the other features are updated (trend, calendar, holiday). Then the next forecast step is made; rinse and repeat for remaining forecasting steps. This process utilizes the RemixAutoML functions:

AutoCatBoostScoring()
AutoXGBoostScoring()
AutoH2oMLScoring()

Contact

If anyone is interested in testing out other models, utilizing different data sets, or just need to set up automated forecasts for their company, contact me on LinkedIn.

P.S.

I have plans to continue enhancing and adding capabilities to the automated time series functions discussed above. For example, I plan to:

Test Fourier features for both AutoTS() and the CARMA suite
Test other ML algorithms
Ensemble methods for combining forecasts
Add Croston Econometric model for intermittent demand forecasting
Create machine learning based intermittent demand forecasting models (in a similar fashion to the Croston method) utilizing RemixAutoML Generalized Hurdle Models

R Script

Code to reproduce: https://gist.github.com/AdrianAntico/8e1bbf63f26835756348d7c67a930227

library(RemixAutoML)
library(data.table)
###########################################
# Prepare data for AutoTS()----
###########################################
# Load Walmart Data ----
# link to manually download file: https://remixinstitute.app.box.com/v/walmart-store-sales-data/
data <- data.table::fread("https://remixinstitute.box.com/shared/static/9kzyttje3kd7l41y1e14to0akwl9vuje.csv", header = T, stringsAsFactors = FALSE)
# Subset for Stores / Departments with Full Series Available: (143 time points each)----
data <- data[, Counts := .N, by = c("Store","Dept")][Counts == 143][, Counts := NULL]
# Subset Columns (remove IsHoliday column)----
keep <- c("Store","Dept","Date","Weekly_Sales")
data <- data[, ..keep]
# Group Concatenation----
data[, GroupVar := do.call(paste, c(.SD, sep = " ")), .SDcols = c("Store","Dept")]
data[, c("Store","Dept") := NULL]
# Grab Unique List of GroupVar----
StoreDept <- unique(data[["GroupVar"]])
###########################################
# AutoTS() Builds----
###########################################
for(z in c(1,5,10,20,30)) {
  TimerList <- list()
  OutputList <- list()
  l <- 0
  for(i in StoreDept) {
    l <- l + 1
    temp <- data[GroupVar == eval(i)]
    temp[, GroupVar := NULL]
    TimerList[[i]] <- system.time(
      OutputList[[i]] <- tryCatch({
        RemixAutoML::AutoTS(
          temp,
          TargetName       = "Weekly_Sales",
          DateName         = "Date",
          FCPeriods        = 1,
          HoldOutPeriods   = z,
          EvaluationMetric = "MAPE",
          TimeUnit         = "week",
          Lags             = 25,
          SLags            = 1,
          NumCores         = 4,
          SkipModels       = NULL,
          StepWise         = TRUE,
          TSClean          = TRUE,
          ModelFreq        = TRUE,
          PrintUpdates     = FALSE)},
        error = function(x) "Error in AutoTS run"))
    print(l)
  }
  
  # Save Results When Done and Pull Them in After AutoCatBoostCARMA() Run----
  save(TimerList, file = paste0(getwd(),"/TimerList_FC_",z,"_.R"))
  save(OutputList, file = paste0(getwd(),"/OutputList_FC_",z,".R"))
  rm(OutputList, TimerList)
}
###########################################
# Prepare data for AutoCatBoostCARMA()----
###########################################
# Load Walmart Data----
# link to manually download file: https://remixinstitute.app.box.com/v/walmart-store-sales-data/
data <- data.table::fread("https://remixinstitute.box.com/shared/static/9kzyttje3kd7l41y1e14to0akwl9vuje.csv", header = T, stringsAsFactors = FALSE)
# Subset for Stores / Departments With Full Series (143 time points each)----
data <- data[, Counts := .N, by = c("Store","Dept")][Counts == 143][, Counts := NULL]
# Subset Columns (remove IsHoliday column)----
keep <- c("Store","Dept","Date","Weekly_Sales")
data <- data[, ..keep]
# Build AutoCatBoostCARMA Models----
for(z in c(1,5,10,20,30)) {
  CatBoostResults <- RemixAutoML::AutoCatBoostCARMA(
    data,
    TargetColumnName = "Weekly_Sales",
    DateColumnName = "Date",
    GroupVariables = c("Store","Dept"),
    FC_Periods = 10,
    TimeUnit = "week",
    TargetTransformation = TRUE,
    Lags = c(1:25,51,52,53),
    MA_Periods = c(1:25,51,52,53),
    CalendarVariables = TRUE,
    TimeTrendVariable = TRUE, 
    HolidayVariable = TRUE,
    DataTruncate = FALSE,
    SplitRatios = c(1 - (30+z)/143, 30/143, z/143),
    TaskType = "GPU",
    EvalMetric = "RMSE",
    GridTune = FALSE,
    GridEvalMetric = "r2",
    ModelCount = 2,
    NTrees = 1500,
    PartitionType = "timeseries",
    Timer = TRUE)
  
  # Output----
  CatBoostResults$TimeSeriesPlot
  CatBoost_Results <- CatBoostResults$ModelInformation$EvaluationMetricsByGroup
  data.table::fwrite(CatBoost_Results, paste0(getwd(),"/CatBoost_Results_",z,".csv"))
  rm(CatBoost_Results,CatBoostResults)  
}
###########################################
# Prepare data for AutoXGBoostCARMA()----
###########################################
# Load Walmart Data ----
# link to manually download file: https://remixinstitute.app.box.com/v/walmart-store-sales-data/
data <- data.table::fread("https://remixinstitute.box.com/shared/static/9kzyttje3kd7l41y1e14to0akwl9vuje.csv", header = T, stringsAsFactors = FALSE)
# Subset for Stores / Departments With Full Series (143 time points each)----
data <- data[, Counts := .N, by = c("Store","Dept")][Counts == 143][, Counts := NULL]
# Subset Columns (remove IsHoliday column)----
keep <- c("Store","Dept","Date","Weekly_Sales")
data <- data[, ..keep]
for(z in c(1,5,10,20,30)) {
  XGBoostResults <- RemixAutoML::AutoXGBoostCARMA(
    data,
    TargetColumnName = "Weekly_Sales",
    DateColumnName = "Date",
    GroupVariables = c("Store","Dept"),
    FC_Periods = 2,
    TimeUnit = "week",
    TargetTransformation = TRUE,
    Lags = c(1:25, 51, 52, 53),
    MA_Periods = c(1:25, 51, 52, 53),
    CalendarVariables = TRUE,
    HolidayVariable = TRUE,
    TimeTrendVariable = TRUE,
    DataTruncate = FALSE,
    SplitRatios = c(1 - (30+z)/143, 30/143, z/143),
    TreeMethod = "hist",
    EvalMetric = "MAE",
    GridTune = FALSE,
    GridEvalMetric = "mae",
    ModelCount = 1,
    NTrees = 5000,
    PartitionType = "timeseries",
    Timer = TRUE)
  
  XGBoostResults$TimeSeriesPlot
  XGBoost_Results <- XGBoostResults$ModelInformation$EvaluationMetricsByGroup
  data.table::fwrite(XGBoost_Results, paste0(getwd(),"/XGBoost_Results",z,".csv"))
  rm(XGBoost_Results)
}
###########################################
# Prepare data for AutoH2oDRFCARMA()----
###########################################
# Load Walmart Data ----
# link to manually download file: https://remixinstitute.app.box.com/v/walmart-store-sales-data/
data <- data.table::fread("https://remixinstitute.box.com/shared/static/9kzyttje3kd7l41y1e14to0akwl9vuje.csv", header = T, stringsAsFactors = FALSE)
# Subset for Stores / Departments With Full Series (143 time points each)----
data <- data[, Counts := .N, by = c("Store","Dept")][Counts == 143][, Counts := NULL]
# Subset Columns (remove IsHoliday column)----
keep <- c("Store","Dept","Date","Weekly_Sales")
data <- data[, ..keep]
for(z in c(1,5,10,20,30)) {
  H2oDRFResults <- AutoH2oDRFCARMA(
    data,
    TargetColumnName = "Weekly_Sales",
    DateColumnName = "Date",
    GroupVariables = c("Store","Dept"),
    FC_Periods = 2,
    TimeUnit = "week",
    TargetTransformation = TRUE,
    Lags = c(1:5, 51,52,53),
    MA_Periods = c(1:5, 51,52,53),
    CalendarVariables = TRUE,
    HolidayVariable = TRUE,
    TimeTrendVariable = TRUE,
    DataTruncate = FALSE,
    SplitRatios = c(1 - (30+z)/143, 30/143, z/143),
    EvalMetric = "MAE",
    GridTune = FALSE,
    ModelCount = 1,
    NTrees = 2000,
    PartitionType = "timeseries",
    MaxMem = "28G",
    NThreads = 8,
    Timer = TRUE)
  
  # Plot aggregate sales forecast (Stores and Departments rolled up into Total)----
  H2oDRFResults$TimeSeriesPlot
  H2oDRF_Results <- H2oDRFResults$ModelInformation$EvaluationMetricsByGroup
  data.table::fwrite(H2oDRF_Results, paste0(getwd(),"/H2oDRF_Results",z,".csv"))
  rm(H2oDRF_Results)  
}
###########################################
# Prepare data for AutoH2OGBMCARMA()----
###########################################
# Load Walmart Data ----
# link to manually download file: https://remixinstitute.app.box.com/v/walmart-store-sales-data/
data <- data.table::fread("https://remixinstitute.box.com/shared/static/9kzyttje3kd7l41y1e14to0akwl9vuje.csv", header = T, stringsAsFactors = FALSE)
# Subset for Stores / Departments With Full Series (143 time points each)----
data <- data[, Counts := .N, by = c("Store","Dept")][Counts == 143][, Counts := NULL]
# Subset Columns (remove IsHoliday column)----
keep <- c("Store","Dept","Date","Weekly_Sales")
data <- data[, ..keep]
for(z in c(1,5,10,20,30)) {
  H2oGBMResults <- AutoH2oGBMCARMA(
    data,
    TargetColumnName = "Weekly_Sales",
    DateColumnName = "Date",
    GroupVariables = c("Store","Dept"),
    FC_Periods = 2,
    TimeUnit = "week",
    TargetTransformation = TRUE,
    Lags = c(1:5, 51,52,53),
    MA_Periods = c(1:5, 51,52,53),
    CalendarVariables = TRUE,
    HolidayVariable = TRUE,
    TimeTrendVariable = TRUE,
    DataTruncate = FALSE,
    SplitRatios = c(1 - (30+z)/143, 30/143, z/143),
    EvalMetric = "MAE",
    GridTune = FALSE,
    ModelCount = 1,
    NTrees = 2000,
    PartitionType = "timeseries",
    MaxMem = "28G",
    NThreads = 8,
    Timer = TRUE)
  
  # Plot aggregate sales forecast (Stores and Departments rolled up into Total)----
  H2oGBMResults$TimeSeriesPlot
  H2oGBM_Results <- H2oGBMResults$ModelInformation$EvaluationMetricsByGroup
  data.table::fwrite(H2oGBM_Results, paste0(getwd(),"/H2oGBM_Results",z,".csv"))
  rm(H2oGBM_Results)
}
##################################################
# AutoTS() and AutoCatBoostCARMA() Comparison----
##################################################
# Gather results----
for(i in c(1,5,10,20,30)) {
  load(paste0("C:/Users/aantico/Desktop/Work/Remix/RemixAutoML/TimerList_",i,"_.R"))  
  load(paste0("C:/Users/aantico/Desktop/Work/Remix/RemixAutoML/OutputList_",i,"_.R"))    
  
  # Assemble TS Data
  TimeList <- names(TimerList)
  results <- list()
  for(j in 1:2660) {
    results[[j]] <- cbind(
      StoreDept = TimeList[j],
      tryCatch({OutputList[[j]]$EvaluationMetrics[, .(ModelName,MAE)][
        , ModelName := gsub("_.*","",ModelName)
        ][
          , ID := 1:.N, by = "ModelName"
          ][
            ID == 1
            ][
              , ID := NULL
              ]},
        error = function(x) return(
          data.table::data.table(
            ModelName = "NONE",
            MAE = NA))))
  }
  
  # AutoTS() Results----
  Results <- data.table::rbindlist(results)
  
  # Remove ModelName == NONE
  Results <- Results[ModelName != "NONE"]
  
  # Average out values: one per store and dept so straight avg works----
  Results <- Results[, .(MAE = mean(MAE, na.rm = TRUE)), by = c("StoreDept","ModelName")]
  
  # Group Concatenation----
  Results[, c("Store","Dept") := data.table::tstrsplit(StoreDept, " ")][, StoreDept := NULL]
  data.table::setcolorder(Results, c(3,4,1,2))
  
  ##################################
  # Machine Learning Results----
  ##################################
  
  # Load up CatBoost Results----
  CatBoost_Results <- data.table::fread(paste0(getwd(),"/CatBoost_Results_",i,".csv"))
  CatBoost_Results[, ':=' (MAPE_Metric = NULL, MSE_Metric = NULL, R2_Metric = NULL)]
  data.table::setnames(CatBoost_Results, "MAE_Metric", "MAE")
  CatBoost_Results[, ModelName := "CatBoost"]
  data.table::setcolorder(CatBoost_Results, c(1,2,4,3))
  
  # Load up XGBoost Results----
  XGBoost_Results <- data.table::fread(paste0(getwd(),"/XGBoost_Results",i,".csv"))
  XGBoost_Results[, ':=' (MAPE_Metric = NULL, MSE_Metric = NULL, R2_Metric = NULL)]
  data.table::setnames(XGBoost_Results, "MAE_Metric", "MAE")
  XGBoost_Results[, ModelName := "XGBoost"]
  data.table::setcolorder(XGBoost_Results, c(1,2,4,3))
  
  # Load up H2oDRF Results----
  H2oDRF_Results <- data.table::fread(paste0(getwd(),"/H2oDRF_Results",i,".csv"))
  H2oDRF_Results[, ':=' (MAPE_Metric = NULL, MSE_Metric = NULL, R2_Metric = NULL)]
  data.table::setnames(H2oDRF_Results, "MAE_Metric", "MAE")
  H2oDRF_Results[, ModelName := "H2oDRF"]
  data.table::setcolorder(H2oDRF_Results, c(1,2,4,3))
  
  # Load up H2oGBM Results----
  H2oGBM_Results <- data.table::fread(paste0(getwd(),"/H2oGBM_Results",i,".csv"))
  H2oGBM_Results[, ':=' (MAPE_Metric = NULL, MSE_Metric = NULL, R2_Metric = NULL)]
  data.table::setnames(H2oGBM_Results, "MAE_Metric", "MAE")
  H2oGBM_Results[, ModelName := "H2oGBM"]
  data.table::setcolorder(H2oGBM_Results, c(1,2,4,3))
  
  ##################################
  # Combine Data----
  ##################################
  
  # Stack Files----
  ModelDataEval <- data.table::rbindlist(
    list(Results, CatBoost_Results, XGBoost_Results, H2oGBM_Results, H2oDRF_Results))
  data.table::setorderv(ModelDataEval, cols = c("Store","Dept","MAE"))
  
  # Add rank----
  ModelDataEval[, Rank := 1:.N, by = c("Store","Dept")]
  
  # Get Frequencies----
  RankResults <- ModelDataEval[, .(Counts = .N), by = c("ModelName","Rank")]
  data.table::setorderv(RankResults, c("Rank", "Counts"), order = c(1,-1))
  
  # Final table----
  FinalResultsTable <- data.table::dcast(RankResults, formula = ModelName ~ Rank, value.var = "Counts")
  data.table::setorderv(FinalResultsTable, "1", -1, na.last = TRUE)
  
  # Rename Columns----
  for(k in 2:ncol(FinalResultsTable)) {
    data.table::setnames(FinalResultsTable, 
                         old = names(FinalResultsTable)[k],
                         new = paste0("Rank_",names(FinalResultsTable)[k]))
  }
  
  # Print
  print(i)
  print(knitr::kable(FinalResultsTable))
}

2 Steps to Sharpen Wisdom and Discernment

There is a barrier to discernment when seeking wisdom and knowledge. Many people

AI-Powered Virtual Assistants: Beyond Siri and Alexa

“We can only see a short distance ahead, but we can see plenty

Breaking Barriers: Low-Code/No-Code and AI in Application Development

“Technology is nothing. What is important is that you have a faith in

Name*

Email*

Email won't be displayed in comment.

Enter your LinkedIn URL so readers can connect

Website

Name*

Email*

Email won't be displayed in comment.

Enter your LinkedIn URL so readers can connect

Website

39 Comments

Inline Feedbacks

View all comments

Onno

6 years ago

Dear Adrian,

I think it would be interesting to look at testing whether the slight differences for the longest forecast horizon are actually statistically significant, for example via inclusion rates in a Model Confidence Set.

“Ocularmetrics”-wise I would guess that the short-term forecast performance of the Econometric models are significantly better than the ML models but that the reverse does not hold for the long-term forecasts? What do you think?

Douglas Pestana - @DougVegas

Admin

Reply to Onno

6 years ago

Hi Onno, yes that would be interesting. In Adrian’s article, I think he mentioned that the differences in accuracy weren’t tested to be statistically significantly different, just that they were lower overall. When it comes down to planning purposes at an enterprise, such as inventory or demand forecasting, decision-makers will likely default to methodology with lowest error rate.

Also, in the Conclusion section of his experiment, he mentions that the econometric time series models perform more accurately in shorter time horizons, and ML models perform more accurately over longer time horizons.

Robert Young

6 years ago

The problem with using either common econometric analysis or machine learning in the realm of human events is either they are right only 1) by accident or 2) the data generating process in question has reached a period of stasis. Data from the natural sciences follows God’s Laws, and as such is predictable if one knows enough about those laws, has sufficient data, and has sufficient compute power. You see the results every day on your TeeVee weather report. Even then, low probability events will toss it all into a cocked hat, on occasion.

OTOH, predicting future economic data (macro- or micro-) is controlled by future, but preceding, events. That is to say: events drive the data. Future data is driven by future (lagged by some amount) events, not past data. Modulo stasis, of course. Thus we see the current angst caused by, finally, a yield curve inversion. At the micro- level, as with this Walmart data, any number of events (for which we don’t have data today) will determine the post-event data. The least costly method is to simply get a straight edge and eyeball the line through the data points. You’ve got just as much chance of being correct as any high-compute analysis. How, for example, will this data series account for the Trade War started by Trump? Is it in these historic data? How much numerical weight should be given? And so forth.

The Great Recession, a macro- event, was caused by a few micro- events not represented, at their outsets, by any data. A few in particular:
1 – CountryWide Mortgage instituting lax ARM loan requirements
2 – commercial banks adopting same to ‘remain competitive’
3 – securitization of bundles of said loans
4 – adoption of a copula formula in said process (see Felix Salmon’s Wired article for details)
5 – Blythe Masters creation of the CDS as ‘protection’ for that entire edifice, which turned a large part of financial markets into a betting parlor

Now, at the end (2006-ish) before the Crash, a few analysts simply tried to eyeball a line through house prices and saw the hockey stick pattern emerging. Such highly non-linear behavior simply couldn’t be sustained, but it was in the best interest of the various parties (banks, builders, investment banks, etc.), in the short term to continue this experiment in musical chairs. A process which can’t be sustained won’t be, no matter what the models say.

Finally, the macro- economy affects the micro- (aka, Walmarts here) far more than micro-analysts are generally willing to admit.

Douglas Pestana - @DougVegas

Admin

Reply to Robert Young

6 years ago

Hi Robert, thank you for the well thought out response. As you mention, harder to make accurate predictions over longer term horizons as macroeconomic and macro-enterprise behavior is a function of several smaller events. For economists, I’m sure using econometric time series forecasts over a shorter time horizon is more prudent given the constant variation of events. Then as new data comes in, you can re-fit/re-train the models. I’d say for large enterprises, especially for those making forecasts for hundreds or thousands of stores or SKUs, that may be slightly impractical as they need to make 6-12 month planning decisions to ensure they meet demand. They may even have shorter leeway for re-adjusting forecasts, procurements, and operational decisions which means they may not have time to wait around for an exhaustive macro and micro-enterprise level economic analysis.

Eric

6 years ago

With this data set, I think using a grouped (not ensembling) time series formulation of the econometrics models would have been the more appropriate comparison. Grouping accounts for correlations and interactions between series and is often (always?) superior to forecasting individually. The forecast package handles grouped/hierarchical time series and I believe [fable] is now able to parallelize the modeling.

Douglas Pestana - @DougVegas

Admin

Reply to Eric

6 years ago

Hi Eric, we’ll definitely check out the fable package if it supports parallelization. RemixAutoML is built for speed and efficiency so we’ll explore ways of leveraging grouped time series via the forecast package (which powers AutoTS) and fable.

Regarding whether grouping is superior, it would be interesting to test if it produces higher accuracy/lower error forecasts versus individual forecasts, but it’s best for us to not make that assumption. Do you have any blogs or studies that an experiment conducted like this that show that effect?

Eric

Reply to Douglas Pestana - @DougVegas

6 years ago

Here’s the section in Hyndman’s book, https://otexts.com/fpp2/reconciliation.html with the paper linked at the bottom which is where I think I got this from (been a little while). Also, some slides from recent conference, https://slides.mitchelloharawild.com/user2019/#58

Adrian Antico

Author

Reply to Eric

6 years ago

Thanks again for the comment Eric. For this particular experiment, I wanted to see how the performances differed based on forecasting each store / dept combination separately (bottom up approach). An alternative experiment certainly can be done using the other grouping methods (top down or middle out).

Eric

Reply to Adrian Antico

6 years ago

I see now. This is bottom-up. I had forgotten about that bottom-up, top-down, and middle-out were grouped forecasting approaches. My comment only applies to Optimal Reconciliation approach described by Hyndman.

Adrian Antico

Author

Reply to Eric

6 years ago

No worries Eric. You at least got me thinking about whether the package needs some of that functionality. Let me know if you’re interested in helping out.

Tom

6 years ago

Great topic. However, it’s hard to generalize based on a single example. Your prediction task is about aggregate data, and your ‘econometric’ models are more ‘metric’ than ‘econ’. Let’s say you don’t work for kaggle, but you actually work for walmart. Then you have access to receipt-level data. Via credit cards, you can probably even track it down to an individual household. Then you can think about econ models with micro foundation. You can use SKU level pricing data and fit an actual demand model. You can introduce reasonable constraints on the model. If that’s too hard, you can probably get your hands on some panel data and at least fit a BLP model.

Adrian Antico

Author

Reply to Tom

6 years ago

Thanks for your comment Tom. I’d love to try out your idea – if you could help me find some transactional level data I’d love to showcase some of my other automated machine learning functions that would be perfect for that type of model!

Tom

Reply to Adrian Antico

6 years ago

Just google ‘dunnhumby source files’. Not sure how well you can impute prices for weeks without purchases, though. It’s difficult to find publicly available data like this (as you probably have noticed).

Adrian Antico

Author

Reply to Tom

6 years ago

Thanks for that Tom! I’ll check it out along with other sources. I’ll be eventually posting some articles to show how to utilize the other automated ML functions from the package.

Eric

6 years ago

Looks like you’re using MAPE as an evaluation metric for model selection of some of the econometrics models. May not be a problem here since the data is weekly sales, but that metric is known to have issues with data around zero. https://www.statworx.com/de/blog/what-the-mape-is-falsely-blamed-for-its-true-weaknesses-and-better-alternatives/

Adrian Antico

Author

Reply to Eric

6 years ago

Thanks for your comment Eric. You can select different evaluation metrics with all the functions used in the blog.

M. Edward (Ed) Borasky

6 years ago

Wait – are you seriously comparing run times for machine learning on an NVidia 1080 GPU with econometric models running on a CPU?

Adrian Antico

Author

Reply to M. Edward (Ed) Borasky

6 years ago

The AutoCatBoostCARMA() and AutoXGBoostCARMA() functions utilized GPU but I did mention that it provided about a 10X speedup. Regardless, I don’t know of any econometric time series models that can run on GPU at this point. If I find any you can bet I’ll be incorporating them into the package. At this point, it’s CPU for AutoTS() but I thought the readers should be aware of the run times in case they want to run the script from the blog or incorporate the functions into their workflows.

Fred

6 years ago

Hi, would you be able to update the article with a table of the average evaluation metric (MAE or MAPE across all stores and departments) from each model, for each (h)? The rank hides the magnitude of accuracy, and the log(MAE) visualization appears to be missing h=10.

TIA!

Adrian Antico

Author

Reply to Fred

6 years ago

Hi Fred. I set up a github repository with the results (link below). Be sure to let me know if you find something interesting! The folder contains the MAE metrics for the AutoTS by store, dept, and forecast horizon. The AutoCARMA files have a few extra evaluation metrics (along with MAE).
https://github.com/AdrianAntico/ForecastTests

Fred

Reply to Adrian Antico

6 years ago

Thanks, this is very useful for recreating! I look forward to getting to this!

Tommaso Gennari

6 years ago

Impressive work. Thank you for sharing!

Adrian Antico

Author

Reply to Tommaso Gennari

6 years ago

Thanks!

shafi

6 years ago

Hi Adrian Antico,

For some reason, I am not able to download the date in your link data <- data.table::fread("https://remixinstitute.box.com/shared/static/9kzyttje3kd7l41y1e14to0akwl9vuje.csv", header = T, stringsAsFactors = FALSE)
I shall appreciate your prompt response.

Best Regards,
Shafi

Douglas Pestana - @DougVegas

Admin

Reply to shafi

6 years ago

Hi Shafi,

Could you try re-running the data import code? It should work now.

-Douglas

shafi

Reply to Douglas Pestana - @DougVegas

6 years ago

Hi Douglas,

Thanks a lot! it worked now.

shafi

Reply to Douglas Pestana - @DougVegas

6 years ago

Hi Douglas,

I am using AutoXGBoostCARMA() from your codes https://gist.github.com/AdrianAntico/8e1bbf63f26835756348d7c67a930227
But every I run these codes, my Rstudio crushes. Though, I am using on window 10 but I have installed Rstudio in Ubuntu, It works fine for Xgboost h2O and AutoML of H2O, I can run 1000 models easily. Please guide me on this.

Best Regards,
Shafi

Adrian Antico

Author

Reply to shafi

6 years ago

Hi shafi, you’re likely running out of memory. If you open up your task panel and observe that while you kick off the process you will know if that is the case. If it is a memory issue you can always subset the data for a reduced number of stores and / or departments and feed that need data set into the models.

shafi

Reply to Adrian Antico

6 years ago

Hi Adrian,

Thanks for the prompt response! It worked as I reduced the number of count.

One another trivial question,XGBoostResults[“TimeSeriesPlot”] gives both comparison of actual and Forecast. But in AutoTS it does not. In your note on this link https://github.com/AdrianAntico/RemixAutoML/issues/10
plot = xx$TimeSeriesPlot , then plot = plot + ggplot2 (a bit basic question:How can we add the actual data series for a comparison?

Adrian Antico

Author

Reply to shafi

6 years ago

Thanks for your question about adding in the predicted values from AutoTS to the forecast plot. That can be done, however, not sure when I will get around to it. One reason that I opted not to do that originally (for AutoTS that is) is due to the nature of the data that I use to build the models. Some of them are alterations of the original series (smoothing / outlier dampening) and the models will fit those altered values. When you see that plot you may be tempted to think the model isn’t fitting correctly when in reality the models are determined based on out of sample performance on unaltered data. I think you make a valid case for me to add it in but I’ll have to think about adding a note to the plots in case an altered data set was used.

shafi

Reply to Adrian Antico

6 years ago

Thanks Adrian, it would be to have more option. it will serve the purpose of different analysis. I run 256 SARIMA model and selected the best model it worked very well with Macro economic data of Canada. so, I wanted to compare it with your AutoTS to how it performs. I also worked very well too with a few codes. Thanks for the wonderful idea!

shafi

Reply to Adrian Antico

6 years ago

This is in continuation of previous note.

ForecastData <- xx$Forecast
ModelEval <- xx$EvaluationMetrics
WinningModel <- xx$TimeSeriesModel$fitted
y <- xx$TimeSeriesModel$x

autoplot(y, series="Data") +
autolayer(WinningModel, series="TBATA")
autolayer(ForecastData$Forecast_ARIMA_ModelFreq, series="Prediction") +
xlab("Year") + ylab("GDP") + ggtitle("GDP") + theme_bw()+theme(legend.title = element_blank(),legend.position = "bottom")

I hope you will have a better/short solution

shafi

Reply to Adrian Antico

6 years ago

Hello Adrian and Douglas,

I am using AutoDataParition function for times series split, my data is monthly starts from 2004 Feb to 2019 March. I want to split it into train, valid and test. I want something like this train (2004 to 2015), validate(2016 to 2017) test (2018 to 2019) As you mentioned “With “time” you will have data sets generated so that the training data contains the earliest records in time, validation data the second earliest, test data the third earliest, etc. But in my case it is not working I don’t know where is the problem.

There are two issues one if I used TimeColumnName = date I get the error message [Error in TimeColumnName %chin% names(data) :
x is type ‘closure’ (must be ‘character’ or NULL)] My date column has this format 2004-02-04

Second, If I make TimeColumnName = NULL, train has mix of different date to end of 2019. Similarly, valid and test has also a mix of date. I get the same as we do split with H2O Flow. A about h2o split I posted the problem but not solved(https://stackoverflow.com/questions/57585109/h2o-flow-date-issue-and-split/57599747#57599747).

So I was happy to see AutoDataPartition, if it work for me.

part <- AutoDataPartition(data, NumDataSets = 3, Ratios = c(0.7, 0.2, 0.1),
PartitionType = "time", StratifyColumnNames = NULL,
StratifyNumericTarget = NULL, StratTargetPrecision = 3,
TimeColumnName = date)

train <- part$TrainData
Valid <- part$ValidationData
test <- part$TestData

shafi

Reply to Adrian Antico

6 years ago

Sorry for putting one more question.

I have data of monthly GDP and 25 predictors all data is numeric. As an initial test, I am using your example codes. When using this line of codes XGBoostResults$TimeSeriesPlot, I am not able to get predicted values for in sample (Though, I am getting out of sample predicted).

Please note that I also tested with the single GDP as well by removing all the predictors. But the problem is still there.

I get the warning message,

Warning messages:
1: Removed 10 rows containing missing values (geom_path).
2: Removed 182 rows containing missing values (geom_path).

In your codes for Walmart sales it worked fine. Here are the codes,

XGBoostResults <- RemixAutoML::AutoXGBoostCARMA(

data,

TargetColumnName = "gdpm",

DateColumnName = "date",

GroupVariables = NULL,

FC_Periods = 10,

TimeUnit = "month",

TargetTransformation = TRUE,

Lags = c(1:5),

MA_Periods = c(1:5),

CalendarVariables = TRUE,

HolidayVariable = TRUE,

TimeTrendVariable = TRUE,

DataTruncate = FALSE,

SplitRatios = c(0.70,0.20,0.10),

TreeMethod = "hist",

EvalMetric = "RMSE",

GridTune = FALSE,

GridEvalMetric = "mse",

ModelCount = 1,

NTrees = 5000,

PartitionType = "time",

Timer = TRUE)

XGBoostResults$TimeSeriesPlot

shafi

6 years ago

The above mentioned problem is mainly due to GroupVariables = NULL then we can’t see the actual vs predicted after clicking XGBoostResults$TimeSeriesPlot. But the issue how to select the group variables when I have 26 numerical data columns. When I put variables to GroupVariables = c(……………..). Then I can get the predicted vs Actual however, sequence of date is very distorted. (date get mixed)

Adrian Antico

Author

Reply to shafi

6 years ago

The CARMA suite of models are not set up to utilize other predictors. If you want to utilize other predictors, you’ll have to use the Auto_Regression() functions.

shafi

Reply to Adrian Antico

6 years ago

Okey thanks for the response.

Adrian Antico

Author

Reply to shafi

6 years ago

No problem. I designed the CARMA functions to replicate time series models that only utilize features about the target variable, calendar based variables, and time trend variables.

shafi

Reply to Adrian Antico

6 years ago

Do you think, can we use Google Trend(GT) data with CARMA. It has both time trend and calender based variables. Is this right?. I am just thinking how can we put into a group as you did with Sales data of Walmart.

Get free weekly access to business AI reports, trends, and courses.

Why Machine Learning is more Practical than Econometrics in the Real World

Adrian Antico

Motivation

Approach

The Data

The Experiment

The Models

Econometrics Models used in AutoTS()

Machine Learning Algorithms

Results

Aggregate Summaries

Individual Forecast Horizon Summaries by Model

Conclusion

Algorithm Tuning Details

Econometrics Model Details:

Global Optimizations (applies to all models):

Individual Model Optimizations:

Machine Learning Model Details:

Contact

P.S.

R Script

Related Posts

2 Steps to Sharpen Wisdom and Discernment

AI-Powered Virtual Assistants: Beyond Siri and Alexa

Breaking Barriers: Low-Code/No-Code and AI in Application Development