SherbertTiny2366 t1_jaebzb0 wrote on February 28, 2023 at 8:43 PM

Reply to comment by CyberPun-K in [Discussion] Open Source beats Google's AutoML for Time series by fedegarzar

From what I get, that is also the advantage of Fugue. From their Webpage:
> FugueSQL is designed for heavy SQL users to extend the boundaries of traditional SQL workflows. FugueSQL allows the expression of logic for end-to-end distributed computing workflows. It can also be combined with Python code to use custom functions alongside the SQL commands. It provides a unified interface, allowing the same SQL code to run on Pandas, Dask, and Spark.

https://github.com/fugue-project/fugue

SherbertTiny2366 t1_j01t4du wrote on December 13, 2022 at 1:45 PM

Reply to comment by -Rizhiy- in [Discussion] Amazon's AutoML vs. open source statistical methods by fedegarzar

Imagine this toy example. You have 5 series, which are very sparse, as is often the case in retail. For example, series 1 has sales on Mondays and 0's the rest of the days, series 2 on Tuesdays, series 3 on Wednesdays, and so on. For those individual series, a value close to 0 would be more or less accurate, however, when you add all the predictions up, the value will be way below the true value.

SherbertTiny2366 t1_izy50ew wrote on December 12, 2022 at 6:38 PM

Reply to comment by Zealousideal-Card637 in [Discussion] Amazon's AutoML vs. open source statistical methods by fedegarzar

For Hierarchical and sparse data it is quite common to see models achieving good accuracy in the bottom levels but being very bad at higher aggregation levels. This is the case because the models are systematically under or over predicting.

SherbertTiny2366 t1_iyq6prw wrote on December 3, 2022 at 8:01 AM

Reply to comment by TheBrain85 in [R] Statistical vs Deep Learning forecasting methods by fedegarzar

There is no overlap at all. It’s a completely new dataset. There might be similarities in the sense that there are time series or certain frequencies but in no way could it be the talk of “training in the test” set.

SherbertTiny2366 t1_iynjxon wrote on December 2, 2022 at 6:47 PM

Reply to comment by TheBrain85 in [R] Statistical vs Deep Learning forecasting methods by fedegarzar

How is it biased to try good-performing ensembles in another data set?

And how is that overfitting?

Furthermore, just because the data sets begin with "M" it does not mean that they "have significant overlap and similarity. "

SherbertTiny2366 t1_iyj00d2 wrote on December 1, 2022 at 7:28 PM

Reply to comment by cristianic18 in [R] Statistical vs Deep Learning forecasting methods by fedegarzar

>This ensemble is formed by averaging four statistical models: AutoARIMA, ETS, CES and DynamicOptimizedTheta. This combination won sixth place and was the simplest ensemble among the top 10 performers in the M4 competition.

SherbertTiny2366 t1_it8kold wrote on October 21, 2022 at 6:43 PM

Reply to comment by tblume1992 in [D] Python's library to multivariate time series forecasting: Sktime, modeltime, darts. by popcornn1

Here is the repo for hierarchical methods: https://github.com/nixtla/hierarchicalforecast/