Downtown_Finance_661
Downtown_Finance_661 t1_j96nj5k wrote
Reply to comment by Mnbvcx0001 in [D] Simple Questions Thread by AutoModerator
ML is a mathematical discipline. You have to read books to dive into it. Collaboration is possible after you become usefull. Try "Grocking deep learning" for simple introduction to neural networks. Also check classical ml tasks in regression/classification/trees and drill them. This is hard work wich can not be substituted by being part of some community.
Update: Before it you better learn basics of python programming language. Find lectures with homeworks which are not connected with ML itself (16 hours + 40 hours will be enough)
Downtown_Finance_661 t1_j93g1nw wrote
Reply to [D] Simple Questions Thread by AutoModerator
I want to thank community for this possibility to ask simple time series question. Please don't reply "jump in window" (it is bad advice from statistical PoV since I'm on the second floor)
I'm new to time series topic in particular and in ML in general. I have tried ARDL model with no seasonal part and no exog variables (from statsmodels.tsa.api import ARDL). I'm working with very small dataset of 16 points (see Appendix 1) with strict trend component.
This TS is stationary according to adfuller test inspite of it is not stationary due to simple criteria like "moving average have to be kind of constant". Not sure if this test even applicable for such a small number of points.
Imagine i want to forecast next nine(sic!) points and i have no idea how to choose best number of lags. Hence I fit my model for several different nlags on TS[:-9] dataset (train set) and choose the best lag by comparing MAE/MSE/R2 on TS[-9:] dataset (test set). Best lag is lags = 1.
In spite of all ugliness of the idea to forecast 9 points having 16-9=7 points the prediction plot is well fitted with test data plot. This result convinced me to to go further (from common mathematical sense).
Now I have to decide :
(1) to use the model above (trained on TS[:-9] set) to predict TS[16:26] values for which i have very good R2 on nine predictions.
(2) or i have to refit the lags = 1 model with all my points ( TS[:] ) but without the chance to test it for nine predictions
And i have no idea how to choose the best option, so i decided to research convergence of model's coefficients (m.params). My plan is to fit nine models for nine sets TS[:-9], TS[:-8], TS[:-7],...TS[:-0] and to check whether a and b in nine consecutive models y(t) = a*y(t-1) + b are tending to converge to two a_lim, b_lim constants. They are not. Not even close to convergence. They look random... This is the end, i don't know how to choose.
My very last idea was to freeze b = constant for all nine models and retest the convergence of a under this restriction but i see no such option in ARDL (and to be honest i have no idea how to program ARDL-like function by myself even for lag=1)
My question is: Any ideas what i can and should do?
Btw, in appendix 2 I have tried to research coefficient's convergence for function:
f[i] = 1.01*f[i-1]+0.01+random noise
I see some problems with convergence even in this scenario.
Appendix 1: Demographic data (fact)
year
2006-01-01 87287
2007-01-01 86649
2008-01-01 86036
2009-01-01 85394
2010-01-01 84845
2011-01-01 84542
2012-01-01 84034
2013-01-01 83881
2014-01-01 83414
2015-01-01 83035
2016-01-01 82656
2017-01-01 82280
2018-01-01 81654
2019-01-01 81745
2020-01-01 81614
2021-01-01 81367
Name: num_of_citizens, dtype: int64
​
Appendix 2: convergence in model task
import pandas as pd
# genrate data
f = [1,1]
for i in range(2,2000):
f.append((1.01*f[i-1]+0.01))
print(len(f))
df = pd.DataFrame({'fib_num':f})
df.head(10)
#df.plot(subplots=True, layout=(1,1), legend = True, figsize = (7,7))
import numpy as np
std = (max(f) - min(f))*0.0001
f_noise = [x + np.random.normal(loc = 0, scale = std) for x in f]
print(f'Max = {max(f_noise)}, Min = {min(f_noise)}')
df_noise = pd.DataFrame({'fib_num_noise':f_noise})
#df_noise.plot(subplots=True, layout=(1,1), legend = True, figsize = (5,5))
df = df_noise.rename(columns={'fib_num_noise':'fib_num'})
from statsmodels.tsa.api import ARDL
fib_par = {}
r2s = []
mae = []
rmse = []
for k in range(15, df.shape[0]):
partial_set = np.asarray(df['fib_num'][0:k])
m = ARDL(partial_set, lags=1)
mfitted = m.fit()
partial_set_pred = (mfitted.predict(start = 0, end = k-1))[2:]
r2s.append(r2_score(partial_set[2:],partial_set_pred))
mae.append(mean_absolute_error(partial_set[2:],partial_set_pred))
rmse.append(np.sqrt(mean_squared_error(partial_set[2:],partial_set_pred)))
fib_par[k] = mfitted.params
# print one of the last coeff-s in coef dict:
print(fib_par[df.shape[0]-20])
# this is plot for 'a' (Y = a*Y +b) change to !=1 to see plot for 'b'
for v in range(len(fib_par[15])):
if v != 0:
pd.Series([x[v] for x in fib_par.values()]).rename(v, inplace = True).plot(legend = True, figsize = (25,7), title = 'Model coeffs')
edf = pd.DataFrame({'r2score':r2s, 'mae':mae, 'rmse':rmse}).iloc[:200]
edf.plot(legend = True, figsize = (15,7), subplots=True, layout=(3,1), title = 'Model quality params')
Downtown_Finance_661 t1_janm2nt wrote
Reply to comment by M_Alani in [D] Are Genetic Algorithms Dead? by TobusFire
Fun story! How you have chosen hyper-parameters for models? Have you turn them over in for-loops?