Submitted by whatweshouldcallyou t3_ztjxbn in dataisbeautiful
whatweshouldcallyou OP t1_j1dzhe6 wrote
Reply to comment by porsche_radish in [OC] Yeah Science! Scientific Output vs. National Wealth by whatweshouldcallyou
Used GAM since the relationship is nonlinear.
porsche_radish t1_j1e7lqs wrote
This looks like too many degrees of freedom.
How much better does this perform over linear?
whatweshouldcallyou OP t1_j1e9c0l wrote
Linear Model without region effects: R2 ~ 50%
Linear Model with region effects: ~ 63%
With random slope: ~ 70%
​
Polynomial model with random intercept: ~ 71%
Polynomial model with random intercept & random slope: ~ 74%
So...maybe worth it.
porsche_radish t1_j1eksq5 wrote
I meant vanilla linear regression. GAM are beyond overkill here.
These lines imply really bizarre relationships (the blue line dipping near france, the red line flattening out right below 5) that aren’t in these data.
whatweshouldcallyou OP t1_j1elaii wrote
Yeah I do agree that there is overfit both for Europe around the point that you specify as well as for the Americas. Oceania and Asia fit quite nicely.
Javerlin t1_j1h77pd wrote
They fit quite nicely to your eye sure, but look at the lack of data points you have for the higher gdp in those continents. This is just bad practice.
Javerlin t1_j1h72f1 wrote
Just because your r squared is better does not make your prediction better. You should check with cross validation.
Adding more freedom to your trend lines makes it fit your current data better sure. But just “joining the dots” fits the best.
Viewing a single comment thread. View all comments