Submitted by justundertheblack t3_114elpg in MachineLearning
[removed]
Submitted by justundertheblack t3_114elpg in MachineLearning
[removed]
Thanks for this man
btw this is a school project so we have to train our own model and we have the dataset for it too so do you know any good ones?
Ah I see. In that case something like this should give you a good direction.
this seems good I'll look into it
i fs will 👍
Do you know how to validate a pricing signal, back testing and portfolio optimization? The NLP/ML part might be the easy one
Naah I don't Can you point me towards some resources?
https://www.investopedia.com/terms/b/backtesting.asp
https://en.m.wikipedia.org/wiki/Modern_portfolio_theory
With extreme synthesis :
markets are not stationary environments so you have to expect and mitigate drift. This have implications on the evaluation methodology and on the choice of time series models that can be calibrated with fewer data points.
A strategy to make money in the markets allocate capital on multiple financial instruments using multiple signals therefore the value of a signal is the predictive advantage that it provides when stacked on top of others commonly used signals. If the predictive capability of the news sentiment is easily replicated by a linear combination of cheaply available signals then it's not worth much.
Trueee I've heard that such models need to be tuned regularly I'll definitely look into it
It sounds like a school assignment
Naah it's a college project 😂
Maybe I didn't describe it good enough
Best way to find any topic and related code is to search it on Google Scholar.
E.g. https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=nlp+stock+market+trading+code+github&btnG=
https://github.com/jinanzou/astock
These are the two results I found. Happy learning.
Thanks a lot man I was just looking into finbert myself too haha
help me out if you know things tho
No_Dust_9578 t1_j8vv85i wrote
Few things. Don't make a model from scratch, use a pre-trained one. There are plenty on hugging face. Another thing, later on, if you have your own data, you can use it to fine tune those models to better suit your task. This is a general approach to ML applications where data isn't available or not enough. Side note, speaking from experience, those large sentiment models that are out there do have great performance but some of them have been trained with large sentiment datasets that have inconsistencies. For instance, once I had to validate manually the performance on my data and noticed that the pre-trained models predicted the following sentence as POSITIVE sentiment but to a human, this is not positive: "oh yay, I love cold food...". So be careful and setup some sanity checks. Don't fully assume the predictions are accurate.