pornthrowaway42069l
pornthrowaway42069l t1_jdn6noe wrote
Reply to [N] GPT-4 has 1 trillion parameters by mrx-ai
Not going to deny that GPT-4 looks impressive, but, they could set up 10 bajillion-quadrillion parameters, question is, do they have the data to effectively utilize all of these? Maybe its time to start looking into decreasing number of parameters, and making more efficient use of the data.
pornthrowaway42069l t1_iy8srkr wrote
Reply to comment by Sadness24_7 in Keras metrics and losses by Sadness24_7
Ah, I see. During training, the loss and metrics you see are actually moving averages, not exact losses/metrics in that epoch. I can't find the documentation rn, but I know I seen it before. What this means is that losses/metrics during training won't be a "good" gauge to compare with, since they include information from previous epochs.
pornthrowaway42069l t1_ixq1g4e wrote
Reply to comment by Sadness24_7 in Keras metrics and losses by Sadness24_7
Write a custom function, and use it as a metric? Not sure what you mean by "What the training method says", but I think default metrics get summed just like losses.
pornthrowaway42069l t1_ixm7q3s wrote
Reply to Keras metrics and losses by Sadness24_7
You can specify several losses, or have multi-output with a single loss - in both cases Keras will average them out (I think its non-weighted by default, and you can specify the weights, but I don't remember 100%).
You can't really have 3 different loss values for a single network - otherwise it won't know how to use that to backpropagate. The best you can do is write a custom loss function, and mix them in a way that makes sense for your problem (You will still need to provide a singular value at the end), or provide the weights (You'd need to look up APIs docs for that).
pornthrowaway42069l t1_iwmr91a wrote
Reply to comment by Constant-Cranberry29 in How to normalize data which contain positive and negative numbers into 0 and 1 by Constant-Cranberry29
emmm... 2 parameter box-cox transform? If that doesn't work as well, maybe there the problem is with something else, between neglog and 2 parameter box-cox you should get decent normalization I feel.
pornthrowaway42069l t1_iwlbxq2 wrote
Reply to How to normalize data which contain positive and negative numbers into 0 and 1 by Constant-Cranberry29
You can try neglog,
x > 0: log(x)
x<0: -log(-x)
pornthrowaway42069l t1_itsbufj wrote
Reply to Binary segmentation with imbalanced data by jantonio78
I'd try some baseline/simpler models on the same data and see how it performs. Maybe the model just can't do any better, that's always a good one to check before panicking.
You can also try to use K-means or DBSCAN or something like that, and try to get 2 clusters of results - see if those algos can segment your data better than your network. If so, maybe the network is set up incorrectly somehow, if not, maybe something funky happening to your data in pipeline.
pornthrowaway42069l t1_itrxwks wrote
Try predicting using a generated dataset, or one of the generic datasets, this will show if vgg16 is the culprit or its a pattern. GET MORE DATA FOR THE GODS OF DATA
pornthrowaway42069l t1_issyo30 wrote
I had similar experience in some big companies.
Bombed the leetcode, but found an opportunity to show-case my (fairly cool) project code during technical interview. Asking the guy questions, he confused feature importance with feature selection, couldn't answer about a baseline model (They had a black-box without one), and a bunch of other things. When I said "I kind of prepared for pandas + SQL more", said "We expect you to know those things". I guess they expect me to know how to use pandas and SQL but not python for crappy leetcode questions.
The truth is, most companies/ml departments have no idea what they want or should be doing. Good luck to that head of ML team, because I was glad I wasn't selected, with such great interview and ML skills it's a bullet dodged.
pornthrowaway42069l t1_isldxay wrote
One of the reasons, besides mentioned in other comments, is that sometimes test set is just easier to solve than the train set. Not saying that is your crux, but might be worth a try by getting different splits.
pornthrowaway42069l t1_irh69qh wrote
Reply to comment by DeepThoughtsBubble in [D] Attending EMNLP 2022 in person? by DeepThoughtsBubble
It's a difference "Presented Paper X at Y event" vs "Published highly reviewed paper"
It a) Gives you network opportunities b) Shows you take your stuff seriously c) Shows that you have oral skills, which means you can like use words in a not totally idiotic manner to convey your thoughts, which is always a plus in a technical field.
pornthrowaway42069l t1_irgt36a wrote
On one hand a big presentation like this will be a wonder on the resume.
On the other hand, you will spend money and potentially do worse on a final.
IMO, if money is no concern, give the speech, I personally feel it would worth more than a course later in life. If money is tight, or you are not sure, then it's alright to stay home - you already have a flex on you (Highly rated paper), so it's not the end of the world if you won't do it.
pornthrowaway42069l t1_jdnmf0j wrote
Reply to comment by currentscurrents in [N] GPT-4 has 1 trillion parameters by mrx-ai
I'm confused, how is that different from what I said? Maybe I worded my response poorly, but I meant that we should focus on smaller models, rather than those gigantic ones.