Submitted by AutoModerator t3_10cn8pw in MachineLearning

Please post your questions here instead of creating a new thread. Encourage others who create new posts for questions to post here instead!

Thread will stay alive until next one so keep posting after the date in the title.

Thanks to everyone for answering questions in the previous thread!

23

Comments

You must log in or register to comment.

T1fa_nug t1_j4idpif wrote

Hello guys I'm new in the machine learning and I wanted to know if a i5 8th gen and a 1060 6 gb paired with 16 Gb of ram are they enough for any type work that could come my way??!

1

akacukiii t1_j4ixh08 wrote

Hi. I'm an international grad student in the US and am looking for an internship for the summer. Please, if you have some tips, or if you care to have a look at my profile, just let me know. Thank you!

3

CaptainD5 t1_j4kc10r wrote

Hello! I have a question. Will it be possible to create a NN that replicates the behaviour of prophet? I dont want to do it, I just wanted to understand from a theoretical point of view what will be the most similar way to do it (optimize a function that take into account seasonality and provides an infinite 'regression' way to predict new values based just on dates. Thanks in advance!

1

RuhRohCarChase t1_j4q2e7k wrote

Hi everyone! This is not a technical question, but does anyone know how to find the accepted papers list for AAAI23? (or a reliable way for any ML/AI conferences)

I work in an academic research unit and finding any accepted papers list is a mess, unless it’s readily available from a conference or on open review! I catalogue all our papers by funding sources, individual projects, authors, conferences, and about 10 other data points. Any advice is greatly appreciated! Have an awesome day everyone!

1

all_is_love6667 t1_j4q5kiv wrote

Can chatgpt understand science? I heard it was given science papers, but can it help scientists in their work? Can it give scientific hints?

1

ChangingHats t1_j4r2hxx wrote

I am trying to utilize tensorflow's MultiHeadAttention to do regression on time series data for forecasting of a `(batch, horizon, features)` tensor.

During training, I have `inputs ~> (1, 10, 1)` and `targets ~> (1, 10, 1)`. `targets` is a horizon-shifted output of `inptus`.

During inference, `targets` is just a zeros tensor of the same shape.

What's the best way to run attention such that the output utilizes all timesteps in `inputs` as well as each subsequent timestep of the resulting attention output, instead of ONLY the timesteps of the inputs?

Another problem I see is that attention is run between Q and K, and during inference, Q = K, so that will affect the output differently, no?

1

trnka t1_j4r661s wrote

Think about it more like autocomplete. It's able to complete thoughts coherently enough to fool some people, when provided enough input to complete from. It's often incorrect with very technical facts though.

It's really about how you make use of it. In scientific work, you could present your idea and ask for pros and cons of the idea, or to write a story about how the idea might fail horribly. That can be useful at times. Or to explain basic ideas from other fields.

It's kinda like posing a question to Reddit except that ChatGPT generally isn't mean.

There are other approaches like Elicit or Consensus that use LLMs more for literature review which is probably more helpful.

1

inquisitor49 t1_j4tgazw wrote

In transformers, a positional embedding is added to a word embedding. Why does this not mess up the word embedding, such as changing the embedding to another word?

1

Iljaaaa t1_j4uub0z wrote

I have an autoencoder input of 100x21. The 21 columns are PC scores, the 100 rows are observations. The importance of the columns degrades as the column number increases. The first column is the most important for the data variance, the last column is the least important. To be able to reconstruct the data back from PCA the first columns need to be as correct as possible.

I have tried searching whether I can adjust weights or something else of the autoencoder layers to include this importance of the columns, but I have not found it.

In other words, I want errors in the first (e.g 5) columns to be punished more harshly than errors in the last (e.g 5) columns.

I would be grateful if someone could point me in the right direction!

2

LetGoAndBeReal t1_j4vz8hv wrote

Companies can fine-tune top performing LLMs to condition the LLMs output, but not to embody the knowledge contained in proprietary data. The current best approach for incorporating this custom knowledge is through data augmented generation techniques and technologies such as what LangChain offers.

I am trying to decide whether to invest time building an expertise in these techniques and technologies. I may not wish to do so if the ability to add custom knowledge properly in the LLMs will arrive in short order.

I would like to know from those steeped in LLM R&D how soon such capabilities might be expected. Is this the right place to ask?

1

mildresponse t1_j4xhvkg wrote

Are there any easy and straightforward methods for moving ML models across different frameworks? Does it come down to just manually translating the parameters?

For instance, I am looking at a transformer model in PyTorch, whose parameters are stored within a series of nested objects of various types in an OrderedDict. I would like to extract all of these parameter tensors for use in a similar architecture constructed in Tensorflow or JAX. The naive method of manually collecting the parameters themselves into a new dict seems tedious. And if the target is something like Haiku in JAX, the corresponding model will initialize its parameters into a new nested dict with some default naming structure, which will then have to be connected to the interim dict created from PyTorch. Are there any better ways of moving the parameters or models around?

1

mildresponse t1_j4xjmvw wrote

My interpretation is that the words should have different embedding values when they have different positions (context) in the input. Without a positional embedding, the learned word embeddings will be forced into some kind of positional average. The positional offsets give the model more flexibility to resolve differently in different contexts.

Because the embeddings are high dimensional vectors of floats, I'd guess the risk of degeneracy (i.e. that the embeddings could start to overlap with one another) is virtually 0.

1

Agitated-Purpose-171 t1_j4z7iz5 wrote

Hi everybody, I have one question about VLAD while I read this paper (Aggregating local descriptors into a compact image representation) on CPVR.

My question is why VLAD works.

Aggregating local descriptors into a compact image representation paper links:

https://lear.inrialpes.fr/pubs/2010/JDSP10/jegou_compactimagerepresentation.pdf

In this paper, there is a network VLAD, it can turn the local features (N*D dimension) into a global feature (k* D dimension).

Below is my understanding of the operations of VLAD, step by step.

=> input: N*D dimension local feature.

(i) use k-means to find the k clusters and the central feature for each cluster.

(ii) for each cluster find a residual sum.

V = summation of ( each local feature in the cluster minus the central feature).

V = sum (Xi - C)

V: residual sum of the cluster

X: local feature in the cluster

C: Central feature of the cluster

(iii) concatenate the residual sum then get the global feature.

global feature = [V1,V2,....Vk]

(V1 is the residual sum of cluster 1, V2 is the residual sum of cluster 2... and so on.)

=> output: k*D dimension global feature.

My question is why the residual sum of each cluster is "not" zero.

Since the central feature of each cluster found by k-means is the average of the local feater of each cluster.

The central feature of cluster 1 = average of the local feature in cluster 1.

C1 = (X1 + X2 + X3 + ...+ Xm) / m

The residual sum of cluster 1 = (X1-C1) + (X2-C1) + (X3-C1) + ... + (Xm-C1) = V1

Based on the above equation, I think the residual sum of each cluster is zero. So the global feature will be a zero matrix = [V1, V2,..., Vk] = [zero vector, zero vector, ..., zero vector].

The only reason that came into my mind is that the iteration of the k means is not enough, so the central feature of each cluster is not equal to the average of the local feature in the cluster. Am I right?

Could anybody let me know why the residual sum is not a zero vector? Thanks a lot.

1

TastyOs t1_j5129q7 wrote

I assume you're doing something like minimizing MSE between inputs and reconstructions. Instead of calculating MSE for all 21 columns, you split it into two parts: do an MSE for the important columns, and an MSE for the unimportant columns. Then weight the important MSE higher than the unimportant MSE

​

So something like

loss = 0.9 * MSE_important + 0.1 * MSE_unimportant

2

retarded_user t1_j518o13 wrote

Should the learning rate be changed to a smaller value (such as 1e-4) when working with scaled Data (range [0,1] or [-1,1]?

I'm using Adam with Keras/Tensorflow.

1

lukaszluk t1_j51m1sf wrote

Hello!
Does anyone know of a dataset with 2-D floor plan images with labeled furniture?
Couldn't find anything interesting (bad quality or very little examples).
Some of the places I tried:
SESYD - ok quality dataset (but little examples)
HouseExpo - json datasets - the quality is good, but no labeled furniture.
FloorPlanCAD Dataset - the quality of data is low
Furnishing dataset - does not contain whole rooms, only furniture
SFPI dataset Towards Robust Object Detection in Floor Plan Images: A Data
Augmentation Approach. 10k images (this could be a good dataset if quality is good, still downloading though)
Any other datasets I should check out?

1

Seankala t1_j549ygz wrote

Are there any Slack channels or Discord Servers for ML practitioners to talk about stuff?

2

stardust-sandwich t1_j54em1w wrote

I want to pull data from an API(done) and use NLP to categorize that information. Then with those results push it into a webpage or GUI tool where it will highlight the text and say, is the correct? So I can use this GUI so that I can "teach" the learning model how to classify text

e.g

Category 1 - words 1, words 2, words 3 and similar

Category 2 - word4, words 5, words 6 and so on

Then it will go and try that and come back and ask me to tune it again and rinse and repeat. Once this model is trained I then want to see it later in a different script to point a news article at it for example and it will split out the data I need.

How can I achieve this please? What are the best tools and services to get this done, ideally open source if possible, if not then happy to use a commercial service if its cheap to do so, as this is just a personal project of mine.

​

Thanks in advance.

1

unsteadytrauma t1_j54nqh3 wrote

Is it possible to run a model like GPT2 or GPT-J on my own computer and use it to rewrite/rephrase and summarize text? Or would that require too much resources for a personal computer? I'm a noob.

1

jfacowns t1_j550f70 wrote

XGBoost Question around One-Hot Encoding & Get_Dummies in Python

I am working on building a model for NHL (hockey) games and have a spreadsheet with a ton of advanced stats from teams, dates they played and so on.

All of my data in this spreadheet is categorized as a float. I am trying to add in a few columns of categorical data as I feel it could help the model.

The categorical columns have data that determines if the home team or the away team is playing on back to back days.

I am trying to determine here is one-hot encoding is best for this approach or if I'm misunderstanding how it works as a whole.

Here is some code

NHLData = pd.read_excel('C:\\Temp\\NHL_ModelBuilder.xlsx')


data.drop(['HomeTeam', 'AwayTeam','Result'],
      axis=1, inplace=True)


NHLData = pd.get_dummies(NHLData, columns= ['B2B_Home', 'B2B_Away'])

Does this make sense? Am i on the right track here?

If i do NHLData.head() I can see the one-hot encoded columns but when I do NHLData.dtypes() I see this:

B2B_Home_0              uint8
B2B_Home_1              uint8
B2B_Away_0              uint8
B2B_Away_1              uint8

Should these not be objects?

1

arararagi_vamp t1_j557ewd wrote

I have built a simple CNN which is able to detect circles on a white background with noise using PyTorch.

Now I wish to extend my network to be able to return the center of the circle as coordinates. The problem is in each data there is a variable number of circles, meaning I would need a variable number of labels for each data. In a CNN however the number of labels remains constant.

How do I work around this problem?

1

icedrift t1_j571qce wrote

I'm pretty sure GPT-J 6B requires a minimum of 24gigs of VRAM so you would need something like a 3090 to run it locally. That said I think you're better off hosting it on something like collab or paperspace.

1

Capable_Difference39 t1_j592g8e wrote

Hi all can anyone please let me know what certification or courses I can do to move to AIML field I am already working as an software engineer and have working knowledge of c#

1

morecoffeemore t1_j595omy wrote

Dumb question, but how do I know chatgpt is not just copy/pasting from the web?

Tried chatgpt for the first time. Seems cool. Dumb, question, but how do I know it's not just copy/pasting something a person wrote on the web?

I ask it for a recommendation for speakers. Gives a good reply. It seems to me it could've just done a web search and then copied what someone wrote on the web as a reply.

Is there a way to test/use chatgpt to prove to myself that it's not just copying and pasting from the web?

1

UnderstandingDry1256 t1_j5c0y0o wrote

What are the training strategies used for GPT models? Are transformer blocks or layers trained independently? Are they trained using some subset of data and fine tuned then?

I would appreciate any references or details :)

2

FlyingTwentyFour t1_j5dthif wrote

what course would be the good way to start learning NLP? I'm a beginner in ML but wanted to learn about NLP

2

stanteal t1_j5f0jaw wrote

As you have said you would need a variable amount of outputs which is not feasible in a CNN. However, you could divide the image into a grid and make predictions of the probability of the center of a circle is within each grid and their x and y offsets . Not sure if there are better resources available, but it might be worth looking at how YOLO or YOLO2 implemented their outputs.

1

serverrack3349b t1_j5fc250 wrote

In a sense it is just copying and pasting from the web just in a different order, but I get that that is not your question. Something I would try is to use plagiarism checking sites online to see if there is an exact copy of your text online. If there is than you should be able to either attribute it to the right person or re write it a bit so it is not plagiarism

1

evys_garden t1_j5fpwmw wrote

I'm currently reading Interpretable Machine Learning by Christoph Molnar and am confused with section 3.4: Evaluation of Interpretability.

I don't quite get Human level evaluation (simple task). The example is show a user different explanations and the user would choose the best one and i don't know what that means. Can someone enlighten me?

1

doIneedtohaveone1 t1_j5fqzkf wrote

Does any one know how to solve the PDE for it in python? Any kind of reference material would be appreciated!

It's been long since I came across any PDEs and have forgotten everything related to it.

1

kannkeinMathe t1_j5gxi7i wrote

Hey you,
i want to build an chatbot for domain specify purpose, for example to talk with a person about its mental state and its depression. For that I would like to train the bot with texts from the domain.
So my question how should I start?
What is approach would you use? - Would you use an intent base solution?
What are the standard models for chatbots - BERT ?
Is it even possible to fine-tune models with large text corpuses ? - IF yes, how?
Thank you Guys

1

Z1ndabad t1_j5hbncl wrote

Hey guys, new to ML and cant seem to wrap my head around the concept. I was to make a used car price prediction model using large data set and most of the tutorials i watch just use the linear regression library. However can you use neural networks instead like Levenberg-marquat?

1

Lamos21 t1_j5j74g9 wrote

Hi. I'm looking to create a custom dataset for pose estimation. Are there any free annotation tools suitable to annotate objects (meaning not human) so that I can create a custom dataset? Thanks

1

Numerous-Carrot3910 t1_j5jhhkg wrote

Hi, I’m trying to build a model with a large number of categorical predictor variables that each have a large number of internal categories. Implementing OHE leads to a higher dimensional dataset than I want to work with. Does anyone have advice for dealing with this other than using subject matter expertise or iteration to perform feature selection? Thanks!

1

iLIVECSUI_741 t1_j5jmlzh wrote

Hi, I wonder how to decide *When* it is ok to submit your work to top conferences. For example, I have a model related to biological data mining, I know KDD is coming soon but I do not like this conference and I would like to wait for NeurIPS. However, I am not sure if I will be scooped during this long period. Thanks for your help!

1

billbobby21 t1_j5jnvmh wrote

If you spend money training a model using OpenAI's API for example, do you actually own the model? As in lets say you train it so that it gets really good at writing short stories about animals. Would you then actually own that model and have the rights to use and/or license it to others? Or would OpenAI also be able to improve their own local models using the model that you created?

Basically I'm wondering what is stopping the company you are using to create a model from just stealing your creation.

2

trnka t1_j5k34hg wrote

I can't comment on OpenAI specifically, but in general it's in the terms of service of the API what they can and can't do with the model and/or data fed through it.

1

trnka t1_j5k4ldr wrote

It depends on the data and the problems you're having with high-dimensional data.

  • If the variables are phrases like "acute sinusitis, site not specified" you could use a one hot encoding of ngrams that appear in them.
  • If you have many rare values, you can just retain the top K values per feature.
  • If those don't work, the hashing trick is another great thing to try. It's just not easily interpretable.
  • If there's any internal structure to the categories, like if they're hierarchical in some way, you can cut them off at a higher level in the hierarchy
2

trnka t1_j5k5ndr wrote

Yeah you can use a neural network instead of linear regression if you'd like. I usually start with linear regression though, especially regularized, because it usually generalizes well and I don't need to worry about overfitting so much.

Once you're confident that you have a working linear regression model then it can be good to develop the neural network and use the linear regression model as something to compare to. I'd also suggest a "dumb" model like predicting the average car price as another point of comparison, just to be sure the model is actually learning something.

I'm not familiar with the Levenberg–Marquardt algorithm so I can't comment on that. From the Wikipedia page it sounds like a second-order method, and those can be used if the data set is small but they're uncommon for larger data. Typically with a neural network we'd use an optimizer like plain stochastic gradient descent or a variation like Adam.

1

trnka t1_j5k77wb wrote

The difference from application-level evaluation is a bit vague in that text. I'll use a medical example that I'm more familiar with - predicting the diagnosis from text input.

Application-level evaluation: If the output is a diagnosis code and explanation, I might measure how often doctors accept the recommended diagnosis and read the explanation without checking more information from the patient. And I'd probably want a medical quality evaluation as well, to penalize any biasing influence of the model.

Non-expert evaluation: With the same model, I might compare 2-3 different models and possibly a random baseline model. I'd ask people like myself with some exposure to medicine which explanation is best for a particular case and I could compare against random.

That said I'm not used to seeing non-experts used as evaluators, though it makes some sense in the early stages of poor explanations.

I'm more used to seeing the distinction between real and artificial evaluation. I included that in my example above -- "real" would be when we're asking users to accomplish some task that relies on explanation and we're measuring task success. "Artificial" is more just asking for an opinion about the explanation but the evaluators won't be as critical as they would be in a task-based evaluation.

Hope this helps! I'm not an expert in explainability though I've done some work with it in production in healthcare tech.

1

Numerous-Carrot3910 t1_j5ka168 wrote

Thanks for your response! Even with retaining the top K values of each feature, there are still a large number of features to consider. I haven’t tried the hashing trick, so I will look into that

1

trnka t1_j5kksex wrote

Hmm, you might also try feature selection. I'm not sure what you mean by not iterating, unless you mean recursive feature elimination? There are a lot of really fast correlation functions you can try for feature selection -- scikit-learn has some popular options. They run very quickly, and if you have lots of data you can probably do the feature selection part on a random subset of the training data.

Also, you could do things like dimensionality reduction learned from a subset of the training data, whether PCA or a NN approach.

1

PulPol_2000 t1_j5kwl2u wrote

I have a project that would use AR Core and Google ML kit to be able to recognize vehicles from a video feed and besides recognizing the objects is that it will be able to know the distance measurement of the object from the origin camera point. I'm lost on how I would integrate the distance measurement into the object detected of the ML kit. sorry for lack of knowledge as I only entered the ML community. thanks in advance!

1

Zyj t1_j5l7oog wrote

When i use 2 RTX 3090 with nVLink bridge plugged into PCIe 3.0 x8 slots each instead of PCIe 4.0 x16 slots, what kind of performance hit will i get?

1

kernel_KP t1_j5lpxnn wrote

I have a dataset (unlabelled) containing a lot of audio files and for each file, I have computed the chromagram. I would need some advices for the implementation of a possibly efficient Neural Network to cluster these audio files relying on their chromagram. Consider this data to be already correctly pre-processed so chromagram have all the same size. Thanks a lot!

1

Oceanboi t1_j5n6p7b wrote

my advice is to proceed. its cool to know the math underneath, but just go implement stuff dude, if it doesn't work you can always remote/rent GPU. what i did for my thesis is google tutorials and re-implement them using my dataset. through all the bugs and the elbow grease, you will know enough to at least speak the language. just do it and don't procrastinate with these types of posts (i do this too sometimes)

EDIT: a lot can be done on colab these days regarding neural networks and huggingface. google huggingface documentation! i implemented a huggingface transformer model to do audio classification (and im a total noob i just copied a tutorial). it was total misuse of the model and accuracy was bad, but at least i learned and given a real problem i could at least find my way forward.

1

trnka t1_j5nukd2 wrote

I'm not sure what you mean by applying a NN to linear regression.

I'll try wording it differently. Sometimes a NN can outperform linear regression on regression problems, like in the example if there's a nonlinear relationship between some features and car price. But neural networks are also prone to over-fitting so I recommend against having a NN as one's first attempt to model some data. I recommend starting simple and trying complex models when it gets difficult to improve results in simple models.

I didn't say this before but another benefit of starting simple is that linear regression is usually much faster than neural networks, so you can iterate faster and try out more ideas quickly.

2

Cyclone4096 t1_j5owtmi wrote

I don’t have too much background on ML. I want to build a fairly small neural network that has only one input which comes from a time series data and has to give only one output for that data. My loss function aggregates the entire time series output to get a single scalar value. I’m using PyTorch and when I call “.backward()” on the loss function it takes a long time (understandably). Is there an easier way to do this rather than doing backward gradient calculation on a loss function that itself is a result if 100s of millions values? Note that the neural network itself is tiny, maybe less than 100 weights, but my issue is that I don’t have any golden target, but I want to minimize a complex function calculated from the entire time series output.

1

Jack3602 t1_j5qcwv2 wrote

What would you recommend for a good resource for learning AI/ML. I have some knowledge in web dev and know C/C++. I finished the OdinProject foundations and currently on Full stack JavaScript but I kinda got a bit curious about Machine learning and I would like to get my feet wet. Is there any good resource to start, what would you recommend?
Don't really care for udemy courses and watching a lot of videos cuzz I've tried it for web dev and it just feels like tutorial hell, but I loved The Odin Project and reading tutorials/documentations/doing exercises/projects because I actually learn a lot that way. I've seen websites like mlcourse.ai and kaggle.com but still haven't tried them. What is your opinion on them, maybe a comparison to theodinproject.com and would you recommend something else

1

Cyclone4096 t1_j5qi39j wrote

Sure! So this is for audio signal processing. There is an amplifier that takes an audio signal and volume as input. However higher volume causes white noise, so I want the volume to stay low whenever possible and boost the volume by multiplying the input signal instead. But of course the multiplication won’t work if the input to the amplifier itself is already high. Switching the amplifier volume too much is not good either as that would cause pop/click noise. So I’m designing a small neural network that will take the audio signal as input and output the amplifier volume. The way I went about is I modeled the amplifier and all noise associated with it using tensor math. Then I used the amplifier output minus the original input and did MSE on that. Note that the audio signals are pretty long so the filter+MSE is a pretty massive expression. It seems to be working somewhat, but not sure if there is an easier way to do this…

1

DeXma00 t1_j5qppol wrote

I have a small images dataset labeled on cvat. Now I need to export it and train the network on pytorch lightning. How can I do that? I'm a complete noob on this but I need it for the next phase of a project I'm working on.

Any help is realy apreciated!

1

Great-Ad8037 t1_j5qtayg wrote

Can you change the title/abstract of CVPR 23 submissions during/after the rebuttal phase? Some reviewers have trouble with our title and think we should change it. Can we commit to doing that in our rebuttal response?

1

Perfect_Finance7314 t1_j5t45nz wrote

Hello, I have been generating images with StyleGAN2-ada-pytorch in Google colab and I have my genereted images in google drive. I am struggling to find which seed-number an image is. Can someome please help me figure out how do I find the seed number to a specific image?

Thanks a lot!

1

RealKillering t1_j5vu1t5 wrote

I just started working with Google Colab. I am still learning and just used Cifar 10 for the first time. I switched to colab pro and also switched the GPU class to Premium.

The thing is the training seems to take just as long as with the free GPU. What am I doing wrong?

2

catndante t1_j5x6k5e wrote

Hi, i have a simple question about DDPM model. I'm not so sure, but I think I have read the post saying that when T=1000, using 1,000 models will perform better but its computationally too redundant, so DDPM used same model for evert step t. Is this argument correct? If centers with huge computation does this, will the performance be better?

2

marcelomedre t1_j5xxv7t wrote

Hi, I have a question about k-means. I have a data frame with 100 variables after removing low variance and high correlated ones. I know that the data must be normalized for the kmeans, specially to remove the range dependency, but I am facing a problem that if I do normalize my data the algorithm is not properly separating the clusters. I have 3 variables ranges in my data:

  • 0-10^4;
  • -10^3 - 10^3;
  • 0 - 10^3

I have at least 5 very specific clusters that I could characterize by not scaling the data, but I am not comfortable with this procedure.

I couldn’t find a reasonable explanation with is the algorithm performing better in non-scaled data instead of the scaled one.

1

randomrushgirl t1_j5yjww3 wrote

Hey! I had a very similar doubt and was hoping you could provide some insight. I came across this CLIP Guided Diffusion Colab Notebook by Katherine Crowson. It's really cool and I've played a little with it.
I want to know if I can generate the same image over and over again. I've tried setting the seed but I'm new to this so can someone give me some intuition or links to some related work in this area. Any help would be appreciated.

1

trnka t1_j5z5e39 wrote

I've seen that before when the large range features were the most important for the clusters I wanted. It was essentially doing feature weighting but it was implicit in the scales

2

answersareallyouneed t1_j5zux2o wrote

Looking at an ML Engineer role with the following qualifications:

"Strong experience in the area of developing machine learning training framework, or hardware acceleration of machine learning tasks"

"Familiar with hardware architecture, cache utilization, data streaming model"

Any recommendations for books/resources/courses in this area? How does one begin to develop these skills?

1

TopCryptographer402 t1_j609ecj wrote

Does anyone have resources on how to create a simple time series transformer for a classification task? I've been trying to build one for over a month now but I haven't had any luck. I'm trying to predict a binary outcome (0 or 1) for the next 100 time steps.

2

InsidiousApe t1_j634sxw wrote

I enjoy that this is the simple questions thread. :)

Let me ask something much simpler, although in three parts. I am a web developer with no ML experience, but with a specific project in mind. I'd like to understand the process a touch better in order to help me find a programmer to work alongside (paid of course).

(1) Provided the information is easily found via API for instance, what is the ingestion process like time wise for very large amounts of information? I realize that is subjective to the physical size of the data, but are there other things going on which take time in that process?

(2) In order to program a system to look for correlations in data where no one may have seen them before, what is the process used to do this? This is what I'm truly looking to do once that information is taken in. For example, a ton of (HIPAA Compliant) medical information is taken in and I'm looking to build a system that can look for commonalities of people with a thyroid tumor. Obviously tons of tweaking to those results, but what is the process which allows this to happen?

1

WarProfessional3278 t1_j649od6 wrote

Does anyone know of any good AI-generated text detectors? I know there's GPTZero but it's not very good in my experience.

My research has led me to Hive AI but I'm sure there are better alternatives out there that does not claim such good results (99.9% accuracy) while still having a lot of false positives in my tests.

1

trnka t1_j6583q3 wrote

If you're ingesting from an API, typically the limiting factor is the number of API calls or network round trips. So if there's a "search" API or anything similar that returns paginated data that'll speed it up a LOT.

If you need to traverse the API to crawl data, that'll slow it down a lot. Like say if there's a "game" endpoint, a "player" endpoint, a "map" endpoint, etc.

If you're working with image data, fetching the images is usually a separate step that can be slow.

After that, it you can fit it in RAM you're good. If you can fit it on one disk, there are decent libraries with each ML framework to efficiently load from disk in batches, and you can probably optimize the disk loading too.

----

What you're describing is usually called exploratory data analysis but it depends on the general direction you want to go in. If you're trying to identify people with thyroid cancer earlier, for example, you might want to compare the data of recently-diagnosed people to similar people that have been tested and found not to have thyroid cancer. Personally, in that situation I like to just train a logistic regression model to predict that from various patient properties then check if it's predictive on a held-out data sample. If it's predictive I'll then look at the coefficients of the features to understand what's going on, then work to improve the features.

Another simple thing you can do, if the data is small enough and tabular rather than text/image/video/audio is to load it up in Pandas and run .corr then check correlations with the column you care about (has_thyroid_cancer).

Hope this helps! Happy to follow up too.

2

NormalManufacturer61 t1_j66u0fb wrote

I am a non-data scientist interested in a laymans to introductory level book/primer on the topic of ML/AI, specifically on the principles and mechanics of the topic(s). Any recommendations?

1

golongandprosper t1_j67azrb wrote

I wouldn’t think so. The code for the video is digital, and patterns can be detected from the rendered frames, while a monitor displays converted data to analog light patterns. The only reason for a monitor is if the detector is a camera in front of the monitor sensing light patterns, then it would convert to digital patterns similar to the orginal code. That may be useful for interacting in the analog world and accounting for the way light reflects in an analog space, but I think that’s future tech, or maybe automated cars. You’d hope they’ve done some control/experiment to account for lighting changes like this

1

yauangon t1_j67hram wrote

I'm trying to improve a CNN encoder, as a feature extractor for an AMT (automatic music transcription) model. As the model must be small and fast (for mobile deployment), we are limited to about 3-6 layers of 1D-CNN. I want to improve the encoder with residual block (of ResNet), but my question is: I don't known if Residual block would benefit on such a shallow CNN architecture? Thank everyone :D

1

DCBAtrader t1_j68h2fr wrote

Basic question on regression/AutoML (pycaret mainly).

When do p-values versus error metric (MAE, MSE, R Squared matter).

My previous model building experience (multivariate regression) was to first use various combinations of variables in OLS such that all the variables were statistically significant, and then use an AutoML (pycaret) to build models, and judge them by MAE, MSE or R squared. Using proper cross-validation test/train splits of course.

I'm wondering if this step is needed, and I just can just run the entire data-set in pycaret, and thus judge a model based on said metrics (MAE, MSE, R squared)?

My gut says that the simpler model with stat. significant variables should perform better but maybe I can just look at the best error metric?

1

bridgeton_man t1_j68njfs wrote

Quesiton about goodness of fit.

​

For regressions, R-squared and Adj. R-Squared are typically considered the primary goodness-of-fit measures.

​

But in many supervised machine-learning models, RMSE is the main measure that I keep running across. For example, decision tree models that I create in R using Rpart do that.

​

So, my question is how to compare the predictive accuracy of OLS regression models that report R-sq to equivalent Rpart regression trees that report RMSE.

1

ant9zzzzzzzzzz t1_j6a37a1 wrote

Is there research about order of training examples, or running epochs on batches of data rather than full training set at a time?

I was thinking about how for people we learn better if focus on one problem at a time until grokking it, rather than randomly learning things in different domains.

I am thinking like train some epochs on one label type, then another, rather than all data in the same epoch, for example.

This is also related to state full retraining, like one probably does professionally - you have an existing model checkpoint and retrain on new data. How does it compare to retraining on all data from scratch?

1

Anvilondre t1_j6a8v34 wrote

Probably not. The idea of ResNets is to remove the vanishing gradients that normally occur in very deep networks. In my experience it can often do worse than better, but you can try DenseNets instead.

2

Anvilondre t1_j6aa0er wrote

Honestly I don't think transformers are worth it for any kind of TS or tabular data (and there's research showing that). But if you really want to try, I had a good success with this library. It makes it essentially a few-liner to run tons of transformer and other architectures on any kind of tabular data. You may also want to check out HuggingFace model repo for quick solutions.

1

eltorrido23 t1_j6c4bwq wrote

I’m currently starting to pick up ML with a quant focused social scientist background. I am wondering what I am allowed to do in EDA (on the whole data set) and what not, to avoid „data leakage“ or information gain which might eventually ruin my predictive model. Specifically, I am wondering about running linear regressions in the data inspection phase (as this is what I would often do in my previous work, which was more about hypothesis testing and not prediction-oriented). From what I read and understand one shouldn’t really do that, because to much information might be obtained which might lead me to change my model in a way that ruins predictive power? However, in the course I am doing (Jose Portillas DS Masterclass) they are regularly looking at the correlations before separating train/test samples. But essentially linear regressions are also just (multiple/corrected) correlations, so therefore I am a bit confused where to draw the line in EDA. Thanks!

1

trnka t1_j6ce4td wrote

I try not to think of it as right and wrong, but more about risk. If you have a big data set and do EDA over the full thing before splitting testing data, and intend to build a model, then yes you're learning a little about the test data but it probably won't bias your findings.

If you have a small data set and do EDA over the full thing, there's more risk of it being affected by the not-yet-held-out data.

In real-world problems though, ideally you're getting more data over time so your testing data will change and it won't be as risky.

1

trnka t1_j6d5fbk wrote

I think most people split by participant. I don't remember if there's a name for that, sorry! Hopefully someone else will chime in.

If you have data from multiple hospitals or facilities, it's also common to split by that because there can be hospital-specific things in the data and you really want your evaluation to estimate the quality of the model for patients not in your data at hospitals not in your data.

1

EscanorFTW t1_j6d9ee8 wrote

What are some good places to start if you are just getting into ML/Ai? Pls share useful links/resources

1