Viewing a single comment thread. View all comments

knowledgebass t1_iu0qcqe wrote

Might I suggest reducing your dataset sizes by about an order of magnitude to ~10k or even less? Less than 1000 records would be ideal.

If it is a learning environment, you want a pretty quick turnaround on training models, say less than 30 seconds. Of course, it can take much (much) longer on actual production systems with huge training sets, but it is going to be frustrating for students if they have to wait minutes for their models to train. I'd test this beforehand and make sure that they won't get bogged down by this. (Plenty of small ML datasets out there which are still interesting and instructive.)

1

jshkk OP t1_iu0qluv wrote

Oh I only threw 100K out there as an extreme/max; whatever I would do would be teensy!

1