Viewing a single comment thread. View all comments

chief167 t1_istrptp wrote

As someone who sometimes has to hire people, perhaps this is the issue:

Imagine how difficult it is for big companies to get a MLOps framework going, with all the red tape and scattered IT systems. It was very painful where I work. In the end we got something working using a python platform that really needs you to use pandas and sklearn type interfaces.

Let's hypothetically say you are a great data scientist using R, or Sas or MATLAB or ... If I don't have a lot of options I'd hire you and put you on a training program for our framework. But if I have multiple decent candidates, and some don't require retraining, yeah imma gonna pick one of them. I am not spending 2 months trying to get compliance and cybersec to approve your docker container with R code in it, if I can have a similar model in our pre-approved workflow.

14

SkinnyJoshPeck t1_isu0ui7 wrote

I hear ya; I think the point is less about proficiency and more about mastery -- in my case, I was marked down heavily since I didn't use iloc. Something like

df[df.col < 10]
vs
df[df.iloc[:, 0] < 10]

because I guess it makes it more clear to the reader, and it protects the code from explicit column names; the fact that I didn't use it made me seem like I didn't know pandas well.

to your point, though, I see the importance in the infrastructure. In this case, it was for an ml scientist role where I wouldn't actually be doing any of the MLOps, just designing and tuning the models.

16

phb07jm t1_isudd5v wrote

Can someone please explain why the second is preferable? I would always do the first because it's more likely that the position of a column will change than the name.

21

silvershadow t1_isurezr wrote

Change the iloc to a loc and then I would maybe see the argument.

.iloc and .loc explicitly return the original data frame, while [] indexing can in some cases return a copy. Pandas makes no promises on what you get

So depending on what the full expression was the criticism of using [] inducing could make sense. You’d need to see the full context of what OP was writing though.

From the sounds of what they wrote though, this is not the thinking the interviewer was following.

11

chief167 t1_isuculx wrote

Ok yeah well that's stupid. Because I am actually in favour of column names instead of indexes. Indexes are pain in the ass when your incoming dataframe changes, it creates an implicit dependency.

But your last line is my point. You shouldn't be concerned about MLops stuff, but if your models is already in the right framework, it saves soooo much time

11

monkeyunited t1_isufjc7 wrote

That’s dumb and violates the “explicit is better than implicit” rule.

8