Hola!

I am working on comparing some models, few of which have been implemented in PyTorch and the rest of them in Tensorflow (some in 1.x and others in 2.x versions). I know if they are implemented well, one should be able to simply compare their graphs/performances regardless of the platform. But often there might be some subtle differences in the implementations (within the platforms themselves and the way model code utilizes it) that can make it painful to trust the training. Some models are from official sources so I'd rather not verify much of their code before using them. Of course, I don't want to implement all of them into a single platform unless I must.

If you have come across such a problem, how have you dealt with it? Are there certain tests you would conduct to ensure the loss curves can be compared? How would you go about this issue other than finding someone else's implementation of say, a TF model in PyTorch, and verifying it?

Sincerely, A man in crisis.

Comments

You must log in or register to comment.

sugar_scoot t1_jc5qogu wrote on March 14, 2023 at 6:13 AM

#2,226,185

What's the purpose of your study?

cthorrez t1_jc5s8ag wrote on March 14, 2023 at 6:34 AM

#2,226,282

Basically I would just make sure the metrics being compared are computed the same way. Same numerator and denominator like summing vs averaging, over the batch vs epoch. If the datasets are the same and the type of metric you are computing is the same it's comparable.

The implementation details just become part of the comparison.

Fapaak t1_jc5yuuj wrote on March 14, 2023 at 8:08 AM

#2,226,609

Replying to sugar_scoot (#2,226,185)

Sounds like a bachelor's thesis to me at least

chaotycmunkey OP t1_jc6fkn7 wrote on March 14, 2023 at 11:51 AM

#2,227,417

Replying to sugar_scoot (#2,226,185)

Writing my first paper. I have my own model that I'm going to compare against the SOTA on some RL tasks.

BellRock99 t1_jc7fsag wrote on March 14, 2023 at 4:23 PM

#2,229,318

Trust the implementation, or simply just use the metrics presented in their papers on the standard datasets. The latter is more correct in my opinion, since even your implementation could be wrong.

chaotycmunkey OP t1_jc7ki7b wrote on March 14, 2023 at 4:53 PM

#2,229,598

Replying to BellRock99 (#2,229,318)

My goal is to test their model on a new dataset that I believe will fail to perform on. And my proposed model is supposed to be an improvement. As such, I have to run their model.

BellRock99 t1_jc7oki4 wrote on March 14, 2023 at 5:19 PM

#2,229,848

Replying to chaotycmunkey (#2,229,598)

You could try asking their implementation