I have been trying to find a nice tech stack I like for designing and running machine learning models, and currently I'm trying out mlflow, hydra, and optuna.

However, hydra seems to have several limitations that are really annoying and are making me reconsider my choice. Most problematic is the inability to group parameters together in a multirun. Hydra only supports trying all combinations of parameters, as described in https://github.com/facebookresearch/hydra/issues/1258, which does not seem to be a priority for hydra. Furthermore, hydras optuna optimizer implementation does not allow for early pruning of bad runs, which while not a deal breaker is definitely a nice to have feature.

What I do like about hydra is their ability to combine config yaml, using defaults. So does anyone have any good alternatives or suggestions for how to fix this or what to switch to?

Comments

You must log in or register to comment.

RicketyCricket t1_je50mbi wrote on March 29, 2023 at 1:44 PM

https://github.com/fidelity/spock

_Arsenie_Boca_ t1_je5d04j wrote on March 29, 2023 at 3:09 PM

Looks interesting, a bit more lightweight than hydra. But also misses a lot of cool features like composing multiple yaml configs

RicketyCricket t1_je5j2n9 wrote on March 29, 2023 at 3:48 PM

Most of the cool stuff is buried in the docs under advanced features :-)

https://fidelity.github.io/spock/advanced_features/Composition

(full transparency I'm the author/maintainer/core-developer. I know the docs need a re-org to surface more of the useful features)

RicketyCricket t1_je5jchr wrote on March 29, 2023 at 3:50 PM

This being my favorite hidden one:

https://fidelity.github.io/spock/advanced_features/Evolve#maintaining-cli-and-python-api-configuration-parity

RicketyCricket t1_je5kgy4 wrote on March 29, 2023 at 3:57 PM

second favorite:

https://fidelity.github.io/spock/advanced_features/Post-Hooks

Basically lets you do any validation necessary on your configs. Spock provides some basics (greater than, within bounds, etc) but it's totally up to the user via any simple asserts or validation functions a user wants to write.

_Arsenie_Boca_ t1_je6ayl7 wrote on March 29, 2023 at 6:45 PM

Thanks, looks like your library isn't far behind hydra in terms of functionality. Will definitely look into it more closely the next time I set up a project.

What would you say are the pros and cons between hydra and spock?

RicketyCricket t1_je9lp7z wrote on March 30, 2023 at 12:39 PM

Mainly that Spock is much lighter weight and really focuses on just configuration management and stateful ness. Hydra has all these crazy bells and whistles (Ray integration etc) that could be useful for certain things but kinda starts meandering from the original purpose of configuration management imo. Hydra is great and if it works for you then use it. We built Spock internally when I was at Fidelity because Hydra didn’t exist… just so happens that FB/Meta was doing the same thing at the same time so both libraries end up covering a very similar usage space

_Arsenie_Boca_ t1_je9n0ea wrote on March 30, 2023 at 12:50 PM

Thanks, I basically use only the config part of hydra and am regularly annoyed that its so clunky, so spock might be a good alternative. Gonna check it out :)

alyflex OP t1_je5y5gh wrote on March 29, 2023 at 5:24 PM

> https://github.com/fidelity/spock

This looks quite promising, and I like the Post hooks you linked below, but I do not see any way of running a series of experiment in a non-combinatoric way? There is Optuna api (though I can't tell whether early pruning is supported in this?), but I don't see any way of grouping parameters for a set of experiments.

DigThatData t1_je5kfoc wrote on March 29, 2023 at 3:57 PM

go closer to the metal and use omegaconf directly.

RicketyCricket t1_je5ky7l wrote on March 29, 2023 at 4:00 PM

As the developer of Spock (posted in another comment) -- OmegaConf is also an awesome choice and super useful. I'd suggest checking it out too!

You can go even closer to metal and use the attrs library as well (https://www.attrs.org/en/stable/)

alyflex OP t1_je5yia4 wrote on March 29, 2023 at 5:26 PM

That is certainly an option that I was considering, but then I would have to make my own job planner / multirunner, (which I actually already have done for my current project, but this whole refactoring was to try and move away from my own custom functions and try to use some more standardized methods)

DigThatData t1_je600q2 wrote on March 29, 2023 at 5:35 PM

i misunderstood, i thought you were looking for an alternative config component. if you're looking for an atlernative for managing hyperparameter search jobs, consider https://docs.ray.io/en/latest/tune/index.html . I think hydra actually might even integrate with ray.

[deleted] t1_je50yk7 wrote on March 29, 2023 at 1:47 PM

[removed]

Jean-Porte t1_je53acs wrote on March 29, 2023 at 2:04 PM

This is much lighter but it's a pure-python config flow manager I made where you can chain experiment classes by adding them (xp1()+xp2() ) https://github.com/sileod/xpflow

fmindme t1_je5ty5k wrote on March 29, 2023 at 4:58 PM

I'm also using Omega conf. It's a great lib: full of feature, not opiniated, perfect for Mlops !

arnowaczynski t1_je6flvk wrote on March 29, 2023 at 7:15 PM

dataclasses from python standard library + dacite.from_dict

timo_kk t1_jea88ah wrote on March 30, 2023 at 3:29 PM

Pyrallis quite nicely builds on top of that to support e.g. command line arguments.

I'm quite happy with it.

pseeth t1_je8mold wrote on March 30, 2023 at 5:19 AM

I have a lightweight package that I use that has all the main things I wanted from hydra or gin-config. It's here and it's pretty tiny in terms of lines of code: https://github.com/pseeth/argbind

fnordstar t1_je8rhex wrote on March 30, 2023 at 6:18 AM

Isn't just using python flexible enough for you?