Viewing a single comment thread. View all comments

Zermelane t1_iw1nx4n wrote

High-confidence predictions:

  • GPT-4, unless it comes out this year. The rumors about its capabilities and architecture have been so all over the place that I have no idea what to expect of it, but the part I'm confident about is, it's coming.

  • Publicly available text-to-image models conditioned on a good text encoder's embeddings and not CLIP (or, with eDiff-I's example, not only CLIP). We will collectively realize just how frustratingly vague and gist-based our current text-to-image models really were.

  • H100s go brrr. 2-3x cost decreases in workloads doing anything A100s were already good at, more if you can make use of stuff like fp8, with matching improvements in AI services.

  • Some crazy thing will happen in BioML that nobody will be able to agree on whether it's a huge breakthrough or an insignificant increment.

...And some spicy low-confidence ones:

  • Some cool architectural improvement to diffusion models turns out to work really well and make them significantly cheaper, don't know what. Pyramidal diffusion? Maybe someone figures out how to do StyleGAN3's equivariances on a U-Net? Maybe some trick that's particularly good for video?

  • Someone figures out how to get text-to-image to competently use references when drawing, without textual inversion's crazy overfitting problems.

  • One of the big labs gets a LLM to usefully critique and correct its own chain-of-thought reasoning, bumps MMLU results by some scary number in the 5-10% range. (Bonus points if they also apply that to codegen)

  • Someone trains a TTS to use T5 embeddings, and suddenly it just gets emotional prosody right because it actually has some idea of what it's saying.

4