oneandonly13579 t1_jasn3ro wrote on March 3, 2023 at 7:46 PM

#2,153,757

brain2img 2023, cool

hoshitoshi t1_jasnojt wrote on March 3, 2023 at 7:50 PM

#2,153,784

This will be very useful for brain researchers (and scarily interrogators). I wonder if you could do a similar thing and use an LLM to extract text/language from MRI imaging to read what a person is actually thinking.

helliun t1_jasuo76 wrote on March 3, 2023 at 8:36 PM

#2,154,061

Wow the implications here are kinda insane

currentscurrents t1_jasxijr wrote on March 3, 2023 at 8:54 PM

#2,154,186

I'm a wee bit cautious.

Their test set is a set of patients, not images, so their MRI->latent space model has seen every one of the 10,000 images in the dataset. Couldn't it simply have learned to classify them? Previous work has very successfully classified objects based on brain activity.

How much information are they actually getting out of the brain? They're using StableDiffusion to create the images, which has a lot of world knowledge about images pretrained into it. I wish there was a way to measure how of the output image is coming from the MRI scan vs from StableDiffusion's world knowledge.

Zestyclose-Debt-4712 t1_jasycf3 wrote on March 3, 2023 at 9:00 PM

#2,154,218

Does this research make any real sense? Creating a low resolution image from brain activity has been done before and is amazing. But using a pretrained denoising network on the noisy image will add just details that have nothing to do with the brain activity. Just like those ai-„enlarge/zoom“ models imagine/add details that never were in the original picture.

Or am I missing something here and they address the issue?

SleekEagle OP t1_jaszawj wrote on March 3, 2023 at 9:06 PM

#2,154,262

Replying to Zestyclose-Debt-4712 (#2,154,218)

It looks like, rather than conditioning on text they condition on the fMRI, but it's unclear to me exactly how they map between the two and why this would even work without finetuning. TBH I haven't had time to read the paper so I don't know the details, but figured I'd drop the paper in case anyone was interested!

Visible-Moment-8974 t1_jath1zi wrote on March 3, 2023 at 11:09 PM

#2,155,046

exciting! one day we will be able to visualize whats goin on in this amazing creature: https://www.youtube.com/watch?v=0vKCLJZbytU&ab_channel=NatureonPBS

A_HumblePotato t1_jati59p wrote on March 3, 2023 at 11:17 PM

#2,155,086

Looks interesting, but as another user pointed out not particularly novel (aside from the decoder model being used). One thing I wish these studies did is to test these models on subjects that weren’t used for training of the model, to see if these methods generalize to several people (or at least a few-shot training/testing on new subjects). I do actually like the idea of using latent diffusion models for these tasks, as long-term our brain does not store perfect reconstruction of images.

OrangeYouGlad100 t1_jatsw72 wrote on March 4, 2023 at 12:39 AM

#2,155,545

Replying to currentscurrents (#2,154,186)

> so their MRI->latent space model has seen every one of the 10,000 images in the dataset

Are you sure about that? I wasn't able to understand their test method from the paper, but it sounds like they held out some images from training

OrangeYouGlad100 t1_jatt83m wrote on March 4, 2023 at 12:41 AM

#2,155,559

Replying to currentscurrents (#2,154,186)

This is what they wrote:

"For a subset of those trials (N=2,770 trials), 982 images were viewed by all four subjects. Those trials were used as the test dataset, while the remaining trials (N=24,980) were used as the training dataset."

That makes it sound like 982 images were not used for training

currentscurrents t1_jatvmtm wrote on March 4, 2023 at 1:00 AM

#2,155,678

Replying to OrangeYouGlad100 (#2,155,545)

You're right, I misread it. I thought they held out 4 patients for tests. But upon rereading, their dataset only had 4 patients total and they held out the set of images that were seen by all of them.

>NSD provides data acquired from a 7-Tesla fMRI scanner over 30–40 sessions during which each subject viewed three repetitions of 10,000 images. We analyzed data for four of the eight subjects who completed all imaging sessions (subj01, subj02, subj05, and subj07).

...

>We used 27,750 trials from NSD for each subject (2,250 trials out of the total 30,000 trials were not publicly released by NSD). For a subset of those trials (N=2,770 trials), 982 images were viewed by all four subjects. Those trials were used as the test dataset, while the remaining trials (N=24,980) were used as the training dataset.

4 patients is small by ML standards, but with medical data you gotta make do with what you can get.

I think my second question is still valid though. How much of the image comes from the brain data vs from the StableDiffusion pretraining? Pretraining isn't inherently bad - and if your dataset is 4 patients, you're gonna need it - but it makes the results hard to interpret.

karius85 t1_jav4q78 wrote on March 4, 2023 at 8:26 AM

#2,157,350

Any relation to the team in Kyoto who popped up in vsauce MindField episode a couple of years back?

Coyote-Sweaty t1_jav4ul6 wrote on March 4, 2023 at 8:27 AM

#2,157,354

Replying to oneandonly13579 (#2,153,757)

Actually, its from 2022

LeanderKu t1_javeqbn wrote on March 4, 2023 at 10:49 AM

#2,157,636

Replying to A_HumblePotato (#2,155,086)

They probably don’t generalize. I bet they tried it

Accomplished-Fly-96 t1_jaz8gc8 wrote on March 5, 2023 at 5:18 AM

#2,163,709

In Sections 3.3 and 3.4, the authors mention linear models for the mapping between the text embeddings and the fMRI. I looked at their repository, but it does not have any code yet. Does anyone have a better idea about these linear models the authors talk about?

chungexcy t1_jaz8tm4 wrote on March 5, 2023 at 5:21 AM

#2,163,720

>Figure 3 shows the results of visual reconstruction for one subject (subj01). We generated five images for each test image and selected the generated images with highest PSMs.

Something is not quite right. When they select the generated image, they use PSM score to select the best in 5. To calculate the PSM, I believe you need the original image (target, ground truth). It's like the LDM gives you five choices and you use your target pick the most similar one and then claim that this one is similar your target?

Mysterious-Career236 t1_jb57u9u wrote on March 6, 2023 at 3:07 PM

#2,172,468

Replying to LeanderKu (#2,157,636)

I also bet the scope didn't allow them to try hard enough. It is possible you could generalise a little if you trained the model in numerous people and not only a handfew

[R] High-resolution image reconstruction with latent diffusion models from human brain activity

Comments