dgrsmith

dgrsmith t1_j8n4kwe wrote

From a cognitive point of view, humans and animals have modules that they rely on for certain tasks. For Human Neuropsych assessment, the combination of the function of these modules gives you a score for general intelligence, with each module contributing toward the whole. Having a removed or changed “module” for one reason or another will sometimes cause localized task failures (e.g., neurodegenerative disease or brain injury) or approach to tasks that is atypical (e.g., atypical brain development). Maybe we can think of specific cognitive functions as being API calls to a modules in this “tool use” paradigm? This is likely not an original thought, and if anyone has references or has heard of this idea, please let me know!

1

dgrsmith t1_j6xffed wrote

>If you are in a position that you have access to a database at work, I strongly suggest that you give it a try. It's surprisingly good.

I'll give it a try with synthetic data! Maybe I'll be surprised at the amount of finessing it doesn't take. I assume it's gonna take quite a bit to make it work, but I'll give it a shot!

1

dgrsmith t1_j6xe2bu wrote

Totally agree. The company's CTO or business users need to buy into this in order to allow resource allocation. It's promising, it just requires a hell of a lot of "human in the loop" at the moment in order to finess the data to a point that the AI could produce reliable results from hidden concepts and constructs in raw tables. I think the assumption currently is that your data is finessed already for GPT to take over and produce reliable and clean results. Those 10 data people will certainly be supported by data cleaning staff. That's where it should be anyhow. No data scientist likes spending the 80% of their time cleaning and prepping the data, but that's where we are now.

7

dgrsmith t1_j6xa0mw wrote

That’s the thing though, you need to know what you’re looking for in the database in order for the database to be able to provide you with data. AI can guess, sure, but you won’t be able to trust the results unless you’re familiar with the database, and ensure the AI is as well. I agree it’s not a breaking case, it’s just a case of considerable resource reallocation.

In your example as well, even though it is an implicit assumption that gender is an easy construct to define, that May not be the case. Are we talking sex at birth? Sex at point of observation? Identifying gender? Constructs require a lot of data understanding and finessing in a manner that end users won’t be able to clearly be able to pull without a human directing the AI somehow by providing data availability and documentation. Once you have those human data prep processes done, yes, you want your end users to be able to ask questions of the data readily. But this requires a fair bit of human anticipation as to what should be available to the AI given end-user business needs.

1

dgrsmith t1_j6wpjae wrote

This was discussed over on r/datascience too. We’d love it if it worked out of the box, but the knowledge requirements needed to tell the tool what tables do and what each of their columns mean requires a level of documentation that most companies don’t have reliably, nor would it be standardized enough to allow a model such as GPT to generalize. In a perfect world, metadata is available, and data governance is a significant focus. Often, companies don’t have time to focus on these tasks as they require considerable work. Additionally, even though there are a lot of efforts to standardize, sometimes the underlying concepts need a lot of human intervention prior to being pushed into models.

With this in mind, the title should read, “GPT tool that lets you connect to unrealistically well documented databases, and ask questions in text.”

This May be a factor in convincing a company’s CTO that they need to let us focus on documentation, but right now, governance and metadata are far from priorities for analytics teams.

8