Submitted by Exnur0 t3_zwi4jx in MachineLearning
In the wake of all the questions and worries about models that can generate content nearing (or exceeding, in some cases) the quality of that made of humans, there are a couple mechanisms that companies should provide alongside their models. Both vary in feasibility, but in general, both are pretty doable, at least for what we've seen so far.
-
A hashing-based system to check whether a given piece of content was generated by the model. This can be accomplished by hashing all of the outputs of the model, and storing them. If it doesn't pose some sort of security risk for the generator, it could also provide the date of generation.
-
A model for discriminating whether a given piece of content was generated by the model, similar to this model for GPT-2. This is necessary in addition to the simpler hashing mechanism, since it's possible for only a portion of the media to be generated. This would be imperfect, of course, but if nothing else, we should press companies enough that they feel obligated to give it a dedicated try.
These mechanisms need real support - an API for developers, and a UI for less sophisticated users. They should have decent latency, and be hopefully be provided for free, at some level of usage - I understand the compute required could be enormous.
Curious what others think here :)
dojoteef t1_j1uy04f wrote
Very interesting idea. It could easily be applied to images since digital watermarks already exist. Not sure how feasible it is for AI generated text.
Tbh, I imagine it behooves companies to do this so they are less likely to train on media (text, images, audio, etc) produced from a model. The more ubiquitous the use of AI generation becomes, the more of an issue this poses. Currently that problem is likely quite minimal and probably acts to inject a small bit of noise into training (and the knowledge distillation effect could make slightly improve training efficiency).
Though I guess a new data cleaning step could be running a classification model to classify if the media trained on is likely AI generated, though that would likely be less efficient than a hash produced at the time of generation.