TiredOldCrow t1_iuzp1y3 wrote on November 4, 2022 at 5:00 AM

I appreciate that the legendary "fast inverse square root" code from Quake 3 gets produced verbatim, comments and all, if you start with "float Q_rsqrt".

float Q_rsqrt( float number )
{
	long i;
	float x2, y;
	const float threehalfs = 1.5F;

	x2 = number * 0.5F;
	y  = number;
	i  = * ( long * ) &amp;y;                       // evil floating point bit level hacking
	i  = 0x5f3759df - ( i &gt;&gt; 1 );               // what the fuck? 
	y  = * ( float * ) &amp;i;
	y  = y * ( threehalfs - ( x2 * y * y ) );   // 1st iteration
//	y  = y * ( threehalfs - ( x2 * y * y ) );   // 2nd iteration, this can be removed

	return y;
}

I'm interested in how practical it will be for a motivated attacker to poison a code generation models with vulnerable code. Also curious to what extent these models produce code that only works with outdated and vulnerable dependencies -- a problem you'll also run into if you naively copy old StackOverflow posts. I've recently been working on threat models in natural language generation, but it seems like threat models in code generation are also going to be interesting.

Edit: Not John Carmack!

ClearlyCylindrical t1_iv0d7hp wrote on November 4, 2022 at 10:36 AM

the q_rsqrt being produced verbatim is probably due to identical code existing in many areas of the training data.

dojoteef t1_iv0hfoe wrote on November 4, 2022 at 11:25 AM

Slightly off-topic: I'm a huge John Carmack fan, but he isn't the author of that code. It's just part of engine code that his company released for the game Quake 3 Arena. For details, check out:

https://www.beyond3d.com/content/articles/8/

TiredOldCrow t1_iv0s1bg wrote on November 4, 2022 at 1:02 PM

Great read, thanks for that. Updated the comment.