Submitted by evanthebouncy t3_zxef0f in MachineLearning
Foundational models can generate realistic images from prompts, but do these models understand their own drawings? Generating SVG (Scalable Vector Graphics) gives us a unique opportunity to ask this question. SVG is programmatic, consisting of circles, rectangles, and lines. Therefore, the model must schematically decompose the target object into meaningful parts, approximating each part using simple shapes, then arrange the parts together in a meaningful way.
Check out the blog (5min read) for the full report https://medium.com/p/74ec9ca106b4
tl;dr:
GPT can symbolically decompose an object into parts, is okay at approximating the parts using SVG, is bad at putting the parts together, and is Egyptian.
be happy to take some comments and QA here :D
--evan
Shir_man t1_j21rb06 wrote
I did this too some time ago. Cool experiment and I enjoyed your results