Submitted by WobblySilicon t3_zz0tua in MachineLearning
Mefaso t1_j29980m wrote
>i found that text to video problem is being actively researched and may not require as much compute as bare language models
There are always opportunities for research with little compute, usually this means your research has to avoid training new models, or at least avoid training from scratch.
However, text to video models are typically very compute extensive
Complete-Maximum-633 t1_j2ab7zy wrote
Anything with “video” is going to be costly.
WobblySilicon OP t1_j2d1n2x wrote
question is how much cost? can it be done with one GPU or do i need a swarm of those?
Complete-Maximum-633 t1_j2drquz wrote
Impossible to answer without more context.
WobblySilicon OP t1_j2ffcey wrote
Sure! Sir!
In the months to come i would be working on the problem of text to video. After literature review i got the idea that it might be compute extensive, like a cluster of GPUs required to train the models. So I asked that if it could be done with a mediocre GPU such as a 3080. I haven't really thought about the models i would use or general architecture of the model. Just wanted an answer, because i dont wish to take up this topic then get stuck due to compute issues.
[deleted] t1_j2a02a8 wrote
[deleted]
WobblySilicon OP t1_j2a0oop wrote
I do have access to an A6000 for a few days. Other resources (less memory) are available by the university as well. By compute expensive I mean whole clusters of gpus...
I have difficulty in trying to wrap my head around text to video problem (particularly the newer models with many smaller components). Are their any suggestions/resources to get acquainted with this new task..? I have read recent research papers but it seems hard to find an area where improvement could be made by technical customization of base models. Do you have any tips on this?
Finally, If I cant work on text to video then my other option would be deep fake detection. Can you comment on merits or demerits of choosing this topic for my study? Both topics are very new for me. I have exposure to intermediate vision based problems and feel confident enough to try these out. Right now it just feels that I am out of ideas for any tinkering with the base models.
Viewing a single comment thread. View all comments