SaltyStackSmasher OP t1_jal0r2l wrote on March 2, 2023 at 4:25 AM

Reply to comment by CMUOresama in [D] backprop through beam sampling ? by SaltyStackSmasher

thanks a lot for this. will definitely take a look

SaltyStackSmasher OP t1_jagisv7 wrote on March 1, 2023 at 7:06 AM

Reply to comment by cnapun in [D] backprop through beam sampling ? by SaltyStackSmasher

thanks for the response. my main concern with beam sampling and backprop is the fact that context for the 2nd token will include 1st token. I believe in the RNN case, this wouldn't necessarily matter since only the hidden state is being propagated forward. In transformers, we have to completely redo the forward pass for 2nd token onwards and these subsequent forward passes don't have anything in common, so I'm a bit confused about how the gradients will flow exactly.

please let me know if I wasn't clear in explaining my problem. thanks again for your response :)