Viewing a single comment thread. View all comments

skelly0311 t1_is8d9xa wrote

ELECTRA, which is a transformer variant of BERT uses a GAN in the pre training phase in order to get rid of the mask tokens discrepancy from transforms such as BERT and RoBERTa.

https://arxiv.org/abs/2003.10555

1