https://ai.googleblog.com/2020/01/reformer-efficient-transformer.htmlGoogle has introduced Reformer, a Transformer model designed to handle context windows of up to 1 million words, all on a single accelerator and using only 16GB of memory.
It combines two crucial techniques to solve the problems of attention and memory allocation that limit Transformer’s application to long context windows. Reformer uses locality-sensitive-hashing (LSH) to reduce the complexity of attending over long sequences and reversible residual layers to more efficiently use the memory available.