We introduce EELBERT, an strategy for compression of transformer-based fashions (for instance, BERT), with minimal affect on the accuracy of downstream duties. That is achieved by changing the enter embedding layer of the mannequin with dynamic, for instance, on-the-fly, embedding computations. For the reason that enter embedding layer accounts for a big fraction of the mannequin dimension, particularly for the smaller BERT variants, changing this layer with an embedding computation perform helps us cut back the mannequin dimension considerably. Empirical analysis on the GLUE benchmark reveals that our BERT variants (EELBERT) endure minimal regression in comparison with the normal BERT fashions. By way of this strategy, we’re capable of develop our smallest mannequin, UNO-EELBERT, which achieves a GLUE rating inside 4% of absolutely skilled BERT-tiny whereas being 15x smaller (1.2 MB) in dimension.