Skip to content

Latest commit

 

History

History

sentence-generation

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 

Sentence generation

This subproject contains several utilities for sentence generation.

How it works

Transformer models basically work with batched sequences. Since BLOOM-like models use relative position embedding, it is possible to use fixed sequence generation buffer and queue-like FIFO context update. That is, the past key-value pairs and attention masks will be shifted right-to-left to generate new tokens with fixed length. We first initialize the context tensors with fixed max_length, and then overwrite with newly requested input prompts. Almost of the generation time is from recursive (auto-regressive) token generation part. Our new framework just predicts the next tokens and then push to the right of the contexts. This makes the required tensor space be assigned at first and then the additional memory consumption will not be happened.

Experimental project

This subproject is for experiments and the codes can be applied to our serving service, as mentioned above.