Adaptive Draft-Verification for Efficient Large Language Model Decoding

Published: June 27, 2024

We introduce an LLM decoding acceleration method that requires no fine-tuning. Our approach involves an adaptive draft-verification process that evolves over time to improve efficiency. We utilize a tri-gram matrixbased LLM representation to dynamically approximate the output distribution of the LLM, allowing the model to adjust to changing token probabilities during the decoding process. ADED

Share on

Twitter Facebook LinkedIn

Xukun Liu

Adaptive Draft-Verification for Efficient Large Language Model Decoding

Share on

Leave a Comment