Open Source · Hacker News ·
Show HN: Tiny-vLLM – high performance LLM inference engine in C++ and CUDA
Tiny-vLLM is an open-source high-performance LLM inference engine implemented in C++ and CUDA. It is designed to optimize inference efficiency for large language models.