AI Startup Funding Alert

a16z and Lightspeed back vLLM team’s $150M AI inference startup


Inferact, an AI startup founded by a group of AI researchers and open-source software developers, has raised $150 million in a seed funding round, valuing the company at $800 million.

The round was led by Andreessen Horowitz and Lightspeed, with participation from Databricks’ venture arm, the UC Berkeley Chancellor’s Fund, and other backers.

Founded by the core team behind vLLM

Inferact is built by the core team behind vLLM – Simon MoWoosuk KwonKaichao You, and Roger Wang.

vLLM, which stands for virtual large language model, is an open-source library maintained by the vLLM community.

vLLM focuses on inference, the stage where trained AI models generate responses in real-world applications. As AI models become more reliable and capable, inference is emerging as the new bottleneck. Applications now require models to run longer, handle more tokens, and serve thousands of users simultaneously. That puts pressure on memory, hardware, and performance.

vLLM addresses this by optimising how models use memory and compute.

One of its core features, PagedAttention, reduces memory waste by storing key-value cache data more efficiently across system RAM. The software also uses techniques such as quantisation to reduce model size and allows models to generate multiple tokens at once, speeding up response times.

vLLM has more than 2,000 contributors and over 50 core developers, and it is used by large companies like Meta and Google.

Making inference cheaper and faster

“Our mission is to grow vLLM as the world’s AI inference engine and accelerate AI progress by making inference cheaper and faster,” says the company in a blog post.

Inferact has two main goals. First, it aims to support the vLLM project by providing financial and developer resources to help it grow, especially as new model architectures, hardware, and larger models emerge.

Second, Inferact plans to develop a next-generation commercial inference engine, focusing on refining the software layer called the “universal inference layer,” while collaborating with existing providers rather than competing with them.

In a blog post, co-founder Woosuk Kwon said the goal is to make AI serving simple, so teams no longer need large infrastructure groups to deploy models at scale.

The company is expected to offer a serverless version of vLLM and add features such as observability, troubleshooting, and disaster recovery, likely running on Kubernetes.

At the same time, Inferact says it will keep improving the open-source vLLM project. That includes adding support for new model architectures, more hardware platforms, and larger, multi-node deployments.

Follow Startup Story

Related Posts

© Startup Story Private Limited. All Rights Reserved.