OctoAI wants to makes private AI model deployments easier with OctoStack

OctoAI (formerly OctoML), today announced the launch of OctoStack, its new end-to-end solution for deploying generative AI models in an enterprise’s private cloud, whether on-premises or in a cloud virtual private from a leading provider including AWS, Google, Microsoft and Azure, as well as Coreweave, Lambda Labs, Snowflake and others.

In its early days, OctoAI focused almost exclusively on optimizing models to perform more efficiently. Based on the Apache TVM machine learning compiler framework, the company later launched its TVM-as-a-Service platform and, over time, expanded it into a separate model service offering entire company combining its optimization tools with a DevOps platform. With the rise of generative AI, the team then launched the fully managed OctoAI platform to help its users manage and refine existing models. OctoStack, at its core, is this OctoAI platform, but for private deployments.

Image credits: OctoAI

Today, Luis Ceze, CEO and co-founder of OctoAI, told me that the company has more than 25,000 developers on the platform and hundreds of paying customers in production. Many of these companies, Ceze said, are GenAI-native companies. The market for traditional businesses looking to adopt generative AI is significantly larger, however, so it’s perhaps no surprise that OctoAI is now going after them too with OctoStack.

“One thing that has become clear is that as the enterprise market moves from last year’s experimentation to deployments, first of all, they are looking around because they are nervous about the idea to send data through an API,” Ceze said. “Second: Many of them have also committed their own compute, so why am I going to buy an API when I already have my own compute? And third, no matter what certifications you get and how big your name is, they feel like their AI is valuable like their data and they don’t want to send it. So there is a very clear need within the business to control deployment.

Ceze noted that the team has been building the architecture to offer both its SaaS and hosted platform for some time. And while the SaaS platform is optimized for Nvidia hardware, OctoStack can support a much wider range of hardware, including AMD GPUs and AWS’s Inferentia accelerator, making the optimization challenge quite difficult (while taking advantage of OctoAI’s strengths).

Deploying OctoStack should be simple for most businesses, as OctoAI provides the platform with ready-to-use containers and their associated Helm charts for deployments. For developers, the API remains the same whether they are targeting the SaaS or OctoAI product in their private cloud.

The canonical enterprise use case continues to use text summarization and RAG to allow users to chat with their internal documents, but some enterprises also refine these models on their internal codebases to run their own reporting models. code generation (similar to what GitHub now offers). to Copilot Enterprise users).

For many companies, being able to do this in a secure environment and strictly under their control is what now allows them to put these technologies into production for their employees and customers.

“For our performance- and security-sensitive use case, it is imperative that the models that process call data run in an environment that provides flexibility, scalability and security,” said Joshua Kennedy-White, CRO at Apate AI. “OctoStack allows us to easily and efficiently run the custom models we need, in the environments we choose, and deliver the scale our customers need.


Back to top button