Hallucination-free AI is easy, but it ain’t cheap!

Canonical URL

Do not index

Introduction

In our recent announcement (Safety RAG: Improving AI Safety by Extending AI’s Data Reach) on the Stardog main blog, Evren Sirin and I explained how Stardog Voicebox is the fast, safe, 100% hallucination-free AI data assistant. Apparently if you engineer an AI system that uses LLM but is free of hallucination, people want to know how you did it. Fair enough.

Our customers didn’t ask for this explanation since they’re too busy enjoying its benefits by getting new insights. Nope, it was researchers and other practitioners, largely on LinkedIn—which maybe says something but what?—who were agitated about our claims.

We intentionally set out to build a hallucination-free system and that’s what Voicebox is. We achieve the only enterprise-acceptable level of AI safety, in the same way any engineering and product org achieves anything at all. We intended it.

In particular, hallucination-free GenAI systems are a function of holistic system design, not of a particular LLM or prompt strategy or quantization method. You can’t bolt AI safety on at the end. The 1.0 release is too late.

But that all begs the question, what did hallucination-free GenAI cost us?

User Focus

The first cost to build hallucination-free is to understand who your user is and what they need from an AI system. If my user needs a virtual life coach, then a skosh of AI hallucination may well distinguish emotionally flat responses from those that resonate.

Stardog Voicebox users are knowledge workers, mostly but not exclusively in regulated industries, who attack high-stakes use cases that are data-bound. When we say hallucinations are bad for business, we mean they’re bad for Stardog’s ideal customer. We know this because we’ve listened to and have served these users from day one.

Intentional System Design

The second cost was to engage in intentional system design, not only of Voicebox itself in, say, the UX, but also in every aspect of the system, including—

the initial focus on knowledge graph question answering as the primary service in Voicebox; we focused the 1.0 release on end-user value delivery from nearly day one. We’re bringing other services to Voicebox including—

data modeling
data mapping for federated graph streams
business rules

the avoidance of RAG since RAG trusts LLMs to tell people facts about their org, and that’s not safe!

the choice of LLMs, that is, we sidestepped the OpenAI trap, not because OpenAI isn’t amazing. It’s amazing! But we knew our SOC2 compliance hurdles in regulated industries meant it was going to be very hard to send our customer’s data to some other org for processing god-knows-where.

That’s not to say that Voicebox isn’t interoperable with OpenAI since it is. But there are limitations to using an LLM service as opposed to using an LLM “locally”, that is, where we can inspect and manipulate the logits, residual stream, fine-tune for customer data, serve thousands of LoRA adapters in a multi-tenant architecture, etc.

the design of our underlying platform and Voicebox-specific APIs—a public preview of which is coming soon, so you can build amazing Voicebox-like apps using our platform—to make sure that hallucinations didn’t creep in by some back door.

the choice to go further than anyone else in the knowledge graph space has gone by building Stardog Karaoke, a turnkey on-premise appliance to bring Stardog Voicebox to the hybrid data cloud of our regulated customers.

finally, the implementation, which is nearly complete, of a private GPU compute facility physically adjacent to AWS us-east-1 to deliver world-class UX to Voicebox users by processing their LLM traffic with our GPU facility—we will be talking publicly about this soon, too.

Pride

We went through a bunch of hard moments when we realized that our expertise—knowledge graph, neuro-symbolic AI, graph database systems—was adjacent to LLM but also pretty different; this meant hiring some new engineers—looking at you, Tapan & Himanshu!—onto the team and trusting them to build critical systems. ✅

We also had to acknowledge that knowledge graph aims over the years have been validated by the emergence of LLM but pre-GNN techniques have been completely repudiated. That stung a bit but no matter; if you’re focused on user value, algorithms can come and go.

Time to Market

Every time we saw another startup announce their 1.0 launch, invariably based on RAG, we had a twinge of…something. Let’s call it envy. We wanted to go fast and ship now. But taking the longer road to deliver a fast, accurate AI data assistant that’s 100% hallucination-free took time, not least since we weren’t in a position to fork some Py RAG lib on Github and go.

In the end it took about 13 months to launch Voicebox 1.0 and while that’s a long time, it’s a testament to Stardog Eng that it was only 13 months.

Was the Juice Worth the Squeeze?

Absolutely. The first time I heard a prospect say “what, this means we can trust what GenAI is telling us as we work on regulatory matters”, I know the cost we’d paid to build Stardog Voicebox was worth it.

Hallucination-free AI is easy, but it ain’t cheap!