Announcing LLM Hallucination Detection API

Steve Norman

Steve Norman

· 3 min read
Punya API evaluates the AI's hallucination score

Dear LLM Developers,

LLM Hallucination is a significant issue that can jeopardize businesses and diminish users' trust in AI adoption. Punya is thrilled to announce a solution: the LLM Hallucination Detection API. This groundbreaking technology helps you detect hallucination issues, allowing you to properly safeguard your application's quality.

The Problem

Hallucination in AI responses can lead to misinformation, irrelevance, and potential harm. While existing solutions like prompt engineering and temperature settings have been employed to mitigate this issue, these methods are non-deterministic. Currently, there's no way to obtain a quantifiable measure of potential inaccuracies or irrelevance in response to a user's question or prompt.

Introducing the Solution:

Punya.AI introduces a ready-to-use LLM Hallucination Detection API that works seamlessly across various LLM service providers, including OpenAI, Anthropic, Azure, and HuffingFaces models.

What the API can detect:

  • Overall Assessment of LLM Correctness: The API categorizes LLM responses into three groups: “Correct”, “Partially Correct”, and “Incorrect”. Use the “Correct” signal to maintain the quality of your AI and the “Incorrect” signal to identify instances when your AI may provide inaccurate answers.
  • Detailed Scoring: We provide a range of LLM correctness scores, ranging from 0 to 10. This allows you to differentiate levels of accuracy, from the most inaccurate LLM response to the slightly inaccurate ones, enabling you to prioritize accordingly.
  • Providing Explanations: Our API also offers explanations for each score, making it easier for human evaluators to comprehend the detected inaccuracy and associated risks.

Example Results

User MessageBot ResponseAPI Detection ResultScoreExplanation
What type of mammal lays the biggest eggs?Ostrich lays the biggest eggs.incorrect0.8The bot's response is mostly correct. Ostriches do lay the largest eggs among all bird species, but not among mammals. The bot could have provided more accurate information by mentioning that the platypus and echidna are monotremes, a unique group of mammals that lay eggs.
Who is the sole survivor of the Titanic?The sole survivor of the Titanic is Jack Dawson.incorrect0.5The output is incorrect. Jack Dawson is a fictional character from the movie 'Titanic' directed by James Cameron. The last living survivor of the Titanic was Millvina Dean. The output is incorrect. Jack Dawson is a fictional character from the movie 'Titanic' and not a real-life survivor. The last survivor of the Titanic was Millvina Dean.
Name three founders of FinxterThe three founders of Finxter are Christian Mayer, Christopher Wagner, and Martin Dörre.incorrect0.3The output is incorrect. The founders of Finxter are not Christian Mayer, Christopher Wagner, and Martin Dörre. The actual founders are not publicly disclosed. The output is incorrect. The founders of Finxter are not Christian Mayer, Christopher Wagner, and Martin Dörre. The actual founders are not known.

Live demo

See a live demo of how our Hallucination Detection API can verify any prompt you provide.

Getting Started - Integrate with your apps

Call Hallucination Detection API programmatically from your machines. Getting started is free, no credit card required.

  1. Sign up for a Punya account.
  2. Create an LLM App Analytics and obtain your API key.
  3. Call the REST API and receive the results
    1. Replace <YOUR_API_KEY> with the API key you received from Step 2 above.
    2. Replace the user's message and bot response with the ones you'd like to evaluate for hallucination.

Correctness Testing API Call:

curl --location 'https://api.punya.ai/v1/correctness/test'\
     --header 'Content-Type: application/json'\
     --header 'Authorization: Bearer <YOUR_API_KEY>'\
     --data '{
        "user_message": "What type of mammal lays the biggest eggs?",
        "bot_response": "Ostrich lays the biggest eggs.",
        "testing": true,
     }'

Output:

{
   "result": ”incorrect”,
   "score":  0.8,
   "reasoning": "The bot's response is mostly correct. 
    Ostriches do lay the largest eggs among all bird 
    species, but not among mammals. The bot could have 
    provided more accurate information by mentioning 
    that the platypus and echidna are monotremes, a 
    unique group of mammals that lay eggs."
}

Join Our Community

Stay updated with our latest developments, send feature requests, and provide feedback by joining our Discord community.

Need Support?

Your success is our mission. If you have any questions or need assistance, don't hesitate to contact us at steve@punya.ai.

Thank you for being a part of our journey. We're excited to see the innovative applications you'll build using our LLM Hallucination Detection API.


Steve Norman

About Steve Norman

Revolutionize your AI applications with Punya: an AI-powered chatbot and analytics platform. For business inquiries, please email admin@punya.ai.

Copyright © 2024 Punya AI. All rights reserved.