Sent over HTTPS to this app’s /api/verify only — not stored on the server. Keys are not saved unless you click Save for future (stored in this browser’s localStorage). For CI/workflows use headers X-LLM-API-Key + X-LLM-Provider.
Multi-source RAG
Cross-check against multiple knowledge bases
Chain-of-thought verify
Step-by-step logical consistency
Adversarial probe
Counter-evidence stress test
Confidence calibration
Uncertainty per sub-claim
Multi-source RAG
Wikipedia, web, arXiv
Citation verifierv2
CrossRef, courts, PubMed
Code safetyv2
AST, URLs, packages
Medical guardv2
Evidence & disclaimers
Policy groundingv2
Vs. source documents
Calibration
Per-claim uncertainty
Medical
Diagnoses, dosing
Legal
Cases, citations
Technical
Code, packages
Policy
T&Cs, internal docs
Historical
Events, facts
Predictive
Forecasts
Choose a mode and run verification
How this demo runs:/api/verify calls your chosen LLM using server-side keys only (never exposed to the browser).
Priority for Auto: Anthropic → Gemini → OpenAI → Groq. Configure at least one key (Cloudflare Secrets or .dev.vars).
file:// cannot load this page’s API — use npm run dev or your deployed URL.
How the agent prevents hallucinations
A multi-layer pipeline intercepts AI outputs, decomposes them into atomic claims, verifies each claim via retrieval and optional domain modules (v2), applies a verifier LLM with calibration and severity labels (v3), and returns a calibrated confidence score with citations or corrections. Beyond verification: constrained decoding, uncertainty elicitation, and self-consistency sampling reduce bad outputs upstream.
This deployed demo: the Live demo tab runs the verifier LLM through /api/verify (Cloudflare Function) using whichever providers you configured — citation/CodeScanner-style modules here are described as product concepts until wired to real APIs in code.
Claim decomposer
Layer 1 — parse & split
Splits complex text into atomic factual claims
Detects verifiable vs. opinion statements
Extracts named entities, dates, quantities
Assigns verification priority per claim
RAG retriever
Layer 2 — evidence fetch
Queries Wikipedia, arXiv, web search in parallel
Embeds claims → semantic nearest-neighbor search
Supports custom private knowledge bases
Source reliability scoring (e.g. PageRank-style)
Verifier LLM
Layer 3 — judge
Structured chain-of-thought per claim
Entailment: evidence supports or contradicts
Confidence score with calibration
Backends: GPT-4, Claude, Gemini, Ollama, …
Report generator
Layer 4 — output
Verdict with grounded corrections
Inline citation links for claims
JSON, HTML, Markdown, webhooks
Audit trail + feedback for retraining
Prevention strategies beyond verification
Constrained decodingForce citations via logit bias or tool calls.
Uncertainty elicitationAsk the model to surface uncertainty before asserting facts.
Self-consistencySample multiple answers; vote on the majority.
v2 — four domain-specific modules on the base pipeline
Each module is independently loadable. After claim decomposition they can run in parallel and merge into one verdict with per-module evidence trails. In the Live demo, turning modules on shapes the LLM prompt (simulated cross-checks); production-grade hooks to CourtListener, CrossRef, etc. belong in application code—same architecture as below.
Citation verifier v2
Legal, academic, scientific
Resolves case names via CourtListener & Google Scholar APIs
Hallucination types (fabricated sources, fictional history, policy invention, etc.) appear in the Coverage tab with example mappings.
Incident categories & module mapping below extend to the regulated sectors in the next card. Coverage = intent to detect before users rely on output. Percentages are engineering targets for a full retrieval-backed stack, not guarantees or certifications. Severity defaults (P0–P2) reflect typical harm if a false claim slips through.
Critical sectors requiring hallucination testing
These sectors share near‑zero tolerance for silent fabrication: liability, licensing, fines, or physical harm dominate outcomes.
The Live demo on this site runs verifier JSON through /api/verify only — it does not by itself satisfy EU AI Act, FDA, banking, or court evidentiary rules; production systems still need grounded retrieval, logging, human‑in‑the‑loop sign‑off, and your compliance review.
Legal & judiciary
Fake cases, dockets, or reasoning in filings → sanctions / disbarment risk. Architecture maps to: Citation verifier + audit trail in report generator · typical tier P0
Healthcare & life sciences
Misdiagnosis, dosing, contraindications in AI outputs → patient harm; regulated-as-device paths expect reliability & human review. Maps to: Medical guard (text) + severity P0 · multimodal clinical imaging remains a model-layer gap (see partial row below).
Financial services
Bad numbers, “insights,” transfers, or filings grounded in fiction → liability & regulatory breach. Maps to: Policy grounding on approved disclosures + citation/numeric cross-checks + calibration · often P1
Manufacturing & critical infrastructure
Subtly wrong repair, safety, or control guidance → downtime or injury. Maps to: Code / link scanner + procedural RAG against SOPs & OEM docs · treat as high‑stakes technical · P0–P1 depending on consequence.
Government & law enforcement
Essential services, immigration, policing — often classified as high‑risk under frameworks like the EU AI Act: documentation, logging, oversight. Maps to: audit-oriented report output + policy/legal grounding; deployment must meet jurisdictional rules beyond this repo.
Customer service (regulated)
Binding promises (refunds, coverage) invented by bots → direct liability. Maps to: Policy grounding (Air Canada–style checks) · P1
Why these sectors are especially “in need”
Liability: Unlike casual chat, a single wrong factual or policy assertion can mean fines in the millions, loss of license, or irreversible harm — there is little room for “mostly right.”
Audit trails: Regulators and plaintiffs increasingly ask how a conclusion was reached; hallucination testing & structured verdict JSON support documentation for compliance and litigation defence (when wired into your logging stack).
Trust gap: After operators see confident lies from AI, adoption of automation stalls; repeatable verification is needed to recover confidence and productivity.
Medical guard on text; vision errors occur inside the model — not fixable by post-hoc text alone
P0Partial
55%
Code errors & broken links (fraud system bugs)
Code scanner → AST vulns, 404 URLs, fake packages
P2Covered
82%
Healthcare image inference (the multimodal gap) needs model-level calibration — ensemble uncertainty, Platt scaling, and mandatory human-in-the-loop review. Treat that as a roadmap item for multimodal plugins, not something a text-only guard fully solves.
Common hallucination types → module
Source fabrication
Fake news, quotes, papers, legal cites
Citation verifier
Logical / arithmetic
Wrong math or contradictions
Chain-of-thought
Fictional history
“Moonlight Treaty of 1854,” invented events
Multi-source RAG
Policy invention
Refunds or terms not in your docs
Policy grounding
Medical misinformation
Unsupported dosing or diagnosis in text
Medical guard
Code & link errors
404s, fake npm/PyPI names, risky code
Code scanner
“Know-it-all” tone
Authoritative phrasing, weak evidence
Calibration
Incorrect predictions
Markets, weather, outcomes stated as certain
Calibration
Image misclassification
CV sees objects that are not there
Model layer · roadmap
Honest ceiling. This agent is a post-hoc verification layer: it runs after the LLM produces text and before the user sees it. Unknown-unknown fabrications in domains with no reachable ground truth may still land as “uncertain.” Training-time gaps and multimodal failures need prevention inside the model (RLHF, constitutional AI, RAG in the loop, calibrated vision heads).
Deployment options
This repo ships production UI on Cloudflare Pages:public/index.html + functions/api/*.js + wrangler.toml — see README (npm run dev / npm run deploy). The cards below are additional ways you might package the same ideas (PyPI, Docker, etc.).
Click a card to copy a starter prompt for an AI assistant or internal docs — not a guarantee that a package name exists on a registry. Reserve names when you publish.
PyPI package roadmap
Future pip install … — name TBD when published
Docker + REST API
FastAPI server, OpenAI-compatible verify route
Hugging Face Spaces
Free hosted Gradio demo for discovery
Vercel / Railway
Serverless-style deploy + API keys in env
npm package
npm install hallucination-guard — TS-first SDK
GitHub Action
CI/CD gate on LLM outputs in PRs
Python / YAML below — honest labeling:
These blocks describe a target Python API and CI shape for contributors — they are not importable from this repository today (only functions/*.js + public/ ship). Publishing PyPI/npm/GitHub Actions under those names requires your release process.
from hallucination_guard import GuardAgent
from hallucination_guard.modules import (
CitationVerifier, CodeScanner, MedicalGuard, PolicyGrounding
)
agent = GuardAgent(
verifier="claude-sonnet-4",
modules=[
CitationVerifier(sources=["courtlistener", "crossref", "pubmed"]),
CodeScanner(run_url_check=True, check_packages=True),
MedicalGuard(require_rct_evidence=True, add_disclaimer=True),
PolicyGrounding(docs=["./policies/refund_policy.pdf"]),
],
strategy="parallel", # all modules run concurrently
)
result = agent.verify(text=llm_output)
print(result.verdict) # trusted | warning | hallucinationprint(result.module_reports) # per-module evidence breakdownprint(result.corrections) # grounded corrections with citationsprint(result.safe_rewrite) # corrected version of the text
from hallucination_guard.middleware import HallucinationGuardMiddleware
app.add_middleware(
HallucinationGuardMiddleware,
agent=agent,
block_on=["hallucination"], # auto-block hallucinations
warn_on=["warning"], # pass warnings with header
safe_rewrite=True, # replace with corrected text
)
# Replace `your-org/.../action` when you publish — placeholder name only:
- uses: your-org/hallucination-guard-action@v1
with:
text-file: outputs/llm_response.txt
modules: citation,code,policy
policy-docs: docs/policies/
fail-on: hallucination# One or more, matching your guard backend:
anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }}# gemini-api-key / openai-api-key / groq-api-key if applicable
from hallucination_guard import GuardAgent
agent = GuardAgent(
verifier="claude-sonnet-4", # or "gpt-4o", "ollama/mistral"
strategy="multi_source",
sources=["wikipedia", "web_search", "arxiv"],
)
result = agent.verify(
text="The Great Wall is visible from the moon."
)
print(result.verdict) # → "hallucination"print(result.confidence) # → 0.97print(result.corrections[0]) # → "NASA confirms it is not visible..."
Repo layout — today vs roadmap
ai-hallucination-guard/# this repository
├── functions/# Cloudflare Pages Functions
│ ├── api/
│ │ ├── config.js# GET which LLM keys are set
│ │ └── verify.js# POST proxy to LLMs
│ └── lib/
│ └── providers.js# Anthropic / Gemini / OpenAI / Groq
├── public/
│ ├── index.html# Live demo + all tabs
│ └── _headers
├── wrangler.toml
├── package.json
├── README.md
├── LICENSE
│
├── # Future optional Python package (not generated yet):
├── hallucination_guard/# planned core package
├── api/# FastAPI server
└── pyproject.toml