AI Hallucination Agent — unified open-source demo combining strategies, verification modules, and severity triage.

MIT Multi-provider LLM Risk triage
Live demo
Architecture
Modules & risk
Coverage
Open source

Sent over HTTPS to this app’s /api/verify only — not stored on the server. Keys are not saved unless you click Save for future (stored in this browser’s localStorage). For CI/workflows use headers X-LLM-API-Key + X-LLM-Provider.

Multi-source RAG
Cross-check against multiple knowledge bases
Chain-of-thought verify
Step-by-step logical consistency
Adversarial probe
Counter-evidence stress test
Confidence calibration
Uncertainty per sub-claim
Multi-source RAG
Wikipedia, web, arXiv
Citation verifierv2
CrossRef, courts, PubMed
Code safetyv2
AST, URLs, packages
Medical guardv2
Evidence & disclaimers
Policy groundingv2
Vs. source documents
Calibration
Per-claim uncertainty
Medical
Diagnoses, dosing
Legal
Cases, citations
Technical
Code, packages
Policy
T&Cs, internal docs
Historical
Events, facts
Predictive
Forecasts
Choose a mode and run verification
How this demo runs: /api/verify calls your chosen LLM using server-side keys only (never exposed to the browser). Priority for Auto: Anthropic → Gemini → OpenAI → Groq. Configure at least one key (Cloudflare Secrets or .dev.vars). file:// cannot load this page’s API — use npm run dev or your deployed URL.
How the agent prevents hallucinations
A multi-layer pipeline intercepts AI outputs, decomposes them into atomic claims, verifies each claim via retrieval and optional domain modules (v2), applies a verifier LLM with calibration and severity labels (v3), and returns a calibrated confidence score with citations or corrections. Beyond verification: constrained decoding, uncertainty elicitation, and self-consistency sampling reduce bad outputs upstream.
This deployed demo: the Live demo tab runs the verifier LLM through /api/verify (Cloudflare Function) using whichever providers you configured — citation/CodeScanner-style modules here are described as product concepts until wired to real APIs in code.
Claim decomposer
Layer 1 — parse & split
Splits complex text into atomic factual claims
Detects verifiable vs. opinion statements
Extracts named entities, dates, quantities
Assigns verification priority per claim
RAG retriever
Layer 2 — evidence fetch
Queries Wikipedia, arXiv, web search in parallel
Embeds claims → semantic nearest-neighbor search
Supports custom private knowledge bases
Source reliability scoring (e.g. PageRank-style)
Verifier LLM
Layer 3 — judge
Structured chain-of-thought per claim
Entailment: evidence supports or contradicts
Confidence score with calibration
Backends: GPT-4, Claude, Gemini, Ollama, …
Report generator
Layer 4 — output
Verdict with grounded corrections
Inline citation links for claims
JSON, HTML, Markdown, webhooks
Audit trail + feedback for retraining
Prevention strategies beyond verification
Constrained decodingForce citations via logit bias or tool calls.
Uncertainty elicitationAsk the model to surface uncertainty before asserting facts.
Self-consistencySample multiple answers; vote on the majority.
v2 — four domain-specific modules on the base pipeline
Each module is independently loadable. After claim decomposition they can run in parallel and merge into one verdict with per-module evidence trails. In the Live demo, turning modules on shapes the LLM prompt (simulated cross-checks); production-grade hooks to CourtListener, CrossRef, etc. belong in application code—same architecture as below.
Citation verifier v2
Legal, academic, scientific
Resolves case names via CourtListener & Google Scholar APIs
Verifies DOIs and ISBNs via CrossRef and OpenAlex
Checks PubMed IDs and arXiv IDs for existence
Flags hallucinated docket numbers (e.g. fake WL citations)
Returns real case summary if found, correction if not
Code safety scanner v2
Static analysis + URL validation
AST parse: injection, XSS, unsafe eval patterns
HTTP HEAD validates URLs (404 / redirect)
Bandit & semgrep (Python); ESLint security (JS)
Dependency checks: package exists on PyPI/npm
CVE lookup for referenced library versions
Medical guard v2
Clinical evidence grounding
Maps diagnoses and drugs to ICD-11 and RxNorm
PubMed for supporting or contradicting RCTs
WHO & FDA cross-check for interactions
Flags high-confidence claims with thin evidence
Mandatory “consult a clinician” disclaimer on output
Policy grounding v2
Document & policy adherence
Ingests company policies as a grounding KB
Semantic similarity: claim vs. source paragraph
Detects promises not in policy (Air Canada scenario)
Sources: PDF, DOCX, Notion, Confluence, …
Returns conflicting policy section as evidence
Severity tiers (v3)
P0
Extreme risk — blocker
Life, liberty, legal integrity — block output, pager / human review, full audit trail.
Medical misdiagnosis Lethal dosage errors Fabricated precedents in filings Emergency misinformation
P1
High risk — escalate
Financial loss, liability, reputational damage — flag, correction, operator notify, audit.
Binding policy promises High-stakes demo errors Security backdoors in code Fake citations in reports
P2
Moderate risk — correct
Accuracy and credibility — inline correction, uncertainty badge, “verify this” prompt.
Non-existent citations False biographies Broken links / 404s Arithmetic errors
Hallucination types (fabricated sources, fictional history, policy invention, etc.) appear in the Coverage tab with example mappings.

Incident categories & module mapping below extend to the regulated sectors in the next card. Coverage = intent to detect before users rely on output. Percentages are engineering targets for a full retrieval-backed stack, not guarantees or certifications. Severity defaults (P0–P2) reflect typical harm if a false claim slips through.

Critical sectors requiring hallucination testing

These sectors share near‑zero tolerance for silent fabrication: liability, licensing, fines, or physical harm dominate outcomes. The Live demo on this site runs verifier JSON through /api/verify only — it does not by itself satisfy EU AI Act, FDA, banking, or court evidentiary rules; production systems still need grounded retrieval, logging, human‑in‑the‑loop sign‑off, and your compliance review.

Legal & judiciary
Fake cases, dockets, or reasoning in filings → sanctions / disbarment risk. Architecture maps to: Citation verifier + audit trail in report generator · typical tier P0
Healthcare & life sciences
Misdiagnosis, dosing, contraindications in AI outputs → patient harm; regulated-as-device paths expect reliability & human review. Maps to: Medical guard (text) + severity P0 · multimodal clinical imaging remains a model-layer gap (see partial row below).
Financial services
Bad numbers, “insights,” transfers, or filings grounded in fiction → liability & regulatory breach. Maps to: Policy grounding on approved disclosures + citation/numeric cross-checks + calibration · often P1
Manufacturing & critical infrastructure
Subtly wrong repair, safety, or control guidance → downtime or injury. Maps to: Code / link scanner + procedural RAG against SOPs & OEM docs · treat as high‑stakes technical · P0P1 depending on consequence.
Government & law enforcement
Essential services, immigration, policing — often classified as high‑risk under frameworks like the EU AI Act: documentation, logging, oversight. Maps to: audit-oriented report output + policy/legal grounding; deployment must meet jurisdictional rules beyond this repo.
Customer service (regulated)
Binding promises (refunds, coverage) invented by bots → direct liability. Maps to: Policy grounding (Air Canada–style checks) · P1
Why these sectors are especially “in need”
  • Liability: Unlike casual chat, a single wrong factual or policy assertion can mean fines in the millions, loss of license, or irreversible harm — there is little room for “mostly right.”
  • Audit trails: Regulators and plaintiffs increasingly ask how a conclusion was reached; hallucination testing & structured verdict JSON support documentation for compliance and litigation defence (when wired into your logging stack).
  • Trust gap: After operators see confident lies from AI, adoption of automation stalls; repeatable verification is needed to recover confidence and productivity.
Headline incidents → module mapping
Legal fabrications (Mata v. Avianca)
Citation verifier → CourtListener + docket / Westlaw-style checks
P0 Covered
95%
Customer service misinformation (Air Canada)
Policy grounding → claims not in source documents
P1 Covered
92%
Academic citation fabrication (fake DOIs)
Citation verifier → CrossRef DOI + PubMed + OpenAlex
P1 Covered
97%
Historical / factual inaccuracies (Google Bard / Webb)
Multi-source RAG → Wikipedia, news archives, NASA
P1 Covered
88%
Healthcare misidentification (skin lesion)
Medical guard on text; vision errors occur inside the model — not fixable by post-hoc text alone
P0 Partial
55%
Code errors & broken links (fraud system bugs)
Code scanner → AST vulns, 404 URLs, fake packages
P2 Covered
82%
Healthcare image inference (the multimodal gap) needs model-level calibration — ensemble uncertainty, Platt scaling, and mandatory human-in-the-loop review. Treat that as a roadmap item for multimodal plugins, not something a text-only guard fully solves.
Common hallucination types → module
Source fabrication
Fake news, quotes, papers, legal cites
Citation verifier
Logical / arithmetic
Wrong math or contradictions
Chain-of-thought
Fictional history
“Moonlight Treaty of 1854,” invented events
Multi-source RAG
Policy invention
Refunds or terms not in your docs
Policy grounding
Medical misinformation
Unsupported dosing or diagnosis in text
Medical guard
Code & link errors
404s, fake npm/PyPI names, risky code
Code scanner
“Know-it-all” tone
Authoritative phrasing, weak evidence
Calibration
Incorrect predictions
Markets, weather, outcomes stated as certain
Calibration
Image misclassification
CV sees objects that are not there
Model layer · roadmap
Honest ceiling. This agent is a post-hoc verification layer: it runs after the LLM produces text and before the user sees it. Unknown-unknown fabrications in domains with no reachable ground truth may still land as “uncertain.” Training-time gaps and multimodal failures need prevention inside the model (RLHF, constitutional AI, RAG in the loop, calibrated vision heads).
Deployment options
This repo ships production UI on Cloudflare Pages: public/index.html + functions/api/*.js + wrangler.toml — see README (npm run dev / npm run deploy). The cards below are additional ways you might package the same ideas (PyPI, Docker, etc.).

Click a card to copy a starter prompt for an AI assistant or internal docs — not a guarantee that a package name exists on a registry. Reserve names when you publish.

PyPI package roadmap
Future pip install … — name TBD when published
Docker + REST API
FastAPI server, OpenAI-compatible verify route
Hugging Face Spaces
Free hosted Gradio demo for discovery
Vercel / Railway
Serverless-style deploy + API keys in env
npm package
npm install hallucination-guard — TS-first SDK
GitHub Action
CI/CD gate on LLM outputs in PRs
Python / YAML below — honest labeling: These blocks describe a target Python API and CI shape for contributors — they are not importable from this repository today (only functions/*.js + public/ ship). Publishing PyPI/npm/GitHub Actions under those names requires your release process.
from hallucination_guard import GuardAgent from hallucination_guard.modules import ( CitationVerifier, CodeScanner, MedicalGuard, PolicyGrounding ) agent = GuardAgent( verifier="claude-sonnet-4", modules=[ CitationVerifier(sources=["courtlistener", "crossref", "pubmed"]), CodeScanner(run_url_check=True, check_packages=True), MedicalGuard(require_rct_evidence=True, add_disclaimer=True), PolicyGrounding(docs=["./policies/refund_policy.pdf"]), ], strategy="parallel", # all modules run concurrently ) result = agent.verify(text=llm_output) print(result.verdict) # trusted | warning | hallucination print(result.module_reports) # per-module evidence breakdown print(result.corrections) # grounded corrections with citations print(result.safe_rewrite) # corrected version of the text
from hallucination_guard.middleware import HallucinationGuardMiddleware app.add_middleware( HallucinationGuardMiddleware, agent=agent, block_on=["hallucination"], # auto-block hallucinations warn_on=["warning"], # pass warnings with header safe_rewrite=True, # replace with corrected text )
# Replace `your-org/.../action` when you publish — placeholder name only: - uses: your-org/hallucination-guard-action@v1 with: text-file: outputs/llm_response.txt modules: citation,code,policy policy-docs: docs/policies/ fail-on: hallucination # One or more, matching your guard backend: anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }} # gemini-api-key / openai-api-key / groq-api-key if applicable
from hallucination_guard import GuardAgent agent = GuardAgent( verifier="claude-sonnet-4", # or "gpt-4o", "ollama/mistral" strategy="multi_source", sources=["wikipedia", "web_search", "arxiv"], ) result = agent.verify( text="The Great Wall is visible from the moon." ) print(result.verdict) # → "hallucination" print(result.confidence) # → 0.97 print(result.corrections[0]) # → "NASA confirms it is not visible..."
Repo layout — today vs roadmap
ai-hallucination-guard/ # this repository ├── functions/ # Cloudflare Pages Functions │ ├── api/ │ │ ├── config.js # GET which LLM keys are set │ │ └── verify.js # POST proxy to LLMs │ └── lib/ │ └── providers.js # Anthropic / Gemini / OpenAI / Groq ├── public/ │ ├── index.html # Live demo + all tabs │ └── _headers ├── wrangler.toml ├── package.json ├── README.md ├── LICENSE │ ├── # Future optional Python package (not generated yet): ├── hallucination_guard/ # planned core package ├── api/ # FastAPI server └── pyproject.toml
4
Verification strategies
MIT
License — commercial OK
6+
LLM backends / modules
Custom knowledge bases
Ways to contribute
New strategy plugins
Domain verifiers: medical, legal, code
LLM backend adapters
Ollama, Gemini, Mistral, Cohere, local
Benchmark datasets
Labeled hallucination examples
Framework integrations
LangChain, LlamaIndex, Haystack, CrewAI
Open-source growth checklist
README with badges, demo GIF, quick start
CONTRIBUTING.md — style, PR process, issue templates
GitHub Discussions for Q&A and roadmap
PyPI + npm releases, semver, changelogs
Hugging Face Spaces live demo
Launch posts (HN, Reddit r/LocalLLaMA, etc.)
Docker image on GHCR for self-hosting