AstraGuard inspects prompts and agent actions for injection attempts, jailbreaks, and tool abuse — returning a risk score and an
allow / review / block
verdict your app enforces in the request path.
Detector findings are fused via weighted noisy-OR into a single risk score. Each finding tells you which family fired and why.
58 categorized patterns across 11 attack families — instruction overrides, jailbreak personas, prompt leaks, delimiter smuggling, encoded payloads, social engineering, context hijacking, goal manipulation, agent redirection, and instruction conflicts — backed by an ML classifier that catches paraphrased attacks regex misses.
Detects unauthorized tool calls and loop patterns in autonomous agents. Configurable allowlist, sliding-window loop detection on tool call sequences.
Scans retrieved content — tool outputs, RAG documents, fetched URLs — for embedded instructions targeting the agent. HTML-comment smuggling, exfiltration commands, user-flipping attacks. The class chat-only scanners miss.
Not just a moderation pass — a structured security gate in your request path.
Add a security gate in front of your LLM-powered features. Get structured findings you can route into your existing SIEM, ticketing, or IR workflow. Download per-scan reports for audit trails.
Drop a single API call into your LangChain, LlamaIndex, or custom agent pipeline. Block obvious attacks before they reach the model; flag borderline cases for human review.
No SDK required. Drop the snippet into any language that speaks HTTP. Examples below.
Also available: POST /v1/agents/events · GET /v1/risk/{id} · GET /v1/scan/report
— full OpenAPI explorer.