Product

Introducing OpenClaw Guardrails - Runtime Safety for AI Agents

·8 min read

AI agents are shipping fast. MCP skills, tool-use chains, and autonomous workflows are moving from demos to production. But the more autonomy you hand to an agent, the more damage a bad action can cause — a destructive shell command, an unauthorized network call, or a file-system write to the wrong path.

Today we're launching OpenClaw Guardrails, a new set of Vettly API endpoints that sit in the runtime path of your AI agents. Every action the agent wants to take passes through Vettly first. If the action violates your policy, it gets blocked before it executes — not after.

Why AI Agent Guardrails Matter

Content moderation for user-generated text is a solved problem. But AI agents don't just generate text — they take actions. They run shell commands, read and write files, make HTTP requests, and access environment variables. A single unvetted MCP skill can escalate from "helpful assistant" to "production outage" in one function call.

Existing guardrail approaches fall into two buckets: prompt-level filters that only see text, and post-hoc logging that records what happened after the damage is done. Neither is sufficient. You need in-path authorization — a gate that evaluates every action against your policy in real time and returns allow, flag, or block before the action runs.

How It Works

OpenClaw Guardrails has three core capabilities:

1

Skill Vetting

Before an agent installs an MCP skill, send its permission manifest to Vettly. The API checks the requested permissions against your policy and returns allow or block. Dangerous permission combinations (e.g., file-system write + network access) can be auto-blocked.

2

Action Authorization

At runtime, every action the agent wants to take — shell commands, file operations, network calls, environment variable access — is sent to the authorization endpoint. The API evaluates the action against blocked patterns, permission scopes, and risk thresholds, then returns a decision.

3

Metrics & Policy Management

Track how many actions are allowed, flagged, and blocked over time. See which action patterns trigger the most blocks. Roll back to a previous policy version if a new policy is too restrictive or too permissive.

Fail-Closed by Default

Most runtime authorization systems fail open — if the guardrail service is down or slow, the action goes through anyway. This defeats the purpose. OpenClaw Guardrails is fail-closed: if Vettly can't be reached or returns an error, the action is blocked. Your agents stop rather than run unsupervised.

This is a deliberate trade-off. A brief pause is always better than an irreversible action. For latency-sensitive workflows, you can configure per-action timeout thresholds and fallback behaviors in your policy — but the default is always deny.

Step 1: Vet Skills Before Install

When your agent framework discovers a new MCP skill, pass its permission manifest through the skill-vetting endpoint before installing it. This catches overly broad permissions early.

vet-skill.tsNode.js
import { Vettly } from '@vettly/sdk';
const client = new Vettly('vettly_live_...');
// Vet an MCP skill before the agent installs it
const result = await client.openclaw.guardrails.vetSkill({
skill: {
name: 'file-manager',
permissions: ['fs.read', 'fs.write', 'fs.delete'],
source: 'https://registry.example.com/file-manager',
},
policy: 'production',
});
if (result.action === 'block') {
console.log('Skill rejected:', result.reasons);
// Do not install - skill exceeds policy permissions
}

Step 2: Authorize Actions at Runtime

Wrap every agent action in an authorization call. The response includes a decision (allow, flag, or block), reasons, and a decisionId for your audit trail.

authorize-action.tsNode.js
// In-path authorization before the agent executes an action
const auth = await client.openclaw.guardrails.authorizeAction({
agentId: 'agent-abc-123',
action: {
type: 'shell',
command: 'rm -rf /tmp/workspace/build',
},
context: {
sessionId: 'sess-xyz',
user: 'deploy-bot',
},
policy: 'production',
});
switch (auth.decision) {
case 'allow':
await executeCommand(action);
break;
case 'flag':
await notifyHuman(auth);
break;
case 'block':
throw new Error(`Blocked: ${auth.reasons.join(', ')}`);
}

Step 3: Monitor and Tune

Use the metrics endpoint to understand how your agents interact with your policy. If you see a spike in blocked actions, you may need to update the agent's prompt — or tighten the policy further. Policy versions are tracked, so you can compare metrics across revisions and roll back if needed.

metrics.tsNode.js
// Pull guardrail metrics for the last 30 days
const metrics = await client.openclaw.guardrails.getMetrics({
days: 30,
});
console.log(metrics);
// {
// totalDecisions: 14832,
// allowed: 13210,
// flagged: 1204,
// blocked: 418,
// policyVersion: 'v3',
// topBlockedActions: [
// { type: 'shell', pattern: 'rm -rf *', count: 87 },
// { type: 'network', pattern: '*.internal.corp', count: 64 },
// ],
// }

What You Can Control

OpenClaw Guardrails policies let you define rules for four action categories:

  • shellBlock destructive commands (rm -rf, DROP TABLE), restrict commands to allowlisted patterns, require approval for elevated access.
  • fileRestrict read/write to specific directories, block access to sensitive paths like .env, credentials.json, or private keys.
  • networkAllowlist or blocklist domains, block requests to internal services, prevent data exfiltration to unknown endpoints.
  • envPrevent agents from reading environment variables that contain secrets, API keys, or database credentials.

Getting Started

OpenClaw Guardrails is available today on all paid Vettly plans. The setup takes three steps:

  1. Create a policy in the Vettly dashboard. Start with the "OpenClaw Default" preset, which blocks destructive shell commands, restricts file access to the working directory, and denies access to secrets.
  2. Integrate the SDK into your agent framework. Add the skill-vetting call to your install hook and the action-authorization call to your execution loop.
  3. Monitor the dashboard. Review the metrics, tune thresholds, and iterate on your policy as your agents evolve.

Ready to add guardrails to your AI agents?

OpenClaw Guardrails is live now. Start with the default preset and ship safer agents today.