ORILink · Validation

How We Test ORILink

Most security tools tell you what they block. We want to show you how we know.

This page covers how our test cases are built, what's covered, what isn't, and how results are verified. If you're evaluating ORILink and want to understand what "100% block rate" actually means, this is where to look.

Section 01

How Test Cases Get Built

We run daily threat intelligence scans across security research, published CVEs, community reports, and real attack data observed in production environments. When a new attack vector shows up, it goes on the build list.

Test cases are written against those specific real-world vectors by our research pipeline — not pulled from synthetic benchmarks or academic datasets. Things people are actually doing to AI agents right now.

Each test case has a clear pass/fail condition: does ORILink block the attack before the model sees it, or before the agent acts on it? A block that happens after inference doesn't count.

Section 02

What's Covered

Every category below has been tested across multiple sessions and across 6 model architectures.

Prompt injection

Direct attempts to override agent instructions embedded in external content. "Ignore all previous instructions and do X." Variations using different phrasing, syntax, and instruction formats.

Web-based injection

Malicious instructions embedded in web pages, documents, tool responses, and search results that an agent might visit or process during a task.

Encoding and obfuscation

Attacks that disguise instructions using Base64, ROT13, Unicode substitution, zero-width characters, and mixed encoding schemes. The encoding doesn't matter. We track where content came from, not just what it looks like.

Multi-hop injection

Attacks spread across multiple retrieved pieces of content. Each chunk carries its origin. The chain is tracked end to end.

Semantic intent monitoring

Attacks that use reasoning language to reframe goals or values without explicit keywords, whether targeting an agent, a local model, or a RAG pipeline. M3 detects paraphrased malicious intent using a local embedding model with no external APIs, feeding confidence signals to the taint propagation layer.

Tool and plugin poisoning

Tool definitions that mutate after approval. Malicious payloads in tool responses. Tools impersonating legitimate tools. Supply chain attacks via compromised library responses.

Agent-to-agent infection

When one agent gets compromised and tries to pass malicious instructions to other agents through trusted channels. We track the original source of every piece of content. Trust scores don't get upgraded just because a trusted agent forwarded something.

Cryptographic provenance integrity

Trust annotations are HMAC-sealed at the point of assignment. Content binding detects tampering at any layer — agent input, model context, or retrieval result. Trust scores cannot be elevated through forwarding.

Inter-gate taint propagation

Low-trust inbound content degrades the trust of any output derived from it, whether that's an agent action, a model response, or a RAG retrieval result. Gates share context. A suspicious input makes downstream outputs more scrutinized, automatically.

Unauthorized outbound actions

Actions framed in legitimate-sounding language that would result in prohibited behavior: unauthorized data access, sending data to external destinations, scanning systems outside the agent's scope, generating attack payloads. We classify what the action actually does, not what it's called.

Credential and secret leakage

Agent outputs scanned for API keys, tokens, passwords, private keys, and sensitive configuration data before they leave the system.

False positive validation

7 categories of legitimate agent behavior confirmed unblocked. Authorized URL research, writing to authorized storage, forwarding verified content to authorized agents, honest self-identification. Security that blocks legitimate work isn't security, it's a problem.

Section 03

What Isn't Covered Yet

We publish this because it matters. Here's what we know we haven't fully solved:

Distributed multi-machine deployments

Distributed multi-machine deployments remain out of scope. Multi-machine networks require a separate architecture not included in the current release.

Model-level jailbreaks

Attacks that exploit specific weaknesses in how a particular model was trained. That's the model provider's problem to solve, not ours. ORILink operates before the model sees the input and before the agent acts on its output. We don't try to fix the model.

Section 04

How Results Are Verified

Tests run across multiple sessions against 6 model architectures: Llama 3, Mistral 7B, Gemma 2, GPT-4o, GPT-4o mini, and Claude Haiku. Open-source, commercial, and hardened models.

Every validation run is monitored by an independent security process that has no role in building the tests it audits. The same system that builds and runs the tests cannot sign off on its own results.

100% block rate is required before any vector is considered validated. If something doesn't block consistently, it goes back on the build list.

Section 05

The Numbers

602

Individual SDK test cases

6

Model architectures

100%

Block rate

0

False positives

~13ms

Total stack latency added per decision

1,310

Combined test cases across Individual and Business SDKs

142

Semantic intent patterns in M3 corpus

Total stack latency is ~13ms. Gate 1 alone runs at 0.14ms. The additional 12ms comes from M3, our semantic intent monitoring layer — local embedding model, no external APIs. We think that's a reasonable trade for catching paraphrased attacks that keyword detection misses.

These numbers will change. We add test cases regularly. When they do, this page updates.