Full Research Dashboard Internal — Not for Distribution

Three-Layer Trust Architecture

Composable trust for AI agents and decentralized systems. Device liveness + citation reputation + identity anchoring. Cross-model reviewed by Claude, Grok (CTO), Gemini, Codex, and Devil's Advocate.

Last updated: March 19, 2026

Executive Summary

What this is

A three-layer trust system that compounds device liveness, citation reputation, and identity anchoring into a single composable stack. Each layer adds confidence independently. The composition is novel — every component exists in production, but nobody has assembled them.

Core insight: Don't solve trust with one mechanism. Layer three independent signals that are individually weak but collectively strong. An attacker must defeat all three simultaneously.

3
Independent trust layers
3/33
Parameters that matter
4B+
Devices with TEE globally
$18.4B
Behavioral biometrics (2033)
$52.6B
AI agent market (2030)
0.997
Sybil resistance stability
Architecture

The three layers

Layer 3: Identity Anchor (Shyft Trust Channels)KYC'd human → trust anchor → agents → sub-agents | Status: DEPLOYED
Layer 2: Citation Reputation (PageRank)Agents cite agents → graph scoring | Key params: alpha, diversity, reciprocal penalty | Status: RESEARCHED
Layer 1: Device Liveness (TEE + On-Device AI)Sensors → Secure Enclave → signed score → ZK proof | Status: COMPONENTS EXIST
Settlement: Stable Chain (ID 988)USDT0 payments — machine-to-machine transactions

Attack / Defense Compound Matrix

Attack VectorL1 AloneL1 + L2L1 + L2 + L3
Phone farm (1000 devices)Passes (each phone is real)Fails (no citation graph between farm phones)Fails (1000 KYC identities is expensive)
Spoofed sensorsMay passFails (no real citation history)Fails
Stolen identityPassesPasses initiallyDetected (revocation propagates)
Sophisticated sybil ringPassesResisted (PageRank dampens rings)Strongly resisted (KYC + PageRank)
Nation-state attackerPassesMay passResisted (cost per identity scales linearly)
Layer 1 — Deep Dive

Device Liveness

What it proves (honestly): "A real physical device with functioning sensors exists and ran a specific computation at a specific time." What it does NOT prove: that the user is trustworthy, is who they claim, or that no other device represents them. Device liveness is an admission gate, not a reputation signal.

Technical Pipeline

// Device Liveness Pipeline
Smartphone Sensors (accelerometer, gyroscope, touch, typing)
  → Trusted Execution Environment (ARM TrustZone / Apple Secure Enclave)
  → On-Device AI Model (QVAC-class, 3.8-13B params, quantized)
  → Liveness Score (computed inside TEE, never exposed raw)
  → TEE-Signed Attestation (device key proves computation integrity)
  → ZK Proof ("I hold a TEE-signed score above threshold X")
  → On-Chain Attestation (proof published, no raw data)

Design Decisions

Intermittent, Not Continuous

30 seconds every 15-30 min, or event-triggered. Continuous monitoring kills battery in 2-3 hours, thermal throttles in 10-15 min. Intermittent: 3-5% daily battery drain.

Hardware Attestation First, AI Second

Apple App Attest / Google Play Integrity as primary. On-device AI is supplementary. Hardware attestation is harder to fake than AI inference.

Do NOT Tokenize Base Score

If device liveness has direct token value, phone farms are profitable day one (used phones: $30-80). Device liveness = free mandatory admission gate. Only L2 reputation carries economic value.

Sensor Selection: Avoid Health Data

USE: accelerometer, gyroscope, touch, typing rhythm. DO NOT USE: HealthKit/Google Fit (HIPAA), GPS (location tracking), camera/microphone (wiretap laws). Motion + interaction = least regulated.

Known Attack Vectors

AttackDifficultyMitigation
Rooted Android + sensor injectionEasyHardware attestation (Play Integrity) catches root
Emulator with synthetic telemetryEasyTEE remote attestation proves real hardware
GAN-generated sensor patternsMediumTemporal consistency checks across sessions
Mechanical device simulatorHardCross-session behavioral drift detection
Real phone, multiple identitiesHardL2 + L3 handle this, not L1

Open Research Questions

Population-scale uniqueness: Behavioral biometrics work at bank scale (millions). Do they maintain uniqueness at billion-person scale? BioCatch says yes for their use case. No published academic evidence for general population.

White-box adversarial risk: BioCatch is proprietary. Our system would be open-source. What happens when adversaries can generate synthetic data against known model weights?

Cross-device continuity: When a user changes phones, behavioral profile changes. How to migrate identity without re-verification?

ZK over ML at scale: Current zkML handles millions of params. Proving 13B model inference in ZK is years away. Practical path: TEE-computed score + ZK proof of the score (trusts the TEE).

Tether QVAC (Enabler)

13B
Params on iPhone 16
3.8B
Params on Android flagships
90%
Less memory vs full precision
Open
Source (BitNet LoRA)

DA Fatal Flaws Identified

#1 "Proof of phone" ≠ "proof of human" — Category error if overclaimed. Must be honest about what L1 actually proves.

#2 Phone farms destroy economics — $30-80/device. If base layer is tokenized, farms are profitable immediately.

#3 Sensor spoofing trivial on rooted Android — Xposed Framework injects arbitrary sensor data. Hardware attestation is the mitigation.

#4 Privacy-utility tradeoff — Useful score = information leakage; private score = useless. Threshold ZK proofs are the compromise.

Layer 2 — Deep Dive

Citation Reputation (PageRank)

Ablation Results (Phase 6 Research)

1000-trial Bayesian optimization (Optuna TPE sampler) across 33 parameters. Composite score: 0.779 → 0.967.

61.5%
Alpha (damping factor)
16.2%
Citation Diversity
5.6%
Composite Worst Weight
22
Params with ZERO impact

Conclusion: 3 parameters + trust channels = sufficient. The sybil detection suite (carousel, star, chain, cluster detectors) showed zero impact because the identity layer already provides deterrence.

Production Configuration

// Validated parameters
PageRank alpha: ~0.85 // Standard — synthetic suggested 0.6 but real data showed 0.85 outperforms
Citation diversity: enabled // Entropy threshold ~0.55, penalty ~0.14
Reciprocal penalty: ~0.82 // Dampens mutual-citation gaming
Composite worst-case weight: ~0.23 // Floors catastrophic underperformance

// Everything else: monitor but don't penalize
// Collect data on carousel, star, chain, burst patterns for human review

Proposed Experiments (Ranked by ROI)

#ExperimentQuestion AnsweredPriority
1Product utility thresholdBinary (trusted/untrusted) or continuous scores? If binary, scoring precision is over-engineeringHighest
2Real data validation (Bitcoin Alpha/OTC, Elliptic++)Does 3-param PageRank work on adversarial real-world data?High
3Adversarial red teamCheapest attack that succeeds against simplified system?Medium
4Same-anchor citation dampeningShould intra-organization citations carry less weight?Medium
5Temporal burst detectionCitation volume spikes as signal (not penalty)Low
Layer 3 — Deep Dive

Identity Anchor (Shyft Trust Channels)

What Exists (Deployed)

Human KYC through Shyft trust anchors. Trust anchor → agents → sub-agents hierarchy. Each agent traces to a KYC'd human. Trust channel revocation propagates through hierarchy.

Live Infrastructure

Detection vs. Deterrence

Trust channels provide sybil deterrence via identity cost. KYC is expensive to repeat. Adding explicit staking bonds double-taxes legitimate users without deterring well-funded attackers. The identity layer shifts L2's role from "detect sybils" to "measure competence."

The reframe: We were building sophisticated algorithmic detection for attacks that an economic mechanism (KYC cost) already makes irrational. This was the pivotal insight from Phase 4 of the research — detection vs. deterrence are fundamentally different solutions to the same problem.

Distilled Analytics Connection (2018)

Shyft formally partnered with Distilled Analytics (David Shrier, Alex Pentland MIT) in 2018. Pentland's "Social Physics": behavioral metadata predicts identity/creditworthiness without content. The intellectual foundation exists and has Shyft's name on it. BioCatch's $1.3B acquisition commercially validated this approach.

Validation

Real Data Results

Tested against Stanford SNAP's Bitcoin Alpha and OTC trust networks — real adversarial data with ground-truth labels.

~0.49
Spearman correlation (alpha=0.85)
~0.396
Spearman correlation (alpha=0.6)
0.997
Sybil resistance stability
0.4
Minimum threshold

Critical finding: Alpha=0.6 (from synthetic optimization) performed WORSE than standard PageRank (0.85) on real data. Spearman ~0.396 vs ~0.49. Synthetic optimization produced parameters that hurt real-world performance.

Validates Anti-Pattern #2 (benchmark overfitting) and #5 (synthetic bubble) from methodology framework.

Good news: Sybil resistance was excellent (0.997 stability). The architecture holds — it just needs parameter re-tuning with real data, not synthetic. Standard alpha (0.85) passes the 0.4 threshold.

Status: Needs Re-Tuning

Next steps: Re-run with alpha=0.85-0.95, incorporate negative ratings from the Bitcoin datasets, calibrate sybil penalties against known adversarial nodes. Elliptic++ (822K nodes) is the next validation dataset.

Methodology

Research Process

6-phase research process with multi-model adversarial debate. Each phase builds on the previous one. The conclusion at each phase was different from — and better than — the previous phase.

Phase 1

Optimize

1000-trial Optuna on 33 params. Score: 0.779 → 0.967. Quantitative baseline established.

Phase 2

Challenge

DA + Grok + Gemini + Codex challenged evaluation function, assumptions, framing.

Phase 3

Simplify

Ablation: 33 params → 3. Killed 22 zero-impact parameters. Complexity reduction.

Phase 4

Reframe

Detection → deterrence. Identity layer already solves the algorithmic problem. Question changed.

Phase 5

Map Unknowns

Population uniqueness, cross-domain applicability, device liveness — mapped what we don't know.

Phase 6

Ship & Learn

Concrete research path, decision gates, GTM. From theory to executable plan.

5-Level Questioning Framework

LevelQuestionMost Teams Stop At
1Is the optimization working? (Are numbers going up?)← Here
2Is it testing the right thing? (Does the score mean anything?)
3Is optimization even the right approach? (Ablation first)
4Is the problem correctly framed? (Detection vs. deterrence)
5What don't we know we don't know? (Adjacent-field challenges)← Real insights

Anti-Patterns Discovered

#1 Optimizing Noise — 22/33 parameters had zero impact. Optimization improved the score by fitting to noise in synthetic data.

Signal: If removing a parameter doesn't change results, you were optimizing noise.

#2 Benchmark Overfitting — 0.967 on synthetic with 0% false positives. Too good. Scores >0.95 on synthetic benchmarks should trigger suspicion, not celebration.

Signal: Near-perfect scores. Ask "why does our benchmark think this is meaningful?"

#3 Complexity Creep — 19 → 33 parameters felt like thoroughness. More knobs = more noise to overfit, more interactions to debug.

Signal: Parameter count growing without corresponding growth in distinct behaviors.

#4 Detection vs. Deterrence Confusion — Building a better mousetrap when you should make it unprofitable to be a mouse. Economic deterrence changes the game.

Signal: Designing sophisticated detection for attacks a deposit/stake mechanism would make irrational.

#5 Synthetic Bubble — Optimizing against your own assumptions is circular reasoning. Confirmed: alpha=0.6 from synthetic hurt real-world performance.

Signal: High confidence in robustness based entirely on tests designed by the same team.

Key Principles

Ablation Before Optimization

Know what matters first. If you don't know which 3 of 33 drive results, you're optimizing blind.

Cross-Model Debate

No single AI validates its own conclusions. 2+ models adversarially. Different blind spots → better intersection.

First-Principles Escape

When optimization plateaus, question the frame. Breakthrough came from reframing, not better numbers.

Economics > Algorithms

In adversarial systems, changing incentives beats improving detection. Deterrence is structural; detection is arms race.

Existing Infrastructure Audit

Inventory what exists before building new. Shyft trust channels were the biggest defense — just unlabeled.

Product Lens

"What decisions does this score inform?" If product needs coarse ranking, sub-percentage optimization is waste.

Market Analysis

Competitive Landscape

SystemWhat It DoesRevenue / TractionLimitation
BioCatchBehavioral biometrics, 280+ banks$160M ARR, $1.3B acquisitionCentralized, bank-only, no on-chain
WorldcoinZK proof of personhood12M iris scansRequires Orbs, banned in 9 countries
World ID AgentKitZK proofs linking agents to humansLaunched Mar 17, 2026Walled garden (Orb required)
Human PassportMulti-signal sybil resistance2M users, 35M credentials, $430M protectedSingle-layer aggregation
Trusta LabsOn-chain reputation scoring82.8M API calls, 2.5M attestationsNo KYC anchor
t54 LabsAI agent trust$5M raised (Ripple, Franklin Templeton)No on-device AI, no full stack
SpruceIDGovernment compliance (DID)$41.8M raised, CA DMV + DHS contractsEnterprise compliance only
CivicGovernment ID verificationEstablishedRequires government ID (KYC, not pseudonymous)

Market Segments by Temperature

HOT

DeFi Sybil Resistance

$170M+ lost to sybil attacks. Human Passport's $430M protection shows demand. Unanimous beachhead across all models.

HOT

Enterprise AI Compliance

EU AI Act Article 12, SEC enforcement. $200K-$2M ARR per enterprise. Slower sales cycle but higher value.

WARM

AI Agent Frameworks

CrewAI, LangChain plugins. 46% CAGR. Good distribution channel but not the destination.

WARM

Financial Services

BioCatch validates the market ($1.3B). But saturated and requires compliance infrastructure.

COLD

Gaming / Social

Slower adoption cycle. Not enough pain to drive fast integration.

COLD

Cross-Marketplace Portability

Data-sharing incentives don't exist. Phantom pain — sounds good, nobody buys it.

Competitive Intelligence

Tempo / MPP Analysis

Two competing payment stacks forming right now. Neither has a reputation layer. That's our gap.

LayerStack A (Stripe/Tempo)Stack B (Coinbase/World)Our Position
IdentityNo solutionWorld ID (Orb biometrics)Layer 3 (Shyft KYC)
Trust/ReputationNo solutionNo solutionLayer 2 (PageRank)
Device LivenessNo solutionWorld ID AgentKitLayer 1 (QVAC + TEE)
PaymentsMPP (HTTP 402)x402 ProtocolNot our layer

Tempo Details

Tempo (Stripe-backed)

Payments blockchain, $500M raised, $5B valuation. 100K+ TPS target. EVM-compatible (Reth). Launched mainnet March 18, 2026. Design partners: Visa, Deutsche Bank, Shopify, Nubank, Revolut, OpenAI, Standard Chartered.

MPP (Machine Payments Protocol)

Open protocol for machine-to-machine payments. HTTP 402 status code. Three primitives: Challenges, Credentials, Receipts. Supports stablecoins, cards, Lightning. "Sessions" for micropayment streaming.

Strategic position: MPP answers "how does an agent pay?" We answer "should you trust that agent?" These are complementary. MPP's Challenges primitive could include a trust score check — before getting payment credentials, pass a reputation threshold. We become infrastructure that Tempo integrates.

Risk: MPP's design partners include OpenAI, Deutsche Bank, Visa. If Stripe builds identity/trust into MPP directly, we compete against the platform. Mitigation: speed — integrate before they build it.

Execution

Research Roadmap

Each phase has an explicit decision gate. If the gate fails, stop or pivot. No sunk-cost continuation.

Phase 1: Validate Device Liveness

Weeks 1-4 (parallel with Phase 2)

iOS Secure Enclave PoC: accelerometer + touch data → signed score. Benchmark: latency, battery, score stability, basic spoofing resistance. Measure score variance across 10 phones.

Kill gate: Score variance between human and simulated use < 2 std dev → stop, focus on L2+L3 only.

Phase 2: Real Data Validation

Weeks 1-4 (parallel with Phase 1)

Bitcoin Alpha/OTC + Elliptic++ (822K nodes). PageRank with alpha=0.85 on real adversarial data. Red team: cheapest attack that changes rankings.

Kill gate: Ranking correlation < 0.4 → PageRank approach needs fundamental rework.

Phase 3: ZK Composition Prototype

Weeks 5-12

TEE score → ZK threshold proof → on-chain verification. Measure: proof generation time on mobile, verification gas cost, proof size.

Kill gate: Proof generation > 60s on flagship → UX unacceptable. Gas > $1/attestation on L2 → economics broken.

Phase 4: Agent Framework Plugin

Weeks 5-8 (parallel with Phase 3)

CrewAI / LangChain plugin: get_trust_score(agent_id). Open source. Measure installs, API calls, developer feedback.

Kill gate: < 50 installs in 30 days with no organic usage → demand too weak, pivot to compliance.

Phase 5: Beachhead Case Study

Weeks 9-16

One production integration. Free integration for public case study. Measure fraud reduction, quality improvement, false positive rate. Publish results on-chain.

Kill gate: The "Stripe moment." Measurable value → scale. No value → hypothesis is wrong.

Grok CTO Verdict

GO, with conditions. Extend to 12 weeks with buffers. Add weekly tech reviews and contingency budget. Dedicate full-time engineer to QVAC PoC from Day 1. Push for paid pilots over free case studies. Audit team for ZK/ML expertise. Biggest risk: Layer 2 PageRank may not work on real data without major rework.

Go-to-Market

Distribution Strategy

Phase 1 — Month 1-2

Plugin Play

CrewAI / LangChain plugin. One function call, returns 0-10000. Open source, zero friction. Claims position before competitors.

Phase 2 — Month 2-3

Beachhead Integration

ONE protocol losing money to bad agents. Free integration for public case study. On-chain verifiable results as proof.

Phase 3 — Month 3-6

Premium Layer

Free: device liveness gate + basic PageRank (1000 queries/month). Paid: KYC-verified identity scoring via Shyft. The combination = moat.

Phase 4 — Month 6+

Compliance Wedge

EU AI Act Article 12 compliance artifacts. SEC/FINRA recordkeeping. Enterprise SaaS ($24K+/year).

Revenue Model

TierPriceWhat You Get
Free$0Device liveness gate + basic PageRank score, 1000 queries/month
Growth$200-2K/monthFull scoring API + citation analytics + webhook alerts
Enterprise$24K+/yearCompliance exports + audit trails + SLA + custom integration

What NOT to Build

A "trust platform" — Vision, not product. Nobody buys a platform; they buy a solution to a problem.

A token-incentivized network — Requires capital you don't have. Phone farm economics destroy token models.

Cross-marketplace reputation portability — Data-sharing incentives don't exist. Phantom pain.

Precedents That Worked

CompanyStrategyResult
BioCatchBehavioral biometrics for bank fraud$1.3B acquisition, $160M ARR
SpruceIDGovernment compliance contracts$41.8M raised, CA DMV + DHS
Gitcoin PassportSybil resistance for grants/airdrops2M users, 35M credentials
Trusta LabsOn-chain reputation scoring82.8M API calls, 2.5M attestations
AlchemyFree-tier infrastructure API70% of top Ethereum apps
Privacy & Regulatory

The Surveillance Test

"If a hostile government controlled this system, could they use it for mass surveillance?" If yes, redesign.

Design DecisionPasses Test?Reason
TEE processing, no raw data exportYesData never leaves secure enclave
Threshold ZK proofs, no score exposureYesOnly "above/below threshold" is public
User-initiated intermittent livenessYesNo background surveillance
On-chain attestations, no central DBYesNo single entity holds all scores
Continuous background sensor collectionFAILSRemoved from design
Score history stored centrallyFAILSRemoved from design

Regulatory Mapping

Data TypeRegulationRiskDecision
Accelerometer / gyroscopeLow regulation (motion data)LowUSE
Touch / typing patternsBIPA (Illinois), GDPR Art. 9MediumUSE (on-device only, threshold output)
GPS / locationGDPR, CCPA, state privacy lawsHighDO NOT USE
Health (HealthKit/Fit)HIPAA, GDPR Art. 9Very HighDO NOT USE
Camera / microphoneWiretap laws, BIPA, GDPRVery HighDO NOT USE
Foundations

Academic & Industry References

SourceRelevance
Pentland, A. "Social Physics" (MIT)Behavioral metadata as identity/creditworthiness predictor
Page & Brin, "PageRank"Citation graph authority scoring — core of Layer 2
PHC Paper (OpenAI + MIT + a16z, 2024)"Personhood Credentials" — design space validation
Tramèr et al. (2016)Model stealing attacks — informs white-box adversarial risk
EdenDID (Springer, 2025)Edge computing + on-chain trust
Buterin (2023)"What do I think about biometric proof of personhood?"
Distilled Analytics + Shyft (2018)Behavioral telemetry + blockchain identity partnership
BioCatch ($1.3B acquisition)Commercial validation of behavioral biometrics at scale

Key Market Statistics

$18.4B
Behavioral biometrics (2033)
$52.6B
AI agent market (2030)
46%
AI agent CAGR
45%
Fortune 500 piloting agents
$160M
BioCatch ARR
280+
Banks using BioCatch