Three-Layer Trust Architecture

Architecture

The three layers

Layer 3: Identity Anchor (Shyft Trust Channels)KYC'd human → trust anchor → agents → sub-agents | Status: DEPLOYED

Layer 2: Citation Reputation (PageRank)Agents cite agents → graph scoring | Key params: alpha, diversity, reciprocal penalty | Status: RESEARCHED

Layer 1: Device Liveness (TEE + On-Device AI)Sensors → Secure Enclave → signed score → ZK proof | Status: COMPONENTS EXIST

Settlement: Stable Chain (ID 988)USDT0 payments — machine-to-machine transactions

Attack / Defense Compound Matrix

Attack Vector	L1 Alone	L1 + L2	L1 + L2 + L3
Phone farm (1000 devices)	Passes (each phone is real)	Fails (no citation graph between farm phones)	Fails (1000 KYC identities is expensive)
Spoofed sensors	May pass	Fails (no real citation history)	Fails
Stolen identity	Passes	Passes initially	Detected (revocation propagates)
Sophisticated sybil ring	Passes	Resisted (PageRank dampens rings)	Strongly resisted (KYC + PageRank)
Nation-state attacker	Passes	May pass	Resisted (cost per identity scales linearly)

Layer 1 — Deep Dive

Device Liveness

What it proves (honestly): "A real physical device with functioning sensors exists and ran a specific computation at a specific time." What it does NOT prove: that the user is trustworthy, is who they claim, or that no other device represents them. Device liveness is an admission gate, not a reputation signal.

Technical Pipeline

// Device Liveness Pipeline
Smartphone Sensors (accelerometer, gyroscope, touch, typing)
  → Trusted Execution Environment (ARM TrustZone / Apple Secure Enclave)
  → On-Device AI Model (QVAC-class, 3.8-13B params, quantized)
  → Liveness Score (computed inside TEE, never exposed raw)
  → TEE-Signed Attestation (device key proves computation integrity)
  → ZK Proof ("I hold a TEE-signed score above threshold X")
  → On-Chain Attestation (proof published, no raw data)

Design Decisions

Intermittent, Not Continuous

30 seconds every 15-30 min, or event-triggered. Continuous monitoring kills battery in 2-3 hours, thermal throttles in 10-15 min. Intermittent: 3-5% daily battery drain.

Hardware Attestation First, AI Second

Apple App Attest / Google Play Integrity as primary. On-device AI is supplementary. Hardware attestation is harder to fake than AI inference.

Do NOT Tokenize Base Score

If device liveness has direct token value, phone farms are profitable day one (used phones: $30-80). Device liveness = free mandatory admission gate. Only L2 reputation carries economic value.

Sensor Selection: Avoid Health Data

USE: accelerometer, gyroscope, touch, typing rhythm. DO NOT USE: HealthKit/Google Fit (HIPAA), GPS (location tracking), camera/microphone (wiretap laws). Motion + interaction = least regulated.

Known Attack Vectors

Attack	Difficulty	Mitigation
Rooted Android + sensor injection	Easy	Hardware attestation (Play Integrity) catches root
Emulator with synthetic telemetry	Easy	TEE remote attestation proves real hardware
GAN-generated sensor patterns	Medium	Temporal consistency checks across sessions
Mechanical device simulator	Hard	Cross-session behavioral drift detection
Real phone, multiple identities	Hard	L2 + L3 handle this, not L1

Open Research Questions

Population-scale uniqueness: Behavioral biometrics work at bank scale (millions). Do they maintain uniqueness at billion-person scale? BioCatch says yes for their use case. No published academic evidence for general population.

White-box adversarial risk: BioCatch is proprietary. Our system would be open-source. What happens when adversaries can generate synthetic data against known model weights?

Cross-device continuity: When a user changes phones, behavioral profile changes. How to migrate identity without re-verification?

ZK over ML at scale: Current zkML handles millions of params. Proving 13B model inference in ZK is years away. Practical path: TEE-computed score + ZK proof of the score (trusts the TEE).

Tether QVAC (Enabler)

13B

Params on iPhone 16

3.8B

Params on Android flagships

90%

Less memory vs full precision

Open

Source (BitNet LoRA)

DA Fatal Flaws Identified

#1 "Proof of phone" ≠ "proof of human" — Category error if overclaimed. Must be honest about what L1 actually proves.

#2 Phone farms destroy economics — $30-80/device. If base layer is tokenized, farms are profitable immediately.

#3 Sensor spoofing trivial on rooted Android — Xposed Framework injects arbitrary sensor data. Hardware attestation is the mitigation.

#4 Privacy-utility tradeoff — Useful score = information leakage; private score = useless. Threshold ZK proofs are the compromise.

Layer 2 — Deep Dive

Citation Reputation (PageRank)

Ablation Results (Phase 6 Research)

1000-trial Bayesian optimization (Optuna TPE sampler) across 33 parameters. Composite score: 0.779 → 0.967.

61.5%

Alpha (damping factor)

16.2%

Citation Diversity

5.6%

Composite Worst Weight

Params with ZERO impact

Conclusion: 3 parameters + trust channels = sufficient. The sybil detection suite (carousel, star, chain, cluster detectors) showed zero impact because the identity layer already provides deterrence.

Production Configuration

// Validated parameters
PageRank alpha: ~0.85 // Standard — synthetic suggested 0.6 but real data showed 0.85 outperforms
Citation diversity: enabled // Entropy threshold ~0.55, penalty ~0.14
Reciprocal penalty: ~0.82 // Dampens mutual-citation gaming
Composite worst-case weight: ~0.23 // Floors catastrophic underperformance

// Everything else: monitor but don't penalize
// Collect data on carousel, star, chain, burst patterns for human review

Proposed Experiments (Ranked by ROI)

#	Experiment	Question Answered	Priority
1	Product utility threshold	Binary (trusted/untrusted) or continuous scores? If binary, scoring precision is over-engineering	Highest
2	Real data validation (Bitcoin Alpha/OTC, Elliptic++)	Does 3-param PageRank work on adversarial real-world data?	High
3	Adversarial red team	Cheapest attack that succeeds against simplified system?	Medium
4	Same-anchor citation dampening	Should intra-organization citations carry less weight?	Medium
5	Temporal burst detection	Citation volume spikes as signal (not penalty)	Low

Layer 3 — Deep Dive

Identity Anchor (Shyft Trust Channels)

What Exists (Deployed)

Human KYC through Shyft trust anchors. Trust anchor → agents → sub-agents hierarchy. Each agent traces to a KYC'd human. Trust channel revocation propagates through hierarchy.

Live Infrastructure

Detection vs. Deterrence

Trust channels provide sybil deterrence via identity cost. KYC is expensive to repeat. Adding explicit staking bonds double-taxes legitimate users without deterring well-funded attackers. The identity layer shifts L2's role from "detect sybils" to "measure competence."

The reframe: We were building sophisticated algorithmic detection for attacks that an economic mechanism (KYC cost) already makes irrational. This was the pivotal insight from Phase 4 of the research — detection vs. deterrence are fundamentally different solutions to the same problem.

Distilled Analytics Connection (2018)

Shyft formally partnered with Distilled Analytics (David Shrier, Alex Pentland MIT) in 2018. Pentland's "Social Physics": behavioral metadata predicts identity/creditworthiness without content. The intellectual foundation exists and has Shyft's name on it. BioCatch's $1.3B acquisition commercially validated this approach.

Validation

Real Data Results

Tested against Stanford SNAP's Bitcoin Alpha and OTC trust networks — real adversarial data with ground-truth labels.

~0.49

Spearman correlation (alpha=0.85)

~0.396

Spearman correlation (alpha=0.6)

0.997

Sybil resistance stability

0.4

Minimum threshold

Critical finding: Alpha=0.6 (from synthetic optimization) performed WORSE than standard PageRank (0.85) on real data. Spearman ~0.396 vs ~0.49. Synthetic optimization produced parameters that hurt real-world performance.

Validates Anti-Pattern #2 (benchmark overfitting) and #5 (synthetic bubble) from methodology framework.

Good news: Sybil resistance was excellent (0.997 stability). The architecture holds — it just needs parameter re-tuning with real data, not synthetic. Standard alpha (0.85) passes the 0.4 threshold.

Status: Needs Re-Tuning

Next steps: Re-run with alpha=0.85-0.95, incorporate negative ratings from the Bitcoin datasets, calibrate sybil penalties against known adversarial nodes. Elliptic++ (822K nodes) is the next validation dataset.

Methodology

Research Process

6-phase research process with multi-model adversarial debate. Each phase builds on the previous one. The conclusion at each phase was different from — and better than — the previous phase.

Phase 1

Optimize

1000-trial Optuna on 33 params. Score: 0.779 → 0.967. Quantitative baseline established.

Phase 2

Challenge

DA + Grok + Gemini + Codex challenged evaluation function, assumptions, framing.

Phase 3

Simplify

Ablation: 33 params → 3. Killed 22 zero-impact parameters. Complexity reduction.

Phase 4

Reframe

Detection → deterrence. Identity layer already solves the algorithmic problem. Question changed.

Phase 5

Map Unknowns

Population uniqueness, cross-domain applicability, device liveness — mapped what we don't know.

Phase 6

Ship & Learn

Concrete research path, decision gates, GTM. From theory to executable plan.

5-Level Questioning Framework

Level	Question	Most Teams Stop At
1	Is the optimization working? (Are numbers going up?)	← Here
2	Is it testing the right thing? (Does the score mean anything?)
3	Is optimization even the right approach? (Ablation first)
4	Is the problem correctly framed? (Detection vs. deterrence)
5	What don't we know we don't know? (Adjacent-field challenges)	← Real insights

Anti-Patterns Discovered

#1 Optimizing Noise — 22/33 parameters had zero impact. Optimization improved the score by fitting to noise in synthetic data.

Signal: If removing a parameter doesn't change results, you were optimizing noise.

#2 Benchmark Overfitting — 0.967 on synthetic with 0% false positives. Too good. Scores >0.95 on synthetic benchmarks should trigger suspicion, not celebration.

Signal: Near-perfect scores. Ask "why does our benchmark think this is meaningful?"

#3 Complexity Creep — 19 → 33 parameters felt like thoroughness. More knobs = more noise to overfit, more interactions to debug.

Signal: Parameter count growing without corresponding growth in distinct behaviors.

#4 Detection vs. Deterrence Confusion — Building a better mousetrap when you should make it unprofitable to be a mouse. Economic deterrence changes the game.

Signal: Designing sophisticated detection for attacks a deposit/stake mechanism would make irrational.

#5 Synthetic Bubble — Optimizing against your own assumptions is circular reasoning. Confirmed: alpha=0.6 from synthetic hurt real-world performance.

Signal: High confidence in robustness based entirely on tests designed by the same team.

Key Principles

Ablation Before Optimization

Know what matters first. If you don't know which 3 of 33 drive results, you're optimizing blind.

Cross-Model Debate

No single AI validates its own conclusions. 2+ models adversarially. Different blind spots → better intersection.

First-Principles Escape

When optimization plateaus, question the frame. Breakthrough came from reframing, not better numbers.

Economics > Algorithms

In adversarial systems, changing incentives beats improving detection. Deterrence is structural; detection is arms race.

Existing Infrastructure Audit

Inventory what exists before building new. Shyft trust channels were the biggest defense — just unlabeled.

Product Lens

"What decisions does this score inform?" If product needs coarse ranking, sub-percentage optimization is waste.

Market Analysis

Competitive Landscape

System	What It Does	Revenue / Traction	Limitation
BioCatch	Behavioral biometrics, 280+ banks	$160M ARR, $1.3B acquisition	Centralized, bank-only, no on-chain
Worldcoin	ZK proof of personhood	12M iris scans	Requires Orbs, banned in 9 countries
World ID AgentKit	ZK proofs linking agents to humans	Launched Mar 17, 2026	Walled garden (Orb required)
Human Passport	Multi-signal sybil resistance	2M users, 35M credentials, $430M protected	Single-layer aggregation
Trusta Labs	On-chain reputation scoring	82.8M API calls, 2.5M attestations	No KYC anchor
t54 Labs	AI agent trust	$5M raised (Ripple, Franklin Templeton)	No on-device AI, no full stack
SpruceID	Government compliance (DID)	$41.8M raised, CA DMV + DHS contracts	Enterprise compliance only
Civic	Government ID verification	Established	Requires government ID (KYC, not pseudonymous)

Market Segments by Temperature

HOT

DeFi Sybil Resistance

$170M+ lost to sybil attacks. Human Passport's $430M protection shows demand. Unanimous beachhead across all models.

HOT

Enterprise AI Compliance

EU AI Act Article 12, SEC enforcement. $200K-$2M ARR per enterprise. Slower sales cycle but higher value.

WARM

AI Agent Frameworks

CrewAI, LangChain plugins. 46% CAGR. Good distribution channel but not the destination.

WARM

Financial Services

BioCatch validates the market ($1.3B). But saturated and requires compliance infrastructure.

COLD

Gaming / Social

Slower adoption cycle. Not enough pain to drive fast integration.

COLD

Cross-Marketplace Portability

Data-sharing incentives don't exist. Phantom pain — sounds good, nobody buys it.

Competitive Intelligence

Tempo / MPP Analysis

Two competing payment stacks forming right now. Neither has a reputation layer. That's our gap.

Layer	Stack A (Stripe/Tempo)	Stack B (Coinbase/World)	Our Position
Identity	No solution	World ID (Orb biometrics)	Layer 3 (Shyft KYC)
Trust/Reputation	No solution	No solution	Layer 2 (PageRank)
Device Liveness	No solution	World ID AgentKit	Layer 1 (QVAC + TEE)
Payments	MPP (HTTP 402)	x402 Protocol	Not our layer

Tempo Details

Tempo (Stripe-backed)

Payments blockchain, $500M raised, $5B valuation. 100K+ TPS target. EVM-compatible (Reth). Launched mainnet March 18, 2026. Design partners: Visa, Deutsche Bank, Shopify, Nubank, Revolut, OpenAI, Standard Chartered.

MPP (Machine Payments Protocol)

Open protocol for machine-to-machine payments. HTTP 402 status code. Three primitives: Challenges, Credentials, Receipts. Supports stablecoins, cards, Lightning. "Sessions" for micropayment streaming.

Strategic position: MPP answers "how does an agent pay?" We answer "should you trust that agent?" These are complementary. MPP's Challenges primitive could include a trust score check — before getting payment credentials, pass a reputation threshold. We become infrastructure that Tempo integrates.

Risk: MPP's design partners include OpenAI, Deutsche Bank, Visa. If Stripe builds identity/trust into MPP directly, we compete against the platform. Mitigation: speed — integrate before they build it.

Execution

Research Roadmap

Each phase has an explicit decision gate. If the gate fails, stop or pivot. No sunk-cost continuation.

Phase 1: Validate Device Liveness

Weeks 1-4 (parallel with Phase 2)

iOS Secure Enclave PoC: accelerometer + touch data → signed score. Benchmark: latency, battery, score stability, basic spoofing resistance. Measure score variance across 10 phones.

Kill gate: Score variance between human and simulated use < 2 std dev → stop, focus on L2+L3 only.

Phase 2: Real Data Validation

Weeks 1-4 (parallel with Phase 1)

Bitcoin Alpha/OTC + Elliptic++ (822K nodes). PageRank with alpha=0.85 on real adversarial data. Red team: cheapest attack that changes rankings.

Kill gate: Ranking correlation < 0.4 → PageRank approach needs fundamental rework.

Phase 3: ZK Composition Prototype

Weeks 5-12

TEE score → ZK threshold proof → on-chain verification. Measure: proof generation time on mobile, verification gas cost, proof size.

Kill gate: Proof generation > 60s on flagship → UX unacceptable. Gas > $1/attestation on L2 → economics broken.

Phase 4: Agent Framework Plugin

Weeks 5-8 (parallel with Phase 3)

CrewAI / LangChain plugin: get_trust_score(agent_id). Open source. Measure installs, API calls, developer feedback.

Kill gate: < 50 installs in 30 days with no organic usage → demand too weak, pivot to compliance.

Phase 5: Beachhead Case Study

Weeks 9-16

One production integration. Free integration for public case study. Measure fraud reduction, quality improvement, false positive rate. Publish results on-chain.

Kill gate: The "Stripe moment." Measurable value → scale. No value → hypothesis is wrong.

Grok CTO Verdict

GO, with conditions. Extend to 12 weeks with buffers. Add weekly tech reviews and contingency budget. Dedicate full-time engineer to QVAC PoC from Day 1. Push for paid pilots over free case studies. Audit team for ZK/ML expertise. Biggest risk: Layer 2 PageRank may not work on real data without major rework.

Go-to-Market

Distribution Strategy

Phase 1 — Month 1-2

Plugin Play

CrewAI / LangChain plugin. One function call, returns 0-10000. Open source, zero friction. Claims position before competitors.

Phase 2 — Month 2-3

Beachhead Integration

ONE protocol losing money to bad agents. Free integration for public case study. On-chain verifiable results as proof.

Phase 3 — Month 3-6

Premium Layer

Free: device liveness gate + basic PageRank (1000 queries/month). Paid: KYC-verified identity scoring via Shyft. The combination = moat.

Phase 4 — Month 6+

Compliance Wedge

EU AI Act Article 12 compliance artifacts. SEC/FINRA recordkeeping. Enterprise SaaS ($24K+/year).

Revenue Model

Tier	Price	What You Get
Free	$0	Device liveness gate + basic PageRank score, 1000 queries/month
Growth	$200-2K/month	Full scoring API + citation analytics + webhook alerts
Enterprise	$24K+/year	Compliance exports + audit trails + SLA + custom integration

What NOT to Build

A "trust platform" — Vision, not product. Nobody buys a platform; they buy a solution to a problem.

A token-incentivized network — Requires capital you don't have. Phone farm economics destroy token models.

Cross-marketplace reputation portability — Data-sharing incentives don't exist. Phantom pain.

Precedents That Worked

Company	Strategy	Result
BioCatch	Behavioral biometrics for bank fraud	$1.3B acquisition, $160M ARR
SpruceID	Government compliance contracts	$41.8M raised, CA DMV + DHS
Gitcoin Passport	Sybil resistance for grants/airdrops	2M users, 35M credentials
Trusta Labs	On-chain reputation scoring	82.8M API calls, 2.5M attestations
Alchemy	Free-tier infrastructure API	70% of top Ethereum apps

Privacy & Regulatory

The Surveillance Test

"If a hostile government controlled this system, could they use it for mass surveillance?" If yes, redesign.

Design Decision	Passes Test?	Reason
TEE processing, no raw data export	Yes	Data never leaves secure enclave
Threshold ZK proofs, no score exposure	Yes	Only "above/below threshold" is public
User-initiated intermittent liveness	Yes	No background surveillance
On-chain attestations, no central DB	Yes	No single entity holds all scores
Continuous background sensor collection	FAILS	Removed from design
Score history stored centrally	FAILS	Removed from design

Regulatory Mapping

Data Type	Regulation	Risk	Decision
Accelerometer / gyroscope	Low regulation (motion data)	Low	USE
Touch / typing patterns	BIPA (Illinois), GDPR Art. 9	Medium	USE (on-device only, threshold output)
GPS / location	GDPR, CCPA, state privacy laws	High	DO NOT USE
Health (HealthKit/Fit)	HIPAA, GDPR Art. 9	Very High	DO NOT USE
Camera / microphone	Wiretap laws, BIPA, GDPR	Very High	DO NOT USE

Foundations

Academic & Industry References

Source	Relevance
Pentland, A. "Social Physics" (MIT)	Behavioral metadata as identity/creditworthiness predictor
Page & Brin, "PageRank"	Citation graph authority scoring — core of Layer 2
PHC Paper (OpenAI + MIT + a16z, 2024)	"Personhood Credentials" — design space validation
Tramèr et al. (2016)	Model stealing attacks — informs white-box adversarial risk
EdenDID (Springer, 2025)	Edge computing + on-chain trust
Buterin (2023)	"What do I think about biometric proof of personhood?"
Distilled Analytics + Shyft (2018)	Behavioral telemetry + blockchain identity partnership
BioCatch ($1.3B acquisition)	Commercial validation of behavioral biometrics at scale

Key Market Statistics

$18.4B

Behavioral biometrics (2033)

$52.6B

AI agent market (2030)

46%

AI agent CAGR

45%

Fortune 500 piloting agents

$160M

BioCatch ARR

280+

Banks using BioCatch

What this is