Composable trust for AI agents and decentralized systems. Device liveness + citation reputation + identity anchoring. Cross-model reviewed by Claude, Grok (CTO), Gemini, Codex, and Devil's Advocate.
Last updated: March 19, 2026
A three-layer trust system that compounds device liveness, citation reputation, and identity anchoring into a single composable stack. Each layer adds confidence independently. The composition is novel — every component exists in production, but nobody has assembled them.
Core insight: Don't solve trust with one mechanism. Layer three independent signals that are individually weak but collectively strong. An attacker must defeat all three simultaneously.
| Attack Vector | L1 Alone | L1 + L2 | L1 + L2 + L3 |
|---|---|---|---|
| Phone farm (1000 devices) | Passes (each phone is real) | Fails (no citation graph between farm phones) | Fails (1000 KYC identities is expensive) |
| Spoofed sensors | May pass | Fails (no real citation history) | Fails |
| Stolen identity | Passes | Passes initially | Detected (revocation propagates) |
| Sophisticated sybil ring | Passes | Resisted (PageRank dampens rings) | Strongly resisted (KYC + PageRank) |
| Nation-state attacker | Passes | May pass | Resisted (cost per identity scales linearly) |
What it proves (honestly): "A real physical device with functioning sensors exists and ran a specific computation at a specific time." What it does NOT prove: that the user is trustworthy, is who they claim, or that no other device represents them. Device liveness is an admission gate, not a reputation signal.
30 seconds every 15-30 min, or event-triggered. Continuous monitoring kills battery in 2-3 hours, thermal throttles in 10-15 min. Intermittent: 3-5% daily battery drain.
Apple App Attest / Google Play Integrity as primary. On-device AI is supplementary. Hardware attestation is harder to fake than AI inference.
If device liveness has direct token value, phone farms are profitable day one (used phones: $30-80). Device liveness = free mandatory admission gate. Only L2 reputation carries economic value.
USE: accelerometer, gyroscope, touch, typing rhythm. DO NOT USE: HealthKit/Google Fit (HIPAA), GPS (location tracking), camera/microphone (wiretap laws). Motion + interaction = least regulated.
| Attack | Difficulty | Mitigation |
|---|---|---|
| Rooted Android + sensor injection | Easy | Hardware attestation (Play Integrity) catches root |
| Emulator with synthetic telemetry | Easy | TEE remote attestation proves real hardware |
| GAN-generated sensor patterns | Medium | Temporal consistency checks across sessions |
| Mechanical device simulator | Hard | Cross-session behavioral drift detection |
| Real phone, multiple identities | Hard | L2 + L3 handle this, not L1 |
Population-scale uniqueness: Behavioral biometrics work at bank scale (millions). Do they maintain uniqueness at billion-person scale? BioCatch says yes for their use case. No published academic evidence for general population.
White-box adversarial risk: BioCatch is proprietary. Our system would be open-source. What happens when adversaries can generate synthetic data against known model weights?
Cross-device continuity: When a user changes phones, behavioral profile changes. How to migrate identity without re-verification?
ZK over ML at scale: Current zkML handles millions of params. Proving 13B model inference in ZK is years away. Practical path: TEE-computed score + ZK proof of the score (trusts the TEE).
#1 "Proof of phone" ≠ "proof of human" — Category error if overclaimed. Must be honest about what L1 actually proves.
#2 Phone farms destroy economics — $30-80/device. If base layer is tokenized, farms are profitable immediately.
#3 Sensor spoofing trivial on rooted Android — Xposed Framework injects arbitrary sensor data. Hardware attestation is the mitigation.
#4 Privacy-utility tradeoff — Useful score = information leakage; private score = useless. Threshold ZK proofs are the compromise.
1000-trial Bayesian optimization (Optuna TPE sampler) across 33 parameters. Composite score: 0.779 → 0.967.
Conclusion: 3 parameters + trust channels = sufficient. The sybil detection suite (carousel, star, chain, cluster detectors) showed zero impact because the identity layer already provides deterrence.
| # | Experiment | Question Answered | Priority |
|---|---|---|---|
| 1 | Product utility threshold | Binary (trusted/untrusted) or continuous scores? If binary, scoring precision is over-engineering | Highest |
| 2 | Real data validation (Bitcoin Alpha/OTC, Elliptic++) | Does 3-param PageRank work on adversarial real-world data? | High |
| 3 | Adversarial red team | Cheapest attack that succeeds against simplified system? | Medium |
| 4 | Same-anchor citation dampening | Should intra-organization citations carry less weight? | Medium |
| 5 | Temporal burst detection | Citation volume spikes as signal (not penalty) | Low |
Human KYC through Shyft trust anchors. Trust anchor → agents → sub-agents hierarchy. Each agent traces to a KYC'd human. Trust channel revocation propagates through hierarchy.
Live InfrastructureTrust channels provide sybil deterrence via identity cost. KYC is expensive to repeat. Adding explicit staking bonds double-taxes legitimate users without deterring well-funded attackers. The identity layer shifts L2's role from "detect sybils" to "measure competence."
The reframe: We were building sophisticated algorithmic detection for attacks that an economic mechanism (KYC cost) already makes irrational. This was the pivotal insight from Phase 4 of the research — detection vs. deterrence are fundamentally different solutions to the same problem.
Shyft formally partnered with Distilled Analytics (David Shrier, Alex Pentland MIT) in 2018. Pentland's "Social Physics": behavioral metadata predicts identity/creditworthiness without content. The intellectual foundation exists and has Shyft's name on it. BioCatch's $1.3B acquisition commercially validated this approach.
Tested against Stanford SNAP's Bitcoin Alpha and OTC trust networks — real adversarial data with ground-truth labels.
Critical finding: Alpha=0.6 (from synthetic optimization) performed WORSE than standard PageRank (0.85) on real data. Spearman ~0.396 vs ~0.49. Synthetic optimization produced parameters that hurt real-world performance.
Validates Anti-Pattern #2 (benchmark overfitting) and #5 (synthetic bubble) from methodology framework.
Good news: Sybil resistance was excellent (0.997 stability). The architecture holds — it just needs parameter re-tuning with real data, not synthetic. Standard alpha (0.85) passes the 0.4 threshold.
Next steps: Re-run with alpha=0.85-0.95, incorporate negative ratings from the Bitcoin datasets, calibrate sybil penalties against known adversarial nodes. Elliptic++ (822K nodes) is the next validation dataset.
6-phase research process with multi-model adversarial debate. Each phase builds on the previous one. The conclusion at each phase was different from — and better than — the previous phase.
1000-trial Optuna on 33 params. Score: 0.779 → 0.967. Quantitative baseline established.
DA + Grok + Gemini + Codex challenged evaluation function, assumptions, framing.
Ablation: 33 params → 3. Killed 22 zero-impact parameters. Complexity reduction.
Detection → deterrence. Identity layer already solves the algorithmic problem. Question changed.
Population uniqueness, cross-domain applicability, device liveness — mapped what we don't know.
Concrete research path, decision gates, GTM. From theory to executable plan.
| Level | Question | Most Teams Stop At |
|---|---|---|
| 1 | Is the optimization working? (Are numbers going up?) | ← Here |
| 2 | Is it testing the right thing? (Does the score mean anything?) | |
| 3 | Is optimization even the right approach? (Ablation first) | |
| 4 | Is the problem correctly framed? (Detection vs. deterrence) | |
| 5 | What don't we know we don't know? (Adjacent-field challenges) | ← Real insights |
#1 Optimizing Noise — 22/33 parameters had zero impact. Optimization improved the score by fitting to noise in synthetic data.
Signal: If removing a parameter doesn't change results, you were optimizing noise.
#2 Benchmark Overfitting — 0.967 on synthetic with 0% false positives. Too good. Scores >0.95 on synthetic benchmarks should trigger suspicion, not celebration.
Signal: Near-perfect scores. Ask "why does our benchmark think this is meaningful?"
#3 Complexity Creep — 19 → 33 parameters felt like thoroughness. More knobs = more noise to overfit, more interactions to debug.
Signal: Parameter count growing without corresponding growth in distinct behaviors.
#4 Detection vs. Deterrence Confusion — Building a better mousetrap when you should make it unprofitable to be a mouse. Economic deterrence changes the game.
Signal: Designing sophisticated detection for attacks a deposit/stake mechanism would make irrational.
#5 Synthetic Bubble — Optimizing against your own assumptions is circular reasoning. Confirmed: alpha=0.6 from synthetic hurt real-world performance.
Signal: High confidence in robustness based entirely on tests designed by the same team.
Know what matters first. If you don't know which 3 of 33 drive results, you're optimizing blind.
No single AI validates its own conclusions. 2+ models adversarially. Different blind spots → better intersection.
When optimization plateaus, question the frame. Breakthrough came from reframing, not better numbers.
In adversarial systems, changing incentives beats improving detection. Deterrence is structural; detection is arms race.
Inventory what exists before building new. Shyft trust channels were the biggest defense — just unlabeled.
"What decisions does this score inform?" If product needs coarse ranking, sub-percentage optimization is waste.
| System | What It Does | Revenue / Traction | Limitation |
|---|---|---|---|
| BioCatch | Behavioral biometrics, 280+ banks | $160M ARR, $1.3B acquisition | Centralized, bank-only, no on-chain |
| Worldcoin | ZK proof of personhood | 12M iris scans | Requires Orbs, banned in 9 countries |
| World ID AgentKit | ZK proofs linking agents to humans | Launched Mar 17, 2026 | Walled garden (Orb required) |
| Human Passport | Multi-signal sybil resistance | 2M users, 35M credentials, $430M protected | Single-layer aggregation |
| Trusta Labs | On-chain reputation scoring | 82.8M API calls, 2.5M attestations | No KYC anchor |
| t54 Labs | AI agent trust | $5M raised (Ripple, Franklin Templeton) | No on-device AI, no full stack |
| SpruceID | Government compliance (DID) | $41.8M raised, CA DMV + DHS contracts | Enterprise compliance only |
| Civic | Government ID verification | Established | Requires government ID (KYC, not pseudonymous) |
$170M+ lost to sybil attacks. Human Passport's $430M protection shows demand. Unanimous beachhead across all models.
EU AI Act Article 12, SEC enforcement. $200K-$2M ARR per enterprise. Slower sales cycle but higher value.
CrewAI, LangChain plugins. 46% CAGR. Good distribution channel but not the destination.
BioCatch validates the market ($1.3B). But saturated and requires compliance infrastructure.
Slower adoption cycle. Not enough pain to drive fast integration.
Data-sharing incentives don't exist. Phantom pain — sounds good, nobody buys it.
Two competing payment stacks forming right now. Neither has a reputation layer. That's our gap.
| Layer | Stack A (Stripe/Tempo) | Stack B (Coinbase/World) | Our Position |
|---|---|---|---|
| Identity | No solution | World ID (Orb biometrics) | Layer 3 (Shyft KYC) |
| Trust/Reputation | No solution | No solution | Layer 2 (PageRank) |
| Device Liveness | No solution | World ID AgentKit | Layer 1 (QVAC + TEE) |
| Payments | MPP (HTTP 402) | x402 Protocol | Not our layer |
Payments blockchain, $500M raised, $5B valuation. 100K+ TPS target. EVM-compatible (Reth). Launched mainnet March 18, 2026. Design partners: Visa, Deutsche Bank, Shopify, Nubank, Revolut, OpenAI, Standard Chartered.
Open protocol for machine-to-machine payments. HTTP 402 status code. Three primitives: Challenges, Credentials, Receipts. Supports stablecoins, cards, Lightning. "Sessions" for micropayment streaming.
Strategic position: MPP answers "how does an agent pay?" We answer "should you trust that agent?" These are complementary. MPP's Challenges primitive could include a trust score check — before getting payment credentials, pass a reputation threshold. We become infrastructure that Tempo integrates.
Risk: MPP's design partners include OpenAI, Deutsche Bank, Visa. If Stripe builds identity/trust into MPP directly, we compete against the platform. Mitigation: speed — integrate before they build it.
Each phase has an explicit decision gate. If the gate fails, stop or pivot. No sunk-cost continuation.
iOS Secure Enclave PoC: accelerometer + touch data → signed score. Benchmark: latency, battery, score stability, basic spoofing resistance. Measure score variance across 10 phones.
Bitcoin Alpha/OTC + Elliptic++ (822K nodes). PageRank with alpha=0.85 on real adversarial data. Red team: cheapest attack that changes rankings.
TEE score → ZK threshold proof → on-chain verification. Measure: proof generation time on mobile, verification gas cost, proof size.
CrewAI / LangChain plugin: get_trust_score(agent_id). Open source. Measure installs, API calls, developer feedback.
One production integration. Free integration for public case study. Measure fraud reduction, quality improvement, false positive rate. Publish results on-chain.
GO, with conditions. Extend to 12 weeks with buffers. Add weekly tech reviews and contingency budget. Dedicate full-time engineer to QVAC PoC from Day 1. Push for paid pilots over free case studies. Audit team for ZK/ML expertise. Biggest risk: Layer 2 PageRank may not work on real data without major rework.
CrewAI / LangChain plugin. One function call, returns 0-10000. Open source, zero friction. Claims position before competitors.
ONE protocol losing money to bad agents. Free integration for public case study. On-chain verifiable results as proof.
Free: device liveness gate + basic PageRank (1000 queries/month). Paid: KYC-verified identity scoring via Shyft. The combination = moat.
EU AI Act Article 12 compliance artifacts. SEC/FINRA recordkeeping. Enterprise SaaS ($24K+/year).
| Tier | Price | What You Get |
|---|---|---|
| Free | $0 | Device liveness gate + basic PageRank score, 1000 queries/month |
| Growth | $200-2K/month | Full scoring API + citation analytics + webhook alerts |
| Enterprise | $24K+/year | Compliance exports + audit trails + SLA + custom integration |
A "trust platform" — Vision, not product. Nobody buys a platform; they buy a solution to a problem.
A token-incentivized network — Requires capital you don't have. Phone farm economics destroy token models.
Cross-marketplace reputation portability — Data-sharing incentives don't exist. Phantom pain.
| Company | Strategy | Result |
|---|---|---|
| BioCatch | Behavioral biometrics for bank fraud | $1.3B acquisition, $160M ARR |
| SpruceID | Government compliance contracts | $41.8M raised, CA DMV + DHS |
| Gitcoin Passport | Sybil resistance for grants/airdrops | 2M users, 35M credentials |
| Trusta Labs | On-chain reputation scoring | 82.8M API calls, 2.5M attestations |
| Alchemy | Free-tier infrastructure API | 70% of top Ethereum apps |
"If a hostile government controlled this system, could they use it for mass surveillance?" If yes, redesign.
| Design Decision | Passes Test? | Reason |
|---|---|---|
| TEE processing, no raw data export | Yes | Data never leaves secure enclave |
| Threshold ZK proofs, no score exposure | Yes | Only "above/below threshold" is public |
| User-initiated intermittent liveness | Yes | No background surveillance |
| On-chain attestations, no central DB | Yes | No single entity holds all scores |
| Continuous background sensor collection | FAILS | Removed from design |
| Score history stored centrally | FAILS | Removed from design |
| Data Type | Regulation | Risk | Decision |
|---|---|---|---|
| Accelerometer / gyroscope | Low regulation (motion data) | Low | USE |
| Touch / typing patterns | BIPA (Illinois), GDPR Art. 9 | Medium | USE (on-device only, threshold output) |
| GPS / location | GDPR, CCPA, state privacy laws | High | DO NOT USE |
| Health (HealthKit/Fit) | HIPAA, GDPR Art. 9 | Very High | DO NOT USE |
| Camera / microphone | Wiretap laws, BIPA, GDPR | Very High | DO NOT USE |
| Source | Relevance |
|---|---|
| Pentland, A. "Social Physics" (MIT) | Behavioral metadata as identity/creditworthiness predictor |
| Page & Brin, "PageRank" | Citation graph authority scoring — core of Layer 2 |
| PHC Paper (OpenAI + MIT + a16z, 2024) | "Personhood Credentials" — design space validation |
| Tramèr et al. (2016) | Model stealing attacks — informs white-box adversarial risk |
| EdenDID (Springer, 2025) | Edge computing + on-chain trust |
| Buterin (2023) | "What do I think about biometric proof of personhood?" |
| Distilled Analytics + Shyft (2018) | Behavioral telemetry + blockchain identity partnership |
| BioCatch ($1.3B acquisition) | Commercial validation of behavioral biometrics at scale |