← Back to Proposals

Network Architecture v1 COMPLETE PROPOSAL

How the ShardKeep node network evolves with on-chain program authority — keeping the nodes, adding the trust. April 2026

The hybrid model: The Solana program handles authority (who can do what, where shards are mapped, who is staked, subscription state). The existing node network handles coordination (real-time shard delivery, sub-second challenge timing, WSS liveness monitoring). These are complementary — we're not replacing the node network, we're giving it an immutable trust anchor.

1. How the Network Works Today

Three node types, one pipeline

Every node in the ShardKeep network — regardless of type — follows the same 6-state qualification pipeline before it can participate:

PENDING → EVALUATING → AUTHENTICATED → QUALIFIED → ACTIVE
↓
SUSPENDED

Operator Nodes

Role: Network validators and challenge coordinators

Run on dedicated servers (VPS/bare metal)
Route shard requests between users and vault nodes
Validate heartbeats and orchestrate epoch challenges
Bond: $500 USD equivalent in SHARDKEEP
Earnings: up to $864/month at scale
Uptime requirement: 99.5%+

Vault Nodes

Role: Store and serve encrypted shard fragments

Run on any always-on device (Pi, VPS, NAS, desktop)
Hold encrypted shard blobs (25 KB each, max capacity configurable)
Respond to store/fetch/delete/verify requests via WSS
Bond: $100 USD equivalent
Earnings: up to $180/month at scale
Uptime requirement: 99%+

XNodes

Role: Lightweight end-user contribution

Run as browser extension or lightweight desktop agent
Store small amounts of shard data (25 MB limit in browser)
Best-effort uptime (lighter penalties)
Bond: $10 USD equivalent
Earnings: up to $10.80/month at scale
Can't upgrade to vault or operator (locked type)

The coordination layer (what runs it today)

Component	What It Does	Where It Runs
Heartbeat API	Node registration, qualification pipeline, health monitoring, API key issuance	PHP on our server (`heartbeat.php`)
WSS Server	Real-time shard delivery, fetch requests, challenge-response, shard verification	Python asyncio on our server (`vault-ws.py`)
Epoch Engine	14-day epoch cycles, 30-minute blocks, challenge scheduling, block scoring	PHP + Python config (`EpochEngine.php`, `epoch-config.json`)
ShardService	Shamir splitting, node selection for shard placement, retrieval orchestration	PHP (`ShardService.php`)
MySQL Database	Node registry, shard locations, challenges, verifications, block scores	`citadel` database
XNode Agent	Installable daemon, heartbeat sender, WSS client, shard storage on disk	Python on operator's machine (`xnode.py` v0.4.3)

The trust problem: Every component above runs on our infrastructure or is verified by our database. If ShardKeep is compromised, every node's qualification, every shard location, every challenge result, and every reward calculation can be manipulated. The node network WORKS, but it runs on centralized trust.

2. The Hybrid Authority Model

The principle: on-chain for AUTHORITY, off-chain for COORDINATION

Not everything belongs on a blockchain. Sub-second WSS challenge-response, real-time shard delivery, and heartbeat monitoring require the speed of traditional infrastructure. But WHO is staked, WHERE shards are mapped, WHAT tier a user has, and WHETHER a node should be slashed — those are questions of AUTHORITY that benefit from immutable, verifiable, trustless on-chain state.

Function	Today	Proposed	Why
Shard location map	MySQL	On-chain PDA	Operator blindness — we can't see where your shards are
Node registration & bond	MySQL	On-chain PDA + staked tokens	Permissionless joining, no gatekeeping, bond can't be arbitrarily seized
User subscription tier	Not built yet	On-chain PDA + escrow	Third parties can verify tier on-chain (Auth-as-a-Service)
Access control / 2FA	Not built yet	On-chain PDA	Kill switch enforced by smart contract, not server
Reward eligibility	MySQL block_scores	Merkle root on-chain	Operators independently verify their rewards
Slashing decisions	Admin manual	Program-enforced rules	Slashing criteria transparent and automatic
Heartbeat monitoring	PHP API	Hybrid	Real-time liveness needs off-chain speed; aggregate scores posted on-chain per epoch
WSS shard delivery	Python asyncio	Stays off-chain	Sub-25ms latency impossible on-chain; WSS is the right tool
Challenge-response	WSS + MySQL	Hybrid	Individual challenges stay WSS (speed); aggregate scores go on-chain (trust)
Shard verification (HMAC)	WSS	Stays off-chain	HMAC challenge needs sub-second round-trip; results feed into on-chain scoring
Node selection for shards	ShardService.php	Hybrid	Selection reads on-chain registry (who's staked) + WSS liveness (who's connected)

The rule: If it's a question of who has permission or what state is authoritative, put it on-chain. If it's a question of real-time coordination or sub-second communication, keep it off-chain. Feed off-chain results into on-chain state at epoch boundaries.

3. How Each Node Type Evolves

3.1 Operator Nodes — from database-registered to on-chain-staked

Today

Registered via heartbeat API
Qualification tracked in MySQL
Bond: conceptual (not enforced)
Earnings: not yet implemented
Slashing: admin-only manual action

Proposed

Registration = Solana transaction (creates NodeRegistry PDA)
Bond = SHARDKEEP tokens locked in program-controlled escrow
Qualification: on-chain status updated by heartbeat oracle
Earnings: claim via Merkle proof each epoch
Slashing: program-enforced rules (100% for corrupt data, 25% for downtime, etc.)

The transition:

Operator installs agent (same as today: curl ... | bash)
→ Agent generates Solana keypair (new: stored at ~/.shardkeep/operator.json)
→ Operator stakes bond via web UI or CLI: shardkeep stake --amount 500 --type operator
→ Solana program creates NodeRegistry PDA with stake
→ Agent sends heartbeats to API (same as today)
→ Heartbeat oracle posts aggregate uptime scores on-chain per epoch
→ Node's on-chain status transitions: Pending → Active
→ Node is eligible for shard assignments and reward claims

// The heartbeat API and WSS server continue running exactly as today
// The on-chain program adds trustless registration, staking, and reward claims
// Operators keep the same agent, same install process, same heartbeat protocol

Why operators care:

Bond protection: Their staked tokens are in a program-controlled PDA, not a ShardKeep-controlled wallet. We can't seize them arbitrarily.
Reward verification: They can independently verify their epoch rewards by checking the Merkle root on-chain. No "trust us, we calculated it right."
Permissionless entry: Anyone with the minimum stake can register. No gatekeeping, no approval queue.
Transparent slashing: Slashing rules are in the program source code (verifiable build). If their bond gets slashed, they can verify the reason on-chain.

3.2 Vault Nodes — blind shard storage

Today

Receive shards via WSS with full metadata (entry_id, wallet_address, shard_index)
Our database knows exactly which node holds which user's shards
Verification challenges reference specific shard_ids tied to users

Proposed

Receive shards via WSS as anonymous blobs (shard_hash only, no user metadata)
Our database stores node health data only — shard map is on-chain, encrypted
Verification challenges reference blob hashes, not user identities

The transition:

User stores a secret
→ Client encrypts + Shamir splits (same as today)
→ ShardService selects vault nodes (reads on-chain registry for staked/active + WSS status for liveness)
→ WSS delivers shards to selected nodes
→ Key change: WSS sends {shard_hash, encrypted_blob} only — no entry_id, no wallet_address, no shard_index
→ Vault node stores blob indexed by shard_hash
→ Client mints shard map as encrypted cNFT on Solana (only client knows which hashes went where)
→ Server purges shard-to-node mapping from memory

// Vault node holds: {shard_hash: encrypted_blob} — nothing else
// Vault node does NOT know: who owns it, which entry, which sibling shards, where siblings are
// This is the "blind safety deposit box" model

Why this matters for vault operators:

Legal protection: If subpoenaed, operators can truthfully say "I hold encrypted blobs. I don't know who owns them or what's in them."
Reduced liability: No user data, no user metadata, no personal information stored.
Same operations: store, fetch-by-hash, delete-by-hash, verify-HMAC. The WSS protocol barely changes — just fewer fields per message.

3.3 XNodes — lightweight participation stays simple

Today

Browser extension or lightweight agent
25 MB storage limit
Heartbeat every 30 seconds
Bond: conceptual ($10 equivalent)
Locked type: can't upgrade to vault or operator

Proposed

Same deployment (browser extension or agent)
Same storage model
Bond staked on-chain ($10 equivalent, lighter slash cap at 25%)
Reward claims via same Merkle system
Still locked type — lightweight by design

Minimal change for XNode operators. The only new step is staking the $10 bond via a Solana transaction (can be done from the browser extension itself). Everything else — heartbeats, storage, challenges — works the same.

4. The Shard Lifecycle (Hybrid)

4.1 Store

Client: Encrypt secret with AES-256-GCM (wallet-derived key)
Client: Shamir split into k-of-n shards (per security tier)
→ Client sends encrypted shards to ShardService API

Server: ShardService.selectNodes()
    → Reads on-chain NodeRegistry PDAs for staked + active operators/vaults
    → Reads WSS liveness status (which nodes are currently connected)
    → Intersects: staked AND live AND has capacity
    → Load-balances by shard count, diversifies by IP/region
    → Returns selected node list to client

Server: WSS delivers shards (anonymous blobs: shard_hash + encrypted_data only)
→ Vault nodes ack receipt

Client: Builds shard map: {shard_hash_1: node_id_A, shard_hash_2: node_id_B, ...}
→ Encrypts shard map with wallet key
→ Stores encrypted map on Solana (ShardMap PDA via shardkeep_core program)
→ Server purges shard-to-node mapping from memory

// After this point: server knows NOTHING about shard locations
// Vault nodes know NOTHING about shard ownership
// Only the user's wallet can decrypt the on-chain shard map

4.2 Retrieve

Client: Signs auth challenge with wallet
→ Reads ShardMap PDA from Solana (FREE — reads cost nothing)
→ Decrypts shard map locally with wallet-derived key
→ Now knows: shard_hash_1 is on node_A, shard_hash_2 is on node_B, ...

Client: Contacts vault nodes directly (or via WSS relay):
"Give me blob with hash 0xABC123"
→ Vault node returns encrypted blob (node has NO idea who is asking or why)
→ Client collects k blobs, runs Shamir reconstruction locally
→ Secret recovered. Server was not involved in locating shards.

Retrieval is server-optional. The client can read the on-chain map and contact vault nodes directly. The server's role becomes a convenience relay (routing fetch requests via WSS), not a requirement. If ShardKeep's server goes down, users can still retrieve their secrets using any Solana RPC provider + direct node connections.

4.3 Rotate

Trigger: Client opens extension, checks rotation schedule from on-chain PDA
Guardian = weekly, Sentinel = daily, Fortress = daily

If rotation due:
→ Client decrypts current shard map (from on-chain PDA)
→ Retrieves shards from current nodes (by hash)
→ Re-splits with fresh Shamir randomness (entirely new shard set)
→ Server selects new target nodes (on-chain registry + WSS liveness)
→ WSS delivers new anonymous blobs to new nodes
→ Client updates ShardMap PDA on-chain with new encrypted map
→ Server purges mapping, old shards purged from old nodes

// Server sees shard locations for ~5-10 seconds during redistribution (ephemeral)
// OR: user delegates rotation to an authorized agent (RotationDelegate PDA)
// OR (future, DevNet R&D): blind swap via shardkeep_rotation program

4.4 Verify (unchanged but enhanced)

WSS server selects shards for verification (same as today: LRU-based, up to 5 per batch)
→ Sends verify_shard with random nonce to vault node
→ Node computes HMAC(hmac_key, nonce) and responds
→ Server validates response, records pass/fail/timeout

Enhancement:
→ At epoch boundary: aggregate verification scores per node
→ Post aggregate scores on-chain (via heartbeat oracle)
→ On-chain scores feed into: reward eligibility, slashing decisions, reputation

// Individual verifications stay off-chain (sub-second timing required)
// Aggregate results go on-chain (trustless, verifiable by operators)

5. Why Operators Run Nodes

The operator incentive loop

Operator stakes bond on-chain (verifiable, can't be seized)
→ Runs node, stores shards, responds to challenges
→ Heartbeat oracle posts uptime + verification scores on-chain
→ Epoch ends: Merkle tree of rewards computed from scores
→ Operator claims reward with Merkle proof (no trust in ShardKeep)
→ Tokens arrive in operator's wallet
→ More operators → more capacity → better uptime → more users → more fees → higher rewards

Earnings by node type (from tokenomics v3.1):

Node Type	Bond	Year 1 Earnings	Year 5 Earnings	Hardware Cost
Operator	$500	$86/month	$864/month	$20-50/month VPS
Vault	$100	$18/month	$180/month	$5-20/month (Pi or VPS)
XNode	$10	$1/month	$10.80/month	$0 (runs in browser)

Earnings funded by token emissions (Years 1-5) transitioning to fee recycling (Year 5+). Operators break even at Year 1 prices; profit at scale.

Why on-chain staking changes the game:

Today: "Trust ShardKeep to track your bond and pay you." Operators must trust that we'll honor the reward schedule.
Proposed: "The program holds your bond and the Merkle root proves your reward." Operators trust math, not a company. They can verify every calculation independently. If ShardKeep disappears, the program still holds their bond and distributes rewards.

6. Why Users Benefit

User Concern	Today	With Hybrid Authority
"Can ShardKeep see my passwords?"	No (client-side encryption)	No (same + shard locations also hidden from us)
"Can ShardKeep be breached?"	Server holds shard location map — breach exposes which nodes to target	Shard map on-chain, encrypted. Breach gets nothing.
"Can ShardKeep lock me out?"	Technically yes (server controls access)	No. On-chain state is permissionless. Alternative clients work.
"Will my vault survive if ShardKeep shuts down?"	No — server coordinates everything	Yes. Shard maps on-chain, nodes keep running (staked), retrieval works peer-to-peer.
"Can I prove my subscription tier to others?"	No (our API is the only source)	Yes. On-chain Subscription PDA readable by anyone (Login with ShardKeep).
"What happens if I die?"	Vault is lost forever	Dead man's switch + beneficiary transfer on-chain. Automatic.
"If I'm breached, can I freeze everything instantly?"	Contact support, wait for response	Backup wallet signs one transaction. Instant on-chain lockdown. Nobody can override it.

7. What Changes vs What Stays

Stays exactly the same (off-chain coordination)

WSS server — real-time shard delivery, fetch, delete, challenge-response. Sub-25ms latency impossible on-chain.
Heartbeat API — nodes still send heartbeats to our endpoint every 30 seconds. The oracle aggregates and posts to chain.
HMAC shard verification — individual challenges stay WSS-based. Too fast for on-chain round-trips.
Epoch timing — 14-day epochs, 30-minute blocks, 20 challenges per block. Same config.
XNode agent install — curl ... | bash. Agent adds a Solana keypair generation step.
Client-side encryption — AES-256-GCM, wallet-derived key. Unchanged.
Shamir splitting — Same library, same thresholds (3-of-5 through 7-of-12).

Moves on-chain (authority)

Shard location map — encrypted cNFT metadata or ShardMap PDA
Node registration + bond staking — NodeRegistry PDA + escrow
Subscription tier — Subscription PDA + payment escrow
Access control + 2FA — AccessControl PDA
Aggregate epoch scores — posted by heartbeat oracle at epoch boundaries
Reward distribution — Merkle root on-chain, operators claim with proofs
Slashing — program-enforced rules, automatic execution
Estate/inheritance — dead man's switch PDA

New components (hybrid)

Heartbeat oracle — aggregates off-chain heartbeat data, posts epoch summaries on-chain. Runs as a crank service.
Node selection hybrid — ShardService reads on-chain registry (staked/active) AND WSS status (live/connected). Both must pass.
Staking CLI/UI — operators stake bonds, users subscribe, all via Solana transactions.

8. On-Chain Program Architecture

Program structure

Full PDA map and instruction set documented in On-Chain Program Authority v3. Summary:

Program	PDAs	Phase
`shardkeep_core`	VaultIndex, ShardMap, Subscription, SubscriptionEscrow, AccessControl, RotationDelegate, FeeConfig, Treasury, ProgramConfig	Phase 1
`shardkeep_nodes`	GlobalNodeConfig, NodeRegistry, NodeStakeVault	Phase 2
`shardkeep_rewards`	RewardPool, OperatorRewards, MerkleDistributor	Phase 2
`shardkeep_estate`	EstatePlan, EscrowedKey	Phase 3

Existing vault program 4hfvirYMHxW4nZSuTreWRQQD45Hfc4LKmUyy3hFYcZVP remains unchanged. The new programs are separate deployments on DevNet.

9. Migration Plan

Phase 1: Dual-authority period (no disruption)

Both systems run in parallel. Off-chain database remains authoritative. On-chain state is populated alongside it. Nothing breaks for existing nodes or users.

Week 1-4: Deploy shardkeep_core + shardkeep_nodes on DevNet
→ Existing heartbeat API writes to MySQL AND creates on-chain PDAs
→ Nodes don't need to update — server mirrors their state on-chain
→ Zero disruption to existing network

Week 4-6: Enable on-chain shard maps for new entries
→ New shard stores write map to both MySQL AND on-chain PDA
→ Retrieval reads from MySQL (primary) with on-chain (fallback/verification)
→ Existing shards unaffected

Week 6-8: Migrate existing shard maps to on-chain
→ Batch process: for each existing entry, encrypt map + store on-chain
→ Verify on-chain map matches MySQL
→ Once verified: flip authority to on-chain, MySQL becomes cache/fallback

Phase 2: On-chain authority (MySQL becomes cache)

→ Shard maps: on-chain is authoritative, MySQL caches for performance
→ Node registry: on-chain is authoritative for staking/status
→ Subscriptions: on-chain is authoritative
→ Heartbeats: still go to API, oracle posts summaries on-chain
→ WSS: still operational for real-time coordination
→ MySQL retains operational data only (heartbeat logs, challenge details, performance metrics)
→ Shard location data PURGED from MySQL (on-chain is the only record)

At no point does the existing network go down. The migration is additive: on-chain authority is layered ON TOP of the existing infrastructure. Once validated, authority flips from database to chain. The database becomes a performance cache, not a trust anchor.

10. The One-Liner (for each audience)

Audience	The pitch
Users	“Not even ShardKeep knows where your passwords are.”
Operators	“Your bond is in a smart contract, your rewards are in a Merkle tree. Trust math, not a company.”
Developers	“Login with ShardKeep — verify identity on-chain, no API dependency.”
Investors	“The only password manager that survives its own company shutting down.”
Competitors	“Replicate this and you've rebuilt your entire architecture. Good luck.”

This proposal integrates findings from: on-chain-shard-map v1/v2/v3, vault-security-v3, tokenomics-v3.1, addendum-revenue-streams-v1, shardkeep-branding-proposal, and a complete codebase audit of the ShardKeep node network (ShardService.php, heartbeat.php, vault-ws.py, xnode.py, EpochEngine.php, epoch-config.json, and all database schemas).

The hybrid authority model keeps what works (fast off-chain coordination) and adds what's missing (trustless on-chain authority). The node network doesn't go away — it gets a backbone made of math.