← Back to Proposals
The Allowed-Server List on Solana PROPOSAL v1
Make the list of real ShardKeep servers something that lives on Solana, not on any one company's server. If someone stands up a fake ShardKeep, wallets and nodes will know instantly that it isn't on the list — and refuse to talk to it. · April 2026
One sentence version: We publish a tamper-proof "phone book" of real ShardKeep servers on the Solana blockchain. Every wallet extension and every node checks that phone book before trusting any server. A fake server isn't in the book, so it gets ignored. This proposal also pins down the things the earlier sketch left fuzzy — who controls the program, how URLs change, and what to do if something goes wrong.
Why this matters now. Today, wallets and nodes trust ShardKeep servers because the domain name (master.shardkeep.io) is baked into their code. If a bad actor ever got control of that domain, or stood up a convincing lookalike, there's no second line of defense. We've been lucky. Moving the trust list on-chain removes "lucky" from the equation.
1. What Problem Are We Actually Solving?
The phishing scenario in plain English
- A bad actor registers
shardkeeep.io (three e's — easy to miss).
- They clone our entire site pixel-for-pixel. It looks identical.
- They pay for Google ads so "shardkeep" searches put their fake site on top.
- A user clicks, connects their wallet, and signs what they think is a login.
- The fake site's JS rewrites the signing message at the last moment. User just authorized a token transfer, not a login.
- Funds gone. Vault contents accessible via the signature, gone too.
Think of it like this: Your bank's website could be impersonated. If you only verify "is this the right URL?" and you misread a letter, you lose. But if your wallet had a pre-approved list of your bank's buildings that nobody but your bank itself could update, the fake branch with the slightly wrong address would be instantly rejected by your wallet — before you had a chance to click the wrong button.
What an on-chain allowed-server list buys us
- Removes ShardKeep-the-company as a trust bottleneck. Today, you trust our DNS. Tomorrow, you trust math on Solana.
- Removes DNS hijack as a full-compromise attack. Even if someone stole our domain, they can't add themselves to the on-chain list.
- Gives wallets a pre-click safety net. The wallet refuses to even populate the login flow for an unregistered server. No click-through warning for the user to ignore.
- Makes network operation survive us. If ShardKeep the company disappears, the list is still there, the servers are still there, the users still find each other.
2. What's Already Designed (and What We're Adding)
The earlier proposal already laid out the registry shape. Here's what that doc said, translated:
| Thing | What it is, in English | Status |
shardkeep_authority |
A program running on Solana (think of it as a small, unchangeable app living on the blockchain). It enforces all the rules below. |
KEEP |
AdminRegistry |
The list of people allowed to approve new servers. Starts as the two of us. Changes require both of us to sign. |
KEEP |
WardenRegistry |
The actual "allowed servers" list. Each entry = one Warden node: its identity, its URL, its owner wallet, its stake. |
KEEP |
| 3-layer defense in the extension |
(1) Check the URL is on the list. (2) Warn if a signing request comes from an unlisted URL. (3) Reject auth tokens signed by servers not on the list. |
KEEP |
| DevNet admin-airdrop gate |
Getting added to the list on the test network requires a real conversation with an admin. No bots, no self-serve. This is our anti-spam-node defense. |
KEEP |
What this proposal adds — the gaps the earlier doc didn't cover
Nine items. Each one is something that can hurt us if we ship without addressing it.
| # | Gap | Why it matters |
| 1 | Who can change the program itself? | If the program is upgradeable by one wallet, that wallet is the real root of trust — bigger than AdminRegistry. |
| 2 | TLS certificate pinning vs. Let's Encrypt renewals | Certs rotate every 90 days. A naive pin fires a false alarm every quarter, training users to ignore warnings. |
| 3 | Rollout order / feature flag | Flip "enforce list" before the list is populated = every user breaks instantly. (This is today's outage on steroids.) |
| 4 | update_url on a hot wallet | If the day-to-day wallet is compromised, attacker changes the URL to their own. 5-minute client cache + users redirect to attacker. |
| 5 | Origin check in iframes | Attacker embeds real ShardKeep page inside their fake page. Top frame origin is the attacker's, but naive check sees the iframe's origin. |
| 6 | "Bond = security" is a myth in v1 | The real gate is admin approval. We must not later loosen admin approval thinking bond alone protects us. |
| 7 | Program ID + RPC endpoints become new supply-chain items | These get hardcoded in the extension and agent. Compromised extension = attacker swaps these = reads attacker's fake registry. |
| 8 | Heartbeat + slashing mechanics are vague | Does "missed heartbeat" auto-slash? What's the grace window? Warden with ISP outage shouldn't lose their bond. |
| 9 | What happens if the program has a bug? | Clients can't be rolled back quickly (app-store review, agent update lag). Need a graceful fallback so outages don't cascade. |
3. Gap #1 — Who Controls the Program Itself?
The hidden root of trust
When you deploy a program on Solana, one wallet is designated the upgrade authority. That wallet can replace the program with a completely different version at any time.
Think of it like this: Imagine a law that says "only these 10 buildings are licensed restaurants." That's the Warden list. But if one person can rewrite the law itself, they could add their own building. The law-changer's identity is more important than the restaurant list.
The earlier proposal didn't say who holds upgrade authority. That's the single most important decision in this whole design. If a Warden operator's laptop is stolen and the upgrade authority is on it, the attacker doesn't need to hack the multisig — they just upload a new program that declares their fake URL a valid Warden.
Three options, ranked
| Option | What it means | Pros | Cons |
| A. Burn it |
Set upgrade authority to a non-existent address. Program is immutable forever. |
Highest trust. Nobody can change the program, ever. |
No bug fixes. First serious bug = build new program, migrate every client. |
| B. Squads multisig with time-lock |
A 3-of-5 multisig (different wallets than AdminRegistry) holds upgrade authority. Any upgrade has a 72-hour on-chain delay before it takes effect. |
Bug fixes possible. Compromised signers can't stealth-upgrade; the 72h window lets everyone see it coming and scream. |
Small ongoing operational burden. Need 5 hardware wallets distributed sensibly. |
| C. Single wallet |
Our wallet holds upgrade authority. |
Easy. |
Same problem as centralized DNS, just on Solana. Defeats the point of this proposal. |
Recommendation: Option B — Squads multisig with 3-of-5 signers and a 72-hour time-lock. This is the standard posture for serious Solana programs (Jupiter, Marinade, and others use variations). Signers held on 5 separate hardware wallets in 5 separate physical locations. Timelock gives the community a chance to spot a malicious upgrade before it lands.
MainNet only. DevNet keeps a simpler single-wallet upgrade authority — it's a test network, nothing of value is there. The Squads-with-timelock posture engages at MainNet genesis.
4. Gap #2 — TLS Pinning Without Breaking Let's Encrypt
The problem
The earlier proposal said "the extension caches each Warden's TLS certificate fingerprint; warn if it changes." But every Warden uses Let's Encrypt, which rotates its cert every 90 days. A new cert = new fingerprint = warning popup.
Think of it like this: Imagine your alarm system beeps every time the gardener comes. After a few months you stop paying attention to beeps. Then the actual burglar walks in and the alarm beeps — but you've trained yourself to ignore it.
Fix: pin the key, not the certificate
A TLS certificate is a public key plus a bunch of metadata (expiry, issuer, etc). Let's Encrypt rotates the certificate but can be configured to reuse the same public key. Pinning the key means the fingerprint doesn't change when the cert renews.
# On every Warden, one-time certbot config change:
sudo certbot renew --reuse-key --force-renewal
# Verify the key stays stable across renewals:
sudo openssl x509 -in /etc/letsencrypt/live/DOMAIN/cert.pem -noout -pubkey | sha256sum
The extension pins the SHA-256 of that public key, not the cert fingerprint. Renewals become silent. A real MITM attack (which requires a new key) still trips the alarm.
Alternative: drop pinning entirely. Layers 1-3 (on-chain list + origin check + JWT verification) already cover the threat. TLS pinning is the belt on the belt-and-suspenders. Given the operational cost and the low-but-real risk of false alarms, it's legitimate to just not do it. My lean is "drop it, document that we dropped it, revisit if Layers 1-3 show gaps."
5. Gap #3 — How We Roll Out Without Breaking Everyone
The danger
If the extension ships with "enforce the on-chain list" turned on before the list is populated, every user is instantly locked out of every real server. This is today's outage multiplied by the entire user base.
Think of it like this: You install new door locks at every building in a city. Good idea. But if you hand out the new keys after you've changed the locks, everyone is locked out until the keys arrive. You want to hand out keys first, then swap the locks.
Feature-flag staged rollout
Phase 0 — Before anything user-facing
1. Deploy shardkeep_authority program to DevNet
2. Seed AdminRegistry with both admin wallets, 2-of-2 threshold
3. Seed WardenRegistry with the one live Warden (master.shardkeep.io)
4. Internal smoke-test reads from a script, confirm expected results
Phase 1 — Extension ships with the flag OFF (shadow mode)
5. Extension queries registry on every login
6. Extension compares the live URL to the registry entry
7. If mismatch: log to console + POST to an audit endpoint. DO NOT BLOCK.
8. Run this for at least 2 weeks. Collect every false-positive. Fix them all.
Phase 2 — Flip the flag, but via a server-side config
9. Extension reads enforce_registry from a signed JSON served by any registered Warden
10. Flip server-side: no extension release, no app-store delay
11. If flip goes wrong: flip it back in seconds
Phase 3 — Hardcode enforcement after stability confirmed
12. Next extension release bakes enforce_registry = true into the code
13. Server-side flag becomes a kill-switch only (for emergency disable)
Why this matters: Server-side flags mean we can turn enforcement on and off in seconds, not days. App-store releases take 3-7 days to propagate. A flag we control via a Warden means a botched rollout is fixed by us flipping one value, not by waiting for Chrome Web Store to approve a hotfix.
6. Gap #4 — URL Changes Are Too Easy
The problem
The earlier proposal said update_url — the instruction that changes where a Warden lives — is signed by the manager wallet. That wallet is described as "hot, day-to-day." So if a manager wallet leaks, the attacker:
- Calls
update_url("fake.shardkeep.attack")
- Waits up to 5 minutes (the client cache TTL)
- Every wallet in the world now connects to their URL
Think of it like this: You have a hot wallet for buying coffee and a cold wallet for your mortgage. You wouldn't let the coffee wallet sign a deed transfer. Changing the URL of your Warden is a deed-transfer-level action, not a coffee action.
Two complementary fixes
Fix A — Require operator wallet
update_url must be signed by the operator (cold) wallet, not the manager (hot) wallet. Matches the severity of the operation to the wallet tier.
Fix B — Time-delay + out-of-band alert
URL changes are queued and take effect after 24 hours. The moment the change is queued, an alert fires to Discord + email (not to the attacker's channel — to the contacts registered at bond time). The operator has 24 hours to cancel.
Recommendation: do both. Operator-signed and 24-hour delay and out-of-band alert. URL changes are rare. The friction is tiny. The protection is real.
7. Gap #5 — iframe Origin Checks
The attack
Wallet extensions have a "sign hook" that fires when any page requests a signature. The earlier proposal said "compare the page's origin to the registry." But what page's origin?
Think of it like this: Attacker builds fakeshardkeep.com. They embed the real master.shardkeep.io login page inside a tiny iframe on their page. The user sees both pages. The attacker's page has a JavaScript overlay that intercepts the wallet's response and repurposes it. The iframe is a legitimate registered origin, but the top window isn't.
Fix: the top window must be the registered one
The extension's sign-hook must check window.top.location.origin, not window.location.origin. If the sign request came from an iframe and the top window is an unregistered origin, refuse to sign.
// Wrong — iframe-embeddable
if (!registeredUrls.includes(window.location.origin)) refuse();
// Right — top frame must be registered
if (window !== window.top) refuse(); // no frames allowed at all
// OR
if (!registeredUrls.includes(window.top.location.origin)) refuse();
Simplest: refuse all framed contexts. ShardKeep's own wallet-connect page is never embedded legitimately. A "no frames ever" rule is clean, easy to reason about, and impossible to bypass.
8. Gap #6 — Admin Approval Is the Real Gate, Not Bond
What could confuse us later
The earlier proposal says new Wardens must stake a bond (in SHRD tokens on MainNet). It's tempting to then think: "If bond is locked, the Warden is trustworthy — they're risking money." That thinking is wrong in v1.
Think of it like this: A casino requires a deposit to play. That deposit discourages quitting mid-hand. It doesn't verify the player isn't counting cards. The real check on card-counting is the pit boss watching — the human decision. In v1 of this system, we're the pit boss. The bond is just the deposit.
The actual rule to write down (and not forget)
In v1.x and v2.x, the admin multisig is the Sybil defense. The bond is a lock-in mechanism and a slashing target. Removing admin approval requires re-engineering Sybil defense. Do not treat "bond raised" as equivalent to "trust established."
The permissionless voted-admission model (v2.2+) is where bond starts doing trust-work alongside DevNet tenure and peer vouches. Until then, the admin conversation is the gate.
9. Gap #7 — Program ID and RPC Endpoints Become Critical
New things to protect
Every wallet extension and every node agent will hardcode two new values:
SHARDKEEP_AUTHORITY_PROGRAM_ID — the Solana address of our program
SOLANA_RPC_ENDPOINTS — a list of Solana RPC servers the client queries to read the registry
If the extension is ever compromised at the Chrome Web Store (rare but documented, e.g. Nord VPN, The Great Suspender), the attacker can swap both values to point at their own fake program + fake RPC. The extension then reads an attacker-controlled "registry" that approves the attacker's URL.
Defenses
- Reproducible builds. The extension's source is public, the build is reproducible, anyone can verify the shipped .zip matches the source tag.
- Code-signed agent updates. The Python bastion/warden agent already has
version-check.php. Sign the update payload with a release key. Agent verifies the signature before applying an update.
- Self-check at startup. The agent reads the program ID from its hardcoded constant AND from a
version-check.php response served by any registered Warden. If they disagree, refuse to start and alert. An attacker would need to compromise both the shipped binary and a registered Warden to break this.
- Multiple RPCs with quorum. Query 3 independent Solana RPCs (ours, Helius, Triton). If they disagree about what's in the registry, something is wrong — alert, don't trust either.
Bottom line: Moving to on-chain doesn't remove supply-chain risk, it redistributes it. We gain a defense against DNS hijack but we add a defense requirement against binary tampering. Code-signing + reproducible builds close that new gap.
10. Gap #8 — Heartbeat and Slashing, Spelled Out
What could go wrong if this is vague
Real scenario: operator's ISP has a 6-hour outage. Warden can't heartbeat during that window. What happens?
- If the program auto-slashes after N missed heartbeats: operator loses their bond for something that wasn't their fault. Unfair. Kills operator growth.
- If "missed heartbeat" has no effect: why do we have heartbeats?
- If admin decision triggered on heartbeat gap: subjective, case-by-case. OK, but needs process.
Recommended policy
| Consecutive missed heartbeats | Status | Consequence |
| 0-4 hours | Active (displayed as Active) | None. Normal network variation. |
| 4-24 hours | Active (displayed as Degraded) | Warning badge on the Warden's public profile. Alert to operator. Not auto-penalized. |
| 24-168 hours (1 week) | Probation | Clients prefer other Wardens. Bond locked. No slash. |
| > 168 hours | Dormant | Removed from client routing. Bond still locked. Operator can reactivate with a signed re-heartbeat within 30 days. |
| > 30 days dormant | Exited | Admin multisig confirms exit. Bond returned to operator (unbonding delay applies). |
No automatic slashing based on uptime alone. Slashing is only for provable misbehavior: running a slashed version of the software, failing cryptographic challenges, serving wrong registry data, etc. An offline node is unhelpful — it's not malicious. The network protects itself by routing around it, not by punishing it.
11. Gap #9 — What If the Program Has a Bug?
Worst case scenario
We ship the program. It has a subtle bug that causes get_warden_list to return an empty list (or an error). Every extension that runs Phase 3 enforcement suddenly cannot find any valid Wardens. The entire network becomes unreachable from the client side.
The program upgrade takes 72 hours (our timelock from Gap #1). Extension releases take 3-7 days (Chrome Web Store). We cannot afford even 72 hours of downtime, let alone a week.
Fix: graceful degradation with a signed cache
- Every time the extension successfully reads the registry, it stores a signed copy locally. The signature comes from any registered Warden's node key, covering the registry snapshot + a timestamp.
- On every read, the extension tries the live registry first.
- If the live read fails (program bug, RPC outage, anything): fall back to the cached snapshot as long as the cache is less than 7 days old.
- If the cache is older than 7 days and the live registry is unreachable: surface a clear error to the user, refuse to sign (fail-closed), but give them a manual "accept this risk, connect anyway" button with a big warning.
Think of it like this: Your phone caches its contacts. If you go offline, you can still see your contacts. If you're offline for a month, your phone warns you the list might be stale. If someone adds a new contact on your laptop during that month, you won't see it until you reconnect — but you also won't be paralyzed by being offline.
7-day cache strikes the balance: long enough to survive a bad program upgrade + full extension-release cycle, short enough that a genuinely-removed malicious Warden is ejected from caches within a week.
12. Putting It All Together
The complete trust chain, end to end
When a user opens their wallet extension and visits a ShardKeep page:
1. Extension checks: is this frame the top window? (Gap #5)
→ If no, refuse. Attack ends.
2. Extension reads program ID from its own constant AND cross-checks against a registered Warden's version-check.php. (Gap #7)
→ If mismatch, refuse + alert. Extension may be tampered with.
3. Extension queries 3 Solana RPCs for the WardenRegistry. Quorum must agree. (Gap #7)
→ If query fails, fall back to 7-day signed cache. (Gap #9)
4. Extension checks: is the page URL in the registry? (Original Layer 1)
→ If no, refuse to populate login. Show user: "unregistered server." Attack ends.
5. User signs login. Server returns a JWT signed by the Warden's node key.
6. Extension verifies JWT signature matches the Warden's node_pubkey in the registry. (Original Layer 3)
→ If mismatch, discard JWT. Attack ends.
7. User is logged in — with four layers of verification behind them, not one.
Ship order (the critical part)
- Week 0: Publish this proposal. Review. Finalize gap decisions (especially Gap #1 upgrade authority).
- Week 1: Write
shardkeep_authority program. Deploy to DevNet. Seed with master.shardkeep.io.
- Week 2: Extension ships with registry query in shadow mode (Phase 1 of Gap #3). Collect audit logs.
- Week 3-4: Fix every false-positive found in shadow mode. Build the signed-cache fallback.
- Week 5: Server-side flag to enable enforcement. Flip to ON for 10% of users. Monitor.
- Week 6: Flip to 100%. Observe.
- Week 7: Next extension release bakes enforcement in. Server flag becomes kill-switch.
- MainNet: Squads 3-of-5 multisig with 72h timelock takes upgrade authority. DevNet keeps its lighter posture.
Explicit non-goals for this proposal
- Not redesigning the registry schema. The fields in role-separation-v1 §6 are kept as-is.
- Not implementing voted admission. That's v2.2+.
- Not building the reward distribution logic. That's on-chain-shard-map-v3's territory.
- Not changing Bastion or Sentry trust model. Those connect to Wardens via WSS; their trust flows from the Warden they chose. Only Wardens need on-chain presence.
13. Decisions Needed Before Code Starts
Approve, reject, or modify each:
| # | Decision | My recommendation |
| 1 | Who holds upgrade authority on MainNet? | Squads 3-of-5 multisig with 72h timelock |
| 2 | TLS pinning in the extension? | Drop it. Rely on Layers 1-3. |
| 3 | Rollout method? | Shadow mode → server-flag → hardcoded. Never big-bang. |
| 4 | update_url wallet tier? | Operator wallet + 24h delay + out-of-band alert |
| 5 | iframe policy? | Refuse all framed contexts entirely. |
| 6 | Bond-vs-admin gating language? | Document clearly that admin approval is v1's Sybil defense. |
| 7 | Supply-chain defense for extension? | Reproducible builds + startup self-check against registered Warden |
| 8 | Heartbeat → slashing policy? | Never auto-slash on uptime. Progressive status degradation per §10. |
| 9 | Client fallback behavior? | 7-day signed cache. Fail-closed beyond that. |
Once these nine are locked, we can start writing the Anchor program with confidence that we won't need to rewrite it two months in.
Companion to: proposal-role-separation-v1 (the original AdminRegistry + WardenRegistry design).
Supersedes nothing — this doc fills in what that one left open.