Squirrel Stack Microservices: 147 Services, Zero Docs
The full Squirrel Stack microservices architecture, as drawn on a napkin during a coffee break in 2021 and never revisited. 147 services. 4 of them are doing anything. 12 of them call each other in a perfect circle. The remaining 131 are owned by an engineer who left the company in 2023.
We call this pattern distributed monolith. Other teams call it the same thing. The difference is we say it with confidence on stage at conferences.
The Architecture (As Drawn From Memory)
[API Gateway]
|
+----------------+----------------+
| | |
[auth-svc] [user-svc] [notification-svc]
| | |
+------+---------+------+---------+
| |
[acorn-allocator] [nut-billing]
| |
[tail-twitch-tracker] |
| |
+-------+--------+
|
[chitter-bus (RabbitMQ)]
|
+--------------+--------------+
| |
[128 other services] [legacy-monolith-2009]
| |
+-------------+---------------+
|
[shared Postgres]
|
(single point of failure,
we know, it's on the roadmap)
Diagram accurate as of last Tuesday at 3pm. The notification-svc has since been rewritten in Rust and forgotten.
Service Inventory (Selected Highlights)
- acorn-allocator: Distributes acorns. Sometimes. Has not been deployed since the engineer who owned it joined a stealth startup.
- nut-billing: Charges users in fractional acorns. Powered by Stripe and prayer. Currently double-charges everyone on Fridays.
- tail-twitch-tracker: Streams tail-twitch events to a Kafka topic that nobody subscribes to. Costs $4,200/month.
- legacy-monolith-2009: Runs 80% of business logic. Cannot be killed. Documented in a Confluence page that was last edited in 2014.
- burrow-balancer: Load balancer. Decides which service to send traffic to based on vibes.
- chitter-bus: Internal event bus. Frequently down. Frequently the reason production is down. We blame the network.
- squirrel-cache: Redis, but the data has feelings. TTLs are negotiable.
- + 140 other services: Names lost to time. Some may not exist. Removing them is forbidden by tradition.
The AI Services (Bolted On In Q3)
- ai-decision-orch-svc: Uses Squirrel-GPT 5 to decide which service should handle the request. 34% of the time it just calls itself recursively.
- prompt-router-svc: Routes prompts to the wrong model based on cost anxiety.
- rag-indexer-svc: Indexes Confluence pages that no longer match reality.
- hallucination-filter-svc: Blocks correct answers by mistake.
- agent-orchestrator-svc: Starts eight agents to solve one null pointer exception.
- vector-stash-svc: Stores embeddings in CachetNut, loses half of them to winter.
- context-window-exhaustion-guard: Truncates your prompt right before the important part. Every time.
- ai-vibe-check-svc: Calls Squirrel-GPT 5 with the request body and asks "does this feel enterprise-y?"
How A Single Request Actually Travels
- User clicks "Buy Acorn" on the homepage.
- Request hits the API Gateway, gets rate-limited because someone forgot to whitelist the production frontend.
- auth-svc validates the JWT, then calls user-svc to validate the user, which calls auth-svc again to validate the JWT.
- auth-svc also calls the new
vibe-check-agent, which asks Squirrel-GPT 5 whether the user "feels like they should be allowed in." - 11 more services are pinged for "feature flags" and "telemetry" and "the vibes."
- Request reaches acorn-allocator, which is down.
- chitter-bus picks up the failure and broadcasts it to 47 listeners, 46 of whom ignore it.
- The 47th listener is a Slack bot that pings the on-call engineer.
- On-call engineer (Hazel) restarts the service. Request finally succeeds. Latency: 18 seconds. Status code: 200.
- The request succeeds, but the AI agent that handled billing has invented a new currency called "NutCreds" and charged the user 3,400 of them.
- User goes elsewhere. We track this in Acornlytics as "engagement."
Ownership Matrix
auth-svc-- owned by Platform, blamed on Security.nut-billing-- owned by Finance, maintained by someone in DevRel.prompt-router-svc-- owned by AI Enablement, which is two PMs and a Figma board.legacy-monolith-2009-- owned by "the business."acorn-allocator-- owned by Steve (no longer with the company). Maintained by Hazel under duress.
Incident Response
- Sev 1 -- Squirrel-GPT 5 summarizes the outage incorrectly in Slack.
- Sev 2 -- Acornlytics dashboard turns green because metrics stopped arriving.
- Sev 3 -- NutToggle turns off the feature after the customer has already tweeted.
- Kuk-kuk-1 -- prod is on fire. Wake Hazel.
- Kuk-kuk-4 -- Steve was mentioned in standup.
Incident Timeline: The Great Snack Outage of 2025
A single shared incident. Referenced everywhere. Postmortem was scheduled. Postmortem never happened. See also: the full archive.
- Friday, 4:47pm --
acorn-allocatorreturns200 OKand allocates zero acorns. - 47 listeners on
chitter-bussee the failure. 46 ignore it. - Bjorn the Hawk is blamed in the public RCA.
- Squirrel-GPT 5 summarizes the incident incorrectly in Slack.
- The fix: rolling back to a commit titled "wip do not deploy."
- Total duration: 17 hours. Hazel hasn't slept right since.
Slack Thread Archive
@nutterz prod is down
@hazel yes
@nutterz is it bad
@hazel yes
@acornelius this is a great learning moment
@hazel restarting acorn-allocator
[reaction] :squirrel: x 47
@bjorn-the-hawk just glided through the data center, FYI
@hazel ... Confluence Page (Stylized)
Owner: Steve (no longer with the company)
Last edited: 2018-03-22
Tags: TODO, draft, do-not-delete, important, also-todo
Body: "see Steve"
Generate a Jira Ticket
For the backlog. Will not be groomed. Status: WONT-FIX (Vibes).
Click the button. Pretend it's groomed.