🍔🧠 Why the F*ck is Discord Still Using Elixir

Happy Monday! ☀️

Welcome to the 341 new hungry minds who have joined us since last Monday!

If you aren't subscribed yet, join smart, curious, and hungry folks by subscribing here.

📚 Software Engineering Articles

Scaling Android Studio performance in massive monorepos
IC work is the new career flex for engineers
Verifiable rewards-based reinforcement learning with GRPO on SageMaker
Four subagent patterns shaping AI agent architecture in 2026
JWT security in Node.js: five critical errors explained

🗞️ Tech and AI Trends

Anthropic is now leading the AI boom's race forward
Google reveals AI helped hackers find major software vulnerabilities
Amazon employees are tokenmaxxing under AI adoption pressure

👨🏻‍💻 Coding Tip

Prevent cache stampede using probabilistic early expiration and request collapsing via background workers

Time-to-digest: 5 minutes

How Discord Built Tracing for Elixir Without Breaking Everything 🧪

Discord runs millions of concurrent guild processes using Elixir's message-passing model, enabling near-instantaneous chat experiences at scale. But when things break, engineers were flying blind—metrics and logs couldn't explain why a guild suddenly felt laggy or why sessions couldn't connect.

The challenge: Implement distributed tracing across Elixir services that communicate via message passing (not HTTP), handle million-user fanouts without exploding span volume, and deploy with zero downtime on a system processing millions of concurrent operations.

Implementation highlights:

The Envelope primitive: Wrap all inter-process messages with metadata containers that transparently attach and propagate trace context without manual encoding at every call site
Zero-downtime rollout: Support both Transport and legacy message passing simultaneously, allowing gradual migration across fleet during normal deployments
Dynamic fanout sampling: Adjust trace sampling rates based on fanout size (100% for 1 session, 0.1% for 10k+) to prevent drowning in spans during massive broadcasts
Lazy context unpacking: Read just the sampling flag byte from trace context headers before expensive full parsing, cutting gRPC handler CPU overhead by half
Forbid root spans post-fanout: Sessions can only continue existing traces, never create new ones, eliminating redundant sampling decisions and regaining 10% CPU back

Results and learnings:

Complete visibility: End-to-end traces now reveal the exact flow and timing of operations across the entire platform, from API to guild to session
Pinpointed bottlenecks: Found that sessions were spending 75% of time unpacking trace context; a single-byte optimization fixed it
Real impact quantified: Discovered guild outages caused 16-minute connection delays for users—impossible to diagnose without tracing

Discord proved that you don't need HTTP headers to do distributed tracing—just some clever engineering and a willingness to challenge assumptions. Their approach of optimizing for zero overhead first, then adding instrumentation, is the opposite of "measure first," and it paid off spectacularly.

IC work is the new career flex

The rise of the High-impact Individual Contributor (HI-C!)

Overcoming reward signal challenges: Verifiable rewards-based reinforcement learning with GRPO on SageMaker AI

In this post, you will learn how to implement reinforcement learning with verifiable rewards (RLVR) to introduce verification and transparency into reward signals to improve training performance.

Scaling developer experience: How we improved Android Studio in a large monorepo

Frustrated by 35-minute integrated development environment (IDE) syncs? In a large monorepo, slow builds were eroding productivity. Discover how we built a custom Focus plugin to slash sync times to under 2 minutes and drop memory usage from 10 GB to 2 GB. Learn how we leveraged Gradle and Bazel to reclaim developer flow without changing a single team's workflow.

Removing AI in Tech Interviews is Wrong

AI-assisted engineering is already the standard, and technical interviews should reflect how modern software is actually built.

How Agents Manage Other Agents: Four Subagents Patterns in 2026

Subagents solve context pollution, but how the main agent manages them matters more than whether they run in sync or async. Four orchestration patterns, from a simple tool call to an autonomous agent team, each with different requirements for model capability and result collection.

Monolith vs Microservices vs Modular Monoliths

Most software systems start simple. A small team, a single codebase, one database, and a few features.

22 Somewhat Random Bits of (Mostly Tech Career) Advice From Dave

Not in any order, or theme, just 22 thoughts from Dave.

ARTICLE (token-twister-tango)
JWT in Node.js: How It Works, 5 Errors That Compromise Your API, and Refresh Token with Rotation

ARTICLE (ai-shoulder-surfer)
What "Built It Solo" Actually Means When You Work With AI

ARTICLE (pocket-monster-power-up)
A Year Late, Claude Finally Beats Pokémon

ARTICLE (nitpick-ninja-moves)
What to do if you're not "detail oriented"

ARTICLE (memory-lane-memories)
Redis array type: short story of a long development

ARTICLE (brain-box-builder)
Build an AI-Powered Learning Management System That Actually Trains People

ARTICLE (code-grim-reaper)
Software Engineers are Obsolete

ESSENTIAL (attention-grabber-guide)
A deep dive into the Transformer architecture

ARTICLE (spotify-spell-caster)
Building a Natural Language Interface to the Spotify Ads API with Claude Code Plugins

ARTICLE (startup-snatch-watch)
Why are AI companies buying the teams behind your favorite dev tools?

Want to reach 200,000+ engineers?

Let’s work together! Whether it’s your product, service, or event, we’d love to help you connect with this awesome community.

WORK WITH US

⚠️ Mitchell Hashimoto Warns Against "AI Psychosis" in Software Development (4 min)

Brief: Hashimoto expresses concern that companies are caught in "AI psychosis," operating under the belief that AI agents can automatically fix bugs at scale, echoing past infrastructure lessons where over-automating resilience created hidden systemic risks that metrics failed to capture.

🖥️ Ubuntu Prioritizes Local AI Over Cloud-Centric Integration (2 min)

Brief: Canonical is shifting Ubuntu's AI strategy away from cloud-first approaches toward local inference and on-device models, offering modular design, user control, and inference snaps for simplified hardware-optimized AI installation while rejecting a global AI killswitch to maintain flexibility across different Ubuntu consumption methods.

🚀 AI is the New Netflix: Why Upload Traffic is the Real Story (4 min)

Brief: AI is becoming the killer app of the next broadband era, but unlike streaming video, it's driven by upload traffic, not downloads—cloud sync now dominates upstream data, while IoT devices, connected cars, and voice search generate massive outbound flows that are reshaping how networks must be built.

⚙️ Amazon Workers "Tokenmaxxing" to Hit AI Usage Targets (3 min)

Brief: Amazon employees are automating unnecessary tasks using the internal MeshClaw AI tool to artificially boost their token consumption and meet company AI adoption targets, creating perverse incentives despite management claims the metrics won't affect performance reviews.

This week’s tip:

Implement cache stampede prevention using probabilistic early expiration (xFetch pattern) combined with probabilistic collapsing of thundering herd requests via Kafka topics or Redis Streams. When a key nears expiration, refresh it with background workers before client demand spikes, reducing lock contention and miss-latency storms.

Wen?

High-QPS APIs with shared expensive dependencies: Probabilistic refresh (e.g., 25% of clients trigger background refresh at 75% TTL) avoids coordinated expiry spikes that spike backend load 10x.
Distributed caches with hot keys: Combine with Kafka consumer groups to load-balance refresh requests across workers; prevents single coordinator from becoming a bottleneck.
Session caches with long TTLs: Early refresh at 75% TTL reduces the window where lock-wait cascades occur; stale-while-revalidate pattern masks latency from clients.

Learn from the rejection and turn it into an opportunity!
Mary Engelbreit

That’s it for today! ☀️

Enjoyed this issue? Send it to your friends here to sign up, or share it on Twitter!

If you want to submit a section to the newsletter or tell us what you think about today’s issue, reply to this email or DM me on Twitter! 🐦

Thanks for spending part of your Monday morning with Hungry Minds.
See you in a week — Alex.

Icons by Icons8.

*I may earn a commission if you get a subscription through the links marked with “aff.” (at no extra cost to you).

🍔🧠 Why the F*ck is Discord Still Using Elixir

Keep Reading

Hungry Minds 🍔🧠