• Hungry Minds
  • Posts
  • 🍔🧠 How Uber Upgraded 2M Spark Jobs (Saved $MMs/Year)

🍔🧠 How Uber Upgraded 2M Spark Jobs (Saved $MMs/Year)

PLUS: Android Coming To PCs 💻, Handling 10B Messages Daily 🚀, Load Balancer Deep Dive 🔄

Today’s issue of Hungry Minds is brought to you by:

Happy Monday! ☀️

Welcome to the 140 new hungry minds who have joined us since last Monday!

If you aren't subscribed yet, join smart, curious, and hungry folks by subscribing here.

📚 Software Engineering Articles

🗞️ Tech and AI Trends

👨🏻‍💻 Coding Tip

  • What is edge storage?

Time-to-digest: 5 minutes

Big thanks to our partners for keeping this newsletter free.

If you have a second, clicking the ad below helps us a ton—and who knows, you might find something you love. 💚

Stop context-switching between fragmented tools.
Miro brings your entire technical workflow into one unified visual platform:

  • Map Complex Systems: Powerful diagramming with 2500+ shapes and AI generation for architecture & flows.

  • Run Async Collaboration: Lead effective stand-ups, retros, and sprint planning with async video and tools.

  • Track Sprints Visually: Visualize dependencies, timelines, and backlogs with native Jira/Linear integration.

  • Document & Align: Create living technical docs that stay in sync with your team’s work.

  • Integrate Your Stack: Works seamlessly with GitHub, AWS, Azure DevOps, and more.

Uber operates one of the world's largest Apache Spark deployments with over 2 million applications running daily across 20,000+ scheduled workflows. Moving from Spark 2.4 to 3.3 meant upgrading every single job without breaking production systems or affecting millions of users.

The challenge: Scale manual migration across 40,000+ Spark applications with no staging environment, no existing test cases, and zero tolerance for production data corruption.

Implementation highlights:

  • Smart code rewriting: Extended their open-source Polyglot Piranha tool to automatically parse ASTs and apply Spark 3 compatibility transformations across Java, Scala, and Python codebases

  • Iron Dome framework: Built custom interceptors for Spark's Catalog interface and Hadoop's File Output Committer to safely redirect production paths during shadow testing

  • Dependency chain overhaul: Systematically upgraded Python 2→3, Scala 2.11→2.12, and resolved monorepo conflicts while maintaining backward compatibility with custom shuffle manager Zeus

  • Automated shadow testing: Created Cadence-powered workflows that automatically shadow production runs, validate output data, and mark jobs ready for migration

  • Runtime path translation: Implemented filesystem-level guardrails that transform /db/tbl to /stgdb/tbl at runtime, ensuring zero risk to production data

Results and learnings:

  • Massive automation win: 85% of jobs migrated within 6 months through automated tooling vs manual developer effort

  • Serious performance gains: 60% of jobs saw 10%+ improvement with 50% overall reduction in runtime and resource usage

  • Developer time savings: Saved thousands of engineer hours while unlocking millions in infrastructure cost savings

Uber's approach proves that even the most complex infrastructure migrations can be automated when you build the right tooling first. Their framework now powers other major upgrades, showing that good automation pays compound interest.

ESSENTIAL (strategy-ninja)
Good Strategy / Bad Strategy

ESSENTIAL (rest-in-pieces)
GraphQL 101: API Approach Beyond REST

Want to reach 190,000+ engineers?

Let’s work together! Whether it’s your product, service, or event, we’d love to help you connect with this awesome community.

Brief: Apple completes transition to full in-house chip production for iPhones, marking strategic shift to optimize AI capabilities while reducing dependence on external suppliers.

Brief: Google's Director of AI and Datadog's VP of Engineering share strategic insights on AI implementation, LLMs, and observability practices in a new comprehensive guide for technical leaders on Google Cloud.

Brief: Tesla's Chief Designer suggests company is considering Cyber SUV and compact Cybertruck models, with a potential mock-up design already spotted in promotional material, despite earlier statements about not using stainless steel exoskeleton in new vehicles.

Brief: Google is reportedly developing an Android-based desktop operating system, signaling a major push to expand beyond mobile and challenge Windows and macOS in the PC market.

Brief: Perplexity introduces Email Assistant that helps with meeting scheduling, email drafting, and priority labeling for Gmail and Outlook, available exclusively to Max subscribers as part of their AI productivity suite.

This week’s coding challenge:

This week’s tip:

Cloudflare Workers Durable Objects provide globally consistent state with SQLite-like transactions across edge locations. Each object instance gets its own isolated JavaScript context with persistent storage that automatically handles consistency and failover.

Wen?

  • Real-time collaborative apps: Building document editors where each document is a durable object handling concurrent edits

  • Game state management: MMO games where each room/lobby maintains a consistent state across global players

  • Rate limiting with memory: Per-user rate limiters that persist across requests and maintain state during traffic spikes

If I cannot do great things. I can do small things in a great way.
Martin Luther King, Jr.

That’s it for today! ☀️

Enjoyed this issue? Send it to your friends here to sign up, or share it on Twitter!

If you want to submit a section to the newsletter or tell us what you think about today’s issue, reply to this email or DM me on Twitter! 🐦

Thanks for spending part of your Monday morning with Hungry Minds.
See you in a week — Alex.

Icons by Icons8.

*I may earn a commission if you get a subscription through the links marked with “aff.” (at no extra cost to you).