• Hungry Minds
  • Posts
  • 🍔🧠 How DropBox Built Multimedia Search for 700M Users

🍔🧠 How DropBox Built Multimedia Search for 700M Users

PLUS: Design a Rate Limiter ⏳, What is Cache Warming ♨️, Instagram Staff Career Gold 🏆

Today’s issue of Hungry Minds is brought to you by:

Happy Monday! ☀️

Welcome to the 489 new hungry minds who have joined us since last Monday!

If you aren't subscribed yet, join smart, curious, and hungry folks by subscribing here.

This week I also include a special recommendation newsletter if you want to learn system design with visuals 👇️ 

Hello, World! System Design NewsletterLearn System Design through case studies and how big tech companies solved their own problem. Highly relevant for software engineers.

📚 Software Engineering Articles

🗞️ Tech and AI Trends

👨🏻‍💻 Coding Tip

  • Use partial indexes in PostgreSQL to optimize query performance and reduce storage

Time-to-digest: 5 minutes

Big thanks to our partners for keeping this newsletter free.

If you have a second, clicking the ad below helps us a ton—and who knows, you might find something you love. 💚

Bitbucket Pipelines is not just a tool, it’s a DevOps force multiplier.

Unlock cloud-native CI/CD with 70% reduction in operational costs, no installation or plugins to manage, and efficient autoscaling.

Let engineers focus on delivering value – not maintaining Jenkins.

Searching through thousands of unlabeled images and videos is a nightmare for knowledge workers. Dropbox tackled this by building a scalable multimedia search system that makes finding visual content as easy as finding text documents in their universal search product, Dash.

The challenge: Build a cost-effective system that can process and search through massive amounts of media files while maintaining low latency and high relevance, despite limited metadata and heavy compute requirements.

Implementation highlights:

  • Metadata-first indexing: Extract lightweight features first (file paths, EXIF data) to enable basic search with minimal overhead

  • Just-in-time previews: Generate previews on-demand rather than upfront to optimize storage and compute costs

  • Location-aware queries: Built custom geocoding logic to enable searching by location where photos were taken

  • Parallel processing: Run preview generation, ranking, and permission checks concurrently to minimize latency

  • Smart caching: Store previews for 30 days and leverage existing Dropbox infrastructure for efficiency

Results and learnings:

  • Scalable performance: Successfully processes 97% of media files while maintaining responsive search

  • Cost optimization: Achieved 3x storage efficiency for images and 13x for video compared to raw storage

  • Developer velocity: Parallel workflows and clear API boundaries enabled faster team execution

This solution shows how thoughtful architecture decisions around caching, parallel processing, and metadata extraction can make multimedia search both performant and cost-effective.

ESSENTIAL (retry, retry, oops)
Retries Have an Evil Twin: Duplicates

ARTICLE (vim-wizardry unlocked)
Why I Switched to Vim Keybindings

ARTICLE (claude’s coding spree)
6 Weeks of Claude Code

Want to reach 190,000+ engineers?

Let’s work together! Whether it’s your product, service, or event, we’d love to help you connect with this awesome community.

Brief: Reddit is working on a unified search interface as part of its strategy to compete with dominant search engines, potentially leveraging its vast community-driven content for better results.

Brief: Anthropic has revoked OpenAI’s access to its Claude AI models amid an escalating legal dispute, leaving OpenAI to seek alternatives for its AI research and services.

Brief: Google introduces Gemini DeepThink, an AI model designed to evaluate multiple ideas simultaneously, enhancing reasoning and problem-solving efficiency.

Brief: OpenAI releases gpt-oss-120b and gpt-oss-20b, two open-weight models under Apache 2.0 license, designed for agentic tasks, customization, and commercial use with full chain-of-thought reasoning.

Brief: OpenAI's GPT-5 offers three model tiers (regular, mini, nano) with competitive pricing and reduced hallucinations, making it a strong contender in the AI race.

Brief: AWS adds OpenAI’s open-weight GPT-OSS-120B and GPT-OSS-20B models to Bedrock and SageMaker, enabling developers to build AI applications with full infrastructure control and advanced reasoning capabilities.

This week’s coding challenge:

This week’s tip:

Create efficient database indices by using partial indexes with WHERE clauses to significantly reduce index size and improve query performance on specific data subsets. PostgreSQL's partial indices allow you to create smaller, more focused indexes that only include rows matching certain conditions.

Wen?

  • Large tables with skewed data: When certain status values represent a small, frequently-queried subset of all rows.

  • Time-series data management: Index recent data more aggressively while maintaining lighter indices for historical data.

  • Multi-tenant systems: Create focused indices for high-traffic customers while keeping the overall index footprint manageable.

“The harder you work, the harder it is to surrender.”
Vince Lombardi

That’s it for today! ☀️

Enjoyed this issue? Send it to your friends here to sign up, or share it on Twitter!

If you want to submit a section to the newsletter or tell us what you think about today’s issue, reply to this email or DM me on Twitter! 🐦

Thanks for spending part of your Monday morning with Hungry Minds.
See you in a week — Alex.

Icons by Icons8.

*I may earn a commission if you get a subscription through the links marked with “aff.” (at no extra cost to you).