· 19 min read

By Atharva Gadekar

Server Side Caching Best Practices

Caching is the unsung hero of modern web performance. It’s like having a photographic memory for your application - once you’ve seen something, you remember it for next time. Let’s dive deep into the caching techniques that separate mediocre apps from blazing-fast ones.

What is Caching?

At its core, caching is about avoiding redundant work. Instead of hitting your database or external APIs repeatedly for the same data, you store frequently accessed information in faster, more accessible memory layers. Think of it as having a cheat sheet for your most common operations.

The performance gains are real: a well-implemented caching strategy can reduce response times from 500ms to under 50ms. That’s not just a nice-to-have - it’s the difference between users staying or bouncing.

Why Caching Matters

  • Latency Reduction: Cut response times by 80-95%
  • Throughput Increase: Handle 10x more requests with the same infrastructure
  • Cost Optimization: Reduce database load, API calls, and compute costs
  • User Experience: Sub-100ms responses feel instant to users

1. Client-Side Caching

Client-side caching happens in the browser and is your first line of defense against slow loads. It’s like having a local copy of everything you need.

Browser Cache

Browsers automatically cache static assets based on HTTP headers. The magic happens through:

What it is: Browser cache stores static files (CSS, JS, images) locally on the user’s device to avoid re-downloading them on subsequent visits.

Where to use: Perfect for static assets that don’t change frequently - CSS frameworks, JavaScript libraries, images, fonts, and other media files. Set long cache times (1 year) for versioned assets and shorter times (1 hour) for frequently updated content.

Cache-Control: max-age=31536000, immutable
ETag: "33a64df551"
Last-Modified: Wed, 21 Oct 2015 07:28:00 GMT

Real-world example: GitHub caches their CSS and JS files for a year (max-age=31536000). When you visit GitHub, your browser doesn’t re-download the same files - it uses the cached versions, making subsequent page loads nearly instant.

Service Workers

Service workers are JavaScript files that run in the background, intercepting network requests and serving cached responses. They’re perfect for:

What it is: Service workers are background scripts that can intercept network requests, cache responses, and serve content offline. They act as a proxy between your app and the network.

Where to use: Ideal for Progressive Web Apps (PWAs), offline-first applications, and sites that need to work without internet. Use them for caching critical app resources, API responses, and enabling push notifications. They’re especially valuable for mobile apps and sites with poor connectivity.

  • Offline-first apps: Cache critical resources for offline use
  • Progressive Web Apps (PWAs): Enable app-like experiences
  • Background sync: Queue actions when offline, sync when back online
// Service worker caching strategy
self.addEventListener('fetch', event => {
  event.respondWith(
    caches.match(event.request)
      .then(response => {
        // Return cached version or fetch from network
        return response || fetch(event.request);
      })
  );
});

Pro tip: Use a “stale-while-revalidate” strategy for dynamic content - serve cached data immediately, then update in background.


2. Server-Side Caching

Server-side caching reduces backend load by storing computed results. It’s like having a smart assistant who remembers all your previous calculations.

Page-Level Caching

Cache entire HTML pages for static or semi-static content. Perfect for:

What it is: Page-level caching stores complete HTML pages in memory, serving them instantly without hitting the database or running server-side logic.

Where to use: Best for content-heavy sites with pages that don’t change frequently - blogs, documentation sites, product catalogs, and news articles. Avoid for personalized content or pages with user-specific data. Set cache times based on content update frequency (hours for news, days for documentation).

  • Blog posts and articles
  • Product catalog pages
  • Documentation sites

Example: Medium caches article pages for 24 hours. When you visit a popular article, you get the cached version instantly instead of waiting for database queries and template rendering.

Fragment Caching

Cache reusable components like headers, sidebars, or product cards. This is especially powerful for sites with consistent layouts.

What it is: Fragment caching stores individual page components (like headers, footers, sidebars) separately, allowing you to cache parts of pages that are reused across multiple pages.

Where to use: Perfect for sites with consistent layouts where the same components appear on multiple pages. Use for navigation menus, product cards, user avatars, and any reusable UI components. This is especially effective for e-commerce sites and content management systems where the same components appear across many pages.

# Rails fragment caching example
<% cache @product do %>
  <div class="product-card">
    <h2><%= @product.name %></h2>
    <p><%= @product.description %></p>
  </div>
<% end %>

Object Caching

Cache expensive computations or database query results. This is where you see the biggest performance gains.

What it is: Object caching stores the results of expensive operations (database queries, API calls, complex calculations) in memory for quick retrieval, avoiding the need to repeat the same work.

Where to use: Use for frequently accessed data that’s expensive to compute - user profiles, product details, search results, and any data that’s read more often than it’s written. This is the most impactful caching strategy for database-heavy applications and APIs with high read-to-write ratios.

# Python with Redis
import redis
import json

r = redis.Redis(host='localhost', port=6379, db=0)

def get_user_profile(user_id):
    cache_key = f"user_profile:{user_id}"
    
    # Try cache first
    cached = r.get(cache_key)
    if cached:
        return json.loads(cached)
    
    # Fallback to database
    profile = database.query_user_profile(user_id)
    
    # Cache for 1 hour
    r.setex(cache_key, 3600, json.dumps(profile))
    return profile

3. Database Caching

Database caching reduces the most expensive operation in your stack - disk I/O. Modern databases have sophisticated caching layers, but you can optimize further.

Query Result Caching

Cache frequently executed queries, especially those with expensive joins or aggregations.

What it is: Query result caching stores the output of database queries in memory, allowing applications to skip expensive database operations for frequently requested data.

Where to use: Ideal for queries that are expensive to execute but requested frequently - user authentication data, product listings, search results, and any query with complex joins or aggregations. Use for read-heavy applications where the same data is requested multiple times within a short period.

-- Example of a query that would benefit from caching
SELECT user_id, COUNT(*) as post_count 
FROM posts 
WHERE user_id = 123 
GROUP BY user_id;

Real example: Stack Overflow caches their question lists for 5 minutes. With millions of page views daily, this saves thousands of database queries per second.

Application-Level Query Caching

Use tools like Redis or Memcached to cache query results:

What it is: Application-level query caching stores database query results in external cache stores (Redis, Memcached) that persist across application restarts and can be shared across multiple application instances.

Where to use: Essential for multi-server applications, microservices architectures, and any system where multiple application instances need to share cached data. Use for session data, user preferences, frequently accessed business objects, and any data that needs to be shared across your application cluster.

// Node.js with Redis
async function getTopPosts() {
  const cacheKey = 'top_posts';
  let posts = await redis.get(cacheKey);
  
  if (!posts) {
    posts = await db.query(`
      SELECT p.*, u.username 
      FROM posts p 
      JOIN users u ON p.user_id = u.id 
      ORDER BY p.score DESC 
      LIMIT 50
    `);
    
    // Cache for 10 minutes
    await redis.setex(cacheKey, 600, JSON.stringify(posts));
  }
  
  return JSON.parse(posts);
}

4. Application-Level Caching

This is where you cache business logic results, API responses, and computed values. It’s the most flexible caching layer.

Function Result Caching

Cache expensive function calls using decorators or memoization:

What it is: Function result caching stores the output of expensive function calls in memory, avoiding redundant computation for the same inputs. This is particularly useful for pure functions with deterministic outputs.

Where to use: Perfect for computationally expensive operations - data processing, image resizing, complex calculations, API response formatting, and any function that’s called frequently with the same parameters. Use for functions that are pure (same input always produces same output) and expensive to compute.

from functools import lru_cache
import redis
import json

# In-memory caching
@lru_cache(maxsize=128)
def fibonacci(n):
    if n < 2:
        return n
    return fibonacci(n-1) + fibonacci(n-2)

# Redis-based caching
def cache_result(func):
    def wrapper(*args, **kwargs):
        cache_key = f"{func.__name__}:{hash(str(args) + str(kwargs))}"
        
        # Try cache
        result = redis_client.get(cache_key)
        if result:
            return json.loads(result)
        
        # Execute and cache
        result = func(*args, **kwargs)
        redis_client.setex(cache_key, 3600, json.dumps(result))
        return result
    return wrapper

@cache_result
def expensive_calculation(user_id, data):
    # Complex business logic here
    return result

API Response Caching

Cache external API calls to reduce latency and costs:

What it is: API response caching stores the results of external API calls locally, reducing the need to make repeated requests to third-party services and improving response times.

Where to use: Essential for applications that rely heavily on external APIs - weather apps, social media integrations, payment gateways, and any service that calls external APIs frequently. Use for API responses that don’t change frequently and are expensive to fetch. This is crucial for reducing API costs and improving reliability.

// Cache external API responses
async function getWeatherData(city) {
  const cacheKey = `weather:${city}`;
  
  // Check cache first
  const cached = await redis.get(cacheKey);
  if (cached) {
    return JSON.parse(cached);
  }
  
  // Fetch from API
  const response = await fetch(`https://api.weatherapi.com/v1/current.json?key=${API_KEY}&q=${city}`);
  const data = await response.json();
  
  // Cache for 30 minutes (weather doesn't change that fast)
  await redis.setex(cacheKey, 1800, JSON.stringify(data));
  
  return data;
}

5. Distributed Caching

When your app scales beyond a single server, you need distributed caching. This is where Redis, Memcached, and Hazelcast shine.

Redis Cluster

Redis clusters distribute data across multiple nodes for high availability and horizontal scaling:

What it is: Redis Cluster is a distributed caching solution that automatically shards data across multiple Redis nodes, providing high availability, fault tolerance, and horizontal scalability for large-scale applications.

Where to use: Essential for high-traffic applications that need to scale beyond a single server - social media platforms, e-commerce sites, gaming applications, and any system with millions of users. Use when you need automatic failover, data distribution across multiple servers, and the ability to add cache nodes without downtime.

// Redis cluster configuration
const Redis = require('ioredis');

const cluster = new Redis.Cluster([
  { host: 'redis-node-1', port: 6379 },
  { host: 'redis-node-2', port: 6379 },
  { host: 'redis-node-3', port: 6379 }
]);

// Automatic sharding and failover
await cluster.set('user:123', JSON.stringify(userData));
const userData = await cluster.get('user:123');

Memcached

Memcached is simpler than Redis but extremely fast for key-value storage:

What it is: Memcached is a high-performance, distributed memory caching system that stores data in RAM for extremely fast access. It’s designed specifically for caching and doesn’t support persistence or complex data structures like Redis.

Where to use: Perfect for simple caching needs where you only need key-value storage - session storage, database query results, API responses, and any data that can be easily serialized. Use when you need maximum performance and don’t require persistence or complex data operations. It’s particularly effective for read-heavy workloads.

// PHP with Memcached
$memcached = new Memcached();
$memcached->addServer('localhost', 11211);

$cacheKey = 'user_profile_' . $userId;
$profile = $memcached->get($cacheKey);

if (!$profile) {
    $profile = $database->getUserProfile($userId);
    $memcached->set($cacheKey, $profile, 3600); // 1 hour TTL
}

Real-world scaling: Facebook uses Memcached to cache user sessions and frequently accessed data. Their cache hit rate is over 99%, meaning 99% of requests are served from cache instead of hitting the database.


6. Content Delivery Networks (CDNs)

CDNs are geographically distributed cache servers that bring content closer to users. They’re essential for global applications.

How CDNs Work

  1. Edge Servers: Cache content in data centers worldwide
  2. Origin Pull: Fetch from your server on cache miss
  3. Geographic Routing: Route users to nearest edge server
  4. Cache Headers: Respect your cache control headers

What it is: CDNs are geographically distributed networks of servers that cache your content closer to users, reducing latency by serving files from the nearest edge server instead of your origin server.

Where to use: Essential for any website with global users - e-commerce sites, media streaming platforms, SaaS applications, and any site serving static assets to users worldwide. Use for static content like images, CSS, JavaScript, videos, and any files that don’t change frequently. CDNs are crucial for improving user experience in regions far from your origin server.

# CDN-friendly cache headers
Cache-Control: public, max-age=31536000, immutable
Cache-Control: public, max-age=300, s-maxage=86400

CDN Strategies

Static Assets: Cache images, CSS, JS for long periods

Cache-Control: public, max-age=31536000, immutable

Dynamic Content: Cache API responses for shorter periods

Cache-Control: public, max-age=300, s-maxage=86400

Real example: Netflix uses CDNs to cache video segments. When you start watching a show, the first few minutes are cached at your local CDN edge, reducing buffering time from seconds to milliseconds.


7. CPU Caching (Hardware Level)

Modern CPUs have sophisticated multi-level cache hierarchies that optimize memory access patterns.

Cache Hierarchy

  • L1 Cache: 32-64KB per core, 1-2 cycle latency
  • L2 Cache: 256KB-1MB per core, 10-20 cycle latency
  • L3 Cache: 8-32MB shared, 40-80 cycle latency
  • Main Memory: 8-64GB, 100-300 cycle latency

What it is: CPU cache hierarchy is a multi-level memory system built into processors that stores frequently accessed data in faster, smaller memory layers to minimize the time spent waiting for data from slower main memory.

Where to use: This is automatic and happens at the hardware level, but you can optimize for it by writing cache-friendly code. Use sequential memory access patterns, keep data structures compact, and avoid random memory access when possible. This is particularly important for performance-critical applications like game engines, scientific computing, and high-frequency trading systems.

Cache-Optimized Code

// Cache-friendly array traversal (row-major order)
for (int i = 0; i < rows; i++) {
    for (int j = 0; j < cols; j++) {
        matrix[i][j] = 0; // Sequential memory access
    }
}

// Cache-unfriendly (column-major order)
for (int j = 0; j < cols; j++) {
    for (int i = 0; i < rows; i++) {
        matrix[i][j] = 0; // Random memory access
    }
}

Performance impact: Cache-friendly code can be 10-100x faster than cache-unfriendly alternatives.


8. Cache Invalidation Strategies

Cache invalidation is hard. Here are proven strategies to keep your cache fresh.

Time-Based Invalidation (TTL)

Set expiration times based on data volatility:

What it is: Time-based invalidation automatically removes cached items after a specified time period, ensuring that stale data doesn’t persist indefinitely and forcing fresh data to be fetched periodically.

Where to use: Use for data that has a natural expiration - session data (short TTL), product information (medium TTL), and configuration data (long TTL). This is the simplest invalidation strategy and works well for data that changes predictably or when you can tolerate some staleness in exchange for performance.

// Short TTL for volatile data
await redis.setex('user_session:123', 3600, sessionData); // 1 hour

// Long TTL for static data  
await redis.setex('product_catalog', 86400, catalogData); // 24 hours

// Very long TTL for immutable data
await redis.setex('static_config', 31536000, configData); // 1 year

Event-Based Invalidation

Invalidate cache when data changes:

What it is: Event-based invalidation removes cached items immediately when the underlying data changes, ensuring that users always see the most up-to-date information without waiting for TTL expiration.

Where to use: Essential for data that changes infrequently but needs to be immediately updated when it does change - user profiles, product details, configuration settings, and any data where accuracy is more important than performance. Use when you have a reliable way to detect data changes (database triggers, application events, webhooks).

// Invalidate user cache when profile updates
async function updateUserProfile(userId, newData) {
  await database.updateUser(userId, newData);
  
  // Invalidate related caches
  await redis.del(`user_profile:${userId}`);
  await redis.del(`user_posts:${userId}`);
  await redis.del(`user_friends:${userId}`);
}

Version-Based Invalidation

Use version numbers to invalidate entire cache groups:

What it is: Version-based invalidation uses a global version number that, when incremented, invalidates entire categories of cached data, allowing for bulk cache invalidation without tracking individual cache keys.

Where to use: Perfect for scenarios where you need to invalidate large groups of related data - application deployments, configuration changes, or when you want to clear all user-related caches at once. Use when you have many related cache entries and want to avoid the complexity of tracking individual invalidation events.

// Increment version to invalidate all user caches
await redis.set('user_cache_version', Date.now());

// Check version before serving cached data
const cacheVersion = await redis.get('user_cache_version');
const cacheKey = `user:${userId}:v${cacheVersion}`;

9. Cache Eviction Policies

When cache is full, you need smart eviction strategies.

LRU (Least Recently Used)

Evict items that haven’t been accessed recently:

What it is: LRU eviction removes the least recently accessed items from cache when storage is full, keeping the most recently used data available. It assumes that recently accessed data is more likely to be accessed again.

Where to use: Ideal for most caching scenarios where access patterns are time-based - web applications, databases, and any system where recently accessed data is likely to be accessed again soon. This is the most commonly used eviction policy because it works well for typical application access patterns.

from collections import OrderedDict

class LRUCache:
    def __init__(self, capacity):
        self.capacity = capacity
        self.cache = OrderedDict()
    
    def get(self, key):
        if key in self.cache:
            # Move to end (most recently used)
            self.cache.move_to_end(key)
            return self.cache[key]
        return -1
    
    def put(self, key, value):
        if key in self.cache:
            self.cache.move_to_end(key)
        else:
            if len(self.cache) >= self.capacity:
                # Remove least recently used
                self.cache.popitem(last=False)
        self.cache[key] = value

LFU (Least Frequently Used)

Evict items with lowest access frequency:

What it is: LFU eviction removes items that have been accessed the least number of times, keeping the most frequently accessed data in cache regardless of when it was last accessed.

Where to use: Best for scenarios with stable access patterns where certain items are consistently popular - content delivery networks, video streaming platforms, and any system where popularity is more important than recency. Use when you have items that are accessed many times over a long period.

from collections import defaultdict, OrderedDict

class LFUCache:
    def __init__(self, capacity):
        self.capacity = capacity
        self.cache = {}  # key -> (value, frequency)
        self.freq = defaultdict(OrderedDict)  # frequency -> OrderedDict of keys
        self.min_freq = 0
    
    def get(self, key):
        if key in self.cache:
            value, freq = self.cache[key]
            # Update frequency
            self._update_frequency(key, freq)
            return value
        return -1

Adaptive Policies

Modern systems use adaptive policies that switch between LRU and LFU based on access patterns.


10. Advanced Caching Patterns

Write-Through Caching

Write to both cache and database simultaneously:

What it is: Write-through caching ensures data consistency by writing to both the cache and the database at the same time, guaranteeing that the cache always contains the most up-to-date data.

Where to use: Essential for applications where data consistency is critical - banking systems, e-commerce platforms, and any system where users must see the most current information. Use when you can’t afford to serve stale data and are willing to accept slightly slower write performance in exchange for consistency.

async function updateUser(userId, userData) {
  // Update both cache and database
  await Promise.all([
    redis.setex(`user:${userId}`, 3600, JSON.stringify(userData)),
    database.updateUser(userId, userData)
  ]);
}

Write-Behind Caching

Write to cache first, then asynchronously to database:

What it is: Write-behind caching writes data to the cache immediately for fast response times, then asynchronously persists it to the database in the background, potentially batching multiple writes for efficiency.

Where to use: Perfect for high-write applications where performance is more important than immediate consistency - logging systems, analytics platforms, social media feeds, and any system with frequent writes that can tolerate eventual consistency. Use when you need maximum write performance and can handle potential data loss if the cache fails before persistence.

async function updateUser(userId, userData) {
  // Update cache immediately
  await redis.setex(`user:${userId}`, 3600, JSON.stringify(userData));
  
  // Queue database update
  await queue.add('updateUser', { userId, userData });
}

Cache-Aside Pattern

Application manages cache explicitly:

What it is: Cache-aside is a pattern where the application code explicitly manages cache operations - checking cache first, fetching from data source on miss, and updating cache with new data.

Where to use: Most common caching pattern for applications that need fine-grained control over cache behavior. Use when you need custom cache logic, want to cache only specific data, or need to implement complex invalidation strategies. This pattern gives you maximum flexibility but requires more application code to manage.

async function getUser(userId) {
  // Try cache first
  let user = await redis.get(`user:${userId}`);
  if (user) {
    return JSON.parse(user);
  }
  
  // Cache miss - fetch from database
  user = await database.getUser(userId);
  
  // Store in cache
  await redis.setex(`user:${userId}`, 3600, JSON.stringify(user));
  
  return user;
}

Read-Through Pattern

Application implements automatic data loading on cache miss:

What it is: Read-through is a caching pattern where the application automatically loads data from the data source when a cache miss occurs, transparently handling the data fetching logic within your application code.

Where to use: Ideal for applications that want to simplify cache management by centralizing data loading logic. Use when you have consistent data loading patterns and want to reduce boilerplate code in your application. This pattern works well with any cache system, though some (like Hazelcast) have native support while others (like Redis) require application-level implementation.

// Application-level read-through implementation with Redis
const cache = new Redis({
  host: 'localhost',
  port: 6379,
  retryStrategy: () => null,
  lazyConnect: true
});

// Custom read-through implementation
async function getWithReadThrough(key, fetchFunction) {
  let data = await cache.get(key);
  
  if (!data) {
    data = await fetchFunction();
    await cache.setex(key, 3600, JSON.stringify(data));
  }
  
  return JSON.parse(data);
}

11. Cache Performance Monitoring

Monitor your cache effectiveness with these metrics:

Hit Rate

// Calculate cache hit rate
const hitRate = cacheHits / (cacheHits + cacheMisses);
console.log(`Cache hit rate: ${(hitRate * 100).toFixed(2)}%`);

Latency Reduction

// Measure performance improvement
const startTime = Date.now();
const result = await getCachedData(key);
const endTime = Date.now();
console.log(`Response time: ${endTime - startTime}ms`);

Memory Usage

// Monitor Redis memory usage
const info = await redis.info('memory');
const usedMemory = info.match(/used_memory_human:(\S+)/)[1];
console.log(`Redis memory usage: ${usedMemory}`);

12. Common Caching Pitfalls

Cache Stampede

Multiple requests miss cache simultaneously, overwhelming the backend:

// Solution: Request deduplication
const pendingRequests = new Map();

async function getData(key) {
  if (pendingRequests.has(key)) {
    return pendingRequests.get(key);
  }
  
  const promise = fetchDataFromDatabase(key);
  pendingRequests.set(key, promise);
  
  try {
    const result = await promise;
    await cache.set(key, result);
    return result;
  } finally {
    pendingRequests.delete(key);
  }
}

Cache Warming

Pre-populate cache with frequently accessed data:

// Warm cache on startup
async function warmCache() {
  const popularUsers = await database.getPopularUsers();
  
  for (const user of popularUsers) {
    await redis.setex(`user:${user.id}`, 3600, JSON.stringify(user));
  }
}

Cache Key Collisions

Use unique, descriptive cache keys:

// Good cache keys
const keys = {
  userProfile: `user:${userId}:profile`,
  userPosts: `user:${userId}:posts:${page}`,
  productDetails: `product:${productId}:details`
};

Conclusion

Caching isn’t just about speed - it’s about building resilient, scalable applications. The right caching strategy can transform a sluggish app into a lightning-fast one.

Remember these principles:

  • Cache early, cache often: Implement caching at every layer
  • Measure everything: Monitor hit rates and performance gains
  • Invalidate carefully: Choose the right invalidation strategy
  • Think globally: Use CDNs for worldwide performance
  • Scale horizontally: Distributed caching for high-traffic apps

The best caching strategy is the one you actually implement. Start simple with Redis for session storage, then layer on more sophisticated caching as your app grows. Your users (and your infrastructure costs) will thank you.


Want to dive deeper? Check out our guides on Redis optimization, CDN configuration, and distributed caching patterns.

Back to Blog