Skip to content

Instantly share code, notes, and snippets.

@Edinunzio
Last active May 8, 2025 22:00
Show Gist options
  • Save Edinunzio/5f04588ffdfdc3e501c15bd7cd9ce79a to your computer and use it in GitHub Desktop.
Save Edinunzio/5f04588ffdfdc3e501c15bd7cd9ce79a to your computer and use it in GitHub Desktop.
Profiling & Optimizing S3-backed FileField Access in Django

Context

While reviewing performance on a document API endpoint, I ran a quick cProfile and spotted a serious hotspot: even small queries were clocking over 1.5 seconds, with more than 600k function calls.

Digging in, I found the root cause: FileField.size was making an HTTP HEAD request to S3 for every file, inside the serializer.

This behavior is invisible in dev (where files are local), but a quiet killer in production.

Before Optimization

627,887 function calls 1.556 seconds per request

Major time spent in:

  • FileField.size → S3 HEAD
  • boto3 + botocore
  • DRF to_representation

Solution

Introduced a size_bytes field on the model.

  • Cached file size at upload time (save() override).
  • Backfilled existing rows via a migration.
  • Updated DRF serializers to use size_bytes.

No more .size calls in runtime views. All file size data now lives in the DB.

After Optimization

13,061 function calls 0.013 seconds per request

No boto3. No S3 calls. No unnecessary I/O.

Performance Comparison

Metric Before Optimization After Optimization Improvement
Total execution time 1.556 seconds 0.013 seconds ~99.2% faster
Total function calls 627,887 13,061 ~98% fewer calls
Top bottlenecks .size on S3 FileField, boto3 API calls Simple DB access only Eliminated external I/O
Key time sinks boto3, botocore, storages/backends/s3.py:size, to_representation() view.get(), DRF serializers Drastically reduced
S3 calls Hundreds of ms per .size call 0 Removed entirely
Network overhead High (multiple HTTP requests to S3) None Removed
Serializer cost ~1.67s cumulative across multiple fields <0.005s ~99.7% faster

Result

  • ~99.2% faster endpoint
  • ~98% fewer function calls
  • Dramatically lower latency and load
  • Eliminated S3 dependency from API runtime path

Pro-tip

If you’re using S3-backed FileFields in Django, accessing .size during serialization is a hidden latency trap. Cache this at write time — you get massive speed wins and a more robust API surface.

Profiler

import cProfile
import pstats
import io
from functools import wraps

def profile_view(func):
    """Add @profile_view to any view you want to profile."""
    @wraps(func)
    def wrapper(*args, **kwargs):
        pr = cProfile.Profile()
        pr.enable()

        response = func(*args, **kwargs)

        pr.disable()
        s = io.StringIO()
        ps = pstats.Stats(pr, stream=s).sort_stats("cumulative")
        ps.print_stats(100)

        print(s.getvalue())
        return response
    return wrapper
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment