Skip to main content

Syncing User Media and Uploads

Learning Focus

By the end of this module, you will master the synchronization of large data directories (User-Generated Content, Media Libraries, Assets) while ensuring network stability and transfer reliability.

The Character of User Media

Unlike application code, User-Generated Content (UGC) is:

  • Stateful: It lives and grows in production.
  • Large: Often hundreds of gigabytes or even terabytes.
  • Append-Only: Files are usually added but rarely modified after upload.
  • Irreplaceable: Often the most critical data to back up.
/data/app/
├── media/ ← Main user uploads (images, PDFs)
├── assets/
│ ├── avatars/
│ ├── documents/
│ └── optimized/ ← Auto-generated thumbnails/derivatives
└── tmp/ ← Exclude from sync

Basic Sync Strategies

Resumable Progress Sync

For any transfer of large media, always use -P (equivalent to --progress --partial).

Initial bulk transfer
rsync -avP /usr/share/media/ user@backup:/mnt/storage/media/
  • --progress shows a real-time status of each file.
  • --partial prevents rsync from deleting partially transferred files if the connection drops.

Incremental Backups (New Files Only)

Since media files are rarely modified, you can skip checking those that already exist at the destination to save I/O.

Rapidly sync only new additions
rsync -av --ignore-existing /local/media/ user@remote:/remote/media/

Bandwidth and Network Stability

Syncing large media libraries can saturate server network bandwidth, affecting application performance.

Limiting Transfer Speed

Use --bwlimit to prevent rsync from using all available bandwidth.

Limit transfer to 10MB/s
rsync -av --bwlimit=10240 /local/media/ user@remote:/remote/media/

Handling Interruptions

If a 50GB transfer drops at 40GB, running the same command again (provided you used -P) will pick up exactly where it left off.

# Simply run the interrupted command again
rsync -avP /local/huge-library/ user@remote:/remote/huge-library/

Optimization Patterns

Syncing by Sub-directory

If the root directory has millions of files, rsync's indexing phase may time out or crash with an Out-Of-Memory (OOM) error. In this case, sync sub-directories individually.

Manual partitioning
rsync -av /media/2022/ user@remote:/media/2022/
rsync -av /media/2023/ user@remote:/media/2023/

Excluding Derivatives and Temp Data

Many applications generate thumbnails or optimized versions of original uploads. If these can be regenerated easily, exclude them to save storage.

Excluding generated assets
rsync -av --exclude='optimized/' --exclude='cache/' --exclude='tmp/' \
/app/media/ user@backup:/backup/media/

Comparison of Common Media Directories

PlatformMedia DirectoryStorage Type
Laravelstorage/app/public/Local Filesystem or S3
Djangomedia/User uploads
Express/Nodeuploads/Various
WordPresswp-content/uploads/Year/Month organized
Nextclouddata/Full user file paths

Common Pitfalls

PitfallConsequencePrevention
No --bwlimitProduction site becomes slow or non-responsive due to network saturation.Always use bandwidth limits during peak hours.
Assuming Compression HelpFiles like .jpg, .mp4, or .zip are already compressed; using -z just wastes CPU.Skip -z for already compressed media files.
Ignoring Disk SpaceThe destination disk fills up mid-sync, potentially crashing other services.Check destination space with df -h before starting.
Over-Syncing LogsSyncing /var/log or app logs via this job fills up storage.Explicitly exclude log files/directories.

What's Next