Syncing User Media and Uploads
By the end of this module, you will master the synchronization of large data directories (User-Generated Content, Media Libraries, Assets) while ensuring network stability and transfer reliability.
The Character of User Media
Unlike application code, User-Generated Content (UGC) is:
- Stateful: It lives and grows in production.
- Large: Often hundreds of gigabytes or even terabytes.
- Append-Only: Files are usually added but rarely modified after upload.
- Irreplaceable: Often the most critical data to back up.
/data/app/
├── media/ ← Main user uploads (images, PDFs)
├── assets/
│ ├── avatars/
│ ├── documents/
│ └── optimized/ ← Auto-generated thumbnails/derivatives
└── tmp/ ← Exclude from sync
Basic Sync Strategies
Resumable Progress Sync
For any transfer of large media, always use -P (equivalent to --progress --partial).
rsync -avP /usr/share/media/ user@backup:/mnt/storage/media/
--progressshows a real-time status of each file.--partialprevents rsync from deleting partially transferred files if the connection drops.
Incremental Backups (New Files Only)
Since media files are rarely modified, you can skip checking those that already exist at the destination to save I/O.
rsync -av --ignore-existing /local/media/ user@remote:/remote/media/
Bandwidth and Network Stability
Syncing large media libraries can saturate server network bandwidth, affecting application performance.
Limiting Transfer Speed
Use --bwlimit to prevent rsync from using all available bandwidth.
rsync -av --bwlimit=10240 /local/media/ user@remote:/remote/media/
Handling Interruptions
If a 50GB transfer drops at 40GB, running the same command again (provided you used -P) will pick up exactly where it left off.
# Simply run the interrupted command again
rsync -avP /local/huge-library/ user@remote:/remote/huge-library/
Optimization Patterns
Syncing by Sub-directory
If the root directory has millions of files, rsync's indexing phase may time out or crash with an Out-Of-Memory (OOM) error. In this case, sync sub-directories individually.
rsync -av /media/2022/ user@remote:/media/2022/
rsync -av /media/2023/ user@remote:/media/2023/
Excluding Derivatives and Temp Data
Many applications generate thumbnails or optimized versions of original uploads. If these can be regenerated easily, exclude them to save storage.
rsync -av --exclude='optimized/' --exclude='cache/' --exclude='tmp/' \
/app/media/ user@backup:/backup/media/
Comparison of Common Media Directories
| Platform | Media Directory | Storage Type |
|---|---|---|
| Laravel | storage/app/public/ | Local Filesystem or S3 |
| Django | media/ | User uploads |
| Express/Node | uploads/ | Various |
| WordPress | wp-content/uploads/ | Year/Month organized |
| Nextcloud | data/ | Full user file paths |
Common Pitfalls
| Pitfall | Consequence | Prevention |
|---|---|---|
No --bwlimit | Production site becomes slow or non-responsive due to network saturation. | Always use bandwidth limits during peak hours. |
| Assuming Compression Help | Files like .jpg, .mp4, or .zip are already compressed; using -z just wastes CPU. | Skip -z for already compressed media files. |
| Ignoring Disk Space | The destination disk fills up mid-sync, potentially crashing other services. | Check destination space with df -h before starting. |
| Over-Syncing Logs | Syncing /var/log or app logs via this job fills up storage. | Explicitly exclude log files/directories. |
What's Next
- Compression and Bandwidth — Advanced networking and speed optimization for rsync.
- Backup Strategies — Putting it all together into a multi-tier backup system.
- Performance Tuning — Optimizing I/O for high-volume transfers.