Workflow

Transferring Millions of Small Files Without the Wait

You have a 10 Gbps connection. Your files total 12 TB. Quick math says ~3 hours. But when those 12 TB are spread across a million files, it takes a day. Here's what's happening and how to fix it.

The Experience Everyone Recognizes

You start a transfer of a large folder — say, a DPX sequence from a VFX render, or an ML training dataset, or a game build going to QA. The total size looks manageable. You kick it off and... it's slow. Much slower than you'd expect from the file sizes.

The progress bar crawls. Your network monitor shows the connection is barely being used. It's not a bandwidth problem — your link has plenty of capacity. The tool is spending most of its time doing something other than sending data.

This is the millions-of-files problem. The more files in a transfer, the more overhead accumulates per file — metadata exchange, status checks, path negotiation, acknowledgments. Each one adds a small delay. Multiply a small delay by a million files and you get hours of waiting that have nothing to do with your actual data.

Who Runs Into This

VFX and Post-Production

Frame sequences are the core of VFX pipelines. A 2K DPX frame is ~12 MB. A 4K EXR with AOV layers (beauty, depth, normals, cryptomatte) can be 200-500 MB per frame. A 90-minute project at 24fps is 129,600 frames — and that's one version of one pass. Include multiple takes, render iterations, and department handoffs, and the file count climbs into the millions.

ML and AI Training Data

Image datasets like ImageNet contain 14 million files at a few hundred KB each. Medical imaging datasets, satellite tile collections, and synthetic training data follow the same pattern. Moving these between training clusters is painfully slow — not because the files are large, but because there are so many of them.

Game Development

A modern game build contains hundreds of thousands of files — textures, meshes, shaders, configs, localization strings, audio clips. Distributing a build to QA, external partners, or CI/CD pipelines means transferring all of them with directory structure intact.

Genomics and Research

Sequencing and analysis pipelines produce millions of per-chromosome, per-sample, or per-region intermediate files that need to move between processing nodes.

Why “Just Zip It” Doesn't Work

The common workaround is to archive everything into one big file first. This helps with the per-file overhead, but creates new problems:

  • You need double the disk space — the original files plus the archive. With a 12 TB sequence, that's 12 TB of free space just for the zip/tar.
  • Archiving takes hours — and so does extracting on the other end. You've traded transfer overhead for archiving overhead.
  • If the transfer fails, you start over — there's no resuming midway through a 12 TB archive.
  • Changed 50 files out of a million? You're re-archiving and re-sending the whole thing.

How Handrive Handles It

Handrive's protocol is designed around this problem. Instead of processing each file as a separate operation with its own overhead, it opens a single encrypted session between two devices and streams files continuously through it. The per-file overhead effectively disappears.

The practical result: your transfer runs at whatever speed your disk or network can sustain — whichever is slower. The file count doesn't change the throughput. Whether you're sending 10 files or 10 million files, the protocol overhead is the same (nearly zero).

What this looks like in practice

1 million DPX frames (12 TB total) on a 10 Gbps link with NVMe SSDs on both ends: the transfer completes in roughly the time it takes to push 12 TB — around 3 hours. Not 3 hours plus a day of overhead.

14 million training images (2 TB total) between two servers on the same campus: limited by disk I/O, not by file count. An NVMe drive reading at 100K+ IOPS moves through the file list in minutes.

Resume That Actually Works

Fast initial transfer is one thing. But what happens when a million-file transfer gets interrupted at 80%? This is where most tools fall apart a second time.

With rsync, resuming means re-scanning and checksumming every file on both sides to figure out what's done. For a million files, that scan alone takes hours. With cloud upload tools, many have no resume at all — a dropped session means starting from scratch.

Handrive tracks state at both the file level and the byte level:

  • Completed files are skipped in seconds — no re-reading, no re-checksumming. If you were 800,000 files into a million-file transfer, resuming skips those 800,000 almost instantly and picks up with the remaining 200,000.
  • Partial files resume where they left off — if a 50 GB file was 30 GB in when the connection dropped, it picks up at byte 30 GB. This matters when your transfer mixes millions of small files with some large ones.

A network hiccup, a laptop closing, a power outage — restart the transfer and you're back to full speed in seconds.

The Disk Is Your Bottleneck (and That's Good)

Once protocol overhead is removed, the limiting factor is your storage hardware. A few practical notes:

  • NVMe SSDs are ideal — 100K+ random IOPS means millions of small files read/write smoothly.
  • SATA SSDs are adequate for most workflows.
  • Spinning hard drives will be the bottleneck. HDDs struggle with the random I/O pattern that millions of small files create. If speed matters, SSD storage on at least one end helps significantly.
  • NAS with SSD cache handles this better than pure HDD arrays, but adds some per-operation latency.

The point isn't that every setup will be fast — it's that the transfer protocol is no longer the thing slowing you down. Whatever throughput your hardware can sustain, that's what you get.

Why This Happens (Technical Aside)

For those curious about the underlying cause: most file transfer tools process files one at a time, with per-file overhead at the protocol level. Even with connection reuse (so there's no new TCP handshake per file), each file still requires metadata exchange, path negotiation, acknowledgment round-trips, and status checks. On a link with 30ms latency, even 5-10ms of per-file overhead across a million files adds up to hours.

Handrive's UDP-based protocol sidesteps this by treating a multi-file transfer as a single continuous stream rather than a sequence of individual file operations. The session is established once, and files flow through it without per-file negotiation. For a deeper dive on why UDP matters for large transfers, see Why TCP Fails for AI Data Transfer.


Transfer millions of files at disk speed

Free P2P transfer. No per-file overhead. Instant resume. Your SSD is the bottleneck, not the protocol.