Understanding Binary Patch Diffing

Learn the fundamentals of binary patch diffing with step-by-step examples, tools, and practical Python implementations for reverse engineering and security analysis.

Binary Analysis perfecXion.ai Team April 7, 2025 12 min read

Binary patch diffing is the process of comparing two binary files (e.g., executables, libraries, or firmware) to identify differences and generate a compact "patch" file that can be applied to the original binary to produce the modified version. Unlike text-based diffing (which works well for source code), binary diffing requires specialized algorithms to handle non-line-based data efficiently, often focusing on delta compression to minimize patch size.

This technique is commonly used in software updates (e.g., to distribute small patches instead of full binaries), reverse engineering (e.g., analyzing security patches), and version control for binary assets.

Key Differences from Text Diffing

  • Text diffs (like diff -u) rely on line-by-line comparisons and context.
  • Binary diffs use byte-level delta encoding, suffix sorting, or other optimizations to create smaller patches, as binaries don't have natural "lines."
  • Patches are not always human-readable; they're designed for machine application.

Popular Binary Diffing Tools

bsdiff/bspatch

Efficient for general binary patches, originally developed by Colin Percival. Uses suffix sorting and delta encoding to produce small patches.

xdelta

Similar functionality with support for VCDIFF format (RFC 3284), often used in backups and version control systems.

BinDiff/Diaphora

More for analysis in reverse engineering (e.g., with IDA Pro or Ghidra), highlighting structural changes like function differences.

rdiff

From librsync, good for large files like disk images. Optimized for handling substantial binary data efficiently.

Step-by-Step Example with bsdiff

Below is a detailed, step-by-step example using bsdiff and bspatch, standard tools for creating and applying binary patches. This assumes you're on a Unix-like system (Linux/macOS) with bsdiff installed.

Installation: Install via sudo apt install bsdiff on Ubuntu, brew install bsdiff on macOS, or compile from source.

1 Prepare the Original Binary File

Create a simple binary file representing a small program or data blob. For this example, we'll use hexadecimal data to simulate two versions of a binary.

00000000: 4d5a 9000 0300 0000 0400 0000 ffff 0000  MZ..............
00000010: b800 0000 0000 0000 4000 0000 0000 0000  ........@.......
00000020: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000030: 0000 0000 0000 0000 8000 0000 0e1f ba0e  ................
00000040: 00b4 09cd 21b8 014c cd21 5468 6973 2070  ....!..L.!This p
00000050: 726f 6772 616d 2063 616e 6e6f 7420 6265  rogram cannot be
00000060: 2072 756e 2069 6e20 444f 5320 6d6f 6465  run in DOS mode
00000070: 2e0d 0d0a 2400 0000 0000 0000            ....$.......

This is a stub from a PE executable header (Windows EXE). Save it as original.bin using Python:

with open('original.bin', 'wb') as f:
    f.write(b'\x4d\x5a\x90\x00\x03\x00\x00\x00\x04\x00\x00\x00\xff\xff\x00\x00\xb8\x00\x00\x00\x00\x00\x00\x00\x40\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x80\x00\x00\x00\x0e\x1f\xba\x0e\x00\xb4\x09\xcd\x21\xb8\x01\x4c\xcd\x21\x54\x68\x69\x73\x20\x70\x72\x6f\x67\x72\x61\x6d\x20\x63\x61\x6e\x6e\x6f\x74\x20\x62\x65\x20\x72\x75\x6e\x20\x69\x6e\x20\x44\x4f\x53\x20\x6d\x6f\x64\x65\x2e\x0d\x0d\x0a\x24\x00\x00\x00\x00\x00\x00\x00')

2 Create the Modified Binary File

Create modified.bin with changes. We'll alter the message from "cannot be run" to "must not run" and add some padding.

Changes: Bytes 0x55-0x60 contain modified text, and null bytes are added at the end for padding.

3 Generate the Binary Patch

Use bsdiff to create the patch file:

bsdiff original.bin modified.bin patch.bsdiff

bsdiff computes a delta using a suffix array for efficient matching of common subsequences, then encodes differences with bzip2 compression. The resulting patch file might be ~100-200 bytes for this small change, versus ~120 bytes for the originals.

4 Apply the Binary Patch

Transform original.bin back into the modified version using bspatch:

bspatch original.bin new.bin patch.bsdiff

This creates new.bin, which should be identical to modified.bin.

# Verification
cmp modified.bin new.bin  # Should show no differences
sha256sum modified.bin new.bin  # Compare checksums

Python Script for Binary Patch Diffing

Here's a Python script that demonstrates a simple binary diffing and patching process. This isn't as efficient as specialized tools like bsdiff, but it's a pure-Python implementation that simulates the process:

import difflib
import binascii

# Function to split hex string into lines for diff readability
def split_into_lines(data, chars_per_line=32):
    return [data[i:i+chars_per_line] for i in range(0, len(data), chars_per_line)]

# Function to generate a unified diff of two binaries (in hex for readability)
def binary_diff(original_bytes, modified_bytes):
    orig_hex = binascii.hexlify(original_bytes).decode('ascii').upper()
    mod_hex = binascii.hexlify(modified_bytes).decode('ascii').upper()
    
    orig_lines = split_into_lines(orig_hex)
    mod_lines = split_into_lines(mod_hex)
    
    diff = difflib.unified_diff(
        orig_lines, mod_lines,
        fromfile='original.bin (hex)',
        tofile='modified.bin (hex)',
        lineterm=''
    )
    return '\n'.join(diff)

# Simple patching function: Applies a list of (offset, remove_bytes, add_bytes) deltas
def apply_patch(original_bytes, deltas):
    result = bytearray(original_bytes)
    offset_adjust = 0
    for offset, remove_len, add_bytes in deltas:
        adj_offset = offset + offset_adjust
        result = result[:adj_offset] + add_bytes + result[adj_offset + remove_len:]
        offset_adjust += len(add_bytes) - remove_len
    return bytes(result)

# Example usage
if __name__ == "__main__":
    # Sample original and modified binaries
    original = b'\x4d\x5a\x90\x00...'  # (truncated for brevity)
    modified = b'\x4d\x5a\x90\x00...'  # (truncated for brevity)
    
    # Generate and print diff
    diff_output = binary_diff(original, modified)
    print("Unified Diff (Hex Representation):\n")
    print(diff_output)
    
    # Example deltas for patching
    deltas = [
        (85, 18, b'\x6d\x75\x73\x74\x20\x6e\x6f\x74\x20\x72\x75\x6e...'),
        (112, 0, b'\x00\x00\x00\x00')  # Added padding
    ]
    
    # Apply patch and verify
    patched = apply_patch(original, deltas)
    assert patched == modified, "Patch failed!"
    print("\nPatch applied successfully!")

Note: This script is educational—difflib is great for visualization but not optimal for large binaries or compact patches. For production, use specialized libraries like xdelta or implement algorithms like those in bsdiff.

Reverse Engineering with Binary Patch Diffing

Reverse engineering often involves analyzing differences between two versions to understand changes, such as bug fixes, added features, or security patches. Binary patch diffing highlights modified code sections without requiring full disassembly of both files.

Analysis Steps

  • 1 Obtain binary versions (vulnerable vs. patched)
  • 2 Generate diff using tools or scripts
  • 3 Analyze hunks for meaningful changes
  • 4 Convert hex to assembly for analysis
  • 5 Test patches and derive exploits

What Diffs Reveal

  • Patched Vulnerabilities: Changes from strcpy to strncpy
  • Added Features: New code blocks or library calls
  • Obfuscation: Randomized changes to evade detection

Handling Advanced Scenarios

Large Files

For gigabyte-sized files, consider xdelta3:

xdelta3 -e -s original.bin modified.bin patch.delta
xdelta3 -d -s original.bin patch.delta new.bin

Reverse Engineering Analysis

Use Ghidra's "Version Tracking" tool or integrate BinDiff for function matching and structural analysis.

Conflicts & Failures

If the original binary has changed, patches may fail. Tools like bspatch don't have built-in conflict resolution.

Efficiency Example

For a 100MB app with 1MB changes, bsdiff might produce a 500KB patch, saving significant bandwidth.

Ready to Dive Deeper?

Explore advanced patch diffing techniques, reverse engineering methodologies, and comprehensive analysis frameworks.