Binary patch diffing is the process of comparing two binary files (e.g., executables, libraries, or firmware) to identify differences and generate a compact "patch" file that can be applied to the original binary to produce the modified version. Unlike text-based diffing (which works well for source code), binary diffing requires specialized algorithms to handle non-line-based data efficiently, often focusing on delta compression to minimize patch size.
This technique is commonly used in software updates (e.g., to distribute small patches instead of full binaries), reverse engineering (e.g., analyzing security patches), and version control for binary assets.
Key Differences from Text Diffing
-
Text diffs (like
diff -u
) rely on line-by-line comparisons and context. - Binary diffs use byte-level delta encoding, suffix sorting, or other optimizations to create smaller patches, as binaries don't have natural "lines."
- Patches are not always human-readable; they're designed for machine application.
Popular Binary Diffing Tools
bsdiff/bspatch
Efficient for general binary patches, originally developed by Colin Percival. Uses suffix sorting and delta encoding to produce small patches.
xdelta
Similar functionality with support for VCDIFF format (RFC 3284), often used in backups and version control systems.
BinDiff/Diaphora
More for analysis in reverse engineering (e.g., with IDA Pro or Ghidra), highlighting structural changes like function differences.
rdiff
From librsync, good for large files like disk images. Optimized for handling substantial binary data efficiently.
Step-by-Step Example with bsdiff
Below is a detailed, step-by-step example using bsdiff and bspatch, standard tools for creating and applying binary patches. This assumes you're on a Unix-like system (Linux/macOS) with bsdiff installed.
Installation: Install via sudo apt install bsdiff
on Ubuntu, brew install bsdiff
on macOS, or compile from source.
1 Prepare the Original Binary File
Create a simple binary file representing a small program or data blob. For this example, we'll use hexadecimal data to simulate two versions of a binary.
00000000: 4d5a 9000 0300 0000 0400 0000 ffff 0000 MZ..............
00000010: b800 0000 0000 0000 4000 0000 0000 0000 ........@.......
00000020: 0000 0000 0000 0000 0000 0000 0000 0000 ................
00000030: 0000 0000 0000 0000 8000 0000 0e1f ba0e ................
00000040: 00b4 09cd 21b8 014c cd21 5468 6973 2070 ....!..L.!This p
00000050: 726f 6772 616d 2063 616e 6e6f 7420 6265 rogram cannot be
00000060: 2072 756e 2069 6e20 444f 5320 6d6f 6465 run in DOS mode
00000070: 2e0d 0d0a 2400 0000 0000 0000 ....$.......
This is a stub from a PE executable header (Windows EXE). Save it as original.bin
using Python:
with open('original.bin', 'wb') as f:
f.write(b'\x4d\x5a\x90\x00\x03\x00\x00\x00\x04\x00\x00\x00\xff\xff\x00\x00\xb8\x00\x00\x00\x00\x00\x00\x00\x40\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x80\x00\x00\x00\x0e\x1f\xba\x0e\x00\xb4\x09\xcd\x21\xb8\x01\x4c\xcd\x21\x54\x68\x69\x73\x20\x70\x72\x6f\x67\x72\x61\x6d\x20\x63\x61\x6e\x6e\x6f\x74\x20\x62\x65\x20\x72\x75\x6e\x20\x69\x6e\x20\x44\x4f\x53\x20\x6d\x6f\x64\x65\x2e\x0d\x0d\x0a\x24\x00\x00\x00\x00\x00\x00\x00')
2 Create the Modified Binary File
Create modified.bin
with changes. We'll alter the message from "cannot be run" to "must not run" and add some padding.
Changes: Bytes 0x55-0x60 contain modified text, and null bytes are added at the end for padding.
3 Generate the Binary Patch
Use bsdiff to create the patch file:
bsdiff original.bin modified.bin patch.bsdiff
bsdiff computes a delta using a suffix array for efficient matching of common subsequences, then encodes differences with bzip2 compression. The resulting patch file might be ~100-200 bytes for this small change, versus ~120 bytes for the originals.
4 Apply the Binary Patch
Transform original.bin
back into the modified version using bspatch:
bspatch original.bin new.bin patch.bsdiff
This creates new.bin
, which should be identical to modified.bin
.
# Verification
cmp modified.bin new.bin # Should show no differences
sha256sum modified.bin new.bin # Compare checksums
Python Script for Binary Patch Diffing
Here's a Python script that demonstrates a simple binary diffing and patching process. This isn't as efficient as specialized tools like bsdiff, but it's a pure-Python implementation that simulates the process:
import difflib
import binascii
# Function to split hex string into lines for diff readability
def split_into_lines(data, chars_per_line=32):
return [data[i:i+chars_per_line] for i in range(0, len(data), chars_per_line)]
# Function to generate a unified diff of two binaries (in hex for readability)
def binary_diff(original_bytes, modified_bytes):
orig_hex = binascii.hexlify(original_bytes).decode('ascii').upper()
mod_hex = binascii.hexlify(modified_bytes).decode('ascii').upper()
orig_lines = split_into_lines(orig_hex)
mod_lines = split_into_lines(mod_hex)
diff = difflib.unified_diff(
orig_lines, mod_lines,
fromfile='original.bin (hex)',
tofile='modified.bin (hex)',
lineterm=''
)
return '\n'.join(diff)
# Simple patching function: Applies a list of (offset, remove_bytes, add_bytes) deltas
def apply_patch(original_bytes, deltas):
result = bytearray(original_bytes)
offset_adjust = 0
for offset, remove_len, add_bytes in deltas:
adj_offset = offset + offset_adjust
result = result[:adj_offset] + add_bytes + result[adj_offset + remove_len:]
offset_adjust += len(add_bytes) - remove_len
return bytes(result)
# Example usage
if __name__ == "__main__":
# Sample original and modified binaries
original = b'\x4d\x5a\x90\x00...' # (truncated for brevity)
modified = b'\x4d\x5a\x90\x00...' # (truncated for brevity)
# Generate and print diff
diff_output = binary_diff(original, modified)
print("Unified Diff (Hex Representation):\n")
print(diff_output)
# Example deltas for patching
deltas = [
(85, 18, b'\x6d\x75\x73\x74\x20\x6e\x6f\x74\x20\x72\x75\x6e...'),
(112, 0, b'\x00\x00\x00\x00') # Added padding
]
# Apply patch and verify
patched = apply_patch(original, deltas)
assert patched == modified, "Patch failed!"
print("\nPatch applied successfully!")
Note: This script is educational—difflib is great for visualization but not optimal for large binaries or compact patches. For production, use specialized libraries like xdelta or implement algorithms like those in bsdiff.
Reverse Engineering with Binary Patch Diffing
Reverse engineering often involves analyzing differences between two versions to understand changes, such as bug fixes, added features, or security patches. Binary patch diffing highlights modified code sections without requiring full disassembly of both files.
Analysis Steps
- 1 Obtain binary versions (vulnerable vs. patched)
- 2 Generate diff using tools or scripts
- 3 Analyze hunks for meaningful changes
- 4 Convert hex to assembly for analysis
- 5 Test patches and derive exploits
What Diffs Reveal
- Patched Vulnerabilities: Changes from strcpy to strncpy
- Added Features: New code blocks or library calls
- Obfuscation: Randomized changes to evade detection
Handling Advanced Scenarios
Large Files
For gigabyte-sized files, consider xdelta3:
xdelta3 -e -s original.bin modified.bin patch.delta
xdelta3 -d -s original.bin patch.delta new.bin
Reverse Engineering Analysis
Use Ghidra's "Version Tracking" tool or integrate BinDiff for function matching and structural analysis.
Conflicts & Failures
If the original binary has changed, patches may fail. Tools like bspatch don't have built-in conflict resolution.
Efficiency Example
For a 100MB app with 1MB changes, bsdiff might produce a 500KB patch, saving significant bandwidth.
Ready to Dive Deeper?
Explore advanced patch diffing techniques, reverse engineering methodologies, and comprehensive analysis frameworks.