cmp Command: Tutorial & Examples

Compare two files byte by byte and locate the first difference

The cmp command is a fundamental Linux utility used to compare two files at the byte level to determine if they are identical or where they first differ. It is particularly useful for verifying file integrity, comparing binary files, and automating checks in server and virtualization environments. This article provides a detailed overview of cmp, including its functionality, parameters, practical examples, potential issues, and best practices for effective usage.

What cmp Does

The cmp command compares two files byte by byte and reports the position of the first difference found. Unlike line-oriented tools such as diff, cmp focuses on binary-level differences, making it suitable for any file type, including executables, images, archives, or any arbitrary data file.

If the files are identical, cmp produces no output and returns an exit status of 0. When differences exist, it outputs the byte offset and line number of the first mismatch and returns a non-zero exit status. This behavior allows cmp to be easily integrated into automated workflows and scripts for verifying file consistency without verbose output.

Why cmp Is Important

In server administration, virtualization, and data management, ensuring file integrity is critical. cmp provides a reliable way to detect file corruption, unsynchronized backups, or unauthorized changes by comparing files precisely at the byte level.

Its ability to handle binary files directly distinguishes it from text-based comparison tools. Moreover, the silent mode (-s) makes it efficient for scripting scenarios where only the comparison result matters, minimizing noise in logs or output.

How cmp Works

cmp opens both files and reads them sequentially, comparing each corresponding byte until it encounters a difference or reaches the end of both files. It keeps track of:

  • Byte offsets: the number of bytes from the beginning of the files.
  • Line numbers: incremented on each newline character (\n) encountered.

When a difference is found, cmp reports the byte position and line number of the first mismatch. If one file is shorter but otherwise identical to the beginning of the longer file, cmp reports the end of the shorter file as the difference point.

The exit status codes are as follows:

  • 0 : Files are identical.
  • 1 : Files differ.
  • 2 : An error occurred (e.g., file not found or permission denied).

This precise reporting and exit status make cmp useful in conditional scripting.

Common Parameters of cmp

The most frequently used options are:

  • -l
    List all differing byte positions and display their values in octal for both files.

  • -s
    Silent mode; suppresses all output. Only the exit status indicates whether files match.

  • -i SKIP
    Skip the first SKIP bytes in both files before starting the comparison.

  • -n LIMIT
    Compare at most LIMIT bytes.

  • --help
    Display help and usage information.

  • --version
    Show version information of the cmp command.

These options enhance flexibility, enabling partial comparisons, quiet checks, or detailed difference reports.

Basic Usage Examples

Compare two binary files file1.bin and file2.bin:

    cmp file1.bin file2.bin

If the files are identical, there is no output, and the exit status is:

    echo $?
    0

If they differ, output displays the first differing byte and line number:

    file1.bin file2.bin differ: byte 15, line 1

Check the exit status in a script to detect differences:

    cmp file1.bin file2.bin
    if [ $? -ne 0 ]; then
            echo "Files differ"
    else
            echo "Files are identical"
    fi

List all byte differences with their octal values:

    cmp -l file1.bin file2.bin

Sample output:

    15 141 142
    20 170 171

This indicates that at byte 15, the first file has octal value 141, and the second has 142, and similarly at byte 20.

Example of handling a missing file or permission error:

    cmp missingfile.bin file2.bin
    cmp: missingfile.bin: No such file or directory

    echo $?
    2

Advanced Usage Examples

Skip the first 100 bytes in both files before comparing:

    cmp -i 100 file1.bin file2.bin

Compare only the first 256 bytes:

    cmp -n 256 file1.bin file2.bin

Use silent mode in scripts to check if files match without producing output:

    if cmp -s file1.bin file2.bin; then
            echo "Files match"
    else
            echo "Files differ"
    fi

Compare two large log files ignoring initial metadata (e.g., timestamp headers):

    cmp -i 1024 /var/log/app1.log /var/log/app2.log

This is useful when headers differ but main content should be identical.

Example script snippet to automate backup verification:

    BACKUP=/backup/config.bak
    ORIGINAL=/etc/config
    if cmp -s "$BACKUP" "$ORIGINAL"; then
            echo "Backup verified"
    else
            echo "Backup differs from original!"
    fi

Performance Considerations

Comparing very large files byte by byte can be time-consuming. To optimize:

  • Use the -n option to limit comparison to a relevant subset of bytes.
  • Use checksum tools like md5sum or sha256sum to quickly detect differences before running cmp.
  • For extremely large files, consider sampling or specialized tools designed for performance.

Security Considerations

  • Ensure you have appropriate read permissions on both files; otherwise, cmp will return an error.
  • Avoid comparing sensitive files in environments where output or logs might be exposed.
  • Use silent mode (-s) in automated scripts to prevent potentially sensitive data from appearing in logs.
  • Be aware that comparing files with different encodings or encrypted content may produce unexpected results.

Potential Problems and Troubleshooting

  • No output despite differences: If running cmp without options produces no message, verify the exit status. A 0 exit code means files are identical; a 1 means files differ.
  • Permission errors: If you get "Permission denied," check file permissions and run as an appropriate user.
  • File not found errors: Ensure the specified file paths are correct.
  • Confusing output for text files: cmp reports byte offsets and line numbers, which may be less intuitive than line-based differences. Use diff for text comparison.
  • Binary files with embedded null bytes: cmp handles these correctly, but output may be confusing if interpreted as text.
  • Text encoding differences: Different encodings (UTF-8 vs UTF-16) will cause cmp to report differences even if content appears similar.

Tips and Best Practices

  1. Use cmp -s in scripts for efficient, quiet checks relying on exit codes.
  2. Combine cmp with hash utilities like md5sum or sha256sum for faster pre-checks.
  3. When comparing files with headers or metadata, use -i to skip irrelevant sections.
  4. Use -l to get a detailed list of all differing bytes when debugging.
  5. Remember that cmp compares bytes literally; differences in text encoding or line endings will show as mismatches.
  6. Check exit codes carefully to distinguish between identical files, differences, and errors.
  7. Use cmp in automation scripts to verify backups, deployments, or configuration consistency.

Real-World Use Cases

  • Backup Verification: Confirm that copied files are identical to originals after backups.
  • Configuration Drift Detection: Detect unauthorized or accidental changes in server config files.
  • Binary Patch Validation: Ensure patches modify only intended bytes in compiled binaries.
  • Automated Testing: Compare program output files or logs against expected results.
  • Virtual Machine and Container Image Integrity: Verify disk images or container layers for corruption or unexpected changes.

See Also

Further Reading

As an Amazon Associate, I earn from qualifying purchases.

The text above is licensed under CC BY-SA 4.0 CC BY SA