Intermediate

Most Python code deals with strings, JSON, and high-level data structures. But some problems take you closer to the metal: parsing a binary file format, reading sensor data over a serial port, implementing a network protocol, or working with legacy data from a C program. At that level, you are not dealing with JSON objects — you are dealing with raw bytes, and the layout of those bytes follows a precise binary format that Python’s JSON parser cannot touch. This is the domain of Python’s struct module.

The struct module lets you pack Python values into raw bytes and unpack raw bytes back into Python values, following an exact layout you specify with a format string. It handles integer sizes, floating-point precision, byte order (little-endian vs big-endian), and alignment — exactly the things you need to interoperate with compiled code, hardware, or binary file formats. No third-party installation required — struct is part of Python’s standard library.

In this tutorial, you’ll learn how struct format strings work, how to pack and unpack individual values and records, how byte order affects your data, how to work with fixed-size binary file headers, and how to handle variable-length data. By the end, you’ll be able to read and write binary data formats with confidence.

Python struct: Quick Example

Here is a minimal example that packs three Python values into 10 bytes and unpacks them back:

# struct_quick.py
import struct

# Pack: 'I' = unsigned int (4 bytes), 'H' = unsigned short (2 bytes), 'f' = float (4 bytes)
fmt = 'IHf'
data = struct.pack(fmt, 1000, 42, 3.14)

print(f"Packed bytes ({len(data)} bytes): {data.hex()}")
print(f"Calculated size: {struct.calcsize(fmt)} bytes")

# Unpack: reverse the process
user_id, status, temperature = struct.unpack(fmt, data)
print(f"Unpacked: user_id={user_id}, status={status}, temperature={temperature:.4f}")

Output:

Packed bytes (10 bytes): e80300002a00c3f54840
Calculated size: 10 bytes
Unpacked: user_id=1000, status=42, temperature=3.1400001049041748

Three Python values become 10 bytes of binary data. struct.pack() encodes them, struct.unpack() decodes them. The format string 'IHf' describes the exact layout: a 4-byte unsigned int, a 2-byte unsigned short, and a 4-byte float. The slight floating-point difference is normal — 32-bit IEEE 754 floats cannot represent 3.14 exactly.

Understanding Format Strings

The format string is the core of struct. Each character represents a C data type with a fixed byte size. You can add a number prefix to repeat a character (e.g., '3B+ for three bytes).

Format CharC TypePython TypeSize
bsigned charint1 byte
Bunsigned charint1 byte
hshortint2 bytes
Hunsigned shortint2 bytes
iintint4 bytes
Iunsigned intint4 bytes
qlong longint8 bytes
Qunsigned long longint8 bytes
ffloatfloat4 bytes
ddoublefloat8 bytes
schar[]bytes1 byte each
?boolbool1 byte

The first character of the format string optionally specifies byte order. '<' means little-endian (x86, ARM, most modern hardware), '>' means big-endian (network byte order, some file formats), and '=' means native byte order without alignment padding.

# byte_order.py
import struct

value = 0x01020304  # = 16909060 decimal

little_endian = struct.pack('I', value)

print(f"Value: {value:#010x}")
print(f"Little-endian bytes: {little_endian.hex()}")  # Least significant byte first
print(f"Big-endian bytes:    {big_endian.hex()}")     # Most significant byte first

# Verify round-trip
le_val = struct.unpack('I', big_endian)[0]
print(f"LE decoded: {le_val:#010x}")
print(f"BE decoded: {be_val:#010x}")

Output:

Value: 0x01020304
Little-endian bytes: 04030201
Big-endian bytes:    01020304

In little-endian, the least significant byte comes first: 04, 03, 02, 01. In big-endian, the most significant byte comes first: 01, 02, 03, 04. Most PC hardware is little-endian. Network protocols (TCP/IP) use big-endian. When working with an unknown format, look up whether it specifies byte order — getting it wrong produces silently corrupted values.

Packing and Unpacking Records

The real power of struct is packing multiple related values into a compact binary record — and unpacking them back. This is how binary file headers, network packets, and sensor data frames are built.

Fixed-Size Records

When all records have the same format, use a single format string and loop over the data with iter_unpack() for efficiency:

# struct_records.py
import struct

# Format: timestamp (4B uint), sensor_id (2B uint), temperature (4B float), humidity (4B float)
RECORD_FMT = '

Output:

Record size: 14 bytes

Wrote 3 records (42 bytes)

Read 42 bytes

Parsed records:
  t=1706000001, sensor=1, temp=22.5C, humidity=65.3%
  t=1706000002, sensor=2, temp=23.1C, humidity=63.8%
  t=1706000003, sensor=1, temp=22.8C, humidity=64.1%

struct.iter_unpack() is the efficient way to read a stream of same-format records. It yields tuples one at a time without buffering the whole dataset, which matters when reading large binary files.

Packing String Fields

Strings in struct are fixed-length byte fields. Use 'Ns' (e.g., '20s') for a string of exactly N bytes. Shorter strings must be padded; longer strings are truncated.

# struct_strings.py
import struct

NAME_LEN = 20
FMT = f'<{NAME_LEN}sI'  # 20-byte name string + 4-byte uint score

def pack_player(name: str, score: int) -> bytes:
    # Encode to bytes and truncate/pad to exactly NAME_LEN bytes
    name_bytes = name.encode('utf-8')[:NAME_LEN].ljust(NAME_LEN, b'\x00')
    return struct.pack(FMT, name_bytes, score)

def unpack_player(data: bytes) -> tuple:
    name_bytes, score = struct.unpack(FMT, data)
    name = name_bytes.rstrip(b'\x00').decode('utf-8')  # Strip null padding
    return name, score

players = [
    ("Alice", 9500),
    ("Bob", 7200),
    ("CharlieLongNameHere", 3400),
]

packed = b''.join(pack_player(n, s) for n, s in players)
print(f"Total size: {len(packed)} bytes ({len(players)} x {struct.calcsize(FMT)} bytes each)")

print("\nUnpacked players:")
record_size = struct.calcsize(FMT)
for i in range(0, len(packed), record_size):
    name, score = unpack_player(packed[i:i + record_size])
    print(f"  {name!r}: {score}")

Output:

Total size: 72 bytes (3 x 24 bytes each)

Unpacked players:
  'Alice': 9500
  'Bob': 7200
  'CharlieLongNameHer': 3400

The long name "CharlieLongNameHere" gets truncated to 19 characters (19 bytes fit in the 20-byte field; [:NAME_LEN] handles this). Null bytes are stripped on unpack with .rstrip(b'\x00'). Always check encoding: if names can contain non-ASCII characters, use 'utf-8' and note that some characters take 2-4 bytes each, which can change how many characters fit in the fixed field.

Reading Binary File Headers

Many binary file formats start with a fixed-size header that describes the file's contents. Here is an example of writing and reading a custom binary file format with a header:

# binary_file_format.py
import struct
from pathlib import Path

# File format:
# Header (16 bytes):
#   - magic: 4 bytes (b'PYDB')
#   - version: 2 bytes (uint16)
#   - record_count: 4 bytes (uint32)
#   - record_size: 2 bytes (uint16)
#   - flags: 4 bytes (uint32)
# Records follow the header

HEADER_FMT = '>4sHIHI'  # Big-endian (common for file formats)
HEADER_SIZE = struct.calcsize(HEADER_FMT)
MAGIC = b'PYDB'

RECORD_FMT = '>If'      # Big-endian: int32 id + float32 value
RECORD_SIZE = struct.calcsize(RECORD_FMT)

def write_db(path: Path, records: list[tuple]):
    with open(path, 'wb') as f:
        # Write header
        header = struct.pack(
            HEADER_FMT,
            MAGIC,
            1,               # version
            len(records),    # record_count
            RECORD_SIZE,     # record_size
            0,               # flags (unused)
        )
        f.write(header)
        # Write records
        for rec_id, value in records:
            f.write(struct.pack(RECORD_FMT, rec_id, value))

def read_db(path: Path) -> list[tuple]:
    with open(path, 'rb') as f:
        raw_header = f.read(HEADER_SIZE)
        magic, version, count, rec_size, flags = struct.unpack(HEADER_FMT, raw_header)

        if magic != MAGIC:
            raise ValueError(f"Not a PYDB file: {magic}")
        print(f"File: magic={magic}, version={version}, records={count}, rec_size={rec_size}")

        records = []
        for _ in range(count):
            raw = f.read(rec_size)
            rec_id, value = struct.unpack(RECORD_FMT, raw)
            records.append((rec_id, value))
    return records

# Demo
data = [(1001, 98.6), (1002, 37.2), (1003, 100.1), (1004, 36.9)]
path = Path("readings.pydb")
write_db(path, data)

print(f"File size: {path.stat().st_size} bytes (header {HEADER_SIZE} + {len(data)} x {RECORD_SIZE})")
records = read_db(path)
print("Records:", records)
path.unlink()

Output:

File size: 48 bytes (header 16 + 4 x 8)
File: magic=b'PYDB', version=1, records=4, rec_size=8
Records: [(1001, 98.5999984741211), (1002, 37.20000076293945), (1003, 100.0999984741211), (1004, 36.90000152587891)]

The file is exactly 48 bytes: 16 bytes of header plus 4 records of 8 bytes each. The magic bytes b'PYDB' let the reader verify it is the right format before processing. The slight float differences are expected from 32-bit float precision -- use d (double) instead of f (float) if you need full precision.

Real-Life Example: BMP Image Header Parser

The BMP (Windows Bitmap) image format has a well-documented binary header. Here is a parser that reads the header from any BMP file and reports its dimensions, color depth, and compression:

# bmp_header_parser.py
import struct
from pathlib import Path

# BMP File Header (14 bytes, little-endian)
FILE_HEADER_FMT = '<2sIHHI'
FILE_HEADER_SIZE = struct.calcsize(FILE_HEADER_FMT)

# DIB Header -- BITMAPINFOHEADER (40 bytes, little-endian)
DIB_HEADER_FMT = ' dict:
    with open(path, 'rb') as f:
        raw_file_hdr = f.read(FILE_HEADER_SIZE)
        raw_dib_hdr = f.read(DIB_HEADER_SIZE)

    # Parse file header
    sig, file_size, reserved1, reserved2, data_offset = struct.unpack(FILE_HEADER_FMT, raw_file_hdr)
    if sig != b'BM':
        raise ValueError(f"Not a BMP file (signature: {sig})")

    # Parse DIB header
    (dib_size, width, height, planes, bpp, compression,
     img_size, x_ppm, y_ppm, colors_used, colors_important) = struct.unpack(DIB_HEADER_FMT, raw_dib_hdr)

    return {
        'file_size_bytes': file_size,
        'data_offset': data_offset,
        'width': abs(width),
        'height': abs(height),
        'bits_per_pixel': bpp,
        'compression': COMPRESSION.get(compression, f'Unknown ({compression})'),
        'image_size_bytes': img_size,
    }


# Create a minimal valid 2x2 BMP for demo
def create_tiny_bmp(path: Path):
    """Create a 2x2 24-bit BMP (white pixels)."""
    width, height, bpp = 2, 2, 24
    row_size = (width * 3 + 3) & ~3  # Rows padded to 4-byte boundaries
    img_data = b'\xff\xff\xff' * width + b'\x00' * (row_size - width * 3)
    img_data = img_data * height
    dib = struct.pack(DIB_HEADER_FMT, 40, width, -height, 1, bpp, 0, len(img_data), 2835, 2835, 0, 0)
    file_hdr = struct.pack(FILE_HEADER_FMT, b'BM', FILE_HEADER_SIZE + DIB_HEADER_SIZE + len(img_data), 0, 0, FILE_HEADER_SIZE + DIB_HEADER_SIZE)
    with open(path, 'wb') as f:
        f.write(file_hdr + dib + img_data)


bmp_path = Path("test.bmp")
create_tiny_bmp(bmp_path)
info = parse_bmp(bmp_path)

print("BMP Header Info:")
for key, value in info.items():
    print(f"  {key}: {value}")

bmp_path.unlink()

Output:

BMP Header Info:
  file_size_bytes: 70
  data_offset: 54
  width: 2
  height: 2
  bits_per_pixel: 24
  compression: None (RGB)
  image_size_bytes: 8

This parser works on any valid BMP file -- save a screenshot as .bmp and point it at that file to see the real dimensions and color depth. The format string mirrors the exact binary layout of the BMP spec, and struct.unpack() does all the byte-level decoding.

Frequently Asked Questions

When should I use struct vs other binary libraries?

Use struct for simple, fixed-layout binary formats where you know the exact byte positions. Use ctypes when interoperating directly with C shared libraries. Use construct (third-party) for complex binary formats with conditionals, enums, and nested structures. Use numpy when working with arrays of numeric data -- numpy dtype handling is faster for bulk array operations than looping with struct.

How do I know which byte order to use?

Check the format specification of whatever you are reading or writing. Network protocols (TCP, UDP, most internet standards) use big-endian ('>'). Most PC hardware (x86, ARM) is little-endian ('<'). File formats vary: PNG and BMP headers are little-endian, TIFF can be either (and specifies in the file itself), audio WAV is little-endian. When in doubt, use explicit '<' or '>' prefix rather than relying on native order.

What is struct alignment and padding?

By default ('@' byte order), struct adds padding between fields to align them on natural boundaries, just like a C compiler would. A 4-byte int after a 1-byte char gets 3 bytes of padding inserted. Use '=' (native order, no padding) or '<'/'>' (explicit endian, no padding) to disable this. Most binary file formats specify exact byte positions with no padding, so you almost always want '<' or '>'.

What does struct.error mean?

struct.error is raised when the data buffer is too short for the format (not enough bytes to unpack), when a value is out of range for the target type (e.g., 300 for a signed byte), or when the format string is invalid. Always check that len(data) == struct.calcsize(fmt) before calling unpack(), and catch struct.error when parsing untrusted binary data.

How do I read partial structs or variable-length data?

For variable-length data, use a fixed-size header that contains a length field, then read that many bytes. Pattern: header = struct.unpack('>I', f.read(4))[0] reads a 4-byte length, then data = f.read(header) reads the payload. This is the standard framing pattern used in virtually every binary protocol that includes variable-length fields.

Conclusion

Python's struct module bridges the gap between Python's high-level data types and the raw binary world. The key skills are writing format strings with the right type codes (I, H, f, s, etc.), choosing the correct byte order prefix ('<' for little-endian, '>' for big-endian), and using calcsize() to verify your layout before committing to it. For reading streams of same-format records, iter_unpack() is the efficient choice over a manual loop.

The BMP parser and sensor data examples in this article represent the two most common use cases: parsing an existing binary format and designing your own. Both follow the same pattern: define the format string, pack or unpack with a single call, and handle byte order explicitly. Extend the sensor example by adding a checksum field ('I' at the end, computed as the XOR of all bytes) for data integrity validation.

For the complete format string reference and edge cases, see the official struct documentation. The array module is also worth knowing for typed arrays of uniform numeric values.