How To Use Python struct Module for Binary Data Packing

Last Updated: June 01, 2026

Table of Contents

Python struct: Quick Example
Understanding Format Strings
Packing and Unpacking Records
Reading Binary File Headers
Real-Life Example: BMP Image Header Parser
Frequently Asked Questions
Conclusion
Related Articles

Intermediate

Most Python code deals with strings, JSON, and high-level data structures. But some problems take you closer to the metal: parsing a binary file format, reading sensor data over a serial port, implementing a network protocol, or working with legacy data from a C program. At that level, you are not dealing with JSON objects — you are dealing with raw bytes, and the layout of those bytes follows a precise binary format that Python’s JSON parser cannot touch. This is the domain of Python’s struct module.

The struct module lets you pack Python values into raw bytes and unpack raw bytes back into Python values, following an exact layout you specify with a format string. It handles integer sizes, floating-point precision, byte order (little-endian vs big-endian), and alignment — exactly the things you need to interoperate with compiled code, hardware, or binary file formats. No third-party installation required — struct is part of Python’s standard library.

In this tutorial, you’ll learn how struct format strings work, how to pack and unpack individual values and records, how byte order affects your data, how to work with fixed-size binary file headers, and how to handle variable-length data. By the end, you’ll be able to read and write binary data formats with confidence.

Written by Pubs

Python developer and educator with 15+ years building production systems across data engineering, web APIs, and AI tooling. Founder of Python How To Program — 270+ in-depth tutorials covering the modern Python stack.

View all tutorials by Pubs →

Python struct: Quick Example

Here is a minimal example that packs three Python values into 10 bytes and unpacks them back:

# struct_quick.py
import struct

# Pack: 'I' = unsigned int (4 bytes), 'H' = unsigned short (2 bytes), 'f' = float (4 bytes)
fmt = 'IHf'
data = struct.pack(fmt, 1000, 42, 3.14)

print(f"Packed bytes ({len(data)} bytes): {data.hex()}")
print(f"Calculated size: {struct.calcsize(fmt)} bytes")

# Unpack: reverse the process
user_id, status, temperature = struct.unpack(fmt, data)
print(f"Unpacked: user_id={user_id}, status={status}, temperature={temperature:.4f}")

Output:

Packed bytes (10 bytes): e80300002a00c3f54840
Calculated size: 10 bytes
Unpacked: user_id=1000, status=42, temperature=3.1400001049041748

Three Python values become 10 bytes of binary data. struct.pack() encodes them, struct.unpack() decodes them. The format string 'IHf' describes the exact layout: a 4-byte unsigned int, a 2-byte unsigned short, and a 4-byte float. The slight floating-point difference is normal — 32-bit IEEE 754 floats cannot represent 3.14 exactly.

Understanding Format Strings

The format string is the core of struct. Each character represents a C data type with a fixed byte size. You can add a number prefix to repeat a character (e.g., '3B+ for three bytes).

Format Char	C Type	Python Type	Size
`b`	signed char	int	1 byte
`B`	unsigned char	int	1 byte
`h`	short	int	2 bytes
`H`	unsigned short	int	2 bytes
`i`	int	int	4 bytes
`I`	unsigned int	int	4 bytes
`q`	long long	int	8 bytes
`Q`	unsigned long long	int	8 bytes
`f`	float	float	4 bytes
`d`	double	float	8 bytes
`s`	char[]	bytes	1 byte each
`?`	bool	bool	1 byte

The first character of the format string optionally specifies byte order. '<' means little-endian (x86, ARM, most modern hardware), '>' means big-endian (network byte order, some file formats), and '=' means native byte order without alignment padding.

# byte_order.py
import struct

value = 0x01020304  # = 16909060 decimal

little_endian = struct.pack('I', value)

print(f"Value: {value:#010x}")
print(f"Little-endian bytes: {little_endian.hex()}")  # Least significant byte first
print(f"Big-endian bytes:    {big_endian.hex()}")     # Most significant byte first

# Verify round-trip
le_val = struct.unpack('I', big_endian)[0]
print(f"LE decoded: {le_val:#010x}")
print(f"BE decoded: {be_val:#010x}")

Output:

Value: 0x01020304
Little-endian bytes: 04030201
Big-endian bytes:    01020304

In little-endian, the least significant byte comes first: 04, 03, 02, 01. In big-endian, the most significant byte comes first: 01, 02, 03, 04. Most PC hardware is little-endian. Network protocols (TCP/IP) use big-endian. When working with an unknown format, look up whether it specifies byte order — getting it wrong produces silently corrupted values.

Debug Dee inspecting glowing hex bytes with magnifying glass — Every byte tells a story. Most of them are just padding.

Packing and Unpacking Records

The real power of struct is packing multiple related values into a compact binary record — and unpacking them back. This is how binary file headers, network packets, and sensor data frames are built.

Fixed-Size Records

When all records have the same format, use a single format string and loop over the data with iter_unpack() for efficiency:

# struct_records.py
import struct

# Format: timestamp (4B uint), sensor_id (2B uint), temperature (4B float), humidity (4B float)
RECORD_FMT = '



Output:
Record size: 14 bytes

Wrote 3 records (42 bytes)

Read 42 bytes

Parsed records:
  t=1706000001, sensor=1, temp=22.5C, humidity=65.3%
  t=1706000002, sensor=2, temp=23.1C, humidity=63.8%
  t=1706000003, sensor=1, temp=22.8C, humidity=64.1%


struct.iter_unpack() is the efficient way to read a stream of same-format records. It yields tuples one at a time without buffering the whole dataset, which matters when reading large binary files.

Packing String Fields

Strings in struct are fixed-length byte fields. Use 'Ns' (e.g., '20s') for a string of exactly N bytes. Shorter strings must be padded; longer strings are truncated.

# struct_strings.py
import struct

NAME_LEN = 20
FMT = f'<{NAME_LEN}sI'  # 20-byte name string + 4-byte uint score

def pack_player(name: str, score: int) -> bytes:
    # Encode to bytes and truncate/pad to exactly NAME_LEN bytes
    name_bytes = name.encode('utf-8')[:NAME_LEN].ljust(NAME_LEN, b'\x00')
    return struct.pack(FMT, name_bytes, score)

def unpack_player(data: bytes) -> tuple:
    name_bytes, score = struct.unpack(FMT, data)
    name = name_bytes.rstrip(b'\x00').decode('utf-8')  # Strip null padding
    return name, score

players = [
    ("Alice", 9500),
    ("Bob", 7200),
    ("CharlieLongNameHere", 3400),
]

packed = b''.join(pack_player(n, s) for n, s in players)
print(f"Total size: {len(packed)} bytes ({len(players)} x {struct.calcsize(FMT)} bytes each)")

print("\nUnpacked players:")
record_size = struct.calcsize(FMT)
for i in range(0, len(packed), record_size):
    name, score = unpack_player(packed[i:i + record_size])
    print(f"  {name!r}: {score}")


Output:
Total size: 72 bytes (3 x 24 bytes each)

Unpacked players:
  'Alice': 9500
  'Bob': 7200
  'CharlieLongNameHer': 3400


The long name "CharlieLongNameHere" gets truncated to 19 characters (19 bytes fit in the 20-byte field; [:NAME_LEN] handles this). Null bytes are stripped on unpack with .rstrip(b'\x00'). Always check encoding: if names can contain non-ASCII characters, use 'utf-8' and note that some characters take 2-4 bytes each, which can change how many characters fit in the fixed field.



Bridging Python objects and raw bytes — one cable at a time.


Reading Binary File Headers

Many binary file formats start with a fixed-size header that describes the file's contents. Here is an example of writing and reading a custom binary file format with a header:

# binary_file_format.py
import struct
from pathlib import Path

# File format:
# Header (16 bytes):
#   - magic: 4 bytes (b'PYDB')
#   - version: 2 bytes (uint16)
#   - record_count: 4 bytes (uint32)
#   - record_size: 2 bytes (uint16)
#   - flags: 4 bytes (uint32)
# Records follow the header

HEADER_FMT = '>4sHIHI'  # Big-endian (common for file formats)
HEADER_SIZE = struct.calcsize(HEADER_FMT)
MAGIC = b'PYDB'

RECORD_FMT = '>If'      # Big-endian: int32 id + float32 value
RECORD_SIZE = struct.calcsize(RECORD_FMT)

def write_db(path: Path, records: list[tuple]):
    with open(path, 'wb') as f:
        # Write header
        header = struct.pack(
            HEADER_FMT,
            MAGIC,
            1,               # version
            len(records),    # record_count
            RECORD_SIZE,     # record_size
            0,               # flags (unused)
        )
        f.write(header)
        # Write records
        for rec_id, value in records:
            f.write(struct.pack(RECORD_FMT, rec_id, value))

def read_db(path: Path) -> list[tuple]:
    with open(path, 'rb') as f:
        raw_header = f.read(HEADER_SIZE)
        magic, version, count, rec_size, flags = struct.unpack(HEADER_FMT, raw_header)

        if magic != MAGIC:
            raise ValueError(f"Not a PYDB file: {magic}")
        print(f"File: magic={magic}, version={version}, records={count}, rec_size={rec_size}")

        records = []
        for _ in range(count):
            raw = f.read(rec_size)
            rec_id, value = struct.unpack(RECORD_FMT, raw)
            records.append((rec_id, value))
    return records

# Demo
data = [(1001, 98.6), (1002, 37.2), (1003, 100.1), (1004, 36.9)]
path = Path("readings.pydb")
write_db(path, data)

print(f"File size: {path.stat().st_size} bytes (header {HEADER_SIZE} + {len(data)} x {RECORD_SIZE})")
records = read_db(path)
print("Records:", records)
path.unlink()


Output:
File size: 48 bytes (header 16 + 4 x 8)
File: magic=b'PYDB', version=1, records=4, rec_size=8
Records: [(1001, 98.5999984741211), (1002, 37.20000076293945), (1003, 100.0999984741211), (1004, 36.90000152587891)]


The file is exactly 48 bytes: 16 bytes of header plus 4 records of 8 bytes each. The magic bytes b'PYDB' let the reader verify it is the right format before processing. The slight float differences are expected from 32-bit float precision -- use d (double) instead of f (float) if you need full precision.

Real-Life Example: BMP Image Header Parser

The BMP (Windows Bitmap) image format has a well-documented binary header. Here is a parser that reads the header from any BMP file and reports its dimensions, color depth, and compression:

# bmp_header_parser.py
import struct
from pathlib import Path

# BMP File Header (14 bytes, little-endian)
FILE_HEADER_FMT = '<2sIHHI'
FILE_HEADER_SIZE = struct.calcsize(FILE_HEADER_FMT)

# DIB Header -- BITMAPINFOHEADER (40 bytes, little-endian)
DIB_HEADER_FMT = ' dict:
    with open(path, 'rb') as f:
        raw_file_hdr = f.read(FILE_HEADER_SIZE)
        raw_dib_hdr = f.read(DIB_HEADER_SIZE)

    # Parse file header
    sig, file_size, reserved1, reserved2, data_offset = struct.unpack(FILE_HEADER_FMT, raw_file_hdr)
    if sig != b'BM':
        raise ValueError(f"Not a BMP file (signature: {sig})")

    # Parse DIB header
    (dib_size, width, height, planes, bpp, compression,
     img_size, x_ppm, y_ppm, colors_used, colors_important) = struct.unpack(DIB_HEADER_FMT, raw_dib_hdr)

    return {
        'file_size_bytes': file_size,
        'data_offset': data_offset,
        'width': abs(width),
        'height': abs(height),
        'bits_per_pixel': bpp,
        'compression': COMPRESSION.get(compression, f'Unknown ({compression})'),
        'image_size_bytes': img_size,
    }


# Create a minimal valid 2x2 BMP for demo
def create_tiny_bmp(path: Path):
    """Create a 2x2 24-bit BMP (white pixels)."""
    width, height, bpp = 2, 2, 24
    row_size = (width * 3 + 3) & ~3  # Rows padded to 4-byte boundaries
    img_data = b'\xff\xff\xff' * width + b'\x00' * (row_size - width * 3)
    img_data = img_data * height
    dib = struct.pack(DIB_HEADER_FMT, 40, width, -height, 1, bpp, 0, len(img_data), 2835, 2835, 0, 0)
    file_hdr = struct.pack(FILE_HEADER_FMT, b'BM', FILE_HEADER_SIZE + DIB_HEADER_SIZE + len(img_data), 0, 0, FILE_HEADER_SIZE + DIB_HEADER_SIZE)
    with open(path, 'wb') as f:
        f.write(file_hdr + dib + img_data)


bmp_path = Path("test.bmp")
create_tiny_bmp(bmp_path)
info = parse_bmp(bmp_path)

print("BMP Header Info:")
for key, value in info.items():
    print(f"  {key}: {value}")

bmp_path.unlink()


Output:
BMP Header Info:
  file_size_bytes: 70
  data_offset: 54
  width: 2
  height: 2
  bits_per_pixel: 24
  compression: None (RGB)
  image_size_bytes: 8


This parser works on any valid BMP file -- save a screenshot as .bmp and point it at that file to see the real dimensions and color depth. The format string mirrors the exact binary layout of the BMP spec, and struct.unpack() does all the byte-level decoding.



Binary headers: the treasure map nobody asked for.


Frequently Asked Questions

When should I use struct vs other binary libraries?
Use struct for simple, fixed-layout binary formats where you know the exact byte positions. Use ctypes when interoperating directly with C shared libraries. Use construct (third-party) for complex binary formats with conditionals, enums, and nested structures. Use numpy when working with arrays of numeric data -- numpy dtype handling is faster for bulk array operations than looping with struct.

How do I know which byte order to use?
Check the format specification of whatever you are reading or writing. Network protocols (TCP, UDP, most internet standards) use big-endian ('>'). Most PC hardware (x86, ARM) is little-endian ('<'). File formats vary: PNG and BMP headers are little-endian, TIFF can be either (and specifies in the file itself), audio WAV is little-endian. When in doubt, use explicit '<' or '>' prefix rather than relying on native order.

What is struct alignment and padding?
By default ('@' byte order), struct adds padding between fields to align them on natural boundaries, just like a C compiler would. A 4-byte int after a 1-byte char gets 3 bytes of padding inserted. Use '=' (native order, no padding) or '<'/'>' (explicit endian, no padding) to disable this. Most binary file formats specify exact byte positions with no padding, so you almost always want '<' or '>'.

What does struct.error mean?
struct.error is raised when the data buffer is too short for the format (not enough bytes to unpack), when a value is out of range for the target type (e.g., 300 for a signed byte), or when the format string is invalid. Always check that len(data) == struct.calcsize(fmt) before calling unpack(), and catch struct.error when parsing untrusted binary data.

How do I read partial structs or variable-length data?
For variable-length data, use a fixed-size header that contains a length field, then read that many bytes. Pattern: header = struct.unpack('>I', f.read(4))[0] reads a 4-byte length, then data = f.read(header) reads the payload. This is the standard framing pattern used in virtually every binary protocol that includes variable-length fields.

Conclusion

Python's struct module bridges the gap between Python's high-level data types and the raw binary world. The key skills are writing format strings with the right type codes (I, H, f, s, etc.), choosing the correct byte order prefix ('<' for little-endian, '>' for big-endian), and using calcsize() to verify your layout before committing to it. For reading streams of same-format records, iter_unpack() is the efficient choice over a manual loop.

The BMP parser and sensor data examples in this article represent the two most common use cases: parsing an existing binary format and designing your own. Both follow the same pattern: define the format string, pack or unpack with a single call, and handle byte order explicitly. Extend the sensor example by adding a checksum field ('I' at the end, computed as the XOR of all bytes) for data integrity validation.

For the complete format string reference and edge cases, see the official struct documentation. The array module is also worth knowing for typed arrays of uniform numeric values.

Related Articles

How To Work with Python bytes and bytearray
How To Use Python hashlib for Cryptographic Hashing
How To Use Python socket Module for Network Programming





Continue Learning Python
Tutorials you might also find useful:

How To Work With JSON Data in Python Using the json Module
Easy guide for data storage options in Python
How To Use Python PyArrow for Columnar Data Processing
How To Use Python Plotly for Interactive Data Visualizations
How To Use Python Streamlit for Interactive Data Apps
How To Use Python cattrs for Structured Data Conversion








				
				Post Views:
				65

How To Use Python struct Module for Binary Data Packing

Python struct: Quick Example

Understanding Format Strings

Packing and Unpacking Records

Fixed-Size Records

Packing String Fields

Reading Binary File Headers

Real-Life Example: BMP Image Header Parser

Frequently Asked Questions

When should I use struct vs other binary libraries?

How do I know which byte order to use?

What is struct alignment and padding?

What does struct.error mean?

How do I read partial structs or variable-length data?

Conclusion

Related Articles

Continue Learning Python

Submit a Comment Cancel reply