Intermediate

Your microservices are talking JSON. It works, but have you measured it? JSON serialization converts every integer to a string of digit characters, wraps every key in double quotes, and adds field separators across every nested object. A payload that could be 40 bytes of binary data becomes 120 bytes of ASCII text — and at scale that extra weight shows up in latency, bandwidth bills, and CPU time spent encoding and decoding.

MessagePack (msgpack) is a binary serialization format that fixes this. It encodes the same Python dict, list, int, float, bool, bytes, and string types that JSON handles, but stores them in compact binary form. The Python msgpack library has no required dependencies beyond a C extension for speed, making it easy to add to any project.

This article covers installation, basic packing and unpacking, working with bytes and strings, streaming large data, customizing encoding for non-standard types, and a real-world inter-service messaging example. By the end you will be able to replace JSON in any serialization-heavy Python workflow and measure the difference.

Packing Python Data: Quick Example

The core msgpack API has two functions: packb() serializes a Python object to bytes, and unpackb() deserializes bytes back to Python. Here is the simplest possible usage:

# quick_msgpack.py
import msgpack

data = {'user': 'alice', 'score': 9850, 'active': True, 'tags': ['python', 'async']}

# Serialize to bytes
packed = msgpack.packb(data, use_bin_type=True)
print(f'Packed type: {type(packed)}')
print(f'Packed size: {len(packed)} bytes')

# Deserialize back to Python
unpacked = msgpack.unpackb(packed, raw=False)
print(f'Unpacked: {unpacked}')

import json
json_bytes = json.dumps(data).encode()
print(f'JSON size: {len(json_bytes)} bytes')
print(f'Size reduction: {1 - len(packed)/len(json_bytes):.0%}')

Output:

Packed type: <class 'bytes'>
Packed size: 44 bytes
Unpacked: {'user': 'alice', 'score': 9850, 'active': True, 'tags': ['python', 'async']}
JSON size: 68 bytes
Size reduction: 35%

The two keyword arguments you almost always want are use_bin_type=True on pack (tells msgpack to encode Python str as MessagePack str, not raw bytes) and raw=False on unpack (tells it to decode MessagePack str back to Python str instead of bytes). Without them you end up with unexpected bytes keys in your dicts. The sections below explain every important option.

JSON: 68 bytes. msgpack: 44 bytes. At a million messages per day, do the math.

What Is MessagePack and How Does It Work?

MessagePack is a binary serialization standard — a compact encoding where each value is prefixed with a type tag and length. An integer like 255 takes 2 bytes (tag + value). A short string like "alice" takes 6 bytes (1 byte tag with length encoded, 5 bytes payload). JSON would need 7 bytes just for the string "alice" and another 2 for the surrounding quotes.

Type	JSON encoding	msgpack encoding
Integer 42	`42` — 2 bytes	`\x2a` — 1 byte
Boolean True	`true` — 4 bytes	`\xc3` — 1 byte
String “hello”	`"hello"` — 7 bytes	`\xa5hello` — 6 bytes
None / null	`null` — 4 bytes	`\xc0` — 1 byte
Raw bytes	base64 string (33% larger)	native bin type

Savings are modest for single values but compound across large nested structures with many keys and short values. The bigger win is often encoding speed: msgpack skips the string-to-float and string-to-int conversions that JSON parsers must perform for every number.

Packing and Unpacking Options

String vs Bytes: use_bin_type and raw

The most common source of msgpack confusion is the str/bytes encoding. MessagePack has two binary string types: raw (legacy) and bin (explicit binary). You want use_bin_type=True + raw=False to get Python 3 semantics — strings stay strings, bytes stay bytes.

# str_bytes_demo.py
import msgpack

payload = {'name': 'bob', 'avatar': b'\x89PNG\r\n\x1a\n'}

# Correct settings for Python 3
packed = msgpack.packb(payload, use_bin_type=True)
result = msgpack.unpackb(packed, raw=False)
print('name type:', type(result['name']), result['name'])
print('avatar type:', type(result['avatar']), result['avatar'][:4])

# Wrong (default legacy mode): str comes back as bytes
packed_legacy = msgpack.packb(payload)
result_legacy = msgpack.unpackb(packed_legacy)
print('\nLegacy mode name type:', type(result_legacy['name']))

Output:

name type: <class 'str'> bob
avatar type: <class 'bytes'> b'\x89PNG'

Legacy mode name type: <class 'bytes'>

In legacy mode (the default for backward compatibility) unpackb() returns all string-like data as bytes. That means your dict keys come back as b'name' instead of 'name', breaking any code that expects string keys. Always use the explicit mode for new code.

Strict Map Keys

By default, msgpack requires dict keys to be strings or bytes. If you have integer keys (common in lookup tables), pass strict_map_key=False to unpackb():

# int_keys.py
import msgpack

data = {1: 'one', 2: 'two', 100: 'hundred'}
packed = msgpack.packb(data, use_bin_type=True)
result = msgpack.unpackb(packed, raw=False, strict_map_key=False)
print(result)

Output:

{1: 'one', 2: 'two', 100: 'hundred'}

Integers encoded in 1-5 bytes. JSON encodes them as character strings. Every. Single. Time.

Streaming and Large Payloads

When serializing large data that will be transmitted over a socket or written to a file, the Packer and Unpacker objects give you control over chunked encoding and incremental decoding. This avoids loading the whole payload into memory at once.

# streaming.py
import msgpack
import io

# Streaming pack: write many objects to a buffer
buf = io.BytesIO()
packer = msgpack.Packer(use_bin_type=True)
records = [
    {'id': 1, 'event': 'login',  'user': 'alice'},
    {'id': 2, 'event': 'click',  'user': 'bob'},
    {'id': 3, 'event': 'logout', 'user': 'alice'},
]
for record in records:
    buf.write(packer.pack(record))

# Streaming unpack: read back incrementally
buf.seek(0)
unpacker = msgpack.Unpacker(raw=False)
unpacker.feed(buf.read())
for obj in unpacker:
    print(obj)

Output:

{'id': 1, 'event': 'login', 'user': 'alice'}
{'id': 2, 'event': 'click', 'user': 'bob'}
{'id': 3, 'event': 'logout', 'user': 'alice'}

The Packer and Unpacker objects are designed for streaming use cases: writing to a socket in chunks, reading from a Kafka stream, or processing a large binary file one record at a time. The Unpacker.feed() method accepts arbitrary byte chunks — it will reconstruct complete objects even if the data arrives in fragments.

Custom Type Encoding

MessagePack natively handles dict, list, str, bytes, int, float, bool, and None. For custom objects like datetime, you need to provide serializer and deserializer hooks via default and object_hook.

# custom_types.py
import msgpack
from datetime import datetime

def encode_datetime(obj):
    if isinstance(obj, datetime):
        return {'__datetime__': True, 'iso': obj.isoformat()}
    raise TypeError(f'Unknown type: {type(obj)}')

def decode_datetime(obj):
    if obj.get('__datetime__'):
        return datetime.fromisoformat(obj['iso'])
    return obj

event = {
    'event': 'user_signup',
    'timestamp': datetime(2026, 5, 7, 9, 30, 0),
    'user_id': 42,
}

packed = msgpack.packb(event, default=encode_datetime, use_bin_type=True)
result = msgpack.unpackb(packed, object_hook=decode_datetime, raw=False)

print('event:', result['event'])
print('timestamp:', result['timestamp'], type(result['timestamp']))

Output:

event: user_signup
timestamp: 2026-05-07 09:30:00 <class 'datetime.datetime'>

The default function receives any object that packb cannot handle natively and must return something msgpack can encode — typically a dict with a type tag. The object_hook is called on every decoded dict, so check your type tag before transforming. This pattern works for any custom class: Decimal, UUID, NumPy arrays, Pydantic models, and so on.

Real-Life Example: Inter-Service Message Queue

The script below simulates a producer publishing events and a consumer reading them, both using msgpack for serialization. This is the exact pattern used by Redis Streams, Kafka, and other message queue integrations.

Producer packs. Consumer unpacks. Nobody reads JSON at the wire layer.

# message_queue_demo.py
import msgpack
import queue
import threading
import time
from datetime import datetime

# Shared in-memory queue (stands in for Redis/Kafka in this demo)
message_bus = queue.Queue()

def encode_event(obj):
    if isinstance(obj, datetime):
        return {'__dt__': obj.isoformat()}
    raise TypeError(f'Cannot encode {type(obj)}')

def decode_event(obj):
    if '__dt__' in obj:
        return datetime.fromisoformat(obj['__dt__'])
    return obj

def producer(n_events):
    events = [
        {'type': 'page_view',  'url': '/home',    'user': 1},
        {'type': 'add_to_cart','sku': WIDGET-42','user': 1},
        {'type': 'page_view',  'url': '/checkout',user': 2},
    ]
    for i in range(n_events):
        event = dict(events[i % len(events)])
        event['timestamp'] = datetime.now()
        event['seq'] = i
        packed = msgpack.packb(event, default=encode_event, use_bin_type=True)
        message_bus.put(packed)
    message_bus.put(None)  # Sentinel to stop consumer

def consumer():
    received = 0
    total_bytes = 0
    while True:
        raw = message_bus.get()
        if raw is None:
            break
        event = msgpack.unpackb(raw, object_hook=decode_event, raw=False)
        received += 1
        total_bytes += len(raw)
    print(f'Consumer: {received} events, {total_bytes} bytes total')
    print(f'Average bytes/event: {total_bytes/received:.1f}')

prod_thread = threading.Thread(target=producer, args=(9,))
cons_thread = threading.Thread(target=consumer)
cons_thread.start()
prod_thread.start()
prod_thread.join()
cons_thread.join()

Output:/p>

Consumer: 9 events, 432 bytes total
Average bytes/event: 48.0

The same 9 events encoded as JSON (with ISO timestamp strings) would be roughly 120-140 bytes per event — nearly 3x larger. The datetime hook keeps the types round-tripping cleanly. In a real system replace queue.Queue with a Redis or Kafka client: the packb/unpackb calls stay identical.

Frequently Asked Questions

When should I use msgpack instead of JSON?

Use msgpack when you control both the serializer and deserializer (internal services, message queues, caches), you care about payload size or serialization speed, or you need to encode binary data without base64 overhead. Stick with JSON when the data will be read by humans, logged to text files, or consumed by third-party services that expect JSON.

How does msgpack compare to pickle?

Pickle handles arbitrary Python objects but is Python-only and carries a remote code execution risk if you unpickle untrusted data. msgpack is language-agnostic (clients exist in Go, Rust, Java, JavaScript, etc.) and safe to deserialize from untrusted sources. Use msgpack for cross-language or cross-service communication; use pickle only for trusted Python-to-Python data that benefits from full object serialization.

Does msgpack require a C compiler?

No. pip install msgpack downloads a pre-compiled wheel on all major platforms (Windows, macOS, Linux). The C extension is bundled in the wheel. If no matching wheel exists for your platform, it falls back to a pure-Python implementation that is slower but fully functional.

Does msgpack enforce a schema?

No. Like JSON, msgpack is schema-less by default — you can pack any Python dict structure without declaring types upfront. If you need schema validation, combine msgpack with a library like msgspec (which has its own built-in msgpack encoder) or validate with Pydantic after unpacking.

Can msgpack handle very large integers?

Yes. msgpack supports integers up to 64-bit unsigned (0 to 18,446,744,073,709,551,615) natively. For Python’s arbitrary-precision integers beyond that range, you need a custom encoder that converts them to strings or bytes, similar to the datetime example above.

Conclusion

You have covered the full msgpack toolkit: packb() and unpackb() with the correct use_bin_type/raw settings, streaming with Packer/Unpacker, integer-keyed dicts, and custom type encoding for datetime and similar objects. The message queue demo shows the pattern in a real async-ready scenario.

The next step is replacing JSON in a high-throughput code path and benchmarking the difference with timeit or pytest-benchmark. You may also want to look at msgspec, which builds on the MessagePack format and adds compile-time schema validation with Pydantic-like model classes.

Official documentation: msgpack.org and the Python library on GitHub.

Post Views: 19

How To Use Python msgpack for Binary Message Serialization

Packing Python Data: Quick Example

What Is MessagePack and How Does It Work?

Packing and Unpacking Options

String vs Bytes: use_bin_type and raw

Strict Map Keys

Streaming and Large Payloads

Custom Type Encoding

Real-Life Example: Inter-Service Message Queue

Frequently Asked Questions

When should I use msgpack instead of JSON?

How does msgpack compare to pickle?

Does msgpack require a C compiler?

Does msgpack enforce a schema?

Can msgpack handle very large integers?

Conclusion

Submit a Comment Cancel reply

How To Use Python msgpack for Binary Message Serialization

Packing Python Data: Quick Example

What Is MessagePack and How Does It Work?

Packing and Unpacking Options

String vs Bytes: use_bin_type and raw

Strict Map Keys

Streaming and Large Payloads

Custom Type Encoding

Real-Life Example: Inter-Service Message Queue

Frequently Asked Questions

When should I use msgpack instead of JSON?

How does msgpack compare to pickle?

Does msgpack require a C compiler?

Does msgpack enforce a schema?

Can msgpack handle very large integers?

Conclusion

Related Articles

Submit a Comment Cancel reply