Intermediate
Your microservices are talking JSON. It works, but have you measured it? JSON serialization converts every integer to a string of digit characters, wraps every key in double quotes, and adds field separators across every nested object. A payload that could be 40 bytes of binary data becomes 120 bytes of ASCII text — and at scale that extra weight shows up in latency, bandwidth bills, and CPU time spent encoding and decoding.
MessagePack (msgpack) is a binary serialization format that fixes this. It encodes the same Python dict, list, int, float, bool, bytes, and string types that JSON handles, but stores them in compact binary form. The Python msgpack library has no required dependencies beyond a C extension for speed, making it easy to add to any project.
This article covers installation, basic packing and unpacking, working with bytes and strings, streaming large data, customizing encoding for non-standard types, and a real-world inter-service messaging example. By the end you will be able to replace JSON in any serialization-heavy Python workflow and measure the difference.
Packing Python Data: Quick Example
The core msgpack API has two functions: packb() serializes a Python object to bytes, and unpackb() deserializes bytes back to Python. Here is the simplest possible usage:
# quick_msgpack.py
import msgpack
data = {'user': 'alice', 'score': 9850, 'active': True, 'tags': ['python', 'async']}
# Serialize to bytes
packed = msgpack.packb(data, use_bin_type=True)
print(f'Packed type: {type(packed)}')
print(f'Packed size: {len(packed)} bytes')
# Deserialize back to Python
unpacked = msgpack.unpackb(packed, raw=False)
print(f'Unpacked: {unpacked}')
import json
json_bytes = json.dumps(data).encode()
print(f'JSON size: {len(json_bytes)} bytes')
print(f'Size reduction: {1 - len(packed)/len(json_bytes):.0%}')
Output:
Packed type: <class 'bytes'>
Packed size: 44 bytes
Unpacked: {'user': 'alice', 'score': 9850, 'active': True, 'tags': ['python', 'async']}
JSON size: 68 bytes
Size reduction: 35%
The two keyword arguments you almost always want are use_bin_type=True on pack (tells msgpack to encode Python str as MessagePack str, not raw bytes) and raw=False on unpack (tells it to decode MessagePack str back to Python str instead of bytes). Without them you end up with unexpected bytes keys in your dicts. The sections below explain every important option.
What Is MessagePack and How Does It Work?
MessagePack is a binary serialization standard — a compact encoding where each value is prefixed with a type tag and length. An integer like 255 takes 2 bytes (tag + value). A short string like "alice" takes 6 bytes (1 byte tag with length encoded, 5 bytes payload). JSON would need 7 bytes just for the string "alice" and another 2 for the surrounding quotes.
| Type | JSON encoding | msgpack encoding |
|---|---|---|
| Integer 42 | 42 — 2 bytes | \x2a — 1 byte |
| Boolean True | true — 4 bytes | \xc3 — 1 byte |
| String “hello” | "hello" — 7 bytes | \xa5hello — 6 bytes |
| None / null | null — 4 bytes | \xc0 — 1 byte |
| Raw bytes | base64 string (33% larger) | native bin type |
Savings are modest for single values but compound across large nested structures with many keys and short values. The bigger win is often encoding speed: msgpack skips the string-to-float and string-to-int conversions that JSON parsers must perform for every number.
Packing and Unpacking Options
String vs Bytes: use_bin_type and raw
The most common source of msgpack confusion is the str/bytes encoding. MessagePack has two binary string types: raw (legacy) and bin (explicit binary). You want use_bin_type=True + raw=False to get Python 3 semantics — strings stay strings, bytes stay bytes.
# str_bytes_demo.py
import msgpack
payload = {'name': 'bob', 'avatar': b'\x89PNG\r\n\x1a\n'}
# Correct settings for Python 3
packed = msgpack.packb(payload, use_bin_type=True)
result = msgpack.unpackb(packed, raw=False)
print('name type:', type(result['name']), result['name'])
print('avatar type:', type(result['avatar']), result['avatar'][:4])
# Wrong (default legacy mode): str comes back as bytes
packed_legacy = msgpack.packb(payload)
result_legacy = msgpack.unpackb(packed_legacy)
print('\nLegacy mode name type:', type(result_legacy['name']))
Output:
name type: <class 'str'> bob
avatar type: <class 'bytes'> b'\x89PNG'
Legacy mode name type: <class 'bytes'>
In legacy mode (the default for backward compatibility) unpackb() returns all string-like data as bytes. That means your dict keys come back as b'name' instead of 'name', breaking any code that expects string keys. Always use the explicit mode for new code.
Strict Map Keys
By default, msgpack requires dict keys to be strings or bytes. If you have integer keys (common in lookup tables), pass strict_map_key=False to unpackb():
# int_keys.py
import msgpack
data = {1: 'one', 2: 'two', 100: 'hundred'}
packed = msgpack.packb(data, use_bin_type=True)
result = msgpack.unpackb(packed, raw=False, strict_map_key=False)
print(result)
Output:
{1: 'one', 2: 'two', 100: 'hundred'}
Streaming and Large Payloads
When serializing large data that will be transmitted over a socket or written to a file, the Packer and Unpacker objects give you control over chunked encoding and incremental decoding. This avoids loading the whole payload into memory at once.
# streaming.py
import msgpack
import io
# Streaming pack: write many objects to a buffer
buf = io.BytesIO()
packer = msgpack.Packer(use_bin_type=True)
records = [
{'id': 1, 'event': 'login', 'user': 'alice'},
{'id': 2, 'event': 'click', 'user': 'bob'},
{'id': 3, 'event': 'logout', 'user': 'alice'},
]
for record in records:
buf.write(packer.pack(record))
# Streaming unpack: read back incrementally
buf.seek(0)
unpacker = msgpack.Unpacker(raw=False)
unpacker.feed(buf.read())
for obj in unpacker:
print(obj)
Output:
{'id': 1, 'event': 'login', 'user': 'alice'}
{'id': 2, 'event': 'click', 'user': 'bob'}
{'id': 3, 'event': 'logout', 'user': 'alice'}
The Packer and Unpacker objects are designed for streaming use cases: writing to a socket in chunks, reading from a Kafka stream, or processing a large binary file one record at a time. The Unpacker.feed() method accepts arbitrary byte chunks — it will reconstruct complete objects even if the data arrives in fragments.
Custom Type Encoding
MessagePack natively handles dict, list, str, bytes, int, float, bool, and None. For custom objects like datetime, you need to provide serializer and deserializer hooks via default and object_hook.
# custom_types.py
import msgpack
from datetime import datetime
def encode_datetime(obj):
if isinstance(obj, datetime):
return {'__datetime__': True, 'iso': obj.isoformat()}
raise TypeError(f'Unknown type: {type(obj)}')
def decode_datetime(obj):
if obj.get('__datetime__'):
return datetime.fromisoformat(obj['iso'])
return obj
event = {
'event': 'user_signup',
'timestamp': datetime(2026, 5, 7, 9, 30, 0),
'user_id': 42,
}
packed = msgpack.packb(event, default=encode_datetime, use_bin_type=True)
result = msgpack.unpackb(packed, object_hook=decode_datetime, raw=False)
print('event:', result['event'])
print('timestamp:', result['timestamp'], type(result['timestamp']))
Output:
event: user_signup
timestamp: 2026-05-07 09:30:00 <class 'datetime.datetime'>
The default function receives any object that packb cannot handle natively and must return something msgpack can encode — typically a dict with a type tag. The object_hook is called on every decoded dict, so check your type tag before transforming. This pattern works for any custom class: Decimal, UUID, NumPy arrays, Pydantic models, and so on.
Real-Life Example: Inter-Service Message Queue
The script below simulates a producer publishing events and a consumer reading them, both using msgpack for serialization. This is the exact pattern used by Redis Streams, Kafka, and other message queue integrations.
# message_queue_demo.py
import msgpack
import queue
import threading
import time
from datetime import datetime
# Shared in-memory queue (stands in for Redis/Kafka in this demo)
message_bus = queue.Queue()
def encode_event(obj):
if isinstance(obj, datetime):
return {'__dt__': obj.isoformat()}
raise TypeError(f'Cannot encode {type(obj)}')
def decode_event(obj):
if '__dt__' in obj:
return datetime.fromisoformat(obj['__dt__'])
return obj
def producer(n_events):
events = [
{'type': 'page_view', 'url': '/home', 'user': 1},
{'type': 'add_to_cart','sku': WIDGET-42','user': 1},
{'type': 'page_view', 'url': '/checkout',user': 2},
]
for i in range(n_events):
event = dict(events[i % len(events)])
event['timestamp'] = datetime.now()
event['seq'] = i
packed = msgpack.packb(event, default=encode_event, use_bin_type=True)
message_bus.put(packed)
message_bus.put(None) # Sentinel to stop consumer
def consumer():
received = 0
total_bytes = 0
while True:
raw = message_bus.get()
if raw is None:
break
event = msgpack.unpackb(raw, object_hook=decode_event, raw=False)
received += 1
total_bytes += len(raw)
print(f'Consumer: {received} events, {total_bytes} bytes total')
print(f'Average bytes/event: {total_bytes/received:.1f}')
prod_thread = threading.Thread(target=producer, args=(9,))
cons_thread = threading.Thread(target=consumer)
cons_thread.start()
prod_thread.start()
prod_thread.join()
cons_thread.join()
Output:/p>
Consumer: 9 events, 432 bytes total
Average bytes/event: 48.0
The same 9 events encoded as JSON (with ISO timestamp strings) would be roughly 120-140 bytes per event — nearly 3x larger. The datetime hook keeps the types round-tripping cleanly. In a real system replace queue.Queue with a Redis or Kafka client: the packb/unpackb calls stay identical.
Frequently Asked Questions
When should I use msgpack instead of JSON?
Use msgpack when you control both the serializer and deserializer (internal services, message queues, caches), you care about payload size or serialization speed, or you need to encode binary data without base64 overhead. Stick with JSON when the data will be read by humans, logged to text files, or consumed by third-party services that expect JSON.
How does msgpack compare to pickle?
Pickle handles arbitrary Python objects but is Python-only and carries a remote code execution risk if you unpickle untrusted data. msgpack is language-agnostic (clients exist in Go, Rust, Java, JavaScript, etc.) and safe to deserialize from untrusted sources. Use msgpack for cross-language or cross-service communication; use pickle only for trusted Python-to-Python data that benefits from full object serialization.
Does msgpack require a C compiler?
No. pip install msgpack downloads a pre-compiled wheel on all major platforms (Windows, macOS, Linux). The C extension is bundled in the wheel. If no matching wheel exists for your platform, it falls back to a pure-Python implementation that is slower but fully functional.
Does msgpack enforce a schema?
No. Like JSON, msgpack is schema-less by default — you can pack any Python dict structure without declaring types upfront. If you need schema validation, combine msgpack with a library like msgspec (which has its own built-in msgpack encoder) or validate with Pydantic after unpacking.
Can msgpack handle very large integers?
Yes. msgpack supports integers up to 64-bit unsigned (0 to 18,446,744,073,709,551,615) natively. For Python’s arbitrary-precision integers beyond that range, you need a custom encoder that converts them to strings or bytes, similar to the datetime example above.
Conclusion
You have covered the full msgpack toolkit: packb() and unpackb() with the correct use_bin_type/raw settings, streaming with Packer/Unpacker, integer-keyed dicts, and custom type encoding for datetime and similar objects. The message queue demo shows the pattern in a real async-ready scenario.
The next step is replacing JSON in a high-throughput code path and benchmarking the difference with timeit or pytest-benchmark. You may also want to look at msgspec, which builds on the MessagePack format and adds compile-time schema validation with Pydantic-like model classes.
Official documentation: msgpack.org and the Python library on GitHub.