How To Use Python Pickle and Shelve for Object Serialization

Last Updated: June 01, 2026

Table of Contents

Python Pickle Quick Example
What Is Pickle and When Should You Use It?
Pickling Custom Classes
Shelve: A Persistent Dictionary
Real-Life Example: API Response Cache
Frequently Asked Questions
Conclusion
Related Articles

Intermediate

You have trained a machine learning model that took three hours to fit, and now you need to save it so you can load it tomorrow without retraining. Or you have a complex nested data structure — dictionaries containing lists of custom objects — and you need to persist it between script runs. JSON cannot handle custom Python objects, and writing a custom serializer for every class is tedious. This is where pickle comes in.

Python’s pickle module serializes Python objects into a byte stream and deserializes them back. Its companion module shelve builds on pickle to give you a persistent dictionary backed by a file. Both are in the standard library — no installation needed.

In this article, we will start with a quick example of saving and loading objects, then explain how pickle works and its security implications. We will cover pickling custom classes, using shelve as a persistent key-value store, and best practices for safe serialization. We will finish with a real-life project that caches expensive API results.

Written by Pubs

Python developer and educator with 15+ years building production systems across data engineering, web APIs, and AI tooling. Founder of Python How To Program — 270+ in-depth tutorials covering the modern Python stack.

View all tutorials by Pubs →

Python Pickle Quick Example

# quick_pickle.py
import pickle

# Save a complex data structure
data = {
    "users": ["Alice", "Bob", "Charlie"],
    "scores": {"Alice": [95, 87, 92], "Bob": [78, 85], "Charlie": [90]},
    "metadata": {"version": 2, "created": "2026-04-12"}
}

# Serialize to file
with open("data.pkl", "wb") as f:
    pickle.dump(data, f)
print("Saved to data.pkl")

# Load it back
with open("data.pkl", "rb") as f:
    loaded = pickle.load(f)

print(f"Users: {loaded['users']}")
print(f"Alice scores: {loaded['scores']['Alice']}")
print(f"Same data: {data == loaded}")

Output:

Saved to data.pkl
Users: ['Alice', 'Bob', 'Charlie']
Alice scores: [95, 87, 92]
Same data: True

Two function calls: pickle.dump() to save and pickle.load() to restore. The entire nested structure — dict, lists, strings, integers — survives the round trip perfectly. Let us explore how this works and when to use it.

What Is Pickle and When Should You Use It?

Pickle converts Python objects into a byte stream (serialization) and reconstructs them from that byte stream (deserialization). Unlike JSON, which only handles basic types (strings, numbers, lists, dicts), pickle can serialize almost any Python object: custom classes, functions, sets, datetime objects, and complex nested structures.

Feature	pickle	JSON	shelve
Python-specific types	Yes (classes, sets, etc.)	No (basic types only)	Yes (uses pickle)
Human readable	No (binary)	Yes	No
Cross-language	No (Python only)	Yes	No
Security	Unsafe with untrusted data	Safe	Unsafe with untrusted data
Key-value access	No (full load)	No (full load)	Yes (dict-like)

Use pickle when: you need to save Python-specific objects (custom classes, ML models, complex structures) and the data stays within your own code. Use JSON when: you need human-readable output or cross-language compatibility. Never unpickle data from untrusted sources — pickle can execute arbitrary code during deserialization.

Pickling objects into jars — pickle.dump() — freeze any Python object. pickle.load() — thaw it back to life.

Pickling Custom Classes

One of pickle’s most powerful features is its ability to serialize custom objects. Your classes work with pickle automatically as long as their attributes are themselves picklable.

# custom_pickle.py
import pickle
from datetime import datetime

class Task:
    def __init__(self, title, priority="medium"):
        self.title = title
        self.priority = priority
        self.created = datetime.now()
        self.completed = False
    
    def complete(self):
        self.completed = True
    
    def __repr__(self):
        status = "done" if self.completed else "pending"
        return f"Task('{self.title}', {self.priority}, {status})"

# Create tasks
tasks = [
    Task("Write tests", "high"),
    Task("Update docs", "low"),
    Task("Fix bug #42", "high"),
]
tasks[0].complete()

# Save
with open("tasks.pkl", "wb") as f:
    pickle.dump(tasks, f)
print(f"Saved {len(tasks)} tasks")

# Load
with open("tasks.pkl", "rb") as f:
    loaded_tasks = pickle.load(f)

for task in loaded_tasks:
    print(f"  {task}")
print(f"\nFirst task created: {loaded_tasks[0].created}")

Output:

Saved 3 tasks
  Task('Write tests', high, done)
  Task('Update docs', low, pending)
  Task('Fix bug #42', high, pending)

First task created: 2026-04-12 09:30:15.123456

Notice that everything was preserved: the string attributes, the boolean completed state, and even the datetime object. Pickle captures the full state of each object, including the state changes we made by calling complete().

Pickle Protocols and bytes

# pickle_protocols.py
import pickle

data = {"key": "value", "numbers": [1, 2, 3]}

# Serialize to bytes (not to a file)
pickled_bytes = pickle.dumps(data)
print(f"Pickled size: {len(pickled_bytes)} bytes")
print(f"Bytes preview: {pickled_bytes[:30]}...")

# Deserialize from bytes
restored = pickle.loads(pickled_bytes)
print(f"Restored: {restored}")

# Check available protocols
print(f"\nHighest protocol: {pickle.HIGHEST_PROTOCOL}")
print(f"Default protocol: {pickle.DEFAULT_PROTOCOL}")

# Use highest protocol for best performance
fast_bytes = pickle.dumps(data, protocol=pickle.HIGHEST_PROTOCOL)
print(f"Highest protocol size: {len(fast_bytes)} bytes")

Output:

Pickled size: 52 bytes
Bytes preview: b'\x80\x05\x95*\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x03key'...
Restored: {'key': 'value', 'numbers': [1, 2, 3]}

Highest protocol: 5
Default protocol: 5
Highest protocol size: 52 bytes

Use pickle.dumps() and pickle.loads() (note the ‘s’) for in-memory serialization to bytes — useful for sending objects over networks or storing in databases. Higher protocol numbers generally produce smaller output and deserialize faster.

Pickle protocols and bytes — pickle.dumps() for bytes, pickle.dump() for files. One letter changes everything.

Shelve: A Persistent Dictionary

shelve builds on pickle to give you a dictionary-like interface backed by a file. Instead of loading and saving entire data structures, you read and write individual keys — perfect for caching and simple data stores.

# shelve_basics.py
import shelve

# Open a shelf (creates the file if it does not exist)
with shelve.open("mydata") as db:
    db["users"] = ["Alice", "Bob", "Charlie"]
    db["config"] = {"theme": "dark", "lang": "en"}
    db["counter"] = 42
    print(f"Stored {len(db)} keys")

# Read back individual keys
with shelve.open("mydata") as db:
    print(f"Users: {db['users']}")
    print(f"Config: {db['config']}")
    print(f"Keys: {list(db.keys())}")
    
    # Check if key exists
    print(f"Has 'users': {'users' in db}")
    print(f"Has 'missing': {'missing' in db}")
    
    # Update a value
    db["counter"] = db["counter"] + 1
    print(f"Counter: {db['counter']}")

Output:

Stored 3 keys
Users: ['Alice', 'Bob', 'Charlie']
Config: {'theme': 'dark', 'lang': 'en'}
Keys: ['users', 'config', 'counter']
Has 'users': True
Has 'missing': False
Counter: 43

Shelve works exactly like a dictionary — you use square bracket syntax to get and set values, in to check membership, and keys() to list all keys. The difference is that the data persists on disk between runs. Always use the with statement to ensure the shelf is properly closed and synced.

Real-Life Example: API Response Cache

API response cache with shelve — Cache hit: 0ms. Cache miss: 2,000ms. shelve makes the difference.

Let us build an API response cache that stores results on disk to avoid making the same request twice. This is useful for development, testing, or any scenario where API calls are slow or rate-limited.

# api_cache.py
import shelve
import time
import hashlib
import json

class APICache:
    def __init__(self, cache_file="api_cache", ttl=3600):
        self.cache_file = cache_file
        self.ttl = ttl  # Time to live in seconds
    
    def _make_key(self, url, params=None):
        key_data = url + json.dumps(params or {}, sort_keys=True)
        return hashlib.md5(key_data.encode()).hexdigest()
    
    def get(self, url, params=None):
        key = self._make_key(url, params)
        with shelve.open(self.cache_file) as db:
            if key in db:
                entry = db[key]
                age = time.time() - entry["timestamp"]
                if age < self.ttl:
                    print(f"  Cache HIT for {url} (age: {age:.0f}s)")
                    return entry["data"]
                else:
                    print(f"  Cache EXPIRED for {url} (age: {age:.0f}s)")
        return None
    
    def set(self, url, data, params=None):
        key = self._make_key(url, params)
        with shelve.open(self.cache_file) as db:
            db[key] = {"data": data, "timestamp": time.time(), "url": url}
        print(f"  Cached response for {url}")
    
    def stats(self):
        with shelve.open(self.cache_file) as db:
            total = len(db)
            fresh = sum(1 for k in db if time.time() - db[k]["timestamp"] < self.ttl)
            print(f"  Cache: {total} entries, {fresh} fresh, {total - fresh} expired")

def fetch_with_cache(cache, url, params=None):
    cached = cache.get(url, params)
    if cached is not None:
        return cached
    
    # Simulate API call
    print(f"  Fetching {url}...")
    time.sleep(0.1)  # Simulate network delay
    data = {"url": url, "status": "ok", "results": [1, 2, 3]}
    
    cache.set(url, data, params)
    return data

# Demo
cache = APICache(ttl=60)

print("First requests (cache miss):")
fetch_with_cache(cache, "https://api.example.com/users")
fetch_with_cache(cache, "https://api.example.com/products")

print("\nSecond requests (cache hit):")
fetch_with_cache(cache, "https://api.example.com/users")
fetch_with_cache(cache, "https://api.example.com/products")

print("\nCache statistics:")
cache.stats()

Output:

First requests (cache miss):
  Fetching https://api.example.com/users...
  Cached response for https://api.example.com/users
  Fetching https://api.example.com/products...
  Cached response for https://api.example.com/products

Second requests (cache hit):
  Cache HIT for https://api.example.com/users (age: 0s)
  Cache HIT for https://api.example.com/products (age: 0s)

Cache statistics:
  Cache: 2 entries, 2 fresh, 0 expired

This cache uses shelve for persistent storage, MD5 hashes for unique cache keys, and TTL-based expiration. You could extend it with LRU eviction, cache size limits, or async support for real production use.

Pickle security warning — Never unpickle data you did not create. Deserialization runs arbitrary code.

Frequently Asked Questions

Is pickle safe to use?

Pickle is safe when you only load data you created yourself. Never unpickle data from untrusted sources -- a malicious pickle file can execute arbitrary code on your machine during deserialization. For data exchange with external systems, use JSON, MessagePack, or Protocol Buffers instead.

What types can pickle handle?

Pickle handles most Python types: basic types (int, float, str, bytes, None), containers (list, dict, set, tuple), custom classes, functions defined at module level, and classes defined at module level. It cannot pickle lambda functions, generators, open file handles, database connections, or thread locks.

When should I use shelve instead of pickle?

Use shelve when you need to read and write individual keys without loading the entire dataset into memory. Pickle loads everything at once. Shelve is better for caches, configuration stores, and any scenario where you access data by key rather than as a whole.

How do I handle class changes after pickling?

If you add attributes to a class after pickling objects, implement __getstate__ and __setstate__ to handle migration. Or use __reduce__ for full control over how objects are reconstructed. For simple cases, adding a __init__ default works: getattr(self, "new_attr", default_value).

Is there a size limit for pickle files?

There is no hard limit, but very large pickle files (multi-GB) can cause memory issues since the entire object must fit in RAM during deserialization. For large datasets, consider using pickle with chunked processing, or switch to formats like Parquet or HDF5 that support lazy loading.

Conclusion

We covered Python's serialization tools: pickle for saving and loading any Python object, pickle protocols for optimization, shelve for persistent key-value storage, and critical security considerations. The API cache project demonstrated how shelve powers practical caching solutions.

Try building a session manager or a simple document store with shelve. For the complete reference, see the official pickle documentation and shelve documentation.

Pickle Basics

pickle serializes any Python object to bytes — lists, dicts, custom classes, functions, even modules in some cases. The API is dump/load (file) or dumps/loads (bytes):

import pickle

data = {"users": [{"name": "Alice", "age": 30}, {"name": "Bob", "age": 25}], "version": 2}

# Write to file
with open("data.pkl", "wb") as f:
    pickle.dump(data, f)

# Read it back
with open("data.pkl", "rb") as f:
    loaded = pickle.load(f)

print(loaded)  # exact reproduction of the original

Pickle preserves the object graph perfectly — even circular references, custom class instances, and unhashable nested structures. For "save the entire state of a complex object," nothing beats it.

The Security Warning You Must Read

NEVER unpickle data from an untrusted source. Pickle is a serialization format, but it's also effectively a programming language — a malicious pickle can execute arbitrary code on load:

# DON'T DO THIS
data = pickle.loads(downloaded_from_internet)  # could execute anything

# Safer alternatives for untrusted data
import json
data = json.loads(downloaded_from_internet)  # text only, no code execution

import msgpack
data = msgpack.unpackb(downloaded_from_internet)  # binary, no code execution

Pickle is for trusted, internal use only: caching computation results, saving ML models you trained yourself, swapping data between your own processes. Anything that crosses a trust boundary should use JSON, msgpack, or Protocol Buffers.

Pickle Protocols

Pickle has 6 protocol versions. Higher = faster, smaller, but less portable to older Python:

# Default — Protocol 5 in Python 3.8+
pickle.dumps(data)

# Explicit protocol
pickle.dumps(data, protocol=pickle.HIGHEST_PROTOCOL)   # always pick newest available
pickle.dumps(data, protocol=4)                          # readable by Python 3.4+
pickle.dumps(data, protocol=0)                          # ASCII, slowest, most portable

For internal use, always use HIGHEST_PROTOCOL. The size and speed difference is significant for large objects.

shelve: Pickle-Backed Persistent Dictionary

shelve wraps pickle in a dict-like persistent storage. Each value gets pickled on write, unpickled on read:

import shelve

# Open like a dict — file path replaces dict literal
with shelve.open("cache.shelf") as cache:
    cache["users:1"] = {"name": "Alice", "age": 30, "groups": ["admin", "writers"]}
    cache["users:2"] = {"name": "Bob", "age": 25, "groups": ["readers"]}
    cache["meta"] = {"last_updated": "2026-05-16"}

# Read in another process
with shelve.open("cache.shelf") as cache:
    print(cache["users:1"])
    print(list(cache.keys()))

shelve is the simplest persistent storage in Python — no schema, no connection pool, no daemon. Great for small caches, configuration storage, and prototype state.

Pickle vs JSON vs msgpack vs SQLite

Use case	Pick
Save trained ML model	pickle / joblib
Send data over the network	JSON (readable) or msgpack (smaller)
Save app state between runs (small)	pickle / shelve
Save app state between runs (large)	SQLite
Inter-process queue messages	msgpack or pickle (trusted)
Configuration file	YAML / TOML / JSON
Anything from external/internet	JSON / msgpack — NEVER pickle

Common Pitfalls

Unpickling untrusted data. Repeat for emphasis: pickle is a code-execution format. Treat .pkl files like .py files when deciding trust.
Schema evolution. A pickled class that's been refactored may fail to load. Either keep the class definitions stable, or migrate pickles at code-update time.
shelve write-through. By default, shelve doesn't detect in-place changes to mutable values. cache["k"].append(x) doesn't persist; you need cache["k"] = cache["k"] + [x] or open with writeback=True.
shelve on multiple processes. shelve is single-process — concurrent writes corrupt the file. For multi-process state, use SQLite or Redis.
Forgetting binary mode. open("data.pkl", "w") uses text mode and breaks pickle data. Always use "wb" / "rb".

FAQ

Q: pickle vs joblib?
A: joblib is pickle under the hood with optimizations for NumPy arrays and large objects. For ML models and numeric data, joblib is faster and produces smaller files.

Q: Is pickle cross-version compatible?
A: Across Python minor versions (3.10 → 3.11), usually yes. Across major versions (Python 2 → 3), often no. Pin Python version for long-lived pickled data.

Q: How do I make my class pickle-friendly?
A: Make sure all attributes are picklable (no file handles, no lambdas, no DB connections). For custom serialization, define __getstate__ and __setstate__.

Q: What's the alternative for large arrays?
A: NumPy's np.save / np.load for arrays alone. For DataFrames, pd.to_parquet / pd.read_parquet — faster, smaller, and language-portable.

Q: Should I gzip pickle output?
A: For large pickles, yes — with gzip.open(path, "wb") as f: pickle.dump(data, f). Combine with HIGHEST_PROTOCOL for the smallest result.

Wrapping Up

Pickle is the Swiss Army knife of Python serialization — but with one big edge: NEVER unpickle untrusted data. For internal use it's perfect: ML models, computation caches, prototype state. shelve turns pickle into a persistent dict for the simplest possible storage. For anything that crosses a trust boundary or another language, reach for JSON, msgpack, or Protocol Buffers instead.

Continue Learning Python

Tutorials you might also find useful:

Post Views: 53

How To Use Python Pickle and Shelve for Object Serialization

Python Pickle Quick Example

What Is Pickle and When Should You Use It?

Pickling Custom Classes

Pickle Protocols and bytes

Shelve: A Persistent Dictionary

Real-Life Example: API Response Cache

Frequently Asked Questions

Is pickle safe to use?

What types can pickle handle?

When should I use shelve instead of pickle?

How do I handle class changes after pickling?

Is there a size limit for pickle files?

Conclusion

Pickle Basics

The Security Warning You Must Read

Pickle Protocols

shelve: Pickle-Backed Persistent Dictionary

Pickle vs JSON vs msgpack vs SQLite

Common Pitfalls

FAQ

Wrapping Up

Related Articles

Continue Learning Python

Submit a Comment Cancel reply