Intermediate

You assign a list to a new variable, modify the new variable, and suddenly your original list has changed too. If you’ve been burned by this Python “gotcha,” you’ve encountered the difference between a reference and a copy. Python doesn’t automatically duplicate objects when you assign them — it creates another reference to the same object in memory. For simple integers and strings this doesn’t matter, but for mutable containers like lists, dicts, and custom objects it causes subtle bugs that are notoriously hard to track down.

The fix is Python’s built-in copy module, which provides two copying strategies: shallow copy and deep copy. Shallow copy creates a new container but shares references to the inner objects. Deep copy recursively duplicates everything, giving you a completely independent clone. The module is part of Python’s standard library — no installation needed. You just need to understand which strategy to use and why.

In this tutorial, we’ll cover how Python assignment works under the hood, what shallow and deep copies actually do to memory, how to use copy.copy() and copy.deepcopy(), and how to customize copying behavior for your own classes. By the end, you’ll know exactly which tool to reach for when you need to duplicate data safely.

Quick Example: copy vs deepcopy

Here’s a minimal example that shows the core difference between the two copy strategies:

# copy_quick.py
import copy

original = [[1, 2, 3], [4, 5, 6]]

shallow = copy.copy(original)
deep = copy.deepcopy(original)

# Modify a nested list
original[0][0] = 99

print("Original:", original)   # [[99, 2, 3], [4, 5, 6]]
print("Shallow: ", shallow)    # [[99, 2, 3], [4, 5, 6]]  <-- affected!
print("Deep:    ", deep)       # [[1, 2, 3], [4, 5, 6]]   <-- unaffected

Output:

Original: [[99, 2, 3], [4, 5, 6]]
Shallow:  [[99, 2, 3], [4, 5, 6]]
Deep:     [[1, 2, 3], [4, 5, 6]]

The shallow copy created a new list, but the inner lists are still shared references. When we modified original[0][0], the shallow copy reflected that change. The deep copy is completely independent -- modifying the original has no effect on it. The sections below explain exactly why this happens and how to handle more complex cases.

How Python Assignment Actually Works

Before we talk about copying, we need to understand what Python assignment actually does. When you write b = a, Python does not create a new object. It creates a new name that points to the same object in memory. Both a and b are references -- labels attached to the same underlying data.

# assignment_vs_copy.py
a = [1, 2, 3]
b = a          # b is just another name for the same list

b.append(4)

print("a:", a)  # [1, 2, 3, 4]  <-- modified!
print("b:", b)  # [1, 2, 3, 4]
print("Same object?", a is b)  # True

Output:

a: [1, 2, 3, 4]
b: [1, 2, 3, 4]
Same object? True

The is operator checks whether two variables point to the same object in memory. Since a and b are the same object, modifying one modifies both. This is a feature, not a bug -- it's how Python keeps memory usage low. But it means you must be deliberate when you actually want an independent copy.

Shallow Copy with copy.copy()

A shallow copy creates a new container object, but fills it with references to the same items as the original. For a flat list (a list containing only non-mutable items like integers or strings), a shallow copy behaves like a full independent copy. The problem arises with nested structures.

# shallow_copy.py
import copy

# Flat list -- shallow copy works fine
flat = [1, 2, 3, 4, 5]
flat_copy = copy.copy(flat)
flat_copy.append(99)

print("Flat original:", flat)       # [1, 2, 3, 4, 5]
print("Flat copy:    ", flat_copy)  # [1, 2, 3, 4, 5, 99]

# Nested list -- shallow copy shares inner references
nested = [[10, 20], [30, 40]]
nested_copy = copy.copy(nested)

nested[0].append(99)  # Mutate an inner list

print("Nested original:", nested)       # [[10, 20, 99], [30, 40]]
print("Nested copy:    ", nested_copy)  # [[10, 20, 99], [30, 40]] -- shared!

Output:

Flat original: [1, 2, 3, 4, 5]
Flat copy:     [1, 2, 3, 4, 5, 99]
Nested original: [[10, 20, 99], [30, 40]]
Nested copy:     [[10, 20, 99], [30, 40]]

With the flat list, appending to flat_copy doesn't affect flat -- they're separate containers. But with the nested list, nested_copy[0] and nested[0] are still the same inner list. The outer containers are different, but the inner objects are shared. Use copy.copy() when your data structure is flat or when you intentionally want to share the inner objects.

Shorthand for Shallow Copies

Python offers several built-in shorthand ways to create a shallow copy without importing the copy module. These are idiomatic and commonly seen in real codebases:

# shallow_shorthand.py
original = [1, 2, 3]

# These all produce shallow copies:
copy1 = original[:]          # slice notation
copy2 = list(original)       # list constructor
copy3 = original.copy()      # list.copy() method

# For dicts:
d = {'a': 1, 'b': [2, 3]}
d_copy = d.copy()            # dict.copy() -- also shallow!

d['b'].append(99)
print(d_copy['b'])  # [2, 3, 99] -- shared inner list

Output:

[2, 3, 99]

All of these shortcuts -- slice notation, the list() constructor, and .copy() -- produce shallow copies. This is an important gotcha: dict.copy() does NOT create a deep copy of a dict. If your dict contains mutable values (like lists or other dicts), mutations to those inner objects will still be visible in the copy.

Deep Copy with copy.deepcopy()

A deep copy recursively duplicates every object in the structure. The copy is completely independent -- modifying any part of the original, at any depth, will not affect the copy. This is the safest choice when you need true independence between your original data and your copy.

# deep_copy.py
import copy

data = {
    'name': 'Alice',
    'scores': [95, 87, 92],
    'address': {
        'city': 'Melbourne',
        'postcode': 3000
    }
}

clone = copy.deepcopy(data)

# Mutate everything in original
data['name'] = 'Bob'
data['scores'].append(100)
data['address']['city'] = 'Sydney'

print("Original:", data)
print("Clone:   ", clone)

Output:

Original: {'name': 'Bob', 'scores': [95, 87, 92, 100], 'address': {'city': 'Sydney', 'postcode': 3000}}
Clone:    {'name': 'Alice', 'scores': [95, 87, 92], 'address': {'city': 'Melbourne', 'postcode': 3000}}

Every nested object -- the list of scores and the address dict -- was independently duplicated. Changes to the original have zero effect on the clone. Deep copy is the right tool when you need to preserve a snapshot of a complex data structure before modifying it, or when you're passing data to a function that might mutate it unexpectedly.

Deep Copy Performance Considerations

Deep copy is not free. It has to traverse the entire object graph, allocate new objects, and track circular references. For large nested structures, this can be noticeably slower than a shallow copy. Here's a simple benchmark to make the difference concrete:

# copy_performance.py
import copy
import time

# Build a deeply nested structure
big_data = {'items': [{'id': i, 'tags': [str(i*j) for j in range(10)]} for i in range(1000)]}

start = time.time()
for _ in range(100):
    copy.copy(big_data)
print(f"Shallow copy x100: {(time.time()-start)*1000:.1f}ms")

start = time.time()
for _ in range(100):
    copy.deepcopy(big_data)
print(f"Deep copy x100:    {(time.time()-start)*1000:.1f}ms")

Output (approximate):

Shallow copy x100: 0.3ms
Deep copy x100:    245.7ms

Deep copy is roughly 800x slower on this example. For small objects or infrequent copying this doesn't matter at all. But in a tight loop processing thousands of items, it can become a real bottleneck. If performance is critical and you only need independence at the top level, shallow copy is faster.

Copying Custom Objects

The copy module works with custom Python classes too. By default, copy.copy() creates a new instance of the class but shares all attribute references (shallow), while copy.deepcopy() recursively copies all attributes (deep). You can customize this behavior by implementing __copy__ and __deepcopy__ on your class.

# copy_custom_class.py
import copy

class Config:
    def __init__(self, name, settings):
        self.name = name
        self.settings = settings  # mutable dict

    def __repr__(self):
        return f"Config(name={self.name!r}, settings={self.settings})"

cfg = Config("production", {"debug": False, "retries": 3})

# Shallow copy -- settings dict is shared
shallow_cfg = copy.copy(cfg)
shallow_cfg.settings['debug'] = True

print("Original:", cfg)         # settings shows debug=True too
print("Shallow: ", shallow_cfg)

# Deep copy -- settings dict is independent
deep_cfg = copy.deepcopy(cfg)
deep_cfg.settings['retries'] = 99

print("Original:", cfg)         # settings unchanged
print("Deep:    ", deep_cfg)

Output:

Original: Config(name='production', settings={'debug': True, 'retries': 3})
Shallow:  Config(name='production', settings={'debug': True, 'retries': 3})
Original: Config(name='production', settings={'debug': True, 'retries': 3})
Deep:     Config(name='production', settings={'debug': True, 'retries': 99})

The shallow copy shares the settings dict, so modifying it in the copy affects the original. The deep copy has its own independent settings dict. This is the typical behavior -- it applies to any class that holds mutable attributes.

Custom __copy__ and __deepcopy__

Sometimes you want fine-grained control. For example, you might want a deep copy that excludes a heavy cache attribute (no point duplicating cached data). Implement __copy__ and __deepcopy__ to control exactly what gets duplicated:

# custom_copy_hooks.py
import copy

class ExpensiveModel:
    def __init__(self, data, cache=None):
        self.data = data
        self.cache = cache or {}  # expensive to duplicate

    def __copy__(self):
        # Shallow: share data reference, reset cache
        cls = self.__class__
        result = cls.__new__(cls)
        result.__dict__.update(self.__dict__)
        result.cache = {}  # fresh cache, don't share
        return result

    def __deepcopy__(self, memo):
        cls = self.__class__
        result = cls.__new__(cls)
        memo[id(self)] = result
        result.data = copy.deepcopy(self.data, memo)  # deep copy data
        result.cache = {}                              # fresh cache, skip deepcopy
        return result

    def __repr__(self):
        return f"ExpensiveModel(data={self.data}, cache_size={len(self.cache)})"

model = ExpensiveModel({'weights': [0.1, 0.2, 0.3]}, cache={'key': 'value'})
cloned = copy.deepcopy(model)

print("Original:", model)
print("Cloned:  ", cloned)  # cache is empty, data is independent

Output:

Original: ExpensiveModel(data={'weights': [0.1, 0.2, 0.3]}, cache_size=1)
Cloned:   ExpensiveModel(data={'weights': [0.1, 0.2, 0.3]}, cache_size=0)

The memo parameter in __deepcopy__ is important -- it tracks already-copied objects to handle circular references. Always pass it to recursive copy.deepcopy() calls inside your implementation.

Shallow vs Deep vs Assignment: Quick Reference

Here's a summary of all three approaches to help you choose the right one:

ApproachNew Container?New Inner Objects?Use When
b = aNoNoYou want two names for the same object
copy.copy(a)YesNo (shared)Flat structures, or intentional sharing of inner objects
copy.deepcopy(a)YesYes (independent)Nested structures, snapshots, safety-first

Real-Life Example: Safe Configuration Management

A common use case for deep copy is managing environment-specific configurations. You start with a base config and create independent overrides for each environment without risking mutations bleeding across environments.

# config_manager.py
import copy

BASE_CONFIG = {
    'database': {
        'host': 'localhost',
        'port': 5432,
        'name': 'myapp_dev',
        'pool_size': 5
    },
    'cache': {
        'backend': 'redis',
        'timeout': 300
    },
    'debug': True,
    'allowed_hosts': ['localhost', '127.0.0.1']
}

def create_env_config(base, overrides):
    """Create an independent environment config from base + overrides."""
    config = copy.deepcopy(base)
    for key, value in overrides.items():
        if isinstance(value, dict) and key in config:
            config[key].update(value)
        else:
            config[key] = value
    return config

# Create staging and production configs
staging = create_env_config(BASE_CONFIG, {
    'database': {'host': 'staging-db.internal', 'name': 'myapp_staging'},
    'debug': False
})

production = create_env_config(BASE_CONFIG, {
    'database': {'host': 'prod-db.internal', 'name': 'myapp_prod', 'pool_size': 20},
    'debug': False,
    'allowed_hosts': ['myapp.com', 'www.myapp.com']
})

# Mutate production config -- base should be untouched
production['allowed_hosts'].append('api.myapp.com')

print("Base allowed_hosts:", BASE_CONFIG['allowed_hosts'])
print("Staging DB host:   ", staging['database']['host'])
print("Production hosts:  ", production['allowed_hosts'])
print("Production pool:   ", production['database']['pool_size'])

Output:

Base allowed_hosts: ['localhost', '127.0.0.1']
Staging DB host:    staging-db.internal
Production hosts:   ['myapp.com', 'www.myapp.com', 'api.myapp.com']
Production pool:    20

Because create_env_config uses copy.deepcopy(base), every call produces a fully independent config. You can safely mutate staging and production configs without ever touching the base. This pattern is common in application factories, test setup, and any system where you need to start from a known baseline and diverge safely.

Frequently Asked Questions

When should I use deepcopy vs shallow copy?

Use deepcopy when your data structure contains nested mutable objects (lists, dicts, or custom classes) and you need full independence between the original and the copy. Use shallow copy when your structure is flat (contains only immutable items like integers, strings, or tuples) or when you deliberately want the copy to share references to inner objects for memory efficiency. If you're unsure, deepcopy is the safe default.

Does deepcopy handle circular references?

Yes. Python's copy.deepcopy() uses a memo dictionary to track objects it has already copied. When it encounters an object it has already processed, it returns the already-copied version instead of recursing infinitely. This means you can safely deepcopy objects that reference themselves or form cycles in their object graph. The same memo dict is why you must pass it through when implementing __deepcopy__.

Do I need to copy immutable objects like tuples and strings?

No. Immutable objects like integers, strings, tuples, and frozensets cannot be modified in place, so sharing references is completely safe. Both copy.copy() and copy.deepcopy() will often return the original object unchanged for immutable types -- there's no point in creating a duplicate. The copying concern only applies to mutable objects.

Does deepcopy work with NumPy arrays and Pandas DataFrames?

It does, but these libraries provide their own more efficient copy methods: array.copy() for NumPy and df.copy(deep=True) for Pandas. Use those instead of copy.deepcopy() when working with numerical or tabular data -- they're optimized for the underlying C memory layout and will be significantly faster. copy.deepcopy() works as a fallback but is much slower on large arrays.

How is deepcopy related to pickling?

Both deepcopy and pickle serialize an object's full state, but they serve different purposes. Pickling converts an object to bytes for saving to disk or sending over a network. Deepcopy keeps everything in memory and returns a live Python object. Interestingly, objects that implement __reduce__ or __getstate__/__setstate__ for pickling will often use those same hooks during deepcopy. If you need a deep copy of an object that can be pickled, pickle.loads(pickle.dumps(obj)) is an alternative, though slower.

Can I use json.loads(json.dumps(data)) as a deepcopy?

Yes, for simple data structures containing only JSON-serializable types (dicts, lists, strings, numbers, booleans, None). This pattern is occasionally used as a fast deep copy alternative since the JSON round-trip is sometimes faster than deepcopy for large flat structures. However, it loses non-JSON types: tuples become lists, datetime objects raise errors, custom class instances are not preserved. Use it with caution and only when you know your data is 100% JSON-serializable.

Conclusion

Python's copy module gives you precise control over how objects are duplicated. Assignment creates a second reference to the same object -- no copying at all. copy.copy() creates a new container but shares references to inner objects, which is fast and appropriate for flat structures. copy.deepcopy() recursively duplicates everything, giving you a completely independent clone at the cost of more memory and CPU time.

The real-life config manager example shows how deepcopy enables safe environment-specific configuration without any risk of mutations leaking between environments. The same pattern applies anywhere you need to branch from a known baseline: test setup, undo/redo history, job queue snapshots, or any pipeline where you need to preserve the original state while a downstream process modifies the data.

For custom classes, implementing __copy__ and __deepcopy__ gives you full control -- useful when some attributes (like caches or open file handles) should not be duplicated. The official documentation at docs.python.org/3/library/copy.html has the complete reference.