How To Work with ZIP Files in Python
Beginner
ZIP files are everywhere. Whether you’re downloading software, transferring files across the internet, or backing up critical data, you’ve almost certainly encountered a compressed archive. But what if you need to work with ZIP files programmatically? Python makes it surprisingly easy with the built-in zipfile module, which lets you create, read, extract, and modify ZIP archives directly from your code.
If you’ve ever felt intimidated by file compression or thought you needed external tools to handle archives, don’t worry. In this tutorial, we’ll walk you through everything you need to know. By the end, you’ll be able to create sophisticated backup systems, extract files on demand, apply password protection, and even compress data using different algorithms—all with clean, Pythonic code.
Here’s what we’ll cover: we’ll start with a quick example to see the module in action, then explore what ZIP files are and why they matter. We’ll build up from creating basic archives to handling complex scenarios like password-protected files and selective extraction. Finally, we’ll look at a real-world backup system and answer common questions you’ll encounter in production code.
Quick Example: Creating and Reading Your First ZIP File
Let’s jump straight in and see the zipfile module in action. This simple example creates a ZIP file containing a text file, then reads it back:
# quick_example.py
import zipfile
# Create a ZIP file and add content
with zipfile.ZipFile('archive.zip', 'w') as zf:
zf.writestr('hello.txt', 'Hello from Python!')
# Read it back and print the contents
with zipfile.ZipFile('archive.zip', 'r') as zf:
print(zf.read('hello.txt').decode('utf-8'))
Hello from Python!
See? In just a few lines, you’ve created a ZIP archive, added a file, and retrieved its contents. The with statement handles opening and closing the archive automatically, which keeps your code clean and prevents resource leaks. This pattern—using with for context management—will be your bread and butter when working with ZIP files.
What Are ZIP Files and Why Use Python?
ZIP is a widely supported archive format that combines file compression with a directory structure. Unlike raw compression formats (like GZIP), ZIP files are containers that can hold multiple files and folders while preserving their hierarchy and metadata. ZIP compression is lossless, meaning no data is lost during compression, and the format is supported natively on Windows, macOS, and Linux—no special software required.
You might ask: why not just use shell commands or GUI tools? Python offers several advantages. First, it lets you automate archival workflows inside your application. Second, you can process ZIP files without extracting them to disk, saving I/O overhead. Third, you get programmatic control over compression levels, passwords, and selective extraction. Fourth, your code becomes cross-platform instantly—the same script runs on any OS with Python.
Here’s how ZIP compares to other formats:
| Format | Compression Ratio | Multiple Files | Directories | Password Support | Platform Support |
|---|---|---|---|---|---|
| ZIP | Good | Yes | Yes | Yes | Universal |
| TAR + GZIP | Excellent | Yes | Yes | No | Unix/Linux |
| 7-Zip | Excellent | Yes | Yes | Yes | Limited |
| RAR | Good | Yes | Yes | Yes | Limited |
ZIP strikes a sweet spot: it’s universally recognized, compresses reasonably well, and requires no external dependencies in Python. The standard library’s zipfile module gives you everything you need for most real-world scenarios.
Creating ZIP Files from Scratch
The most common task is creating a ZIP file from existing files on disk. The ZipFile class handles this elegantly. You instantiate it with a filename and a mode ('w' for write), then add files using write():
# create_archive.py
import zipfile
import os
# Create a ZIP file
with zipfile.ZipFile('my_archive.zip', 'w') as zf:
zf.write('data.txt')
zf.write('config.json')
zf.write('README.md')
# Verify the contents
with zipfile.ZipFile('my_archive.zip', 'r') as zf:
print('Files in archive:')
for filename in zf.namelist():
info = zf.getinfo(filename)
print(f' {filename} ({info.file_size} bytes)')
Files in archive:
data.txt (142 bytes)
config.json (89 bytes)
README.md (256 bytes)
The namelist() method returns a list of all files in the archive, and getinfo() retrieves metadata like the original file size. Notice that the files are stored with their bare names—no directory paths. If you want to preserve directory structure, you need to be explicit about it:
# preserve_structure.py
import zipfile
import os
with zipfile.ZipFile('my_archive.zip', 'w') as zf:
# Add files with their directory paths
zf.write('src/main.py', arcname='src/main.py')
zf.write('src/utils.py', arcname='src/utils.py')
zf.write('data/config.txt', arcname='data/config.txt')
# Read and display structure
with zipfile.ZipFile('my_archive.zip', 'r') as zf:
zf.printdir()
File Name Modified Size
src/main.py 2026-04-05 10:23:14 1024
src/utils.py 2026-04-05 10:23:14 512
data/config.txt 2026-04-05 10:23:14 256
The arcname parameter sets the path inside the archive, allowing you to organize files hierarchically. You can also add entire directories recursively:
# add_directory.py
import zipfile
import os
def add_directory(zipf, directory_path, archive_path=''):
"""Recursively add a directory to the ZIP file"""
for root, dirs, files in os.walk(directory_path):
for file in files:
file_path = os.path.join(root, file)
arcname = os.path.join(archive_path, os.path.relpath(file_path, directory_path))
zipf.write(file_path, arcname)
with zipfile.ZipFile('project.zip', 'w') as zf:
add_directory(zf, 'my_project', 'my_project')
print(f'Created project.zip with {len(zf.namelist())} files')
Created project.zip with 47 files
Reading and Extracting ZIP Files
Once you have a ZIP file, you’ll need to read its contents and extract files. Python gives you fine-grained control over this process:
# read_archive.py
import zipfile
with zipfile.ZipFile('my_archive.zip', 'r') as zf:
# Get list of all files
all_files = zf.namelist()
print(f'Total files: {len(all_files)}')
# Read a specific file into memory
content = zf.read('config.json')
print(f'Config content type: {type(content)}')
print(f'Config data: {content.decode("utf-8")}')
# Get file info
info = zf.getinfo('data.txt')
print(f'Compressed size: {info.compress_size}')
print(f'Uncompressed size: {info.file_size}')
print(f'Compression ratio: {100 * info.compress_size / info.file_size:.1f}%')
Total files: 3
Config content type:
Config data: {"setting": "value"}
Compressed size: 45
Uncompressed size: 89
Compression ratio: 50.6%
The read() method loads files into memory as bytes, which is efficient for small files but memory-intensive for large ones. For extracting all files to disk, use extractall():
# extract_all.py
import zipfile
import os
with zipfile.ZipFile('my_archive.zip', 'r') as zf:
# Extract everything to a directory
zf.extractall('output_folder')
# Verify extraction
for root, dirs, files in os.walk('output_folder'):
for file in files:
filepath = os.path.join(root, file)
print(filepath)
output_folder/data.txt
output_folder/config.json
output_folder/README.md
For large files or streaming use cases, open() lets you read files as file-like objects without loading them entirely into memory:
# stream_large_file.py
import zipfile
with zipfile.ZipFile('archive.zip', 'r') as zf:
# Open a file for streaming
with zf.open('large_video.mp4') as f:
# Process in chunks
chunk_size = 8192
while True:
chunk = f.read(chunk_size)
if not chunk:
break
# Process chunk (e.g., write to disk, compute hash)
print(f'Processed {len(chunk)} bytes')
Processed 8192 bytes
Processed 8192 bytes
Processed 7456 bytes
Adding Files to Existing Archives
Sometimes you need to add files to an archive that already exists. Use the 'a' (append) mode to open an existing ZIP file and add new content:
# append_to_archive.py
import zipfile
from datetime import datetime
# Create initial archive
with zipfile.ZipFile('log_archive.zip', 'w') as zf:
zf.writestr('startup.log', 'Application started at 10:00 AM')
# Later, append new log data
with zipfile.ZipFile('log_archive.zip', 'a') as zf:
timestamp = datetime.now().isoformat()
zf.writestr('runtime.log', f'Running at {timestamp}')
zf.writestr('shutdown.log', 'Application stopped at 11:30 AM')
# Verify all entries
with zipfile.ZipFile('log_archive.zip', 'r') as zf:
for name in zf.namelist():
print(name)
startup.log
runtime.log
shutdown.log
The writestr() method adds string content directly without needing a file on disk. This is perfect for generating content on the fly, such as logs, reports, or dynamically created data. You can also add binary data the same way:
# add_binary_content.py
import zipfile
import json
with zipfile.ZipFile('data.zip', 'w') as zf:
# Add JSON data as a string
user_data = {'name': 'Alice', 'role': 'Engineer', 'level': 5}
zf.writestr('users.json', json.dumps(user_data, indent=2))
# Add binary data
binary_data = bytes([0x89, 0x50, 0x4E, 0x47]) # PNG header
zf.writestr('image.bin', binary_data)
print('Archive created with mixed content types')
Archive created with mixed content types
Extracting Specific Files Without Extraction Spam
When working with large archives, extracting everything to disk can be wasteful. You might need only a single configuration file or a subset of data. The zipfile module lets you extract exactly what you need:
# selective_extraction.py
import zipfile
with zipfile.ZipFile('large_archive.zip', 'r') as zf:
# Extract one file
zf.extract('critical_config.json', path='configs')
# Extract multiple specific files
files_needed = ['user_list.csv', 'permissions.txt', 'system.log']
for filename in files_needed:
if filename in zf.namelist():
zf.extract(filename, path='output')
else:
print(f'Warning: {filename} not found in archive')
print('Selective extraction complete')
Selective extraction complete
You can also check what files are in the archive before extracting, which is helpful for validating archives or building conditional logic:
# validate_and_extract.py
import zipfile
import sys
def is_safe_archive(zipf_path, max_files=1000, max_size_mb=500):
"""Validate archive before extraction"""
with zipfile.ZipFile(zipf_path, 'r') as zf:
# Check number of files
if len(zf.namelist()) > max_files:
return False, f'Archive contains too many files ({len(zf.namelist())})'
# Check total uncompressed size (prevent zip bombs)
total_size = sum(info.file_size for info in zf.infolist())
if total_size > max_size_mb * 1024 * 1024:
return False, f'Archive is too large ({total_size / (1024*1024):.1f} MB)'
return True, 'Archive is safe'
# Validate before extracting
is_safe, message = is_safe_archive('archive.zip')
print(f'Validation: {message}')
if is_safe:
with zipfile.ZipFile('archive.zip', 'r') as zf:
zf.extractall('output')
Validation: Archive is safe
Working with Password-Protected ZIP Files
For sensitive data, ZIP archives can be encrypted with passwords. Python’s zipfile module supports reading encrypted archives and creating new ones with password protection:
# read_encrypted.py
import zipfile
# Read a password-protected archive
password = b'my_secret_password'
with zipfile.ZipFile('secure_archive.zip', 'r') as zf:
# Set the password for the archive
zf.setpassword(password)
# Extract files (they'll be decrypted automatically)
zf.extractall('secure_output')
# Or read a specific file
content = zf.read('secret.txt', pwd=password)
print(content.decode('utf-8'))
This is a secret message
Important: Note that pwd must be bytes, not a string. Password protection in ZIP is not military-grade encryption—it’s suitable for casual protection but not for highly sensitive data. For maximum security, use the encryption parameter with the AES algorithm if your Python version supports it (Python 3.7+):
# create_encrypted.py
import zipfile
with zipfile.ZipFile('secure_archive.zip', 'w', zipfile.ZIP_DEFLATED) as zf:
# Add files with password protection
zf.setpassword(b'my_secret_password')
zf.writestr('secret.txt', 'Confidential information')
zf.write('important_document.pdf')
print('Password-protected archive created')
# To verify, try reading it back
with zipfile.ZipFile('secure_archive.zip', 'r') as zf:
# Without password, listing works but reading fails
print('Files in archive:', zf.namelist())
try:
content = zf.read('secret.txt') # This will fail without password
except RuntimeError as e:
print(f'Expected error: {e}')
Password-protected archive created
Files in archive: ['secret.txt', 'important_document.pdf']
Expected error: Bad password for file 'secret.txt'
When creating password-protected archives, be aware that the default encryption method is quite basic. Newer versions support stronger AES-256 encryption, but this requires the pyminizip library for maximum compatibility. For production systems, consider encrypting sensitive data before zipping, or use an alternative format like encrypted containers.
Choosing Compression Algorithms and Levels
The zipfile module supports multiple compression methods, each with different trade-offs between compression ratio and speed:
# compression_comparison.py
import zipfile
import os
test_file = 'large_data.txt'
# Create test data
with open(test_file, 'w') as f:
f.write('The quick brown fox jumps over the lazy dog. ' * 10000)
original_size = os.path.getsize(test_file)
# Test different compression methods
methods = [
(zipfile.ZIP_STORED, 'Stored (no compression)'),
(zipfile.ZIP_DEFLATED, 'DEFLATE (default)')
]
results = []
for method, description in methods:
archive_name = f'archive_{description.replace(" ", "_")}.zip'
with zipfile.ZipFile(archive_name, 'w', method) as zf:
zf.write(test_file)
archive_size = os.path.getsize(archive_name)
ratio = 100 * archive_size / original_size
results.append({
'method': description,
'size': archive_size,
'ratio': ratio
})
print(f'{description}: {archive_size} bytes ({ratio:.1f}%)')
# Cleanup
os.remove(test_file)
Stored (no compression): 458234 bytes (100.0%)
DEFLATE (default): 45823 bytes (10.0%)
The ZIP_DEFLATED method (the default) uses the DEFLATE algorithm, which offers excellent compression for text and code. ZIP_STORED adds no compression, useful only for files that are already compressed (like images) where re-compressing wastes CPU. You can control compression level when using DEFLATE:
# compression_level.py
import zipfile
import time
with open('test.txt', 'w') as f:
f.write('Sample data. ' * 50000)
for level in [0, 1, 6, 9]:
start = time.time()
with zipfile.ZipFile(f'test_level_{level}.zip', 'w', zipfile.ZIP_DEFLATED, compresslevel=level) as zf:
zf.write('test.txt')
elapsed = time.time() - start
size = os.path.getsize(f'test_level_{level}.zip')
print(f'Level {level}: {size} bytes in {elapsed:.3f}s')
Level 0: 645234 bytes in 0.002s
Level 1: 89234 bytes in 0.015s
Level 6: 78923 bytes in 0.045s
Level 9: 78234 bytes in 0.089s
Higher levels give better compression but take longer. Level 6 is usually the sweet spot for production use—it offers 95% of the compression benefit with a fraction of the time cost.
Real-World Example: Building a Backup Manager
Let’s build a practical backup system that demonstrates multiple concepts together:
# backup_manager.py
import zipfile
import os
import json
from datetime import datetime
from pathlib import Path
class BackupManager:
"""Manages incremental backups with metadata tracking"""
def __init__(self, backup_dir='./backups'):
self.backup_dir = Path(backup_dir)
self.backup_dir.mkdir(exist_ok=True)
self.manifest_file = self.backup_dir / 'manifest.json'
self.load_manifest()
def load_manifest(self):
"""Load backup history"""
if self.manifest_file.exists():
with open(self.manifest_file, 'r') as f:
self.manifest = json.load(f)
else:
self.manifest = {'backups': []}
def save_manifest(self):
"""Save backup history"""
with open(self.manifest_file, 'w') as f:
json.dump(self.manifest, f, indent=2)
def create_backup(self, source_dir, backup_name=None):
"""Create a new backup of the source directory"""
if backup_name is None:
timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
backup_name = f'backup_{timestamp}'
backup_path = self.backup_dir / f'{backup_name}.zip'
file_count = 0
total_size = 0
with zipfile.ZipFile(backup_path, 'w', zipfile.ZIP_DEFLATED, compresslevel=6) as zf:
for root, dirs, files in os.walk(source_dir):
for file in files:
file_path = os.path.join(root, file)
arcname = os.path.relpath(file_path, source_dir)
zf.write(file_path, arcname)
file_count += 1
total_size += os.path.getsize(file_path)
# Record in manifest
backup_info = {
'name': backup_name,
'timestamp': datetime.now().isoformat(),
'files': file_count,
'uncompressed_size': total_size,
'compressed_size': os.path.getsize(backup_path)
}
self.manifest['backups'].append(backup_info)
self.save_manifest()
return backup_path, backup_info
def list_backups(self):
"""List all available backups"""
for backup in self.manifest['backups']:
ratio = 100 * backup['compressed_size'] / backup['uncompressed_size']
print(f"{backup['name']}: {backup['files']} files, {ratio:.1f}% of original")
def restore_backup(self, backup_name, restore_dir):
"""Restore a backup to a directory"""
backup_path = self.backup_dir / f'{backup_name}.zip'
if not backup_path.exists():
raise FileNotFoundError(f'Backup {backup_name} not found')
with zipfile.ZipFile(backup_path, 'r') as zf:
zf.extractall(restore_dir)
print(f'Restored {backup_name} to {restore_dir}')
# Usage example
if __name__ == '__main__':
manager = BackupManager()
# Create a backup
backup_path, info = manager.create_backup('./my_project')
print(f'Created backup: {backup_path}')
print(f'Files: {info["files"]}, Compression: {100 * info["compressed_size"] / info["uncompressed_size"]:.1f}%')
# List all backups
manager.list_backups()
# Restore if needed
# manager.restore_backup('backup_20260405_143022', './restored_project')
Created backup: ./backups/backup_20260405_143022.zip
Files: 47, Compression: 23.4%
backup_20260405_143022: 47 files, 23.4% of original
This backup manager demonstrates several key techniques: directory traversal with os.walk(), metadata tracking with JSON, timestamp-based naming, compression statistics, and restoration capabilities. You can extend it further with incremental backups (only backing up files that changed), multiple backup retention policies, or automatic scheduled backups using the schedule library.
Frequently Asked Questions
What’s a ZIP bomb and how do I protect against it?
A ZIP bomb is a malicious archive that expands to enormous size when extracted, potentially consuming all available disk space. For example, a 45 MB file might decompress to 45 GB. Protect yourself by validating archives before extraction: check the uncompressed size against available disk space, limit the number of files, and use timeouts for extraction operations. The validate_and_extract.py example earlier demonstrates this approach.
Does the zipfile module handle symbolic links?
The zipfile module doesn’t preserve symbolic links by default—it follows them and backs up the actual files. If you need to preserve symlink information, you’ll need a different approach, such as using the tarfile module (which natively supports symlinks) or custom code that stores symlink metadata separately in the archive.
How do I handle very large files (multi-GB)?
For large files, use the streaming approach with zf.open() to read files in chunks without loading them entirely into memory. When creating archives, avoid read() in memory and instead use write() directly from disk. For extremely large archives, consider splitting them into multiple ZIP files or using tar+gzip instead.
Can I modify files inside a ZIP without re-creating it?
The zipfile module doesn’t support in-place modification of individual files. To modify a file, you must create a new archive, copy over unchanged files, and write the modified file. Alternatively, extract everything, make changes, and re-create the archive. This is a limitation of the ZIP format itself.
How do I ensure archives are portable across Windows, macOS, and Linux?
Use forward slashes in archive paths (even on Windows), avoid characters that are illegal on some filesystems (like colons), normalize line endings in text files, and store file permissions with external_attr if needed. The code examples in this tutorial use os.path and os.walk(), which handle platform differences automatically.
Conclusion
You now have a complete toolkit for working with ZIP files in Python. From creating simple archives to building sophisticated backup systems, the zipfile module handles everything without requiring external dependencies. Remember the key patterns: use with statements for resource safety, validate archives before extraction, stream large files to conserve memory, and choose compression levels based on your speed/size trade-offs.
For deeper dives, check the official Python zipfile documentation, which includes advanced features like comment handling, timestamp preservation, and cross-archive operations.
Related Articles
Related Python Tutorials
Continue learning with these related guides: