Intermediate
Python’s built-in sorting handles strings lexicographically — which means “file10.txt” sorts before “file2.txt” because “1” comes before “2” in ASCII. This is fine for computers, but deeply frustrating for anyone who has to look at a sorted file listing. Natural sort order — the order humans expect when they see numbers inside strings — is what natsort provides. It is the difference between a file manager that makes sense and one that scrambles your numbered files.
The natsort library is a pure-Python solution that detects numeric substrings inside strings and compares them as numbers, not as character sequences. It handles filenames, version strings, IP addresses, biological sequences, and locale-aware sorting with a clean, minimal API. No need to write custom sort key functions for every new case.
In this article, you will learn how natsort works, how to use natsorted() for common cases, sort paths and version numbers, use natural sort keys with sort() and sorted(), handle case-insensitive and locale-aware sorting, and build a practical file organizer that uses natural sort to group and display files. Install natsort with pip install natsort.
Natural Sorting: Quick Example
# natsort_quick.py
from natsort import natsorted
files = ['file10.txt', 'file2.txt', 'file1.txt', 'file20.txt', 'file3.txt']
# Default Python sort -- lexicographic (wrong order for humans)
print("Default sort:", sorted(files))
# Natural sort -- numbers compared as integers
print("Natural sort:", natsorted(files))
Output:
Default sort: ['file1.txt', 'file10.txt', 'file2.txt', 'file20.txt', 'file3.txt']
Natural sort: ['file1.txt', 'file2.txt', 'file3.txt', 'file10.txt', 'file20.txt']
That one import and one function call is all it takes to fix the ordering. natsorted() returns a new list, just like sorted(). The rest of this article covers the full range of what natsort can do for you beyond basic filename sorting.
What Is Natural Sort Order?
Natural sort order is a collation algorithm that treats numeric substrings within strings as actual numbers when comparing. Instead of comparing character by character (‘2’ vs ‘1’ vs ‘0’), it identifies contiguous digit sequences, converts them to integers, and compares those integers. This produces an ordering that matches human intuition.
| Input | Lexicographic (default) | Natural sort (natsort) |
|---|---|---|
| file1, file2, file10 | file1, file10, file2 | file1, file2, file10 |
| v1.2, v1.10, v1.9 | v1.10, v1.2, v1.9 | v1.2, v1.9, v1.10 |
| Chapter 1, Chapter 2, Chapter 10 | Chapter 1, Chapter 10, Chapter 2 | Chapter 1, Chapter 2, Chapter 10 |
| img_001.jpg, img_010.jpg, img_100.jpg | Correct (leading zeros) | Correct (numeric value) |
The key insight: lexicographic sorting is correct for most string data (names, words, identifiers), but breaks down whenever numeric sequences appear within strings. natsort detects these sequences and applies numeric comparison only where it is appropriate, leaving the rest of the string to standard character comparison.
Using natsorted() for Common Cases
The main entry point is natsorted(), which accepts any iterable and optional keyword arguments to control sorting behavior:
# natsort_basics.py
from natsort import natsorted, ns
# Version strings
versions = ['1.10.2', '1.9.0', '2.0.1', '1.2.3', '1.10.0']
print("Versions:", natsorted(versions))
# Mixed strings with numbers and text
items = ['item_5', 'item_12', 'item_3', 'item_10', 'item_1']
print("Items:", natsorted(items))
# Case-insensitive natural sort
mixed_case = ['File10.txt', 'file2.TXT', 'FILE1.txt', 'File20.txt']
print("Case-insensitive:", natsorted(mixed_case, alg=ns.IGNORECASE))
# Reverse natural sort
print("Reversed:", natsorted(versions, reverse=True))
# Sort with a key function (e.g., sort tuples by second element)
data = [('b', 'v10'), ('a', 'v2'), ('c', 'v1'), ('d', 'v20')]
print("By version:", natsorted(data, key=lambda x: x[1]))
Output:
Versions: ['1.2.3', '1.9.0', '1.10.0', '1.10.2', '2.0.1']
Items: ['item_1', 'item_3', 'item_5', 'item_10', 'item_12']
Case-insensitive: ['FILE1.txt', 'file2.TXT', 'File10.txt', 'File20.txt']
Reversed: ['2.0.1', '1.10.2', '1.10.0', '1.9.0', '1.2.3']
By version: [('c', 'v1'), ('a', 'v2'), ('b', 'v10'), ('d', 'v20')]
Sorting File Paths
File path sorting has an extra wrinkle: you usually want to sort directory components separately from filenames, and you want numbers within both components to sort naturally. natsort’s PATH5�code> algorithm handles this correctly:
# natsort_paths.py
from natsort import natsorted, ns
from pathlib import Path
# Simulate a file listing with numbered paths
paths = [
'/data/project10/output2.csv',
'/data/project2/output10.csv',
'/data/project1/output1.csv',
'/data/project2/output2.csv',
'/data/project10/output1.csv',
]
# PATH algorithm: sorts directory components and filenames separately
sorted_paths = natsorted(paths, alg=ns.PATH)
for p in sorted_paths:
print(p)
Output:
/data/project1/output1.csv
/data/project2/output2.csv
/data/project2/output10.csv
/data/project10/output1.csv
/data/project10/output2.csv
Without ns.PATH, the slash characters in paths can interfere with the numeric detection. The PATH algorithm splits on path separators first, then applies natural sort within each component -- giving you the result a file manager would show.
Using natsort_keygen() with sort()
If you need to sort an existing list in place (rather than getting a new sorted list), use natsort_keygen() to get a key function you can pass to list.sort():
# natsort_key.py
from natsort import natsort_keygen, ns
# Sort a list in place
logs = ['error_10.log', 'error_2.log', 'error_1.log', 'error_100.log']
key = natsort_keygen()
logs.sort(key=key)
print("Sorted logs:", logs)
# Use as a sort key in more complex operations
students = [
{'name': 'Alice', 'grade': 'Grade 10'},
{'name': 'Bob', 'grade': 'Grade 2'},
{'name': 'Carol', 'grade': 'Grade 1'},
{'name': 'Dave', 'grade': 'Grade 20'},
]
grade_key = natsort_keygen(key=lambda s: s['grade'])
students.sort(key=grade_key)
for s in students:
print(f" {s['grade']}: {s['name']}")
Output:
Sorted logs: ['error_1.log', 'error_2.log', 'error_10.log', 'error_100.log']
Grade 1: Carol
Grade 2: Bob
Grade 10: Alice
Grade 20: Dave
Real-Life Example: File Report Generator
This script scans a directory, groups files by type, sorts them naturally within each group, and produces a clean report with file sizes:
# file_report.py
from pathlib import Path
from natsort import natsorted, ns
from collections import defaultdict
def format_size(bytes_count: int) -> str:
"""Format bytes as human-readable size."""
for unit in ['B', 'KB', 'MB', 'GB']:
if bytes_count < 1024:
return f"{bytes_count:.1f} {unit}"
bytes_count /= 1024
return f"{bytes_count:.1f} TB"
def generate_file_report(folder: str) -> None:
base = Path(folder)
if not base.exists():
print(f"Folder not found: {folder}")
return
# Group files by extension
groups = defaultdict(list)
for path in base.rglob('*'):
if path.is_file():
ext = path.suffix.lower() or '.no_ext'
groups[ext].append(path)
# Sort extension groups naturally, then files within each group
sorted_exts = natsorted(groups.keys())
total_files = 0
total_size = 0
for ext in sorted_exts:
files = natsorted(groups[ext], key=lambda p: p.name, alg=ns.PATH)
ext_size = sum(f.stat().st_size for f in files)
print(f"\n{ext} ({len(files)} files, {format_size(ext_size)})")
print("-" * 40)
for f in files[:5]: # Show first 5 to keep output manageable
size = f.stat().st_size
rel = f.relative_to(base)
print(f" {str(rel):<35} {format_size(size):>8}")
if len(files) > 5:
print(f" ... and {len(files) - 5} more")
total_files += len(files)
total_size += ext_size
print(f"\nTotal: {total_files} files, {format_size(total_size)}")
generate_file_report('./my_project')
Output:
.csv (3 files, 48.2 KB)
----------------------------------------
data/output1.csv 12.1 KB
data/output2.csv 18.4 KB
data/output10.csv 17.7 KB
.py (5 files, 22.8 KB)
----------------------------------------
src/module1.py 4.2 KB
src/module2.py 5.1 KB
src/module10.py 6.8 KB
src/module11.py 3.9 KB
src/utils.py 2.8 KB
Total: 8 files, 71.0 KB
Frequently Asked Questions
Does natsort handle floating-point numbers in strings?
Yes -- use the ns.FLOAT algorithm flag: natsorted(items, alg=ns.FLOAT). This treats sequences like 3.14 and 2.71 as floating-point numbers rather than two separate integer sequences. You can combine flags: ns.FLOAT | ns.IGNORECASE. For version strings specifically, ns.VERSION (or ns.V) is a dedicated shortcut that handles dotted version numbers correctly.
How do I sort with locale-aware character ordering?
Use natsorted(items, alg=ns.LOCALE) after calling locale.setlocale(locale.LC_ALL, '') to set the system locale. This sorts accented characters (like e, a, o in European languages) according to the locale's collation rules rather than Unicode code point order. This is important for sorting names in languages where a and a are considered equal or adjacent in the alphabet.
Can I use natsort with pandas DataFrames?
Yes. Use natsort_keygen() as the key parameter in pandas sort_values(): df.sort_values('column', key=lambda s: s.map(natsort_keygen())). For sorting DataFrame row labels (index), use df.reindex(natsorted(df.index)). The natsort documentation has a dedicated pandas section with more examples for multi-column natural sorting.
Does natsort handle negative numbers?
By default, natsort does not treat a leading minus sign as part of a number -- "file-10.txt" sorts as the string "file-" followed by natural sort on "10.txt". To enable signed number handling, use ns.SIGNED: natsorted(items, alg=ns.SIGNED). This is useful for data with negative values in filenames or log entries, but can produce unexpected results if hyphens are used as separators in non-numeric contexts.
Is natsort fast enough for large lists?
natsort is implemented in pure Python with optional C acceleration via the fastnumbers package. Install it with pip install fastnumbers to get 3-5x speedup on numeric-heavy data. For most real-world use cases (thousands of filenames, hundreds of version strings), natsort's performance is more than adequate. If you are sorting millions of strings, profile first -- the Python overhead may matter more than the sorting algorithm itself.
Conclusion
You now know how to use natsort to make string sorting behave the way humans expect. We covered natsorted() for common cases, the PATH5�code> algorithm for file system paths, natsort_keygen() for in-place sorting and complex key functions, case-insensitive and locale-aware sorting with ns flags, and a practical file report generator that groups and naturally sorts files by extension.
The key takeaway is simple: any time you are sorting strings that contain numbers -- filenames, version strings, log entries, chapter titles, IP addresses -- reach for natsort instead of writing a custom key function. It handles the edge cases (floats, negative numbers, leading zeros, locale) that custom one-liners miss.
See the natsort documentation for the full list of algorithm flags, pandas integration examples, and performance tips.