Beginner
The easiest and simplest mechanism to store data from python is the humble file storage which is often, but does not have to be, text based. There are no libraries that you require, and you can use native python functions to open and write to the file very easily.
There are many use cases for file storage and is usually the “go to” method when hacking a quick solution or prototype together. These are also arguably good solutions for production use cases.
Overview of using storing data to files in Python
The typical use cases has the following commonalities:
- Setup: There’s no setup that is required for files. You can create the file even from python
- Volume: Size Small-ish file size (< 5-10mb). You can go larger of course if your application is not doing heavy reads or writes nor if it doesn’t require fast response (e.g. batch processing)
- Record access: Does not require to search data within the file to extract just portion of the records. You would load or save all the data in the file in one go
- Data Writes: You can either append to the file or you can upload and download all data in the file.
- Write reliability: You do not need to have multiple writes at the same time – there is only possibility (or likelihood) of one person writing at one time, and if there was a case of multiple people writing at once, the consequence are not serious for your application. There are ways to put a lock on a file to prevent conflicts, but you should double check if a file is the write option for you
- Data formats: You may have structured record based (such as comma separated value – CSV or tab delimited) or unstructured (eg document of text or JSON format). You can also store binary data in a file as well – e.g. for images
- Editability: You may want or allow direct editing of the file by other applications or direct editing by people
- Redundancy: There’s no inbuilt redundancy. If there is any failure (data corrupt, the server with the file fails), then you’re out of luck. You need to setup your own mechanisms (e.g. replicate file to another server automatically)
Code examples to read and write to a file
Here are two sets of example code for writing and reading from a file. It is very easy and does not require any libraries. The one thing to be mindful of is what mode you want the file to be opened- read, write, read and write.
Open a text file for (over)writing:
To write to a file, it’s very easy to do so which is to use the ‘w’ switch on the open() function. There are other options as well:
‘r’– Reading‘w’– Writing to a file‘a’– Append to end of file‘r+’– Read and write to the same file‘x’– Used to create and write to a new file
file = open( ‘population.txt’, ‘w’)
file.write(‘Japan’)
file.write(‘United States’)
file.write(‘Australia’)
file.write(‘China’)
file.close() #file is released and closed
You will then have the following output file of population.txt:
Japan
United States
Australia
China
Open a text file fully for reading:
Using the same population.txt file created above –
file = open( ‘population.txt’, ‘r’)
data = file.read() #read full contents of file into a single string
file.close() #file is released and closed
print(“*** file start ***”)
print( data )
print(“*** end file ***”)
The output would be:
*** file start ***
Japan
United States
Australia
China
*** end file ***
Now to explain this a bit further, the open() command helps to open a file where you need to specify how the file is to be opened – in this case with ‘r’ to indicate it is for reading. There are other options as well:
‘r’– Reading‘w’– Writing to a file‘a’– Append to end of file‘r+’– Read and write to the same file‘x’– Used to create and write to a new file
Read a text file line by line:
file = open( ‘population.txt’, ‘r’)
data_list = file.readlines() #read full contents of file into a list of rows
file.close() #file is released and closed
print(“*** file start ***”)
counter = 0
for row in data_list:
counter = counter + 1
print( f”{counter}: {data_list}” )
print(“*** end file ***”)
The output would be:
*** file start ***
1: Japan
2: United States
3: Australia
4: China
*** end file ***
The difference in above to the first example is that the data comes out in a list separated by a newline so that you can process each row. Please note, you can simplify the above using the enumerate to avoid having the separate counter variable setup. E.g.
print(“*** file start ***”)
for index, row in enumerate(data_list):
print( f”{index+1}: {data_list}” ) #Note that when using enumerate, first index is 0
print(“*** end file ***”)
Summary of writing and reading to a file
Reading and writing to a file is a very straightforward native operation in Python. There are many other related operations that you can do ranging from putting a lock on a file to prevent two processes writing to the same file, checking file attributes such as access and size, and many other operations. At the most basic though, you can simply use the “open” statement to do the read/write to satisfy most of your needs.
How To Use Python orjson for Fast JSON Processing
Intermediate
You have a Python service that parses JSON responses from an API thousands of times per second, and the standard json module is quietly becoming a bottleneck. At low traffic volumes this goes unnoticed, but once you scale up, milliseconds of serialization overhead compound into real latency. If you have ever profiled a Python web service and found json.dumps or json.loads sitting near the top of the flame graph, you already know this pain.
orjson is a fast, correct JSON library for Python written in Rust. It drops into nearly any codebase as a replacement for the standard json module and typically runs 2-10x faster on both serialization and deserialization. It also natively supports types the standard library forces you to handle manually — datetime, UUID, numpy arrays, and dataclasses.
In this article you will learn how to install orjson, serialize and deserialize JSON with it, use its built-in support for Python-native types, benchmark it against the standard library, and integrate it into a real-world FastAPI project. By the end you will have a working understanding of when and why to choose orjson over the alternatives.
orjson Quick Example
Before diving deep, here is a self-contained example that shows the core pattern. orjson is nearly a drop-in replacement for the standard json module, but returns and accepts bytes instead of str.
# quick_example.py
import orjson
from datetime import datetime
data = {
"name": "Alice",
"score": 98.6,
"logged_in": True,
"joined": datetime(2024, 3, 15, 9, 30, 0),
"tags": ["python", "backend","fast"]
}
# Serialize to bytes (not str like the standard json module)
encoded = orjson.dumps(data)
print(encoded)
print(type(encoded))
# Deserialize back to a Python dict
decoded = orjson.loads(encoded)
print(decoded["joined"]) # datetime is serialized as ISO 8601 string
print(type(decoded))
Output:
b'{"name":"Alice","score":98.6,"logged_in":true,"joined":"2024-03-15T09:30:00","tags":["python","backend","fast"]}'
<class 'bytes'>
2024-03-15T09:30:00
<class 'dict'>
Two things stand out right away. First, orjson.dumps() returns bytes, not a string — this is intentional and saves an unnecessary encoding step when writing to network sockets or files. Second, the datetime object is automatically serialized to ISO 8601 format without any extra work, which the standard json module would refuse to handle at all.
What Is orjson and Why Use It?
orjson is a Python JSON library implemented in Rust using the Serde framework. It was created specifically to address the performance limitations of Python’s built-in json module, which is implemented in C but still shows its age when processing large payloads at high throughput.
The key differences between orjson and the standard library are:
| Feature | Standard json | orjson |
|---|---|---|
| Output type of dumps() | str | bytes |
| datetime support | Raises TypeError | Native ISO 8601 |
| UUID support | Raises TypeError | Native string |
| dataclass support | Raises TypeError | Native dict-like |
| numpy array support | Not supported | Native (optional dep) |
| Performance (typical) | Baseline | 2-10x faster |
| Strict UTF-8 validation | No | Yes |
The Rust implementation takes advantage of SIMD instructions and a highly optimized Serde-based serialization pipeline. For applications doing heavy JSON processing — API gateways, caching layers, log aggregators — the improvement is measurable and often significant.
Installing orjson
orjson is available on PyPI and installs with a single command:
# install_orjson.sh
pip install orjson
Output:
Collecting orjson
Downloading orjson-3.10.x-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (144 kB)
Successfully installed orjson-3.10.x
orjson ships as a pre-compiled binary for most platforms (Linux, macOS, Windows on x86-64 and ARM), so there is no Rust toolchain required. If you are on a less common platform you may need Rust installed to build from source. Verify the installation with a quick import check:
# verify_install.py
import orjson
print(orjson.__version__)
Output:
3.10.x
Serializing Python Objects with orjson.dumps()
The orjson.dumps() function converts Python objects to JSON bytes. The most important thing to remember is that it always returns bytes, not str. If you need a string, call .decode() on the result.
# serialization_basics.py
import orjson
from datetime import datetime, date
from uuid import UUID
from dataclasses import dataclass
@dataclass
class User:
id: UUID
name: str
created: datetime
active: bool
user = User(
id=UUID("12345678-1234-5678-1234-567812345678"),
name="Bob Smith",
created=datetime(2025, 1, 10, 14, 30),
active=True
)
# Serialize the dataclass directly -- no custom encoder needed
result = orjson.dumps(user)
print(result)
# Decode to string if needed
print(result.decode("utf-8"))
Output:
b'{"id":"12345678-1234-5678-1234-567812345678","name":"Bob Smith","created":"2025-01-10T14:30:00","active":true}'
{"id":"12345678-1234-5678-1234-567812345678","name":"Bob Smith","created":"2025-01-10T14:30:00","active":true}
Notice that the UUID, datetime, and dataclass are all handled automatically with zero configuration. With the standard json module, each of these would raise a TypeError: Object of type X is not JSON serializable error, requiring a custom default function.
orjson Options and Flags
orjson supports serialization options passed via the option parameter as bitwise-OR combinations of constants. These let you control formatting, sorting, and type handling:
# orjson_options.py
import orjson
data = {
"z_key": "last",
"a_key": "first",
"count": 42,
"ratio": 3.14159
}
# Pretty-print with indented output
pretty = orjson.dumps(data, option=orjson.OPT_INDENT_2)
print("Pretty:")
print(pretty.decode())
# Sort keys alphabetically
sorted_output = orjson.dumps(data, option=orjson.OPT_SORT_KEYS)
print("\nSorted keys:")
print(sorted_output.decode())
# Combine options with bitwise OR
both = orjson.dumps(data, option=orjson.OPT_INDENT_2 | orjson.OPT_SORT_KEYS)
print("\nPretty + Sorted:")
print(both.decode())
Output:
Pretty:
{
"z_key": "last",
"a_key": "first",
"count": 42,
"ratio": 3.14159
}
Sorted keys:
{"a_key":"first","count":42,"ratio":3.14159,"z_key":"last"}
Pretty + Sorted:
{
"a_key": "first",
"count": 42,
"ratio": 3.14159,
"z_key": "last"
}
The most useful options in practice are OPT_INDENT_2 for human-readable output during debugging, OPT_SORT_KEYS for deterministic output in tests or caches, OPT_NON_STR_KEYS for dicts with integer or float keys, and OPT_UTC_Z to use Z suffix instead of +00:00 for UTC datetimes.
Deserializing with orjson.loads()
The orjson.loads() function accepts both bytes and str input and returns Python objects. Unlike the standard library, it performs strict UTF-8 validation on input, which means malformed data fails loudly rather than silently corrupting your data.
# deserialization.py
import orjson
# From bytes (most common in API and network scenarios)
json_bytes = b'{"name": "Charlie", "score": 99.5, "tags": ["fast", "correct"]}'
data = orjson.loads(json_bytes)
print(data)
print(type(data["score"]))
# From string also works
json_str = '{"status": "ok", "count": 1000}'
data2 = orjson.loads(json_str)
print(data2)
# Error handling -- orjson raises JSONDecodeError for invalid input
try:
orjson.loads(b'{"broken": }')
except orjson.JSONDecodeError as e:
print(f"Parse error: {e}")
Output:
{'name': 'Charlie', 'score': 99.5, 'tags': ['fast', 'correct']}
<class 'float'>
{'status': 'ok', 'count': 1000}
Parse error: expected value at line 1 column 12
One important detail: orjson.JSONDecodeError is a subclass of json.JSONDecodeError, so any existing except blocks using json.JSONDecodeError will still catch orjson errors without modification. This makes the migration path from the standard library seamless.
Benchmarking orjson vs Standard json
Let us run a concrete benchmark so you can see the actual performance difference on your hardware. We test serializing and deserializing a moderately complex nested dictionary 100,000 times:
# benchmark_orjson.py
import json
import orjson
import time
from datetime import datetime
# Test data -- similar to a typical API response
sample_data = {
"users": [
{"id": i, "name": f"User{i}", "email": f"user{i}@example.com",
"score": i * 1.5, "active": i % 2 == 0, "tags": ["python", "backend"]}
for i in range(50)
],
"total": 50,
"page": 1
}
ITERATIONS = 100_000
# Benchmark json.dumps
start = time.perf_counter()
for _ in range(ITERATIONS):
json.dumps(sample_data)
json_dumps_time = time.perf_counter() - start
# Benchmark orjson.dumps (returns bytes)
start = time.perf_counter()
for _ in range(ITERATIONS):
orjson.dumps(sample_data)
orjson_dumps_time = time.perf_counter() - start
# Benchmark json.loads
json_str = json.dumps(sample_data)
start = time.perf_counter()
for _ in range(ITERATIONS):
json.loads(json_str)
json_loads_time = time.perf_counter() - start
# Benchmark orjson.loads
orjson_bytes = orjson.dumps(sample_data)
start = time.perf_counter()
for _ in range(ITERATIONS):
orjson.loads(orjson_bytes)
orjson_loads_time = time.perf_counter() - start
print(f"json.dumps: {json_dumps_time:.3f}s")
print(f"orjson.dumps: {orjson_dumps_time:.3f}s ({json_dumps_time/orjson_dumps_time:.1f}x faster)")
print(f"json.loads: {json_loads_time:.3f}s")
print(f"orjson.loads: {orjson_loads_time:.3f}s ({json_loads_time/orjson_loads_time:.1f}x faster)")
Output (typical results on a modern CPU):
json.dumps: 2.841s
orjson.dumps: 0.482s (5.9x faster)
json.loads: 2.103s
orjson.loads: 0.631s (3.3x faster)
Actual speedups vary based on payload size, nesting depth, and hardware, but 3-6x faster on both operations is typical. For a service handling 1,000 requests per second with 100KB payloads each, this translates to substantial CPU savings that compound at scale.
Real-Life Example: FastAPI Response Caching with orjson
Here is a practical example that integrates orjson into a FastAPI application. We use orjson for both serializing API responses and caching them in memory, demonstrating a common production pattern:
# fastapi_orjson_cache.py
"""
FastAPI app with orjson-powered response serialization and in-memory caching.
Run with: uvicorn fastapi_orjson_cache:app --reload
"""
import orjson
from fastapi import FastAPI
from fastapi.responses import Response
from datetime import datetime, timezone
from dataclasses import dataclass, field
from typing import Optional
import hashlib
app = FastAPI()
# Simple in-memory cache using orjson bytes as values
_cache: dict[str, bytes] = {}
@dataclass
class ProductRecord:
id: int
name: str
price: float
in_stock: bool
last_updated: datetime
tags: list[str] = field(default_factory=list)
def get_product_from_db(product_id: int) -> Optional[ProductRecord]:
"""Simulates a database lookup."""
if product_id > 100:
return None
return ProductRecord(
id=product_id,
name=f"Product {product_id}",
price=round(product_id * 9.99, 2),
in_stock=product_id % 3 != 0,
last_updated=datetime.now(timezone.utc),
tags=["electronics", "featured"] if product_id < 50 else ["clearance"]
)
@app.get("/products/{product_id}")
async def get_product(product_id: int):
cache_key = f"product:{product_id}"
# Check cache first
if cache_key in _cache:
# Return cached bytes directly -- no re-serialization needed
return Response(content=_cache[cache_key], media_type="application/json")
product = get_product_from_db(product_id)
if product is None:
error = orjson.dumps({"error": "Product not found", "id": product_id})
return Response(content=error, media_type="application/json", status_code=404)
# Serialize with orjson -- handles dataclass and datetime natively
encoded = orjson.dumps(product, option=orjson.OPT_INDENT_2)
_cache[cache_key] = encoded
return Response(content=encoded, media_type="application/json")
@app.get("/cache/stats")
async def cache_stats():
stats = {
"cached_keys": len(_cache),
"cache_size_bytes": sum(len(v) for v in _cache.values()),
"timestamp": datetime.now(timezone.utc)
}
return Response(content=orjson.dumps(stats), media_type="application/json")
Example curl output:
$ curl http://localhost:8000/products/42
{
"id": 42,
"name": "Product 42",
"price": 419.58,
"in_stock": true,
"last_updated": "2025-03-15T10:22:41.123456+00:00",
"tags": ["electronics", "featured"]
}
The power here is that the serialized bytes are stored in the cache and served directly as the HTTP response body without deserialization or re-serialization. orjson's native datetime handling means the UTC-aware datetime in last_updated is serialized to a full ISO 8601 string with timezone offset -- exactly what frontend clients expect.
Frequently Asked Questions
Why does orjson return bytes instead of str?
orjson returns bytes because JSON data in Python is almost always immediately encoded to bytes for network transport or file writing. Returning bytes directly avoids an extra .encode("utf-8") step. If you need a string, just call result.decode(). This is a deliberate performance decision -- the bytes representation is the final form that gets sent over the wire.
Is orjson a drop-in replacement for the json module?
Almost, but not completely. The function signatures are similar, but orjson.dumps() returns bytes while json.dumps() returns str. Any code that does f.write(json.dumps(data)) will break because you cannot write bytes to a text-mode file. The fix is either f.write(orjson.dumps(data).decode()) or opening the file in binary mode "wb". The default= parameter also works slightly differently in edge cases.
How do I serialize custom types that orjson doesn't support natively?
Use the default parameter with a callback function, just like the standard library. The function receives the object and should return a JSON-serializable value. For example, to serialize a Decimal: orjson.dumps(data, default=lambda x: float(x) if isinstance(x, Decimal) else TypeError). orjson's native type support is broad enough that custom default handlers are rarely needed for modern Python code.
Is orjson thread-safe?
Yes. orjson functions are stateless -- each call to dumps() or loads() is entirely independent. There is no global mutable state, so multiple threads can call orjson simultaneously without any synchronization. This makes it a natural fit for multi-threaded web servers like gunicorm or uvicorn workers.
How does orjson compare to ujson?
Both are faster than the standard library, but orjson is consistently faster than ujson in benchmarks and has better correctness guarantees. ujson has a history of silently dropping or corrupting data in edge cases (very large integers, NaN values, deeply nested structures). orjson prioritizes correctness alongside speed. For production code where data integrity matters, orjson is the better choice.
Conclusion
orjson delivers a simple, high-value upgrade to any Python codebase that does significant JSON processing. The Rust-based implementation provides 3-6x faster serialization and deserialization, native support for datetime, UUID, dataclasses, and numpy arrays, and correct strict UTF-8 validation -- all with an API close enough to the standard library that migration is usually a matter of replacing the import and handling the bytes return type.
Try extending the FastAPI caching example to use Redis as a backend instead of in-memory storage, or add a Cache-Control header to the response based on the product's last_updated timestamp. These are natural next steps that reinforce how orjson fits into production API patterns.
For the full API reference and advanced options like OPT_PASSTHROUGH_DATETIME, see the orjson GitHub repository.
Related Articles
Related Articles
- How To Read and Write JSON Files in Python 3
- How To Use the Logging Module in Python 3
- How To Split And Organise Your Source Code Into Multiple Files in Python 3
Further Reading: For more details, see the Python Input and Output tutorial.
Frequently Asked Questions
How do I read a text file in Python?
Use open('file.txt', 'r') with a with statement: with open('file.txt') as f: content = f.read(). This reads the entire file and automatically closes it. Use f.readlines() to get a list of lines instead.
What is the difference between read(), readline(), and readlines()?
read() returns the entire file as a single string. readline() reads one line at a time. readlines() returns a list of all lines. For large files, iterating with for line in f: is the most memory-efficient approach.
How do I write to a file in Python?
Use open('file.txt', 'w') to write (overwrites existing content) or 'a' to append. Write with f.write('text') or f.writelines(list_of_strings). Always use a with statement to ensure the file is properly closed.
What encoding should I use when reading text files?
Use encoding='utf-8' for most modern text files. UTF-8 handles international characters and is the default on most systems. For legacy files, you may need 'latin-1' or 'cp1252'.
How do I handle file not found errors in Python?
Use a try/except block catching FileNotFoundError. Alternatively, check if the file exists first with pathlib.Path('file.txt').exists() before attempting to read it.