Intermediate
A config file is a flat file but is used for reading and writing of settings that affect the behaviour of your application. These files can be incredibly useful so that you can put individual settings inside the human editable file and then have the settings read from your application. This helps you configure your application in the way you need without having to change the application code.
Typically the config file is edited by a simple text editor by the user, then the application runs and reads the config file. If there are any changes to the config file, normally (depending how the code is written), the application will then have to be restarted to take on the new settings.
Some of the considerations for using a config file as a “data store” includes:
- Setup: There’s no setup that is required for files. You should use one of the config management python libraries that are available to make it easier to manipulate config files.
- Volume: Size Small-ish file size (< 5-10mb)
- Record access: Does not require to search data within the file to extract just a portion of the records. You would load or save all the data in the file in one go
- Data Writes: Applications don’t generally write to a config file, but it can be done. Instead the config file is edited outside in a text editor
- Data formats: Normally the data would be a structured record based (such as comma separated value – CSV or tab delimited), or a more complex structure such as what you see in windows based .INI files or JSON format even
- Editability: You generally want to allow direct editing of the file by users
- Redundancy: There’s no inbuilt redundancy. If there is any failure (data corrupt, the server with the file fails), then you’re out of luck. You need to setup your own mechanisms (e.g. replicate file to another server automatically)
Code examples to read and write from config file using ConfigParse
Setting up a config file is actually not that much harder than simply creating a constants inside your application. Your main decision will be what type of configuration file format you’d like to use as there are quite a few to choose from. Here are some options and samples:
| File type | Example config file |
|---|---|
|
1. Simple text file which is tab-delimited Python Library = noneExample: below |
records_per_page 10 |
|
2. A properties file with key value pair Python Library = None |
#webpage display |
|
3. INI file format Python library: configparser |
[database] |
|
4. JSON file format Python library: json |
{ “records_per_page”:10, “logo_icon”: “/images/company_log.jpg”}
|
Example 1: Simple text file which is tab-delimited
You can see a full article on how to read a text file in our “Storing Data in Files in Python” article. The short version of open a tab delimited file is as follows:
Suppose you have a configuration file as follows where each row has two fields which is separated by a tab:
config_data.txt
records_per_page 10
logo_icon /images/company_log.jpg
You can load the data into a python dictionary like the following:
config = {}
file_handler = open('config_data.txt', 'r')
for rec in file_handler:
config.update( [ tuple( rec.strip().split('\t') ) ] )
file_handler.close()
print(config)
The output will be as follows:
{'records_per_page': '10', 'logo_icon': '/images/company_log.jpg'}
Some explanation may be required on the code though to make it easier to understand. Firstly, the for loop is used to read a record line by line. So each time the for loop iterates, it will read a line into the field rec until the whole file is read.
The following code is a little tricky, but the intent is to take the two columns in the tab delimited file and create a dictionary key value pair.
config.update( [ tuple( rec.strip().split('\t') ) ] )
It works by the following:
- It first removes the newline character from the end of the line (through
rec.strip()) - This will then return a string which is then split with
split()by the a tab characters (denoted by‘\t’) - The result of this is a two filed array which is then created into a tuple format
- The tuple is then put in a list and added to list with the
[]brackets - The dictionary
.update()method is used to finally add they key value pair
Example 2: A properties file with key value pair
If you have a fairly simple configuration needs with just a key-value pair, then a properties type file would work for you where you have <config name> = <config value>. This can be easily loaded as a text file and then the key-value be loaded into a dictionary.
Imagine this was the config file: config_data.txt
#webpage display
records_per_page =10
logo_icon =/images/company_log.jpg
The following code could easily load this configuration:
config = {}
with open('config_data.txt', 'r') as file_hander:
for rec in file_hander:
if rec.startswith('#'): continue
key, value = rec.strip().split('=')
if key: config[key] = value
print( config )
Here the code ignores any comment lines (e.g. the line starts with a ‘#’), and then string-splits the line by the ‘=’ sign. This will then load the dictionary ‘config’
Example 3: INI file format using ConfigParse
You can see a full article on how the ConfigParse library works in our earlier article. The short version is as follows.
Suppose you have a configuration file as follows:
test.ini
[default]
name = development
host = 192.168.1.1
port = 31
username = admin
password = admin
[database]
name = production
host = 144.101.1.1
You can then read the file with the following simple code:
import configparser
config = configparser.ConfigParser()
#Open the file again to try to read it
config.read('test.ini')
print( config['database'][‘name’] ) #This will output ‘production’
print( config['database'][‘port’] ) #This will output ‘31’. As there is no port under
# database the default value will be extracted
Example 4: Reading Config values from a JSON file
With JSON being so popular, this is also another alternative you could use to keep all your config data in. It is very easy to also load.
Assume your config file is as follows: config_data.txt
{
"records_per_page":10,
"logo_icon": "/images/company_log.jpg"
}
Then the following code can be used to bring these into a dictionary:
import json
file_handler = open('config_data.txt', 'r')
config = json.loads( file_handler.read() )
file_handler.close()
print(config)
Where the output would be:
{'records_per_page': 10, 'logo_icon': '/images/company_log.jpg'}
Summary
A config file is a great option if you are looking to store settings for your applications. These are usually loaded at the start of the application and then can be loaded into a dictionary which can then serve as a set of constants which your application can use. This will both avoid the need to hardcode settings and also allow you to change the behaviour of your application without having to touch the code.
How To Use Python orjson for Fast JSON Processing
Intermediate
You have a Python service that parses JSON responses from an API thousands of times per second, and the standard json module is quietly becoming a bottleneck. At low traffic volumes this goes unnoticed, but once you scale up, milliseconds of serialization overhead compound into real latency. If you have ever profiled a Python web service and found json.dumps or json.loads sitting near the top of the flame graph, you already know this pain.
orjson is a fast, correct JSON library for Python written in Rust. It drops into nearly any codebase as a replacement for the standard json module and typically runs 2-10x faster on both serialization and deserialization. It also natively supports types the standard library forces you to handle manually — datetime, UUID, numpy arrays, and dataclasses.
In this article you will learn how to install orjson, serialize and deserialize JSON with it, use its built-in support for Python-native types, benchmark it against the standard library, and integrate it into a real-world FastAPI project. By the end you will have a working understanding of when and why to choose orjson over the alternatives.
orjson Quick Example
Before diving deep, here is a self-contained example that shows the core pattern. orjson is nearly a drop-in replacement for the standard json module, but returns and accepts bytes instead of str.
# quick_example.py
import orjson
from datetime import datetime
data = {
"name": "Alice",
"score": 98.6,
"logged_in": True,
"joined": datetime(2024, 3, 15, 9, 30, 0),
"tags": ["python", "backend","fast"]
}
# Serialize to bytes (not str like the standard json module)
encoded = orjson.dumps(data)
print(encoded)
print(type(encoded))
# Deserialize back to a Python dict
decoded = orjson.loads(encoded)
print(decoded["joined"]) # datetime is serialized as ISO 8601 string
print(type(decoded))
Output:
b'{"name":"Alice","score":98.6,"logged_in":true,"joined":"2024-03-15T09:30:00","tags":["python","backend","fast"]}'
<class 'bytes'>
2024-03-15T09:30:00
<class 'dict'>
Two things stand out right away. First, orjson.dumps() returns bytes, not a string — this is intentional and saves an unnecessary encoding step when writing to network sockets or files. Second, the datetime object is automatically serialized to ISO 8601 format without any extra work, which the standard json module would refuse to handle at all.
What Is orjson and Why Use It?
orjson is a Python JSON library implemented in Rust using the Serde framework. It was created specifically to address the performance limitations of Python’s built-in json module, which is implemented in C but still shows its age when processing large payloads at high throughput.
The key differences between orjson and the standard library are:
| Feature | Standard json | orjson |
|---|---|---|
| Output type of dumps() | str | bytes |
| datetime support | Raises TypeError | Native ISO 8601 |
| UUID support | Raises TypeError | Native string |
| dataclass support | Raises TypeError | Native dict-like |
| numpy array support | Not supported | Native (optional dep) |
| Performance (typical) | Baseline | 2-10x faster |
| Strict UTF-8 validation | No | Yes |
The Rust implementation takes advantage of SIMD instructions and a highly optimized Serde-based serialization pipeline. For applications doing heavy JSON processing — API gateways, caching layers, log aggregators — the improvement is measurable and often significant.
Installing orjson
orjson is available on PyPI and installs with a single command:
# install_orjson.sh
pip install orjson
Output:
Collecting orjson
Downloading orjson-3.10.x-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (144 kB)
Successfully installed orjson-3.10.x
orjson ships as a pre-compiled binary for most platforms (Linux, macOS, Windows on x86-64 and ARM), so there is no Rust toolchain required. If you are on a less common platform you may need Rust installed to build from source. Verify the installation with a quick import check:
# verify_install.py
import orjson
print(orjson.__version__)
Output:
3.10.x
Serializing Python Objects with orjson.dumps()
The orjson.dumps() function converts Python objects to JSON bytes. The most important thing to remember is that it always returns bytes, not str. If you need a string, call .decode() on the result.
# serialization_basics.py
import orjson
from datetime import datetime, date
from uuid import UUID
from dataclasses import dataclass
@dataclass
class User:
id: UUID
name: str
created: datetime
active: bool
user = User(
id=UUID("12345678-1234-5678-1234-567812345678"),
name="Bob Smith",
created=datetime(2025, 1, 10, 14, 30),
active=True
)
# Serialize the dataclass directly -- no custom encoder needed
result = orjson.dumps(user)
print(result)
# Decode to string if needed
print(result.decode("utf-8"))
Output:
b'{"id":"12345678-1234-5678-1234-567812345678","name":"Bob Smith","created":"2025-01-10T14:30:00","active":true}'
{"id":"12345678-1234-5678-1234-567812345678","name":"Bob Smith","created":"2025-01-10T14:30:00","active":true}
Notice that the UUID, datetime, and dataclass are all handled automatically with zero configuration. With the standard json module, each of these would raise a TypeError: Object of type X is not JSON serializable error, requiring a custom default function.
orjson Options and Flags
orjson supports serialization options passed via the option parameter as bitwise-OR combinations of constants. These let you control formatting, sorting, and type handling:
# orjson_options.py
import orjson
data = {
"z_key": "last",
"a_key": "first",
"count": 42,
"ratio": 3.14159
}
# Pretty-print with indented output
pretty = orjson.dumps(data, option=orjson.OPT_INDENT_2)
print("Pretty:")
print(pretty.decode())
# Sort keys alphabetically
sorted_output = orjson.dumps(data, option=orjson.OPT_SORT_KEYS)
print("\nSorted keys:")
print(sorted_output.decode())
# Combine options with bitwise OR
both = orjson.dumps(data, option=orjson.OPT_INDENT_2 | orjson.OPT_SORT_KEYS)
print("\nPretty + Sorted:")
print(both.decode())
Output:
Pretty:
{
"z_key": "last",
"a_key": "first",
"count": 42,
"ratio": 3.14159
}
Sorted keys:
{"a_key":"first","count":42,"ratio":3.14159,"z_key":"last"}
Pretty + Sorted:
{
"a_key": "first",
"count": 42,
"ratio": 3.14159,
"z_key": "last"
}
The most useful options in practice are OPT_INDENT_2 for human-readable output during debugging, OPT_SORT_KEYS for deterministic output in tests or caches, OPT_NON_STR_KEYS for dicts with integer or float keys, and OPT_UTC_Z to use Z suffix instead of +00:00 for UTC datetimes.
Deserializing with orjson.loads()
The orjson.loads() function accepts both bytes and str input and returns Python objects. Unlike the standard library, it performs strict UTF-8 validation on input, which means malformed data fails loudly rather than silently corrupting your data.
# deserialization.py
import orjson
# From bytes (most common in API and network scenarios)
json_bytes = b'{"name": "Charlie", "score": 99.5, "tags": ["fast", "correct"]}'
data = orjson.loads(json_bytes)
print(data)
print(type(data["score"]))
# From string also works
json_str = '{"status": "ok", "count": 1000}'
data2 = orjson.loads(json_str)
print(data2)
# Error handling -- orjson raises JSONDecodeError for invalid input
try:
orjson.loads(b'{"broken": }')
except orjson.JSONDecodeError as e:
print(f"Parse error: {e}")
Output:
{'name': 'Charlie', 'score': 99.5, 'tags': ['fast', 'correct']}
<class 'float'>
{'status': 'ok', 'count': 1000}
Parse error: expected value at line 1 column 12
One important detail: orjson.JSONDecodeError is a subclass of json.JSONDecodeError, so any existing except blocks using json.JSONDecodeError will still catch orjson errors without modification. This makes the migration path from the standard library seamless.
Benchmarking orjson vs Standard json
Let us run a concrete benchmark so you can see the actual performance difference on your hardware. We test serializing and deserializing a moderately complex nested dictionary 100,000 times:
# benchmark_orjson.py
import json
import orjson
import time
from datetime import datetime
# Test data -- similar to a typical API response
sample_data = {
"users": [
{"id": i, "name": f"User{i}", "email": f"user{i}@example.com",
"score": i * 1.5, "active": i % 2 == 0, "tags": ["python", "backend"]}
for i in range(50)
],
"total": 50,
"page": 1
}
ITERATIONS = 100_000
# Benchmark json.dumps
start = time.perf_counter()
for _ in range(ITERATIONS):
json.dumps(sample_data)
json_dumps_time = time.perf_counter() - start
# Benchmark orjson.dumps (returns bytes)
start = time.perf_counter()
for _ in range(ITERATIONS):
orjson.dumps(sample_data)
orjson_dumps_time = time.perf_counter() - start
# Benchmark json.loads
json_str = json.dumps(sample_data)
start = time.perf_counter()
for _ in range(ITERATIONS):
json.loads(json_str)
json_loads_time = time.perf_counter() - start
# Benchmark orjson.loads
orjson_bytes = orjson.dumps(sample_data)
start = time.perf_counter()
for _ in range(ITERATIONS):
orjson.loads(orjson_bytes)
orjson_loads_time = time.perf_counter() - start
print(f"json.dumps: {json_dumps_time:.3f}s")
print(f"orjson.dumps: {orjson_dumps_time:.3f}s ({json_dumps_time/orjson_dumps_time:.1f}x faster)")
print(f"json.loads: {json_loads_time:.3f}s")
print(f"orjson.loads: {orjson_loads_time:.3f}s ({json_loads_time/orjson_loads_time:.1f}x faster)")
Output (typical results on a modern CPU):
json.dumps: 2.841s
orjson.dumps: 0.482s (5.9x faster)
json.loads: 2.103s
orjson.loads: 0.631s (3.3x faster)
Actual speedups vary based on payload size, nesting depth, and hardware, but 3-6x faster on both operations is typical. For a service handling 1,000 requests per second with 100KB payloads each, this translates to substantial CPU savings that compound at scale.
Real-Life Example: FastAPI Response Caching with orjson
Here is a practical example that integrates orjson into a FastAPI application. We use orjson for both serializing API responses and caching them in memory, demonstrating a common production pattern:
# fastapi_orjson_cache.py
"""
FastAPI app with orjson-powered response serialization and in-memory caching.
Run with: uvicorn fastapi_orjson_cache:app --reload
"""
import orjson
from fastapi import FastAPI
from fastapi.responses import Response
from datetime import datetime, timezone
from dataclasses import dataclass, field
from typing import Optional
import hashlib
app = FastAPI()
# Simple in-memory cache using orjson bytes as values
_cache: dict[str, bytes] = {}
@dataclass
class ProductRecord:
id: int
name: str
price: float
in_stock: bool
last_updated: datetime
tags: list[str] = field(default_factory=list)
def get_product_from_db(product_id: int) -> Optional[ProductRecord]:
"""Simulates a database lookup."""
if product_id > 100:
return None
return ProductRecord(
id=product_id,
name=f"Product {product_id}",
price=round(product_id * 9.99, 2),
in_stock=product_id % 3 != 0,
last_updated=datetime.now(timezone.utc),
tags=["electronics", "featured"] if product_id < 50 else ["clearance"]
)
@app.get("/products/{product_id}")
async def get_product(product_id: int):
cache_key = f"product:{product_id}"
# Check cache first
if cache_key in _cache:
# Return cached bytes directly -- no re-serialization needed
return Response(content=_cache[cache_key], media_type="application/json")
product = get_product_from_db(product_id)
if product is None:
error = orjson.dumps({"error": "Product not found", "id": product_id})
return Response(content=error, media_type="application/json", status_code=404)
# Serialize with orjson -- handles dataclass and datetime natively
encoded = orjson.dumps(product, option=orjson.OPT_INDENT_2)
_cache[cache_key] = encoded
return Response(content=encoded, media_type="application/json")
@app.get("/cache/stats")
async def cache_stats():
stats = {
"cached_keys": len(_cache),
"cache_size_bytes": sum(len(v) for v in _cache.values()),
"timestamp": datetime.now(timezone.utc)
}
return Response(content=orjson.dumps(stats), media_type="application/json")
Example curl output:
$ curl http://localhost:8000/products/42
{
"id": 42,
"name": "Product 42",
"price": 419.58,
"in_stock": true,
"last_updated": "2025-03-15T10:22:41.123456+00:00",
"tags": ["electronics", "featured"]
}
The power here is that the serialized bytes are stored in the cache and served directly as the HTTP response body without deserialization or re-serialization. orjson's native datetime handling means the UTC-aware datetime in last_updated is serialized to a full ISO 8601 string with timezone offset -- exactly what frontend clients expect.
Frequently Asked Questions
Why does orjson return bytes instead of str?
orjson returns bytes because JSON data in Python is almost always immediately encoded to bytes for network transport or file writing. Returning bytes directly avoids an extra .encode("utf-8") step. If you need a string, just call result.decode(). This is a deliberate performance decision -- the bytes representation is the final form that gets sent over the wire.
Is orjson a drop-in replacement for the json module?
Almost, but not completely. The function signatures are similar, but orjson.dumps() returns bytes while json.dumps() returns str. Any code that does f.write(json.dumps(data)) will break because you cannot write bytes to a text-mode file. The fix is either f.write(orjson.dumps(data).decode()) or opening the file in binary mode "wb". The default= parameter also works slightly differently in edge cases.
How do I serialize custom types that orjson doesn't support natively?
Use the default parameter with a callback function, just like the standard library. The function receives the object and should return a JSON-serializable value. For example, to serialize a Decimal: orjson.dumps(data, default=lambda x: float(x) if isinstance(x, Decimal) else TypeError). orjson's native type support is broad enough that custom default handlers are rarely needed for modern Python code.
Is orjson thread-safe?
Yes. orjson functions are stateless -- each call to dumps() or loads() is entirely independent. There is no global mutable state, so multiple threads can call orjson simultaneously without any synchronization. This makes it a natural fit for multi-threaded web servers like gunicorm or uvicorn workers.
How does orjson compare to ujson?
Both are faster than the standard library, but orjson is consistently faster than ujson in benchmarks and has better correctness guarantees. ujson has a history of silently dropping or corrupting data in edge cases (very large integers, NaN values, deeply nested structures). orjson prioritizes correctness alongside speed. For production code where data integrity matters, orjson is the better choice.
Conclusion
orjson delivers a simple, high-value upgrade to any Python codebase that does significant JSON processing. The Rust-based implementation provides 3-6x faster serialization and deserialization, native support for datetime, UUID, dataclasses, and numpy arrays, and correct strict UTF-8 validation -- all with an API close enough to the standard library that migration is usually a matter of replacing the import and handling the bytes return type.
Try extending the FastAPI caching example to use Redis as a backend instead of in-memory storage, or add a Cache-Control header to the response based on the product's last_updated timestamp. These are natural next steps that reinforce how orjson fits into production API patterns.
For the full API reference and advanced options like OPT_PASSTHROUGH_DATETIME, see the orjson GitHub repository.
Related Articles
Related Articles
- How To Manage Python Environment Variables With dotenv and os.environ
- How To Read and Write JSON Files in Python 3
- How To Split And Organise Your Source Code Into Multiple Files in Python 3
Further Reading: For more details, see the Python configparser documentation.
Frequently Asked Questions
What is the best way to store settings in Python?
For simple key-value settings, use INI files with ConfigParser. For nested data, use JSON or TOML. For environment-specific settings, use .env files with python-dotenv. The best choice depends on your complexity needs and whether non-developers will edit the settings.
How do I create a config file in Python?
Use ConfigParser to create INI files: instantiate the parser, add sections and key-value pairs with config['section'] = {'key': 'value'}, then write with config.write(open('config.ini', 'w')). For JSON, use json.dump().
Should I use environment variables or config files?
Use environment variables for sensitive data (API keys, passwords) and deployment-specific settings. Use config files for application-level settings that rarely change. Many projects combine both: a config file for defaults and environment variables for overrides and secrets.
How do I prevent config files from being committed to Git?
Add your config file names to .gitignore (e.g., config.ini, .env). Provide a config.example.ini template in the repository so other developers know what settings are needed without exposing actual values.
Can I use YAML for Python configuration files?
Yes. Install PyYAML with pip install pyyaml and use yaml.safe_load() to read YAML files. YAML supports nested structures, lists, and comments, making it more expressive than INI. However, it is not part of Python’s standard library.