Intermediate

If you have ever profiled a Python application and found JSON serialization eating up CPU time, you already know the frustration. The built-in json module is reliable and convenient, but it was not designed for speed. When you are processing thousands of API responses per second, converting dataclass objects for a REST API, or serializing datetime values without jumping through hoops, those milliseconds add up fast. That is exactly the problem orjson was built to solve.

orjson is a third-party JSON library written in Rust that integrates seamlessly into Python. It handles encoding and decoding with the same simple interface you already know — dumps() and loads() — but operates 5-10x faster than the standard library and adds native support for types that json cannot handle without custom encoders, like datetime, numpy arrays, UUID objects, and dataclasses. You install it with pip install orjson and it requires no other dependencies.

In this article, you will learn how to serialize and deserialize JSON with orjson, how to handle Python-native types like datetimes and dataclasses automatically, how to use orjson options for output formatting, how to benchmark the speed difference yourself, and how to build a practical data pipeline that processes JSON records at high throughput. By the end, you will have a drop-in replacement for the standard library that improves performance without changing your application’s logic.

Quick Example: orjson in Action

Here is a self-contained script that shows the most common orjson operations in one place. You can run this immediately after installing orjson with pip install orjson.

# quick_orjson.py
import orjson
from datetime import datetime, timezone
from dataclasses import dataclass

@dataclass
class Event:
    name: str
    timestamp: datetime
    count: int

event = Event(name="page_view", timestamp=datetime(2026, 5, 6, 9, 0, 0, tzinfo=timezone.utc), count=42)

# Serialize to bytes (orjson always returns bytes, not str)
raw = orjson.dumps(event)
print(raw)

# Deserialize back to a dict
data = orjson.loads(raw)
print(data)

# Pretty-print
pretty = orjson.dumps(event, option=orjson.OPT_INDENT_2)
print(pretty.decode())

Output:

b'{"name":"page_view","timestamp":"2026-05-06T09:00:00+00:00","count":42}'
{'name': 'page_view', 'timestamp': '2026-05-06T09:00:00+00:00', 'count': 42}
{
  "name": "page_view",
  "timestamp": "2026-05-06T09:00:00+00:00",
  "count": 42
}

Two things stand out here. First, orjson serialized the dataclass directly — no custom encoder needed. Second, the datetime object was automatically converted to an ISO 8601 string including the UTC offset. The standard library would raise a TypeError for both of these without extra configuration.

The sections below cover every major feature: types supported natively, formatting options, how to benchmark performance, common pitfalls, and a real-world data pipeline example.

What Is orjson and Why Is It Faster?

orjson is a JSON library written in Rust and compiled as a Python extension. It uses the serde serialization framework — the same foundation used by many high-performance Rust services — and exposes a minimal Python API. Because the heavy lifting happens in compiled native code, it avoids Python’s interpreter overhead for every character it processes.

The standard json module is a pure Python implementation (with a small C accelerator for some operations). It loops over Python objects at the Python level, which means each key lookup, type check, and string conversion goes through the interpreter. orjson does all of that in Rust, then returns the finished bytes directly to Python.

Feature	json (stdlib)	orjson
Return type of dumps()	str	bytes
datetime support	Raises TypeError	Native (ISO 8601)
dataclass support	Raises TypeError	Native
UUID support	Raises TypeError	Native
numpy array support	No	Yes (with option)
Pretty print	indent= param	OPT_INDENT_2 option
Speed (encode)	Baseline	5-10x faster
Installation	Built-in	pip install orjson

The key trade-off is that orjson’s dumps() always returns bytes, not str. If you need a string, call .decode() on the result. This is intentional — most real-world use cases (writing to a file, sending over a socket, writing to a database) work directly with bytes, and skipping the str conversion saves extra memory allocation.

Installation and Basic API

Install orjson with pip. It requires Python 3.8+ and has no Python dependencies — only the compiled Rust extension.

# terminal
pip install orjson

The API mirrors the standard library closely. orjson.dumps() serializes a Python object to JSON bytes, and orjson.loads() deserializes JSON bytes or strings back to Python objects.

# basic_api.py
import orjson

# Serializing
data = {"user": "alice", "score": 99, "active": True}
encoded = orjson.dumps(data)
print(type(encoded))   # 
print(encoded)         # b'{"user":"alice","score":99,"active":true}'

# Deserializing -- accepts bytes, bytearray, memoryview, or str
decoded = orjson.loads(encoded)
print(type(decoded))   # 
print(decoded)

Output:

<class 'bytes'>
b'{"user":"alice","score":99,"active":true}'
<class 'dict'>
{'user': 'alice', 'score': 99, 'active': True}

Because loads() accepts both bytes and str, you can pass the output of dumps() directly to loads() without decoding first. This makes round-tripping data frictionless.

Native Support for Python Types

This is where orjson earns its reputation. The standard library raises TypeError for datetime, dataclass, UUID, and Enum objects unless you write a custom encoder. orjson handles all of them out of the box.

Datetimes and Dates

orjson serializes datetime, date, and time objects to ISO 8601 strings automatically. Timezone-aware datetimes include the UTC offset; naive datetimes are serialized without one.

# datetime_types.py
import orjson
from datetime import datetime, date, time, timezone, timedelta

eastern = timezone(timedelta(hours=-5))

data = {
    "created_at": datetime(2026, 5, 6, 14, 30, 0, tzinfo=eastern),
    "birth_date": date(1990, 3, 15),
    "run_time": time(8, 45, 0),
    "utc_now": datetime(2026, 5, 6, 19, 30, 0, tzinfo=timezone.utc),
}

print(orjson.dumps(data).decode())

Output:

{"created_at":"2026-05-06T14:30:00-05:00","birth_date":"1990-03-15","run_time":"08:45:00","utc_now":"2026-05-06T19:30:00+00:00"}

The timezone offset is handled correctly for each value — -05:00 for Eastern and +00:00 for UTC. This is exactly what you need when sending timestamps to APIs or storing records that must be unambiguous about timezone.

Dataclasses and Named Tuples

orjson serializes Python dataclass instances and NamedTuple subclasses as JSON objects, mapping field names to values. You do not need a to_dict() method or a custom serializer.

# dataclass_serial.py
import orjson
from dataclasses import dataclass
from typing import List
from datetime import datetime, timezone

@dataclass
class Article:
    title: str
    author: str
    tags: List[str]
    published_at: datetime
    views: int

article = Article(
    title="How To Use orjson in Python",
    author="alice",
    tags=["python", "performance", "json"],
    published_at=datetime(2026, 5, 6, 10, 0, 0, tzinfo=timezone.utc),
    views=1024,
)

result = orjson.dumps(article, option=orjson.OPT_INDENT_2)
print(result.decode())

Output:

{
  "title": "How To Use orjson in Python",
  "author": "alice",
  "tags": [
    "python",
    "performance",
    "json"
  ],
  "published_at": "2026-05-06T10:00:00+00:00",
  "views": 1024
}

Notice that the nested datetime field inside the dataclass is also serialized correctly — orjson handles type conversion recursively through the entire object tree.

orjson Options for Output Control

orjson uses a set of integer flags (passed via the option parameter) to control serialization behavior. You combine multiple options with the bitwise OR operator (|).

# orjson_options.py
import orjson
from datetime import datetime, timezone

data = {
    "name": "alice",
    "score": 99,
    "timestamp": datetime(2026, 5, 6, 10, 0, 0, tzinfo=timezone.utc),
    "metadata": {"source": "api", "version": 2},
}

# Pretty print with 2-space indent
pretty = orjson.dumps(data, option=orjson.OPT_INDENT_2)
print("Pretty:\n", pretty.decode())

# Sort keys alphabetically
sorted_keys = orjson.dumps(data, option=orjson.OPT_SORT_KEYS)
print("\nSorted keys:", sorted_keys.decode())

# Non-string dict keys (e.g., int keys)
int_key_data = {1: "one", 2: "two", 3: "three"}
int_keys = orjson.dumps(int_key_data, option=orjson.OPT_NON_STR_KEYS)
print("\nInt keys:", int_keys.decode())

# Combine options
combined = orjson.dumps(data, option=orjson.OPT_INDENT_2 | orjson.OPT_SORT_KEYS)
print("\nCombined:", combined.decode())

Output:

Pretty:
 {
  "name": "alice",
  "score": 99,
  "timestamp": "2026-05-06T10:00:00+00:00",
  "metadata": {
    "source": "api",
    "version": 2
  }
}

Sorted keys: {"metadata":{"source":"api","version":2},"name":"alice","score":99,"timestamp":"2026-05-06T10:00:00+00:00"}

Int keys: {"1":"one","2":"two","3":"three"}

Combined: {
  "metadata": {
    "source": "api",
    "version": 2
  },
  "name": "alice",
  "score": 99,
  "timestamp": "2026-05-06T10:00:00+00:00"
}

The most commonly used options are OPT_INDENT_2 for human-readable output, OPT_SORT_KEYS for deterministic output (useful for hashing or diffing), and OPT_NON_STR_KEYS when your dicts use integer or tuple keys.

Benchmarking orjson vs stdlib json

Speed claims are only useful if you can measure them yourself. Here is a self-contained benchmark you can run to compare orjson against the standard library on your own machine.

# benchmark.py
import json
import orjson
import timeit
from datetime import datetime, timezone

# Sample payload -- similar to a typical API response
payload = {
    "id": 12345,
    "name": "alice",
    "email": "alice@example.com",
    "created_at": "2026-05-06T10:00:00+00:00",
    "scores": [98, 87, 92, 95, 88],
    "metadata": {"source": "api", "active": True, "version": 3},
}

ITERATIONS = 50000

# Benchmark json.dumps
json_encode_time = timeit.timeit(
    lambda: json.dumps(payload),
    number=ITERATIONS
)

# Benchmark orjson.dumps
orjson_encode_time = timeit.timeit(
    lambda: orjson.dumps(payload),
    number=ITERATIONS
)

json_str = json.dumps(payload)
orjson_bytes = orjson.dumps(payload)

# Benchmark json.loads
json_decode_time = timeit.timeit(
    lambda: json.loads(json_str),
    number=ITERATIONS
)

# Benchmark orjson.loads
orjson_decode_time = timeit.timeit(
    lambda: orjson.loads(orjson_bytes),
    number=ITERATIONS
)

print(f"Encoding ({ITERATIONS:,} iterations):")
print(f"  json.dumps:   {json_encode_time:.3f}s")
print(f"  orjson.dumps: {orjson_encode_time:.3f}s")
print(f"  Speedup:      {json_encode_time / orjson_encode_time:.1f}x")
print(f"\nDecoding ({ITERATIONS:,} iterations):")
print(f"  json.loads:   {json_decode_time:.3f}s")
print(f"  orjson.loads: {orjson_decode_time:.3f}s")
print(f"  Speedup:      {json_decode_time / orjson_decode_time:.1f}x")

Output (approximate — your numbers will vary by hardware):

Encoding (50,000 iterations):
  json.dumps:   0.621s
  orjson.dumps: 0.089s
  Speedup:      7.0x

Decoding (50,000 iterations):
  json.loads:   0.514s
  orjson.loads: 0.112s
  Speedup:      4.6x

The encoding speedup is most dramatic because orjson avoids Python-level object traversal entirely. For decoding, the speedup is still significant but slightly lower because Python dict construction has some overhead regardless of where parsing happens. The more complex and deeply nested your data, the larger the performance gap tends to be.

Real-Life Example: High-Throughput Log Processor

Here is a practical script that reads a large batch of JSON log records, deserializes them with orjson, filters for error events, enriches each record with processing metadata, and serializes the results back to a JSONL file. This pattern is common in data pipelines, ETL jobs, and log analytics systems.

# log_processor.py
import orjson
from datetime import datetime, timezone
from dataclasses import dataclass
from typing import List, Optional
import io

# -- Sample log data (in a real pipeline, this would come from a file or stream) --
SAMPLE_LOGS = [
    b'{"level":"ERROR","service":"auth","message":"token expired","user_id":101,"ts":"2026-05-06T09:01:00Z"}',
    b'{"level":"INFO","service":"api","message":"request received","user_id":202,"ts":"2026-05-06T09:01:05Z"}',
    b'{"level":"ERROR","service":"db","message":"connection timeout","user_id":null,"ts":"2026-05-06T09:01:10Z"}',
    b'{"level":"WARN","service":"cache","message":"cache miss","user_id":303,"ts":"2026-05-06T09:01:15Z"}',
    b'{"level":"ERROR","service":"auth","message":"invalid signature","user_id":404,"ts":"2026-05-06T09:01:20Z"}',
]

@dataclass
class EnrichedLog:
    level: str
    service: str
    message: str
    user_id: Optional[int]
    original_ts: str
    processed_at: datetime
    is_critical: bool

def process_log_batch(raw_logs: List[bytes]) -> List[EnrichedLog]:
    """Filter ERROR logs and enrich them with processing metadata."""
    enriched = []
    now = datetime.now(timezone.utc)

    for raw in raw_logs:
        record = orjson.loads(raw)

        # Only process errors
        if record.get("level") != "ERROR":
            continue

        log = EnrichedLog(
            level=record["level"],
            service=record.get("service", "unknown"),
            message=record.get("message", ""),
            user_id=record.get("user_id"),
            original_ts=record.get("ts", ""),
            processed_at=now,
            is_critical="timeout" in record.get("message", "").lower(),
        )
        enriched.append(log)

    return enriched

def write_jsonl(records: List[EnrichedLog]) -> str:
    """Serialize records to JSONL format (one JSON object per line)."""
    lines = []
    for record in records:
        lines.append(orjson.dumps(record, option=orjson.OPT_NON_STR_KEYS).decode())
    return "\n".join(lines)

# Run the pipeline
errors = process_log_batch(SAMPLE_LOGS)
output = write_jsonl(errors)

print(f"Processed {len(SAMPLE_LOGS)} log records")
print(f"Found {len(errors)} ERROR events\n")
print("=== Enriched Error Logs (JSONL) ===")
print(output)

Output:

Processed 5 log records
Found 3 ERROR events

=== Enriched Error Logs (JSONL) ===
{"level":"ERROR","service":"auth","message":"token expired","user_id":101,"original_ts":"2026-05-06T09:01:00Z","processed_at":"2026-05-06T09:01:30+00:00","is_critical":false}
{"level":"ERROR","service":"db","message":"connection timeout","user_id":null,"original_ts":"2026-05-06T09:01:10Z","processed_at":"2026-05-06T09:01:30+00:00","is_critical":true}
{"level":"ERROR","service":"auth","message":"invalid signature","user_id":404,"original_ts":"2026-05-06T09:01:20Z","processed_at":"2026-05-06T09:01:30+00:00","is_critical":false}

This script demonstrates three real benefits: orjson deserializes each log line with loads(), the EnrichedLog dataclass is serialized directly without any manual conversion, and the datetime.now() value is automatically converted to ISO 8601 with timezone. You can extend this by reading from a real file with open("app.log", "rb") and writing to disk instead of printing.

Frequently Asked Questions

Why does orjson.dumps() return bytes instead of str?

JSON is defined as UTF-8 encoded text, and bytes is the natural representation for encoded data. Most real-world destinations — network sockets, file I/O, HTTP response bodies, database BLOBs — work with bytes directly. Returning bytes skips an extra allocation and avoids the overhead of converting to a Python string object. If you need a string, simply call result.decode(). The performance difference between returning bytes vs. string is small, but across millions of serializations it is measurable.

Can I use a custom encoder for unsupported types?

orjson supports a default parameter similar to the stdlib. You pass a callable that receives the unserializable object and must return something orjson can handle. For example: orjson.dumps(obj, default=lambda o: o.__dict__). This is the escape hatch for types like Decimal, custom objects, or third-party classes that orjson does not natively recognize. Keep the default function fast because it is called once per unrecognized type instance.

Is orjson a drop-in replacement for the stdlib json module?

Almost, but not quite. The main behavioral differences are: dumps() returns bytes not str; the indent parameter is replaced by option=orjson.OPT_INDENT_2; and cls and object_hook parameters are not supported. If your code calls json.dumps() and passes the result directly to write(), the transition is a one-line change. If you depend on indent= or custom decoders, you will need small adjustments.

Does orjson support numpy arrays?

Yes, with option=orjson.OPT_SERIALIZE_NUMPY. This serializes numpy arrays as JSON arrays, respecting the dtype. Supported dtypes include all integer and float variants, bool, and str. If you frequently serialize numpy data (e.g., embedding vectors, model predictions), this option makes orjson significantly more convenient than the stdlib, which requires converting arrays to lists manually.

What errors should I watch for?

orjson raises orjson.JSONDecodeError (a subclass of json.JSONDecodeError) for invalid input to loads(), so existing try/except blocks that catch json.JSONDecodeError still work. For serialization, it raises orjson.JSONEncodeError for types it cannot handle. The most common case is passing an object that is not a dataclass, dict, list, or primitive — pass a default function to handle those cases.

Is orjson thread-safe?

Yes. orjson has no mutable global state and each dumps() and loads() call is independent. You can call it from multiple threads concurrently without locks. This matters in web servers (FastAPI, Django) that process many requests in parallel — you can replace the stdlib json calls in serialization middleware without introducing thread contention.

Conclusion

orjson is one of those rare libraries that does exactly one thing and does it better than the alternative in nearly every way. You get 5-10x faster serialization, native support for datetime, dataclass, UUID, and Enum objects without custom encoders, a clean options API for formatting control, and thread safety — all from a single pip install orjson. The only adjustment you need to make is calling .decode() when you need a string instead of bytes.

Start with the benchmark script from this article to measure the difference on your own workload. For data pipelines, API serializers, and any code that processes large volumes of JSON, orjson is a straightforward performance win. You can also explore orjsonds OPT_SERIALIZE_NUMPY option if your project works with numpy arrays, and combine it with OPT_SORT_KEYS for reproducible output in tests. For the full option reference, visit the official orjson documentation on GitHub.

Post Views: 2

How To Use Python orjson for Fast JSON Processing

Quick Example: orjson in Action

What Is orjson and Why Is It Faster?

Installation and Basic API

Native Support for Python Types

Datetimes and Dates

Dataclasses and Named Tuples

orjson Options for Output Control

Benchmarking orjson vs stdlib json

Real-Life Example: High-Throughput Log Processor

Frequently Asked Questions

Why does orjson.dumps() return bytes instead of str?

Can I use a custom encoder for unsupported types?

Is orjson a drop-in replacement for the stdlib json module?

Does orjson support numpy arrays?

What errors should I watch for?

Is orjson thread-safe?

Conclusion

Submit a Comment Cancel reply

How To Use Python orjson for Fast JSON Processing

Quick Example: orjson in Action

What Is orjson and Why Is It Faster?

Installation and Basic API

Native Support for Python Types

Datetimes and Dates

Dataclasses and Named Tuples

orjson Options for Output Control

Benchmarking orjson vs stdlib json

Real-Life Example: High-Throughput Log Processor

Frequently Asked Questions

Why does orjson.dumps() return bytes instead of str?

Can I use a custom encoder for unsupported types?

Is orjson a drop-in replacement for the stdlib json module?

Does orjson support numpy arrays?

What errors should I watch for?

Is orjson thread-safe?

Conclusion

Related Articles

Submit a Comment Cancel reply