Intermediate

If you’ve ever written the same list-filtering loop for the hundredth time, or stared at a nested dictionary wondering how to safely pluck a value three levels deep, you already know the problem Pydash solves. Python’s standard library is powerful, but working with collections and data pipelines often produces verbose, repetitive code that buries your intent under implementation details. Pydash is a utility library — inspired by Lodash from JavaScript — that gives you a clean, consistent set of functions for transforming data the way you think about it, not the way Python’s builtins happen to expose it.

Pydash is a pure Python library with no mandatory dependencies, and installing it takes one command: pip install pydash. It works with any Python 3.7+ environment. The functions cover six main areas: lists, dictionaries, strings, numbers, functions (higher-order utilities), and chaining. Most functions are forgiving by design — they return sensible defaults instead of raising exceptions when data is missing or the wrong shape.

In this article, we cover the most useful Pydash utilities for everyday Python work: deep key access with get and set_, list operations with chunk, flatten, and group_by, string utilities, functional tools like curry and partial, and Pydash’s method chaining API. By the end you’ll have a working data pipeline that processes a collection of records using only Pydash functions — no manual loops required.

Working with Pydash: Quick Example

To see how Pydash changes the shape of everyday code, here is a complete example that deep-reads nested dictionary keys, filters a list, and groups results — all without writing a single for-loop or try/except block:

# quick_example.py
import pydash as _

records = [
    {"user": {"name": "Alice", "role": "admin"}, "score": 91},
    {"user": {"name": "Bob",   "role": "member"}, "score": 74},
    {"user": {"name": "Carol", "role": "admin"}, "score": 88},
    {"user": {"name": "Dan",   "role": "member"}, "score": 55},
]

# Deep-get a nested key with a fallback default
first_name = _.get(records[0], "user.name", "Unknown")
print("First user:", first_name)

# Filter records where score >= 80
high_scorers = _.filter_(records, lambda r: r["score"] >= 80)
print("High scorers:", _.map_(high_scorers, "user.name"))

# Group all records by role
by_role = _.group_by(records, "user.role")
print("Admins:", [r["user"]["name"] for r in by_role["admin"]])

Output:

First user: Alice
High scorers: ['Alice', 'Carol']
Admins: ['Alice', 'Carol']

Notice that _.map_(high_scorers, "user.name") uses a path string instead of a lambda — Pydash accepts dot-notation paths wherever it accepts iteratee functions. That single convention eliminates a large category of boilerplate lambdas. The trailing underscore on filter_ and map_ avoids shadowing Python’s built-in names.

The sections below go deeper into each function group, with realistic examples you can run immediately.

Six levels deep. One dot-separated string. Zero KeyErrors.
Six levels deep. One dot-separated string. Zero KeyErrors.

What Is Pydash and Why Use It?

Pydash is a port of Lodash — one of JavaScript’s most-downloaded utility libraries — to Python. Where Python’s standard library organizes utilities by data type (itertools, functools, collections, str methods), Pydash organizes everything under a single consistent namespace: import pydash as _ and every utility is one call away.

The library solves a specific problem: Python’s builtins are not composable at the data level. You cannot safely read data["user"]["address"]["city"] without guarding every bracket with a try/except or a chain of .get() calls. Pydash’s _.get(data, "user.address.city", "Unknown") does the same thing in one line and never raises an exception. This matters most when you’re processing API responses, configuration files, or any JSON-shaped data where fields may or may not exist.

Here is a quick comparison of what Pydash replaces:

TaskVanilla PythonPydash
Safe nested key accessdata.get("a", {}).get("b", default)_.get(data, "a.b", default)
Flatten nested listlist(itertools.chain.from_iterable(...))_.flatten(nested)
Group list by fieldManual defaultdict loop_.group_by(items, "field")
Split list into pagesSlice arithmetic in a loop_.chunk(items, size)
Unique by fieldSeen-set + loop_.uniq_by(items, "id")
Partial applicationfunctools.partial(fn, ...)_.partial(fn, ...) or _.curry(fn)

Pydash is not a replacement for pandas or numpy — it doesn’t do vectorized math or DataFrames. It’s the missing middle ground between raw Python and a full data science stack: a clean toolbox for transforming ordinary Python dicts and lists.

Installing Pydash

Install Pydash from PyPI. No other dependencies are required:

# install_pydash.sh
pip install pydash

Output:

Successfully installed pydash-8.0.3

Throughout this article we import Pydash as _, which mirrors the Lodash convention and makes function calls read naturally. If the underscore alias conflicts with something in your codebase, use import pydash as pd or import functions individually: from pydash import get, flatten, group_by.

Dictionary Utilities: get, set_, has, omit, pick

Safe Deep Access with get and set_

The two most-used Pydash functions are _.get() and _.set_(). They read and write nested keys using dot-notation paths without raising exceptions on missing keys. This is invaluable when consuming API responses where any field might be absent:

# dict_get_set.py
import pydash as _

user = {
    "profile": {
        "name": "Alice",
        "address": {
            "city": "Melbourne",
            "postcode": "3000"
        }
    },
    "scores": [91, 85, 78]
}

# Safe nested read — returns default if path is missing
city = _.get(user, "profile.address.city", "Unknown")
country = _.get(user, "profile.address.country", "Australia")  # key missing
print("City:", city)
print("Country (default):", country)

# Array index access in paths
first_score = _.get(user, "scores[0]", 0)
print("First score:", first_score)

# Deep write — creates intermediate keys if needed
_.set_(user, "profile.settings.theme", "dark")
print("Theme set:", _.get(user, "profile.settings.theme"))

Output:

City: Melbourne
Country (default): Australia
First score: 91
Theme set: dark

_.get() never raises a KeyError or TypeError — if any segment of the path is missing or None, it returns your default value. _.set_() mutates the original dict in place and creates intermediate dicts automatically, so you never need to pre-initialize nested structures.

Selecting and Filtering Keys: pick, omit, has

When you need a subset of a dictionary’s keys — for serialization, logging, or passing to an API — _.pick() and _.omit() do the job cleanly without dictionary comprehensions:

# dict_pick_omit.py
import pydash as _

record = {
    "id": 42,
    "name": "Alice",
    "email": "alice@example.com",
    "password_hash": "abc123",
    "created_at": "2024-01-15",
    "internal_notes": "VIP customer"
}

# Keep only safe fields for API response
public = _.pick(record, ["id", "name", "email", "created_at"])
print("Public record:", public)

# Remove sensitive fields before logging
loggable = _.omit(record, ["password_hash", "internal_notes"])
print("Loggable record:", loggable)

# Check if a nested key exists
print("Has email?", _.has(record, "email"))
print("Has address?", _.has(record, "address.city"))  # nested path

Output:

Public record: {'id': 42, 'name': 'Alice', 'email': 'alice@example.com', 'created_at': '2024-01-15'}
Loggable record: {'id': 42, 'name': 'Alice', 'email': 'alice@example.com', 'created_at': '2024-01-15'}
Has email? True
Has address? False

Both _.pick() and _.omit() return new dicts — the original is untouched. _.has() accepts dot-notation paths just like _.get(), so you can check for deeply nested keys before trying to access them.

pick() and omit() — faster than writing 'if key in dict' for the thousandth time.
pick() and omit() — faster than writing ‘if key in dict’ for the thousandth time.

List Utilities: chunk, flatten, group_by, uniq_by, zip_

Splitting Lists with chunk

When sending items to an API in batches, paginating results, or splitting data for parallel processing, you need to divide a list into fixed-size groups. _.chunk() handles this in one call:

# list_chunk.py
import pydash as _

item_ids = [101, 102, 103, 104, 105, 106, 107, 108, 109, 110]

# Split into batches of 3 for API calls
batches = _.chunk(item_ids, 3)
print("Batches:", batches)

# Simulate batch API calls
for i, batch in enumerate(batches, 1):
    print(f"  Batch {i}: processing IDs {batch}")

Output:

Batches: [[101, 102, 103], [104, 105, 106], [107, 108, 109], [110]]
  Batch 1: processing IDs [101, 102, 103]
  Batch 2: processing IDs [104, 105, 106]
  Batch 3: processing IDs [107, 108, 109]
  Batch 4: processing IDs [110]

The last batch contains whatever items remain — Pydash never drops items to make batches uniform. No slice arithmetic, no off-by-one errors.

Flattening Nested Lists

API responses frequently return nested arrays — list of lists of items, or lists of dicts containing lists. Pydash provides three levels of flattening: _.flatten() for one level deep, _.flatten_deep() for all levels, and _.flatten_depth(n) for a specific depth:

# list_flatten.py
import pydash as _

# One level of nesting (common from paginated API results)
pages = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
all_items = _.flatten(pages)
print("Flatten one level:", all_items)

# Deeply nested structure
nested = [1, [2, [3, [4, [5]]]]]
print("Flatten deep:", _.flatten_deep(nested))
print("Flatten 2 levels:", _.flatten_depth(nested, 2))

Output:

Flatten one level: [1, 2, 3, 4, 5, 6, 7, 8, 9]
Flatten deep: [1, 2, 3, 4, 5]
Flatten 2 levels: [1, 2, 3, [4, [5]]]

Grouping Records with group_by

Grouping a list of dicts by a shared field is one of the most common data transformation tasks. With vanilla Python you build a defaultdict and a loop. With Pydash it’s a single call, and it accepts both a field name string and a lambda for computed groupings:

# list_group_by.py
import pydash as _

orders = [
    {"id": 1, "status": "shipped",   "amount": 120.00},
    {"id": 2, "status": "pending",   "amount":  45.50},
    {"id": 3, "status": "shipped",   "amount":  89.99},
    {"id": 4, "status": "cancelled", "amount":  30.00},
    {"id": 5, "status": "pending",   "amount": 210.75},
]

# Group by field name string
by_status = _.group_by(orders, "status")
for status, items in by_status.items():
    total = sum(o["amount"] for o in items)
    print(f"  {status}: {len(items)} orders, ${total:.2f} total")

# Group by computed value (lambda)
by_size = _.group_by(orders, lambda o: "large" if o["amount"] > 100 else "small")
print("\nLarge orders:", len(by_size.get("large", [])))
print("Small orders:", len(by_size.get("small", [])))

Output:

  shipped: 2 orders, $209.99 total
  pending: 2 orders, $256.25 total
  cancelled: 1 orders, $30.00 total

Large orders: 2
Small orders: 3

Deduplication with uniq_by

When merging datasets or deduplicating records from multiple sources, _.uniq_by() keeps the first occurrence of each unique key value — no seen-set bookkeeping required:

# list_uniq.py
import pydash as _

# Raw records with duplicates (e.g., merged from two data sources)
contacts = [
    {"id": 1, "name": "Alice", "source": "CRM"},
    {"id": 2, "name": "Bob",   "source": "CRM"},
    {"id": 1, "name": "Alice", "source": "CSV"},  # duplicate ID 1
    {"id": 3, "name": "Carol", "source": "CSV"},
]

unique_contacts = _.uniq_by(contacts, "id")
print("Unique contacts:", [c["name"] for c in unique_contacts])
print("Sources kept:", [c["source"] for c in unique_contacts])

Output:

Unique contacts: ['Alice', 'Bob', 'Carol']
Sources kept: ['CRM', 'CRM', 'CSV']

The first occurrence wins — so when merging CRM data over CSV data, put the preferred source first in the input list before calling _.uniq_by().

group_by: because a defaultdict loop is a crime against readability.
group_by: because a defaultdict loop is a crime against readability.

String Utilities: camel_case, snake_case, truncate, words

Pydash includes a full set of string case converters that are useful when normalizing data from APIs (which often use camelCase) into Python conventions (snake_case), or formatting output for display:

# string_utils.py
import pydash as _

# Case conversion — common when consuming REST APIs
api_key = "getUserProfileData"
print("Snake case:", _.snake_case(api_key))      # get_user_profile_data
print("Kebab case:", _.kebab_case(api_key))      # get-user-profile-data
print("Title case:", _.start_case(api_key))      # Get User Profile Data

# Reverse: snake_case to camelCase for sending back to API
field_name = "user_created_at"
print("Camel case:", _.camel_case(field_name))   # userCreatedAt

# String truncation for display
long_text = "Python is a versatile programming language used in web development, data science, and automation."
print("Truncated:", _.truncate(long_text, 60))

# Split into words (handles camelCase and snake_case)
print("Words:", _.words("getUserData"))          # ['get', 'User', 'Data']
print("Words:", _.words("get_user_data"))        # ['get', 'user', 'data']

# Pad strings for tabular output
print(_.pad("OK", 10))                           # '    OK    '
print(_.pad_end("Loading", 12, "."))             # 'Loading.....'

Output:

Snake case: get_user_profile_data
Kebab case: get-user-profile-data
Title case: Get User Profile Data
Camel case: userCreatedAt
Truncated: Python is a versatile programming language used in...
Words: ['get', 'User', 'Data']
Words: ['get', 'user', 'data']
    OK    
Loading.....

These functions are particularly valuable when writing API adapters that translate between external naming conventions and your internal Python code. Calling _.snake_case() on every key in an API response dict is faster to read and less error-prone than a regex substitution.

Functional Utilities: curry, partial, flow

Currying Functions

Currying transforms a multi-argument function into a chain of single-argument functions. This is useful for creating specialized functions from general ones without repeating arguments everywhere:

# functional_curry.py
import pydash as _

# A general function with multiple parameters
def multiply(a, b):
    return a * b

# Curry it — now each call takes one argument at a time
curried_multiply = _.curry(multiply)

double = curried_multiply(2)   # fix the first argument
triple = curried_multiply(3)

print("Double 5:", double(5))   # 10
print("Triple 5:", triple(5))   # 15

# Practical use: apply a tax rate to a list of prices
add_tax = _.curry(lambda rate, price: round(price * (1 + rate), 2))
add_gst = add_tax(0.10)  # 10% GST

prices = [19.99, 49.99, 9.95, 149.00]
with_tax = list(map(add_gst, prices))
print("Prices with GST:", with_tax)

Output:

Double 5: 10
Triple 5: 15
Prices with GST: [21.99, 54.99, 10.94, 163.9]

Building Pipelines with flow

_.flow() composes a series of single-argument functions into a pipeline — the output of each function becomes the input of the next. This is the cleanest way to express multi-step data transformations without deeply nested function calls:

# functional_flow.py
import pydash as _

# Define individual transformation steps
def normalize_text(text):
    return text.strip().lower()

def remove_punctuation(text):
    return ''.join(c for c in text if c.isalnum() or c.isspace())

def split_words(text):
    return text.split()

def count_words(words):
    return len(words)

# Compose into a pipeline
word_count_pipeline = _.flow(
    normalize_text,
    remove_punctuation,
    split_words,
    count_words
)

samples = [
    "  Hello, World! This is Python.  ",
    "Pydash makes functional programming easy!",
    "  One.  ",
]

for sample in samples:
    count = word_count_pipeline(sample)
    print(f"'{sample.strip()[:40]}...' -> {count} words")

Output:

'Hello, World! This is Python.' -> 5 words
'Pydash makes functional programming easy!' -> 5 words
'One.' -> 1 words

_.flow() makes the transformation sequence explicit and readable. Adding a new step is one line — insert the function anywhere in the chain. Compare this to the equivalent nested call: count_words(split_words(remove_punctuation(normalize_text(text)))), which you have to read right-to-left to understand the execution order.

flow() — because nested function calls read inside-out and pipelines don't.
flow() — because nested function calls read inside-out and pipelines don’t.

Method Chaining with _.chain()

Pydash’s chaining API lets you apply multiple operations to a collection in sequence without intermediate variables or nested calls. The chain is lazy — nothing executes until you call .value():

# chaining.py
import pydash as _

employees = [
    {"name": "Alice",   "dept": "Engineering", "salary": 95000, "years": 4},
    {"name": "Bob",     "dept": "Marketing",   "salary": 72000, "years": 2},
    {"name": "Carol",   "dept": "Engineering", "salary": 110000, "years": 7},
    {"name": "Dan",     "dept": "Marketing",   "salary": 68000, "years": 1},
    {"name": "Eve",     "dept": "Engineering", "salary": 88000, "years": 3},
    {"name": "Frank",   "dept": "HR",          "salary": 65000, "years": 5},
]

# Chain: filter engineers -> sort by salary desc -> take top 2 -> extract names
top_engineers = (
    _.chain(employees)
    .filter_(lambda e: e["dept"] == "Engineering")
    .sort_by("salary", reverse=True)
    .take(2)
    .map_("name")
    .value()
)

print("Top 2 engineers by salary:", top_engineers)

# Chain: group by dept -> map to dept summary stats
dept_summary = (
    _.chain(employees)
    .group_by("dept")
    .map_values(lambda members: {
        "count": len(members),
        "avg_salary": round(sum(m["salary"] for m in members) / len(members))
    })
    .value()
)

for dept, stats in dept_summary.items():
    print(f"  {dept}: {stats['count']} people, avg ${stats['avg_salary']:,}")

Output:

Top 2 engineers by salary: ['Carol', 'Alice']
  Engineering: 3 people, avg $97,667
  Marketing: 2 people, avg $70,000
  HR: 1 people, avg $65,000

Each method in the chain wraps the previous result. The chain object accumulates operations without executing them — execution happens only when .value() is called. This means you can build reusable chain templates and conditionally add operations before calling .value().

Real-Life Example: Employee Report Pipeline

This project combines everything from the article into a self-contained pipeline that ingests raw employee records, cleans and validates them, computes department statistics, and produces a formatted summary report — all using Pydash functions.

Real data pipelines: because manual loops are where maintainability goes to die.
Real data pipelines: because manual loops are where maintainability goes to die.
# employee_report.py
import pydash as _

RAW_DATA = [
    {"id": 1,  "full_name": "  alice chen  ", "department": "engineering", "annual_salary": 95000, "tenure_years": 4, "active": True},
    {"id": 2,  "full_name": "BOB SMITH",       "department": "marketing",   "annual_salary": 72000, "tenure_years": 2, "active": True},
    {"id": 3,  "full_name": "Carol Ng",        "department": "engineering", "annual_salary": 110000,"tenure_years": 7, "active": True},
    {"id": 4,  "full_name": "dan jones",       "department": "marketing",   "annual_salary": 68000, "tenure_years": 1, "active": False},
    {"id": 5,  "full_name": "Eve Rodrigo",     "department": "engineering", "annual_salary": 88000, "tenure_years": 3, "active": True},
    {"id": 6,  "full_name": " FRANK LEE ",     "department": "hr",          "annual_salary": 65000, "tenure_years": 5, "active": True},
    {"id": 7,  "full_name": "Grace Kim",       "department": "engineering", "annual_salary": 102000,"tenure_years": 6, "active": True},
    {"id": 8,  "full_name": "henry park",      "department": "hr",          "annual_salary": 61000, "tenure_years": 2, "active": False},
]

# Step 1: Clean and normalize records
def normalize_record(rec):
    return _.assign({}, rec, {
        "full_name": _.start_case(rec["full_name"].strip().lower()),
        "department": _.start_case(rec["department"]),
    })

# Step 2: Build the report pipeline
report = (
    _.chain(RAW_DATA)
    # Normalize names and departments
    .map_(normalize_record)
    # Active employees only
    .filter_(lambda e: e["active"])
    # Group by department
    .group_by("department")
    # Compute stats per department
    .map_values(lambda members: {
        "headcount": len(members),
        "avg_salary": round(sum(m["annual_salary"] for m in members) / len(members)),
        "max_salary": max(m["annual_salary"] for m in members),
        "avg_tenure": round(sum(m["tenure_years"] for m in members) / len(members), 1),
        "top_earner": _.max_by(members, "annual_salary")["full_name"],
    })
    .value()
)

# Step 3: Print the report
print("=" * 54)
print("EMPLOYEE REPORT — ACTIVE STAFF BY DEPARTMENT")
print("=" * 54)
for dept, stats in sorted(report.items()):
    print(f"\n  {dept}")
    print(f"    Headcount  : {stats['headcount']}")
    print(f"    Avg Salary : ${stats['avg_salary']:,}")
    print(f"    Max Salary : ${stats['max_salary']:,}")
    print(f"    Avg Tenure : {stats['avg_tenure']} years")
    print(f"    Top Earner : {stats['top_earner']}")

# Step 4: Global stats
all_active = _.filter_(_.map_(RAW_DATA, normalize_record), lambda e: e["active"])
print(f"\n{'=' * 54}")
print(f"  Total active employees : {len(all_active)}")
print(f"  Company avg salary     : ${round(_.mean(_.map_(all_active, 'annual_salary'))):,}")
print(f"  Highest paid overall   : {_.max_by(all_active, 'annual_salary')['full_name']}")
print("=" * 54)

Output:

======================================================
EMPLOYEE REPORT — ACTIVE STAFF BY DEPARTMENT
======================================================

  Engineering
    Headcount  : 4
    Avg Salary : $98,750
    Max Salary : $110,000
    Avg Tenure : 5.0 years
    Top Earner : Carol Ng

  Hr
    Headcount  : 1
    Avg Salary : $65,000
    Max Salary : $65,000
    Avg Tenure : 5.0 years
    Top Earner : Frank Lee

  Marketing
    Headcount  : 1
    Avg Salary : $72,000
    Max Salary : $72,000
    Avg Tenure : 2.0 years
    Top Earner : Bob Smith

======================================================
  Total active employees : 6
  Company avg salary     : $90,333
  Highest paid overall   : Carol Ng
======================================================

The pipeline reads as a clear sequence of intentions: normalize, filter, group, aggregate. To add a new transformation — say, flagging departments with average tenure under 2 years — you add one .map_values() step to the chain. No refactoring, no new loop variables, no off-by-one concerns.

Frequently Asked Questions

How does Pydash compare to toolz or cytoolz?

Toolz and cytoolz (the Cython-accelerated version) focus on purely functional composition — they’re excellent for pipeline-heavy code with large datasets and prioritize performance. Pydash covers a broader surface area including string utilities, nested dict access, and a chaining API, and is more beginner-friendly because its function names mirror Lodash. For data pipelines that process millions of records, toolz may be faster; for everyday JSON and dict manipulation, Pydash’s ergonomics usually win.

What path string formats does _.get() support?

Pydash’s _.get() accepts dot-notation for nested dicts ("user.address.city"), bracket notation for list indices ("scores[0]"), and combinations of both ("users[2].profile.name"). If a key itself contains a dot (rare but possible), you can pass a list of key segments instead: _.get(data, ["key.with.dot", "nested"]). This covers virtually all real-world JSON structures.

Does Pydash mutate the original data?

Most Pydash functions return new objects and do not mutate their inputs — _.filter_(), _.map_(), _.pick(), _.omit(), and so on. The exceptions are functions that explicitly write: _.set_() and _.assign() mutate their first argument by design. If you need immutable behavior from these, pass a copy: _.set_(dict(original), "key", value). The trailing underscore on function names does not indicate mutation — it’s just used to avoid shadowing Python builtins like filter and map.

Is Pydash performant enough for production?

Pydash is pure Python, so it will be slower than numpy, pandas, or C-extension libraries for large-scale data processing. For typical web application work — processing API responses, transforming configuration data, building report summaries — performance is more than adequate. The library’s functions are implemented straightforwardly and don’t introduce significant overhead over vanilla Python. If you’re processing millions of records in a tight loop, profile first; for everything else, developer productivity gains from clean Pydash code usually outweigh the marginal speed difference.

Is the _.chain() API truly lazy?

Yes — _.chain() creates a wrapper that accumulates operations without executing them. No iteration happens until you call .value(). This means you can build a chain object, conditionally add steps based on runtime conditions, and then execute it — only one pass through the data occurs at the end. It also means a bug in a late chain step won’t be revealed until .value() is called, so testing individual chain steps in isolation during development is good practice.

Conclusion

Pydash fills a genuine gap in Python’s utility landscape. We covered the most useful parts of the library: safe nested dict access with _.get() and _.set_(), targeted key selection with _.pick() and _.omit(), list operations including _.chunk(), _.flatten(), _.group_by(), and _.uniq_by(), string case converters, functional tools like _.curry() and _.flow(), and the full method chaining API with _.chain().

The best way to extend the real-life example is to connect it to a real data source — a CSV export, a REST API response, or a database query result. Try replacing the RAW_DATA list with records from a requests.get() call to jsonplaceholder.typicode.com/users and applying the same pipeline to normalize and summarize that data. The chain won’t change; only the source does.

For the full API reference, including the 150+ functions not covered here, see the official Pydash documentation.