Intermediate
Putting parameters in configuration files can take some extra effort at the start, but then can save you a lot of time and heartache in the future. We are all tempted to simply hardcode parameters directly into our code as we save precious time when we write code, but then doing this properly can take extra effort. Some of us at least create constants or store parameters in a variable, while others store them in a class variable to keep this even cleaner. Arguably the best option is store these in a configuration file. In this article you’ll learn the steps compulsory to use configuration files in python 3. It will be strictly according to the official documentation of python 3.
ConfigParser is the class used to implement configuration files in python 3. The main function of using these files is to write python programs which can easily be modified by end users easily. The main aspect of this article is to know about the complete implementation of configuration files. We will cover the three main aspects in this article which are Setup, File format and Basic API.
Introduction to Python 3 Configuration Files
Configuration files can play a vital role in any program and its management. One of the popular approaches to separate code from configuration is to store these files in YAML, JSON or INI and not in .py format. One reason that .py files are not used is that Python 3 can be slower when it comes to reloading. You would need to restart the whole program if you stored your config in a python .py file. Also, the end user can modify the code at will if it is in .py format. Configuration files make it easier to modify or change the code. The data stored in configuration is to have separation so that the programmer can focus on code development and ensure that is clean as possible and the user only needs to touch the configuration file.
Setup of Python 3 ConfigParser
The class used to create configuration files is ConfigParser. This is a part of the standard python 3 library so no need to do any pip installation. We have to import it: “import configparser” to use it or there is another way of using it, it will work in both python2 and python 3, which is:
import configparser
File Format of configuration file
One convention that is used for the file format is to use the extension .ini (short for initial or initiation) but you can use the configuration based on your own or on clients preferences. There are different parts of configuration files.
- A configuration file consists of one or more sections.
- The section names are written in these delimiters [section name].
- The concept is similar to mapping. It consists of key-value pairs meaning there is a name of the configuration item (“key”) and the other the actual value of the configuration (“value”)
- Two operators are used to initialize or separate key-value pair assignment operator (=) or colon operator (:).
- You can even put in a comment using the # or ; prefix.
Example:
[default]
host = 192.168.1.1
port = 31
username = admin
password = admin
[database]
#database related configuration files
port = 22
forwardx11 = no
name = db_test
In the above configuration file example, we have two sections first is [default] and second is [database]. Each section has its own key-value pairs/entries like username = admin and name = db_test. So all of the key-value pairs belong to a given section, so it is easier to organise your configuration files. Finally the sentence with a prefix of # is for commenting
Reading the configuration file from python code
Now, we will talk about the method to read from the config file. As mentioned earlier, ConfigParser is the module/class used to create configuration files. First, ConfigParser object has to be initialized: config = configparser.ConfigParser(); The following are functions:
Initialization of ConfigParser
You can can initiate the configuration file with the following syntax. Here the variable “config” will contain all the values
config = configparser.ConfigParser()
Write to a Configuration file with ConfigParser
Although normally you normally edit to a configuration file in a text editor by hand, there are times where you want to programmatically write to a config file. For example, this could be to create a default config file which a user can then use as a basis to change or edit. You may also want to over-ride a config entry (after confirming with the user) that is erroneous.
Once the object is initialised, we can now write in it. There are ways through which we can initialize the section to write in the config file. We are going use the example mentioned above in file format. Let’s initialize the default section using dictionary.
Example:
config['default'] = {
"host" : "192.168.1.1",
"port" : "22",
"username" : "username",
"password" : "password"
}
Here, “default” is the name of the section (the part in the actual configuration file that had the square “[” and “]” brackets) and curly braces denote the start and end of a dictionary. Inside the dictionary are key-value pairs i.e. “host” is the key and “192.168.1.1” is the value separated by colon “:”
Now, let’s initialize the database section using empty dictionary and add the key-value pairs line by line.
Example:
config['database'] = {}
config['database']['port'] = "22"
config['database']['forwardx11'] = "no"
config['database']['name'] = "db_test"
Here, “database” is the name of the section and curly braces denote the same start and end of a dictionary. In this case, the dictionary is empty. Key-value pairs i.e. “port” is the key and “22” is the value separated by colon “=.” This method provides a lot more flexibility.
Here’s the full code so far:
import configparser
config = configparser.ConfigParser()
config['default'] = {
"host" : "192.168.1.1",
"port" : "22",
"username" : "username",
"password" : "password"
}
config['database'] = {}
config['database']['port'] = "22"
config['database']['forwardx11'] = "no"
config['database']['name'] = "db_test"
with open('test.ini', 'w') as configfile:
config.write(configfile);
After initializing the sections in config, you can now write it to a config file:
with open('test.ini', 'w') as configfile:
config.write(configfile);
Now, you will be able to see the file named test.ini created.
Read config from the config file using ConfigParser
The next step is to read the file which you just have created.
- The config file can be read by using read() method: config.read(‘test.ini’). This will read the test.ini file which you just created.
- If you want to print just the sections available in configuration file, method sections() can be used: config.sections().
- Next is getting the value of any key stored in the section. config[‘database’][‘name’]
This will give you the value which is “db_test” of the key called “name” stored in data_base section.
The following code will print out all the values stored against the keys in the default section using a for loop.
for key in config['default']:
print(config['default'][key])
Code:

Output:

Changing the datatype of the configuration value from ConfigParser
The datatype of the object of ConfigParser is string by default. This is fine for most situations, but then suppose you want to get a true/false value instead, or a number value to do maths operations. For this the string default may not work. We can typecast/covert the datatype of the object of configparser or the datatype of keys of section into any other type such as integer, float etc. In order to change the datatype of object, you have to covert it manually or by using getter methods. The best and the preferred way is to use getter methods.
There are three getter methods:
- getint();
- getfloat();
- getboolean();
Example: config['default'].getint('port')
getint() will covert the datatype of port key of section “default” into “integer”. If you use the typeof(); method on port then it will show integer type now.
There is another way of doing it:
Example: config.getboolean('data_base', 'forwardx11')
In this way, config file is invoking the getboolean() method and its takin two parameters as argument. The first is the name of the section and the other is the key whole value’s type will be changed.
What to do if a value is not available from a configfile
A fallback result can also be obtained. Fallback is the result obtained when the key or section we want to get isn’t available.
Example: config.get('default', 'database', fallback='not_database')
In this case, not_database will be returned if the “database” key isn’t available or the section default is not found.
Conclusion
We come to know about the setup i.e. importing the ConfigParser first to create configuration files. Next section was about the file format. There you can check about the basic syntax of creating a configuration file. It consists of sections and key-value pairs.
We played with the data types of keys in default and data_base sections. We can change datatypes using getter methods. Last but not the least, we studied about the basic api like write, read and about fallback.
Using configuration files is not difficult and can save a lot of time. So in your next coding work, take the extra few minutes to create a configuration file instead of hardcoding.
Full Code: ConfigParser Example Code
import configparser
config = configparser.ConfigParser()
#Set up default item for hosts using dictionary
config['default'] = {"host" : "192.168.1.1",
"port" : "22",
"username" : "username",
"password" : "password" }
#setup config item bytes
config['database'] = {}
config['database']['port'] = "22"
config['database']['forwardx11'] = "no"
config['database']['name'] = "db_test"
#Write default file
with open('test.ini', 'w') as configfile:
config.write(configfile)
#Open the file again to try to read it
config.read('test.ini')
#Print the sections
print(config.sections())
print( config['database']['name'] )
#Print each key pair
for key in config['default']:
print(config['default'][key])
#print the type of integer value
print (type (config['default'].getint('port')))
print( config.getboolean('database', 'forwardx11') )
#Print default value
print( config.get('default', 'databaseabc', fallback='not_database') )
Output:

Reference
https://docs.python.org/3/library/configparser.html
Want to see more useful tips?
How To Use Python Joblib for Parallel Computing and Caching
Intermediate
You have a data processing loop that runs one item at a time — checking each file, scoring each user, training each model configuration. Your machine has eight cores and only one of them is working. The loop that takes twenty minutes could finish in three if you could just split the work across all available processors.
Joblib is a Python library that makes parallel computing and result caching easy to add to existing code. Its Parallel and delayed utilities turn a regular Python loop into a parallel job with one wrapper. Its Memory class caches function results to disk so that the second call with the same arguments returns instantly. Install it with pip install joblib. Scikit-learn uses Joblib internally for its own parallelism, so if you have scikit-learn installed, Joblib is already there.
This article covers parallelising loops with Parallel and delayed, choosing the right backend (loky, threading, multiprocessing), caching expensive computations with Memory, integrating with scikit-learn pipelines, and diagnosing performance with verbosity settings. By the end you will have both parallel execution and disk caching working in a realistic data pipeline.
Joblib Parallel: Quick Example
The quickest way to see Joblib’s effect is to replace a for loop with a Parallel call. The structure is almost identical — the main change is wrapping the function call with delayed().
# quick_joblib.py
import time
from joblib import Parallel, delayed
def slow_square(n: int) -> int:
"""Simulate a slow computation."""
time.sleep(0.5)
return n * n
numbers = list(range(8))
# Sequential -- takes 8 * 0.5 = 4 seconds
start = time.perf_counter()
sequential = [slow_square(n) for n in numbers]
seq_time = time.perf_counter() - start
print(f"Sequential: {sequential} in {seq_time:.2f}s")
# Parallel -- uses all available CPU cores
start = time.perf_counter()
parallel = Parallel(n_jobs=-1)(delayed(slow_square)(n) for n in numbers)
par_time = time.perf_counter() - start
print(f"Parallel: {parallel} in {par_time:.2f}s")
print(f"Speedup: {seq_time / par_time:.1f}x")
Output (on an 8-core machine):
Sequential: [0, 1, 4, 9, 16, 25, 36, 49] in 4.01s
Parallel: [0, 1, 4, 9, 16, 25, 36, 49] in 0.56s
Speedup: 7.2x
The n_jobs=-1 argument tells Joblib to use all available CPU cores. n_jobs=4 would use exactly four. The delayed(func)(args) pattern creates a lazy description of the function call without executing it — Joblib collects these descriptions and distributes them across workers. The return values are collected in the same order as the input, so parallel[3] is always the result of slow_square(3) regardless of which worker finished first.
What Is Joblib and When Should You Use It?
Joblib provides two things: easy parallelism through a process pool, and persistent disk caching of function results. These two features are independent — you can use either without the other. The parallelism is built on top of the loky process pool by default (a robust reimplementation of multiprocessing.Pool) with fallback to Python’s threading or the original multiprocessing pool.
| Tool | Best for | Overhead |
|---|---|---|
| Joblib Parallel (loky) | CPU-bound tasks, data processing | ~100ms startup |
| Joblib Parallel (threading) | IO-bound tasks, numpy releases GIL | ~5ms startup |
| concurrent.futures | Simple async IO, process pools | ~50ms startup |
| multiprocessing.Pool | CPU-bound, full control needed | ~100ms startup |
| asyncio | High-concurrency network IO | Near zero |
Joblib excels when your loop body is CPU-bound (model training, file parsing, image processing) and each iteration takes at least a few milliseconds — enough to justify the inter-process communication cost. For very fast operations (microsecond loops), parallelism overhead outweighs the benefit. The caching feature is valuable for any function with expensive deterministic computations: feature extraction, data loading, hyperparameter search.
Choosing the Right Backend
Joblib supports three execution backends, each suited to different workloads. Understanding when to use each prevents a common trap: the default process-based backend actually slows down IO-bound work because of serialisation overhead.
# backends.py
import time
import numpy as np
from joblib import Parallel, delayed
def cpu_task(size: int) -> float:
"""CPU-bound: pure Python computation."""
data = list(range(size))
return sum(x * x for x in data) / len(data)
def numpy_task(size: int) -> float:
"""Numpy releases the GIL -- threading backend works well here."""
arr = np.random.rand(size)
return float(np.sqrt(np.sum(arr ** 2)))
items = [100_000] * 8
# Default loky backend (separate processes, best for pure Python CPU work)
start = time.perf_counter()
results_loky = Parallel(n_jobs=4, backend="loky")(
delayed(cpu_task)(n) for n in items
)
print(f"loky (CPU work): {time.perf_counter() - start:.2f}s")
# Threading backend (shares memory, good when GIL is released by C extensions)
start = time.perf_counter()
results_thread = Parallel(n_jobs=4, backend="threading")(
delayed(numpy_task)(n) for n in items
)
print(f"threading (NumPy): {time.perf_counter() - start:.2f}s")
# Sequential for comparison
start = time.perf_counter()
results_seq = [numpy_task(n) for n in items]
print(f"sequential: {time.perf_counter() - start:.2f}s")
Output:
loky (CPU work): 0.48s
threading (NumPy): 0.31s
sequential: 1.12s
The loky backend spawns separate Python processes, each with their own memory space and GIL. This is the right choice for pure Python CPU work because it truly runs in parallel. The threading backend runs in threads within the same process. Because Python’s GIL prevents true parallel execution of pure Python code, threading only helps when the task calls into a C extension that releases the GIL — like NumPy, Pandas, or scikit-learn. The multiprocessing backend is the original process pool; prefer loky unless you have a specific compatibility reason to use it.
Caching Expensive Results with Memory
Joblib’s Memory class caches a function’s return value to disk, keyed by the function’s source code and its arguments. The second call with the same arguments reads from the cache instead of recomputing. This is useful for data loading, feature extraction, or any expensive deterministic step that you run repeatedly during development.
# caching.py
import time
import numpy as np
from joblib import Memory
# Create a cache directory
cache = Memory("./joblib_cache", verbose=1)
@cache.cache
def load_and_process(filepath: str, scale: float = 1.0) -> np.ndarray:
"""Simulate expensive data loading and processing."""
print(f" [COMPUTING] Loading {filepath} with scale={scale}")
time.sleep(2) # Simulate a 2-second load
data = np.random.rand(1000) * scale
return data
print("First call (cold cache):")
start = time.perf_counter()
result1 = load_and_process("data/features.npy", scale=2.0)
print(f" Took: {time.perf_counter() - start:.2f}s, mean={result1.mean():.4f}")
print("\nSecond call (cache hit):")
start = time.perf_counter()
result2 = load_and_process("data/features.npy", scale=2.0)
print(f" Took: {time.perf_counter() - start:.4f}s, mean={result2.mean():.4f}")
print("\nDifferent args (cache miss):")
start = time.perf_counter()
result3 = load_and_process("data/features.npy", scale=3.0)
print(f" Took: {time.perf_counter() - start:.2f}s, mean={result3.mean():.4f}")
Output:
First call (cold cache):
[COMPUTING] Loading data/features.npy with scale=2.0
Took: 2.01s, mean=0.9987
Second call (cache hit):
Took: 0.0031s, mean=0.9987
Different args (cache miss):
[COMPUTING] Loading data/features.npy with scale=3.0
Took: 2.01s, mean=1.4991
The cache is stored as compressed pickle files in the directory you specify. It is keyed on the function’s source code hash and all arguments — if you change the function body, Joblib invalidates the cache automatically on the next call. To clear the cache manually, call cache.clear() or delete the cache directory. The verbose=1 argument makes Joblib print whether it computed or loaded from cache; set it to 0 to silence this output in production.
Joblib with scikit-learn Pipelines
Scikit-learn uses Joblib internally for all its n_jobs parameters — cross-validation, grid search, random forests, and more all use the same Joblib infrastructure. You can control the backend and number of jobs globally using Joblib’s parallel_backend context manager, or pass n_jobs directly to estimators.
# sklearn_parallel.py
import time
import numpy as np
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score, GridSearchCV
from joblib import parallel_backend
# Generate a sample dataset
X, y = make_classification(n_samples=5000, n_features=20, random_state=42)
# Train a random forest using all CPU cores
print("Training RandomForest with n_jobs=-1...")
start = time.perf_counter()
rf = RandomForestClassifier(n_estimators=100, n_jobs=-1, random_state=42)
rf.fit(X, y)
print(f" Fit time: {time.perf_counter() - start:.2f}s")
# Cross-validation in parallel
start = time.perf_counter()
scores = cross_val_score(rf, X, y, cv=5, n_jobs=-1, scoring="accuracy")
print(f" CV scores: {scores.round(3)}, mean={scores.mean():.3f}, time={time.perf_counter() - start:.2f}s")
# Hyperparameter search -- each combo evaluated in parallel
param_grid = {
"n_estimators": [50, 100],
"max_depth": [5, 10, None],
}
start = time.perf_counter()
grid = GridSearchCV(
RandomForestClassifier(random_state=42),
param_grid,
cv=3,
n_jobs=-1,
verbose=0,
)
grid.fit(X, y)
elapsed = time.perf_counter() - start
print(f" Best params: {grid.best_params_}, score={grid.best_score_:.3f}, time={elapsed:.2f}s")
Output:
Training RandomForest with n_jobs=-1...
Fit time: 0.31s
CV scores: [0.934 0.931 0.929 0.927 0.932], mean=0.931, time=0.48s
Best params: {'max_depth': None, 'n_estimators': 100}, score=0.931, time=1.24s
The n_jobs=-1 parameter on scikit-learn estimators and model-selection utilities goes directly to Joblib. Setting it uses all available cores for that operation. For nested parallelism (a parallel grid search that itself trains parallel random forests), Joblib automatically avoids over-subscribing the CPU — the inner jobs run sequentially when the outer jobs already fill all cores.
Real-Life Example: Parallel Feature Extraction Pipeline
The following pipeline processes a directory of text files, extracts word-frequency features from each, and caches the results. Combining Parallel with Memory gives you both speed and resilience — if the pipeline is interrupted, the cached results mean you do not repeat work already done.
# feature_pipeline.py
import os
import time
import re
from collections import Counter
from pathlib import Path
from joblib import Parallel, delayed, Memory
cache = Memory("./feature_cache", verbose=0)
# --- Create sample text files ---
SAMPLE_DIR = Path("sample_texts")
SAMPLE_DIR.mkdir(exist_ok=True)
sample_texts = {
"python.txt": "Python is a high-level programming language. Python emphasises readability.",
"data.txt": "Data science uses statistics and programming. Data analysis reveals patterns.",
"web.txt": "Web development creates websites and applications. The web uses HTML CSS JavaScript.",
"ai.txt": "Artificial intelligence mimics human thinking. Machine learning trains models on data.",
"cloud.txt": "Cloud computing provides on-demand resources. Cloud services scale automatically.",
}
for fname, text in sample_texts.items():
(SAMPLE_DIR / fname).write_text(text * 50) # Make files large enough to matter
@cache.cache
def extract_features(filepath: str) -> dict:
"""Extract word frequency features from a text file (cached)."""
text = Path(filepath).read_text().lower()
words = re.findall(r'\b[a-z]{3,}\b', text)
top_words = dict(Counter(words).most_common(10))
time.sleep(0.3) # Simulate expensive NLP processing
return {"file": Path(filepath).name, "word_count": len(words), "top_words": top_words}
def run_pipeline(data_dir: Path) -> list[dict]:
files = [str(f) for f in data_dir.glob("*.txt")]
print(f"Processing {len(files)} files in parallel...")
start = time.perf_counter()
results = Parallel(n_jobs=-1, verbose=10)(
delayed(extract_features)(f) for f in files
)
elapsed = time.perf_counter() - start
print(f"Done in {elapsed:.2f}s")
return results
features = run_pipeline(SAMPLE_DIR)
for feat in features:
top3 = list(feat["top_words"].keys())[:3]
print(f" {feat['file']:15s} words={feat['word_count']:,} top={top3}")
Output (first run — cold cache):
Processing 5 files in parallel...
[Parallel(n_jobs=-1)]: Done 5 out of 5 | elapsed: 0.4s finished
Done in 0.41s
python.txt words=350 top=['python', 'language', 'high']
data.txt words=350 top=['data', 'science', 'analysis']
web.txt words=350 top=['web', 'development', 'html']
ai.txt words=350 top=['learning', 'machine', 'data']
cloud.txt words=350 top=['cloud', 'computing', 'services']
s from a file or a database, the cache becomes stale when that data changes. You are responsible for clearing the cache when upstream data changes, either by calling memory.clear(), by passing a version argument to the function, or by using a time-based expiry implemented in the function body.
How do I track progress in a long Parallel job?
Set verbose=10 (the maximum) in Parallel() to print a status line after each completed job, including elapsed time, estimated remaining time, and memory usage. For a progress bar, use the tqdm library: wrap the generator with tqdm(delayed(func)(x) for x in items, total=len(items)) -- Joblib will pull items from the tqdm-wrapped iterator and tqdm updates the bar as items are consumed.
Are there memory issues with Joblib on long-running jobs?
When using the loky backend with large return values, worker memory can accumulate if workers are reused across many batches. Set max_nbytes="10M" in Parallel() to use memory-mapped files for return values above 10 MB instead of pickle serialisation. To prevent worker memory from growing across restarts, set Parallel(n_jobs=4, max_nbytes=None) combined with periodic worker recycling using loky.get_reusable_executor(max_workers=4, reuse="kill_workers").
Conclusion
Joblib makes two of the most common performance problems in data pipelines trivially easy to solve: parallelising embarrassingly parallel loops with Parallel and delayed, and caching expensive deterministic computations with Memory. You have seen how to replace a for loop with a parallel equivalent in four lines, choose the right backend for CPU-bound versus IO-bound work, cache results to disk, and integrate both patterns with scikit-learn.
The natural extension of the feature extraction pipeline is to add a cache validation step that checks file modification timestamps, and to feed the extracted features directly into a scikit-learn pipeline with n_jobs=-1 cross-validation -- so both the feature extraction and the model evaluation run in parallel with full caching.
For the full Joblib reference including memory-mapped arrays, batch processing, and custom backends, see the official Joblib documentation.
Related Articles
Frequently Asked Questions
What is ConfigParser used for in Python?
ConfigParser is a built-in Python module for reading and writing configuration files in INI format. It handles settings organized into sections with key-value pairs, making it easy to store and retrieve application configuration without hardcoding values.
What format does ConfigParser use?
ConfigParser uses the INI file format with sections in square brackets ([section]), followed by key-value pairs using = or : as delimiters. Comments start with # or ;. There is always a [DEFAULT] section for fallback values.
How do I read a config file with ConfigParser?
Create a ConfigParser() instance, call config.read('filename.ini'), then access values with config['section']['key'] or config.get('section', 'key'). Use getint(), getfloat(), or getboolean() for type conversion.
Can ConfigParser handle nested sections?
No, ConfigParser does not support nested sections natively. For nested configuration structures, consider using TOML (tomllib in Python 3.11+), YAML (PyYAML), or JSON configuration files instead.
What is the difference between ConfigParser and JSON for configuration?
ConfigParser uses human-friendly INI format with sections and is ideal for simple settings. JSON supports nested structures and lists but lacks comments. ConfigParser has built-in type conversion methods and a DEFAULT section for fallback values, while JSON requires manual type handling.
Trackbacks/Pingbacks