Beginner
The need to print when you program is of course one of the most important, and probably the very first things you ever did! This is your full guide on how to print for both python 2 and python 3).
The quickest and simplest scenario on how to print is to simply write the following:
print("Hello World")

However, there are many other variations of printing that comes up when you are coding in python. These could be printing json files, printing without a new line, printing to a log file, printing formatted text, and many more. Find below what you’re looking for in this one stop guide to printing!
Printing without a new line
When you normally use the print(“abc”) construct it still adds a new line character. In order to print without a new line use the end parameter.
print("Hello World", end="")
Normally when printing:

See example with the print parameter:

Printing Text together
When printing items, there are often times you need to print text write next to each other, or you need concatenate text together. Concatenating text in Python is simple and can be done in several ways.
Note that in the below approach that for the method 2, there is no space between the text which is why method 3 helps to solve this problem.
text1 = 'shoe'
text2 = 'laces'
print("Method 1:", text1, text2)
print("Method 2:", text1 + text2)
print("Method 3:", text1 + ' ' + text2)
print("Method 4:", "%s %s" % ( text1, text2) )

Formatting numeric output when printing
When printing, often it’s needed to format the print output. Here’s a list of formatting scenarios.
Printing a number with a string
When printing a number, it is typically simple to do with the following statement:
counter = 5
print(counter)

The problem arises when you want to print text along the same line. You will typically get this error TypeError: unsupported operand type(s) for +: ‘int’ and ‘str’ . For example for the following:

The trick is that you can always concatenate two strings together. Hence, you simply need to convert the int (short for integer, or whole number) into a str (string).
counter = 5
print( str(counter) + ' apples' )

Padding zeros when printing numbers
When printing numbers, often you need to pad with zeros. There are multiple ways to do this, but one of the easier ways is to first convert the number to a string, and then use the zfill function of the string where you can specify how long the number should be.
#this prints the number 10 with up to 8 padded zeros
counter = 10
str(counter).zfill(8)
A more advanced example follows where we’re printing 7 numbers. Notice that for the last number where the number is more than 8 digits, that there are no padded zeros.
counter = 2
for x in range(1, 7):
print( str(counter).zfill(8) )
counter = counter * counter

Another method to pad zeros is the following method to use the format function where a zero is placed in front of the number of digits. Here the “08” refers to padding zeros for 8 digits
counter = 2
for x in range(1, 7):
print( format(counter, '08'))
counter = counter * counter

Printing with text alignment
The following can be used when you want to print a table of contents where the structure “{:>nn”.format(‘text to format’) is used. nn is the number of letters to pad.
'{:>15}'.format('text')
Without any alignment:

With alignment:

Printing complex data structures in readable format
One of the great things about python is that you can put together complex data structures fairly easily. This could be a dictionary where each dictionary item is a list. However, to print this out normally is quite difficult to read. This is where pretty print comes in. Suppose you have the following structure:

Within python, this is represented as a dictionary where the main items “furniture” and “appliances” have the sub-items. So if the data structure is “listitems”, then the data coudl be represented as follows:
listitems ={ 'furniture':[ 'desk', 'chair', 'sofa'], 'appliances':['tv', 'lamp', 'hifi']}
With this in mind, then printing of this data would be as follows:
listitems ={ 'furniture':[ 'desk', 'chair', 'sofa'], 'appliances':['tv', 'lamp', 'hifi']}
print(listitems)

This is where the import library pprint comes in. You can simply use this to print out the output in a more readable fashion. There are two important parameters though. You should use the indent parameter to specify how much space there is per element, and then width to ensure that limited items are put on a single two. If you put a width of 1 character, then that’ll ensure only show one element at most (so if a element has more than 1 character it’s ok, but you cannot include a second element in there as you’re already the 1 width limit).
import pprint;
listitems ={ 'furniture':[ 'desk', 'chair', 'sofa'], 'appliances':['tv', 'lamp', 'hifi']}
pprint.pprint(listitems, indent=1, width=1)

Printing time
Printing time is another important item that you tend to do often in case you want to monitor performance or perhaps to give an update that your long operation is still running.
Print the time
First lets simply print the current date and time
import datetime
print(datetime.datetime.now())

This date time can be easily formatted using the special function from the date object “strftime”. With strftime you can convert the format of the time quite easily to a specified format of hours, mins, seconds and date, with or without the timezone information
import datetime
currentTime = datetime.datetime.now()
print(currentTime.strftime("%Y-%m-%d"))
print(currentTime.strftime("%Y-%m-%d %H:%M:%S"))
print(currentTime.strftime("%Y-%m-%d %H:%M:%S %Z%z"))

As you can guess the Y=year, m = month, d = year, H = hour, M=minutes, S = seconds. Z = timezone. You’ll notice that for the 3rd print item the timezone is blank. We’ll address that in the next section.
Print the time in the correct timezone
However, if you are using a remote machine, or a virtual machine where your local timezone is not set, you may want to chose your own timezone. Also, if you are running services which are across different machines, it is important to make sure you use the right timezone. One simple way is to use universal time (UTC), or to simply to set a single timezone. You can then convert as required.
import datetime
import pytz #include the timezone module
currentTime = datetime.datetime.now( pytz.timezone('UTC') )
print(currentTime.strftime("%Y-%m-%d"))
print(currentTime.strftime("%Y-%m-%d %H:%M:%S"))
print(currentTime.strftime("%Y-%m-%d %H:%M:%S %Z%z"))

Please note in the above example that when the timezone format was shown it showed that the timezone was set to UTC+0000 unlike the previous example. This means that the timezone information was present there.
In the following code, we will first get the time in the UTC timezone, and then convert the time to Hong Kong timezone.
import datetime
import pytz
currentTime = datetime.datetime.now( pytz.timezone('UTC') )
print("Time 1 (UTC time):", currentTime.strftime("%Y-%m-%d %H:%M:%S %Z%z"))
now_local = currentTime.astimezone(pytz.timezone('Asia/Hong_Kong'))
print("Time 2a(HK time) :", now_local.strftime("%Y-%m-%d %H:%M:%S %Z%z"))
print("Time 2b(HK time) :", now_local.strftime("%Y-%m-%d %H:%M:%S "))

Please note that in the “Time 2a” output, you can see the Hong Kong time as 2am with the timezone indicator at the end of +8 hours. The final “Time 2b” is the same time without the timezone included.
Finally, you can get a list of all the timezones available with a quick check on the pytz module and checking “all_timezones”.
import pytz
for tz in pytz.all_timezones:
print(tz)

How to print an exception
Things will go wrong in your code all the time – especially things that you don’t expect. This is where exceptions come in where the try exception blocks fit in quite nicely. The tricky part is that you need to make sure you output what the exception is in order for you to understand what’s going on.
Firstly a quick example of where a try /except can be helpful. Suppose you had the following code where after the definition of the function, the function was called.
def badFunction():
print(a) #print an undefined function
badFunction()#call the functionprint("have a nice day")

In here, as the variable “a” was not defined, then the program terminated and the final line “have a nice day” was never printed.
This is where try/except blocks can come in where you can catch errors from uncertain actions. So you can wrap the “badfunction” in a try block. See following example:
def badFunction():
print(a)
#Try the unsafe code
try:
badFunction()
except NameError:
print("Variable x is not defined")
except:
print("Something else went wrong")
print("have a nice day")

Here, the program continued to run gracefully and it caught the exception with the error “Variable x is not defined”. The reason it was caught was due to “NameError” exception object being defined.
In this next example, we have put a different error. Now the variable is defined as a number but there will be an exception as the number will be concatenated to a string.
def badFunction():
a = 1
print(a + ' join str') #this will fail as joining a string with a number
#try the unsafe code
try:
badFunction()
except NameError:
print("Variable x is not defined")
except:
print("Something else went wrong")
print("have a nice day")

Here another exception was caught but with the generic message of “Something else went wrong”. This is where printing the actual exception is really important. This is where you can define the exception object and print out the error.
ef badFunction():
a = 1
print(a + ' join str')
try:
badFunction()
except NameError:
print("Variable x is not defined")
except Exception as e:
print(e) #print the exception object print("have a nice day")

Here you can see that the reason for the failure was included, and the program continued to run.
There’s a final improvement we can make which is to include where the problem occurred. This is really important where you have logging defined and you can see where the issue was caused.
import traceback
def badFunction():
a = 1
print(a + ' join str')
#run the unsafe code
try:
badFunction()
except NameError:
print("Variable x is not defined")
except Exception as e:
print(e)
traceback.print_tb(e.__traceback__) #show the call list
print("have a nice day")

Here you can see the error description “Unsupported operand type(s) for +”, and then also where the error occurred from the initial call on line 8 with the call to “badFunction()” and the actual offending line of line 5.
Many more printing on python
There’s many more ways to print outputs within python, however this was intended to be a simple resource for some of the common printing challenges that come up, how you can use them, and with a simple example to get you up to speed very quickly with usable code. More to come!
Subscribe to our newsletter
How To Use Python Joblib for Parallel Computing and Caching
Intermediate
You have a data processing loop that runs one item at a time — checking each file, scoring each user, training each model configuration. Your machine has eight cores and only one of them is working. The loop that takes twenty minutes could finish in three if you could just split the work across all available processors.
Joblib is a Python library that makes parallel computing and result caching easy to add to existing code. Its Parallel and delayed utilities turn a regular Python loop into a parallel job with one wrapper. Its Memory class caches function results to disk so that the second call with the same arguments returns instantly. Install it with pip install joblib. Scikit-learn uses Joblib internally for its own parallelism, so if you have scikit-learn installed, Joblib is already there.
This article covers parallelising loops with Parallel and delayed, choosing the right backend (loky, threading, multiprocessing), caching expensive computations with Memory, integrating with scikit-learn pipelines, and diagnosing performance with verbosity settings. By the end you will have both parallel execution and disk caching working in a realistic data pipeline.
Joblib Parallel: Quick Example
The quickest way to see Joblib’s effect is to replace a for loop with a Parallel call. The structure is almost identical — the main change is wrapping the function call with delayed().
# quick_joblib.py
import time
from joblib import Parallel, delayed
def slow_square(n: int) -> int:
"""Simulate a slow computation."""
time.sleep(0.5)
return n * n
numbers = list(range(8))
# Sequential -- takes 8 * 0.5 = 4 seconds
start = time.perf_counter()
sequential = [slow_square(n) for n in numbers]
seq_time = time.perf_counter() - start
print(f"Sequential: {sequential} in {seq_time:.2f}s")
# Parallel -- uses all available CPU cores
start = time.perf_counter()
parallel = Parallel(n_jobs=-1)(delayed(slow_square)(n) for n in numbers)
par_time = time.perf_counter() - start
print(f"Parallel: {parallel} in {par_time:.2f}s")
print(f"Speedup: {seq_time / par_time:.1f}x")
Output (on an 8-core machine):
Sequential: [0, 1, 4, 9, 16, 25, 36, 49] in 4.01s
Parallel: [0, 1, 4, 9, 16, 25, 36, 49] in 0.56s
Speedup: 7.2x
The n_jobs=-1 argument tells Joblib to use all available CPU cores. n_jobs=4 would use exactly four. The delayed(func)(args) pattern creates a lazy description of the function call without executing it — Joblib collects these descriptions and distributes them across workers. The return values are collected in the same order as the input, so parallel[3] is always the result of slow_square(3) regardless of which worker finished first.
What Is Joblib and When Should You Use It?
Joblib provides two things: easy parallelism through a process pool, and persistent disk caching of function results. These two features are independent — you can use either without the other. The parallelism is built on top of the loky process pool by default (a robust reimplementation of multiprocessing.Pool) with fallback to Python’s threading or the original multiprocessing pool.
| Tool | Best for | Overhead |
|---|---|---|
| Joblib Parallel (loky) | CPU-bound tasks, data processing | ~100ms startup |
| Joblib Parallel (threading) | IO-bound tasks, numpy releases GIL | ~5ms startup |
| concurrent.futures | Simple async IO, process pools | ~50ms startup |
| multiprocessing.Pool | CPU-bound, full control needed | ~100ms startup |
| asyncio | High-concurrency network IO | Near zero |
Joblib excels when your loop body is CPU-bound (model training, file parsing, image processing) and each iteration takes at least a few milliseconds — enough to justify the inter-process communication cost. For very fast operations (microsecond loops), parallelism overhead outweighs the benefit. The caching feature is valuable for any function with expensive deterministic computations: feature extraction, data loading, hyperparameter search.
Choosing the Right Backend
Joblib supports three execution backends, each suited to different workloads. Understanding when to use each prevents a common trap: the default process-based backend actually slows down IO-bound work because of serialisation overhead.
# backends.py
import time
import numpy as np
from joblib import Parallel, delayed
def cpu_task(size: int) -> float:
"""CPU-bound: pure Python computation."""
data = list(range(size))
return sum(x * x for x in data) / len(data)
def numpy_task(size: int) -> float:
"""Numpy releases the GIL -- threading backend works well here."""
arr = np.random.rand(size)
return float(np.sqrt(np.sum(arr ** 2)))
items = [100_000] * 8
# Default loky backend (separate processes, best for pure Python CPU work)
start = time.perf_counter()
results_loky = Parallel(n_jobs=4, backend="loky")(
delayed(cpu_task)(n) for n in items
)
print(f"loky (CPU work): {time.perf_counter() - start:.2f}s")
# Threading backend (shares memory, good when GIL is released by C extensions)
start = time.perf_counter()
results_thread = Parallel(n_jobs=4, backend="threading")(
delayed(numpy_task)(n) for n in items
)
print(f"threading (NumPy): {time.perf_counter() - start:.2f}s")
# Sequential for comparison
start = time.perf_counter()
results_seq = [numpy_task(n) for n in items]
print(f"sequential: {time.perf_counter() - start:.2f}s")
Output:
loky (CPU work): 0.48s
threading (NumPy): 0.31s
sequential: 1.12s
The loky backend spawns separate Python processes, each with their own memory space and GIL. This is the right choice for pure Python CPU work because it truly runs in parallel. The threading backend runs in threads within the same process. Because Python’s GIL prevents true parallel execution of pure Python code, threading only helps when the task calls into a C extension that releases the GIL — like NumPy, Pandas, or scikit-learn. The multiprocessing backend is the original process pool; prefer loky unless you have a specific compatibility reason to use it.
Caching Expensive Results with Memory
Joblib’s Memory class caches a function’s return value to disk, keyed by the function’s source code and its arguments. The second call with the same arguments reads from the cache instead of recomputing. This is useful for data loading, feature extraction, or any expensive deterministic step that you run repeatedly during development.
# caching.py
import time
import numpy as np
from joblib import Memory
# Create a cache directory
cache = Memory("./joblib_cache", verbose=1)
@cache.cache
def load_and_process(filepath: str, scale: float = 1.0) -> np.ndarray:
"""Simulate expensive data loading and processing."""
print(f" [COMPUTING] Loading {filepath} with scale={scale}")
time.sleep(2) # Simulate a 2-second load
data = np.random.rand(1000) * scale
return data
print("First call (cold cache):")
start = time.perf_counter()
result1 = load_and_process("data/features.npy", scale=2.0)
print(f" Took: {time.perf_counter() - start:.2f}s, mean={result1.mean():.4f}")
print("\nSecond call (cache hit):")
start = time.perf_counter()
result2 = load_and_process("data/features.npy", scale=2.0)
print(f" Took: {time.perf_counter() - start:.4f}s, mean={result2.mean():.4f}")
print("\nDifferent args (cache miss):")
start = time.perf_counter()
result3 = load_and_process("data/features.npy", scale=3.0)
print(f" Took: {time.perf_counter() - start:.2f}s, mean={result3.mean():.4f}")
Output:
First call (cold cache):
[COMPUTING] Loading data/features.npy with scale=2.0
Took: 2.01s, mean=0.9987
Second call (cache hit):
Took: 0.0031s, mean=0.9987
Different args (cache miss):
[COMPUTING] Loading data/features.npy with scale=3.0
Took: 2.01s, mean=1.4991
The cache is stored as compressed pickle files in the directory you specify. It is keyed on the function’s source code hash and all arguments — if you change the function body, Joblib invalidates the cache automatically on the next call. To clear the cache manually, call cache.clear() or delete the cache directory. The verbose=1 argument makes Joblib print whether it computed or loaded from cache; set it to 0 to silence this output in production.
Joblib with scikit-learn Pipelines
Scikit-learn uses Joblib internally for all its n_jobs parameters — cross-validation, grid search, random forests, and more all use the same Joblib infrastructure. You can control the backend and number of jobs globally using Joblib’s parallel_backend context manager, or pass n_jobs directly to estimators.
# sklearn_parallel.py
import time
import numpy as np
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score, GridSearchCV
from joblib import parallel_backend
# Generate a sample dataset
X, y = make_classification(n_samples=5000, n_features=20, random_state=42)
# Train a random forest using all CPU cores
print("Training RandomForest with n_jobs=-1...")
start = time.perf_counter()
rf = RandomForestClassifier(n_estimators=100, n_jobs=-1, random_state=42)
rf.fit(X, y)
print(f" Fit time: {time.perf_counter() - start:.2f}s")
# Cross-validation in parallel
start = time.perf_counter()
scores = cross_val_score(rf, X, y, cv=5, n_jobs=-1, scoring="accuracy")
print(f" CV scores: {scores.round(3)}, mean={scores.mean():.3f}, time={time.perf_counter() - start:.2f}s")
# Hyperparameter search -- each combo evaluated in parallel
param_grid = {
"n_estimators": [50, 100],
"max_depth": [5, 10, None],
}
start = time.perf_counter()
grid = GridSearchCV(
RandomForestClassifier(random_state=42),
param_grid,
cv=3,
n_jobs=-1,
verbose=0,
)
grid.fit(X, y)
elapsed = time.perf_counter() - start
print(f" Best params: {grid.best_params_}, score={grid.best_score_:.3f}, time={elapsed:.2f}s")
Output:
Training RandomForest with n_jobs=-1...
Fit time: 0.31s
CV scores: [0.934 0.931 0.929 0.927 0.932], mean=0.931, time=0.48s
Best params: {'max_depth': None, 'n_estimators': 100}, score=0.931, time=1.24s
The n_jobs=-1 parameter on scikit-learn estimators and model-selection utilities goes directly to Joblib. Setting it uses all available cores for that operation. For nested parallelism (a parallel grid search that itself trains parallel random forests), Joblib automatically avoids over-subscribing the CPU — the inner jobs run sequentially when the outer jobs already fill all cores.
Real-Life Example: Parallel Feature Extraction Pipeline
The following pipeline processes a directory of text files, extracts word-frequency features from each, and caches the results. Combining Parallel with Memory gives you both speed and resilience — if the pipeline is interrupted, the cached results mean you do not repeat work already done.
# feature_pipeline.py
import os
import time
import re
from collections import Counter
from pathlib import Path
from joblib import Parallel, delayed, Memory
cache = Memory("./feature_cache", verbose=0)
# --- Create sample text files ---
SAMPLE_DIR = Path("sample_texts")
SAMPLE_DIR.mkdir(exist_ok=True)
sample_texts = {
"python.txt": "Python is a high-level programming language. Python emphasises readability.",
"data.txt": "Data science uses statistics and programming. Data analysis reveals patterns.",
"web.txt": "Web development creates websites and applications. The web uses HTML CSS JavaScript.",
"ai.txt": "Artificial intelligence mimics human thinking. Machine learning trains models on data.",
"cloud.txt": "Cloud computing provides on-demand resources. Cloud services scale automatically.",
}
for fname, text in sample_texts.items():
(SAMPLE_DIR / fname).write_text(text * 50) # Make files large enough to matter
@cache.cache
def extract_features(filepath: str) -> dict:
"""Extract word frequency features from a text file (cached)."""
text = Path(filepath).read_text().lower()
words = re.findall(r'\b[a-z]{3,}\b', text)
top_words = dict(Counter(words).most_common(10))
time.sleep(0.3) # Simulate expensive NLP processing
return {"file": Path(filepath).name, "word_count": len(words), "top_words": top_words}
def run_pipeline(data_dir: Path) -> list[dict]:
files = [str(f) for f in data_dir.glob("*.txt")]
print(f"Processing {len(files)} files in parallel...")
start = time.perf_counter()
results = Parallel(n_jobs=-1, verbose=10)(
delayed(extract_features)(f) for f in files
)
elapsed = time.perf_counter() - start
print(f"Done in {elapsed:.2f}s")
return results
features = run_pipeline(SAMPLE_DIR)
for feat in features:
top3 = list(feat["top_words"].keys())[:3]
print(f" {feat['file']:15s} words={feat['word_count']:,} top={top3}")
Output (first run — cold cache):
Processing 5 files in parallel...
[Parallel(n_jobs=-1)]: Done 5 out of 5 | elapsed: 0.4s finished
Done in 0.41s
python.txt words=350 top=['python', 'language', 'high']
data.txt words=350 top=['data', 'science', 'analysis']
web.txt words=350 top=['web', 'development', 'html']
ai.txt words=350 top=['learning', 'machine', 'data']
cloud.txt words=350 top=['cloud', 'computing', 'services']
s from a file or a database, the cache becomes stale when that data changes. You are responsible for clearing the cache when upstream data changes, either by calling memory.clear(), by passing a version argument to the function, or by using a time-based expiry implemented in the function body.
How do I track progress in a long Parallel job?
Set verbose=10 (the maximum) in Parallel() to print a status line after each completed job, including elapsed time, estimated remaining time, and memory usage. For a progress bar, use the tqdm library: wrap the generator with tqdm(delayed(func)(x) for x in items, total=len(items)) -- Joblib will pull items from the tqdm-wrapped iterator and tqdm updates the bar as items are consumed.
Are there memory issues with Joblib on long-running jobs?
When using the loky backend with large return values, worker memory can accumulate if workers are reused across many batches. Set max_nbytes="10M" in Parallel() to use memory-mapped files for return values above 10 MB instead of pickle serialisation. To prevent worker memory from growing across restarts, set Parallel(n_jobs=4, max_nbytes=None) combined with periodic worker recycling using loky.get_reusable_executor(max_workers=4, reuse="kill_workers").
Conclusion
Joblib makes two of the most common performance problems in data pipelines trivially easy to solve: parallelising embarrassingly parallel loops with Parallel and delayed, and caching expensive deterministic computations with Memory. You have seen how to replace a for loop with a parallel equivalent in four lines, choose the right backend for CPU-bound versus IO-bound work, cache results to disk, and integrate both patterns with scikit-learn.
The natural extension of the feature extraction pipeline is to add a cache validation step that checks file modification timestamps, and to feed the extracted features directly into a scikit-learn pipeline with n_jobs=-1 cross-validation -- so both the feature extraction and the model evaluation run in parallel with full caching.
For the full Joblib reference including memory-mapped arrays, batch processing, and custom backends, see the official Joblib documentation.
Related Articles
Further Reading: For more details, see the Python print() function documentation.
Frequently Asked Questions
What does \n do in Python print statements?
The \n escape sequence creates a newline character, causing text after it to appear on the next line. For example, print('Hello\nWorld') outputs ‘Hello’ and ‘World’ on separate lines.
How do I print multiple lines without using \n?
You can use triple-quoted strings (''' or """) to write multi-line text directly, or call print() multiple times. The textwrap.dedent() function also helps format multi-line strings cleanly.
What is a format exception in Python?
A format exception (typically a ValueError) occurs when a format string and its arguments do not match. For example, using the wrong number of placeholders in str.format() or mismatched types in f-strings.
How do I use f-strings for text formatting in Python?
F-strings (formatted string literals) use the syntax f'text {variable}' and were introduced in Python 3.6. They allow you to embed expressions directly inside string literals for readable, efficient formatting.
What is the difference between print() and sys.stdout.write()?
print() adds a newline by default and accepts multiple arguments with separators. sys.stdout.write() writes raw text without any automatic newline, giving you more control over output formatting.