Generating random numbers in Python is a fairly straightforward activity which can be done in a few lines. There maybe many variations which you need to do ranging from decimal places, random numbers between a start and end number, and many more. We’ll go through many useful examples in this article.
The most basic way to generate random numbers in python is with the random library:
import random
num = random.random()
print( f"Random number between 0.0 and 1.0 ={num}\n")
Output as follows:

You’ll see that each time it is run it has a new random number.
Generating the same random number each time and why this matters
Sometimes, you may want to generate some random numbers, but then be able to generate the same random numbers each time. Now this may sound counter intuitive as the whole point of getting random numbers is so that, well, they are random. One scenario where you would like to regenerate the same random numbers is during testing. You may find some unusual behaviour and this is where you may want to replicate that behaviour for which you’l l need the same input. This is where you’d want to generate the same random number and you can do that in python using the seed function from the random library.
The idea behind the seed function is that you can think of it as a specific key which can be used to generate a series of random numbers which stems from a given key. Use a different seed and you’ll generate a different set of random numbers.
See the following example code which generates a random number between 1 and 0:
import random
random.seed(1)
for i in range(1,5):
num = random.random()
print( f"Random number between 0.0 and 1.0 ={num}\n")
Output as follows:

No matter how many times it is run, since the seed is the same each time, it generates the same numbers.
Python Random Number Between 1 and 10
Now that we know how to generate random numbers, how do you do it between two numbers? This is easily done in with either randint() for whole numbers or with uniform() for decimal numbers.
import random
num_int = random.randint(1,10)
print( f"Random whole number between 1 and 10 ={num_int}\n")
num_uni = random.uniform(1,10)
print( f"Random decimal number between 1 and 10 ={num_uni}\n")

Python Generate Random Numbers From A Range
Suppose you needed to generate random numbers from a range of data whether that be numbers, names or even a pack of cards. This can be done through selecting the random element in an array by choosing the index randomly. For example, if you had an array of 5 items, then you can randomly chose and index from 0 to 4 (where 0 is the index of the first item).
There is another and shorter way in python which is to use the random.choice() function. If you pass it an array, it will then randomly return one of the elements.
Here’s an example to randomly select a name from a list with both using the index (to show you how it works), and the much most efficient random.choice() library function:
import random
###### Selecing numbers from a range
names_list = [ "Judy", "Harry", "Sarah", "Tom", "Gloria"]
rand_index = random.randint( 0, len(names_list)-1 )
print( f"Randomly selected person 1 is = { names_list[ rand_index] }\n")
print( f"Randomly selected person 2 is = { random.choice( names_list) }\n")
And the output is different each time:

Generate Random String Of Length n in Python
If you want to generate a specific length string (e.g. to generate a password), both the random and the string libraries can come in handy where you can use it to create an easy password generator as follows:
import random, string
###### Create a random password
def generate_password( pass_len=10):
password = ""
for i in range(1,pass_len+1):
password = password + random.choice( string.ascii_letters + string.punctuation )
return password
print( f"Password generated = [{ generate_password(10) }] ")
This will output a new password each time between square brackets:

If there are specific characters you want to include or exclude, you can simply replace the string.punctuation with your own list/array of specific characters to be included
Random Choice Without Replacement In Python
Suppose you wanted to randomly select items from a list without repeating any items. For example, you have a list of students and you have to select them in a random order to go first in a specific activity. In many programming languages you may need to generate a random list and remember the previously selected items to prevent any repeated selections. In the random library, there is a function called random.sample() that will do all that for you:
import random
#### Select unique random elements
students = ["John", "Tom", "Paul", "Sarah", "July", "Rachel"]
random_order = random.sample( students, 6)
print(random_order)
This will generate a unique list without repeating any selections:

Sign up to the email list and get articles straight to your inbox. Plus get our free python one liner list!
” list=”237850″ redirect=”https://pythonhowtoprogram.com/thank-you-for-subscribing/” check_last_name=”off” layout=”top_bottom” first_name_fullwidth=”off” email_fullwidth=”off” _builder_version=”4.17.4″ _module_preset=”default” body_font=”|700|||||||” body_line_height=”1em” result_message_font=”|700|||||||” body_ul_line_height=”0.1em” custom_button=”on” button_bg_color=”#0C71C3″ button_border_color=”#FFFFFF” button_border_radius=”20px” button_letter_spacing=”0px” button_font=”|800|||||||” button_use_icon=”off” button_custom_margin=”0px||||false|false” button_custom_padding=”1px|1px|1px|1px|false|false” text_orientation=”center” background_layout=”light” custom_padding=”20px|30px|20px|30px|false|false” hover_enabled=”0″ border_radii=”on|3px|3px|3px|3px” box_shadow_style_button=”preset2″ box_shadow_vertical_button=”2px” global_colors_info=”{}” sticky_enabled=”0″][/mfe_send_fox]
Generate Date Between Two Dates in Python
In order to generate a date between two dates, this can be done by converting the dates into days first. This can be combined with the random.randint() in addition to the days of the date differences then adding back to the start date:
import random, datetime
#### Select a random date between two dates:
d1 = datetime.date( 2013, 2, 26 )
d2 = datetime.date( 2015, 12, 15 )
diff = d2 - d1
new_date_days = random.randint( 0, diff.days )
print( f"Random date is { d1 + datetime.timedelta( days=new_date_days ) }")
The output would be as follows:

Generate Random Temporary Filename in Python
A common need is to generate a random filename often for temporary storage. This might be for a log file, a cache file or some other scenario and can be easily done with the similar string generation as above. First a letter should be determined and then the remaining letters can be added with also numbers as well.
import random, string
def generate_random_filename( filename_len=10):
filename = ""
filename = filename + random.choice( string.ascii_lowercase )
for i in range(2, filename_len+1):
filename = filename + random.choice( string.ascii_lowercase + string.digits )
return filename
print( f"Random filename = [{ generate_random_filename( 10) }.txt]")
Output as follows:

There is in fact a specific python library though that does this which is even simpler:
import tempfile
filename = tempfile.NamedTemporaryFile( prefix="temp_" , suffix =".txt" )
print( f" Temporary filename is [{ filename.name }] ")
Output of the temporary filename generator is:

Conclusion
The random library has many uses from generating numbers to specific strings with a given length for password generation. Typically, these use cases sometimes have specialised libraries as there can be nuances (e.g for passwords, you may not want a repeating sequence which may be possible through random luck) which you can search for through pypi.org. However, many can be created with simple lines of code as demonstrated above. Send comments below or email me to ask further questions.
Subscribe
Not subscribed to our email list? Sign up now and get your next article in your inbox:
Related Articles
How To Use Python Joblib for Parallel Computing and Caching
Intermediate
You have a data processing loop that runs one item at a time — checking each file, scoring each user, training each model configuration. Your machine has eight cores and only one of them is working. The loop that takes twenty minutes could finish in three if you could just split the work across all available processors.
Joblib is a Python library that makes parallel computing and result caching easy to add to existing code. Its Parallel and delayed utilities turn a regular Python loop into a parallel job with one wrapper. Its Memory class caches function results to disk so that the second call with the same arguments returns instantly. Install it with pip install joblib. Scikit-learn uses Joblib internally for its own parallelism, so if you have scikit-learn installed, Joblib is already there.
This article covers parallelising loops with Parallel and delayed, choosing the right backend (loky, threading, multiprocessing), caching expensive computations with Memory, integrating with scikit-learn pipelines, and diagnosing performance with verbosity settings. By the end you will have both parallel execution and disk caching working in a realistic data pipeline.
Joblib Parallel: Quick Example
The quickest way to see Joblib’s effect is to replace a for loop with a Parallel call. The structure is almost identical — the main change is wrapping the function call with delayed().
# quick_joblib.py
import time
from joblib import Parallel, delayed
def slow_square(n: int) -> int:
"""Simulate a slow computation."""
time.sleep(0.5)
return n * n
numbers = list(range(8))
# Sequential -- takes 8 * 0.5 = 4 seconds
start = time.perf_counter()
sequential = [slow_square(n) for n in numbers]
seq_time = time.perf_counter() - start
print(f"Sequential: {sequential} in {seq_time:.2f}s")
# Parallel -- uses all available CPU cores
start = time.perf_counter()
parallel = Parallel(n_jobs=-1)(delayed(slow_square)(n) for n in numbers)
par_time = time.perf_counter() - start
print(f"Parallel: {parallel} in {par_time:.2f}s")
print(f"Speedup: {seq_time / par_time:.1f}x")
Output (on an 8-core machine):
Sequential: [0, 1, 4, 9, 16, 25, 36, 49] in 4.01s
Parallel: [0, 1, 4, 9, 16, 25, 36, 49] in 0.56s
Speedup: 7.2x
The n_jobs=-1 argument tells Joblib to use all available CPU cores. n_jobs=4 would use exactly four. The delayed(func)(args) pattern creates a lazy description of the function call without executing it — Joblib collects these descriptions and distributes them across workers. The return values are collected in the same order as the input, so parallel[3] is always the result of slow_square(3) regardless of which worker finished first.
What Is Joblib and When Should You Use It?
Joblib provides two things: easy parallelism through a process pool, and persistent disk caching of function results. These two features are independent — you can use either without the other. The parallelism is built on top of the loky process pool by default (a robust reimplementation of multiprocessing.Pool) with fallback to Python’s threading or the original multiprocessing pool.
| Tool | Best for | Overhead |
|---|---|---|
| Joblib Parallel (loky) | CPU-bound tasks, data processing | ~100ms startup |
| Joblib Parallel (threading) | IO-bound tasks, numpy releases GIL | ~5ms startup |
| concurrent.futures | Simple async IO, process pools | ~50ms startup |
| multiprocessing.Pool | CPU-bound, full control needed | ~100ms startup |
| asyncio | High-concurrency network IO | Near zero |
Joblib excels when your loop body is CPU-bound (model training, file parsing, image processing) and each iteration takes at least a few milliseconds — enough to justify the inter-process communication cost. For very fast operations (microsecond loops), parallelism overhead outweighs the benefit. The caching feature is valuable for any function with expensive deterministic computations: feature extraction, data loading, hyperparameter search.
Choosing the Right Backend
Joblib supports three execution backends, each suited to different workloads. Understanding when to use each prevents a common trap: the default process-based backend actually slows down IO-bound work because of serialisation overhead.
# backends.py
import time
import numpy as np
from joblib import Parallel, delayed
def cpu_task(size: int) -> float:
"""CPU-bound: pure Python computation."""
data = list(range(size))
return sum(x * x for x in data) / len(data)
def numpy_task(size: int) -> float:
"""Numpy releases the GIL -- threading backend works well here."""
arr = np.random.rand(size)
return float(np.sqrt(np.sum(arr ** 2)))
items = [100_000] * 8
# Default loky backend (separate processes, best for pure Python CPU work)
start = time.perf_counter()
results_loky = Parallel(n_jobs=4, backend="loky")(
delayed(cpu_task)(n) for n in items
)
print(f"loky (CPU work): {time.perf_counter() - start:.2f}s")
# Threading backend (shares memory, good when GIL is released by C extensions)
start = time.perf_counter()
results_thread = Parallel(n_jobs=4, backend="threading")(
delayed(numpy_task)(n) for n in items
)
print(f"threading (NumPy): {time.perf_counter() - start:.2f}s")
# Sequential for comparison
start = time.perf_counter()
results_seq = [numpy_task(n) for n in items]
print(f"sequential: {time.perf_counter() - start:.2f}s")
Output:
loky (CPU work): 0.48s
threading (NumPy): 0.31s
sequential: 1.12s
The loky backend spawns separate Python processes, each with their own memory space and GIL. This is the right choice for pure Python CPU work because it truly runs in parallel. The threading backend runs in threads within the same process. Because Python’s GIL prevents true parallel execution of pure Python code, threading only helps when the task calls into a C extension that releases the GIL — like NumPy, Pandas, or scikit-learn. The multiprocessing backend is the original process pool; prefer loky unless you have a specific compatibility reason to use it.
Caching Expensive Results with Memory
Joblib’s Memory class caches a function’s return value to disk, keyed by the function’s source code and its arguments. The second call with the same arguments reads from the cache instead of recomputing. This is useful for data loading, feature extraction, or any expensive deterministic step that you run repeatedly during development.
# caching.py
import time
import numpy as np
from joblib import Memory
# Create a cache directory
cache = Memory("./joblib_cache", verbose=1)
@cache.cache
def load_and_process(filepath: str, scale: float = 1.0) -> np.ndarray:
"""Simulate expensive data loading and processing."""
print(f" [COMPUTING] Loading {filepath} with scale={scale}")
time.sleep(2) # Simulate a 2-second load
data = np.random.rand(1000) * scale
return data
print("First call (cold cache):")
start = time.perf_counter()
result1 = load_and_process("data/features.npy", scale=2.0)
print(f" Took: {time.perf_counter() - start:.2f}s, mean={result1.mean():.4f}")
print("\nSecond call (cache hit):")
start = time.perf_counter()
result2 = load_and_process("data/features.npy", scale=2.0)
print(f" Took: {time.perf_counter() - start:.4f}s, mean={result2.mean():.4f}")
print("\nDifferent args (cache miss):")
start = time.perf_counter()
result3 = load_and_process("data/features.npy", scale=3.0)
print(f" Took: {time.perf_counter() - start:.2f}s, mean={result3.mean():.4f}")
Output:
First call (cold cache):
[COMPUTING] Loading data/features.npy with scale=2.0
Took: 2.01s, mean=0.9987
Second call (cache hit):
Took: 0.0031s, mean=0.9987
Different args (cache miss):
[COMPUTING] Loading data/features.npy with scale=3.0
Took: 2.01s, mean=1.4991
The cache is stored as compressed pickle files in the directory you specify. It is keyed on the function’s source code hash and all arguments — if you change the function body, Joblib invalidates the cache automatically on the next call. To clear the cache manually, call cache.clear() or delete the cache directory. The verbose=1 argument makes Joblib print whether it computed or loaded from cache; set it to 0 to silence this output in production.
Joblib with scikit-learn Pipelines
Scikit-learn uses Joblib internally for all its n_jobs parameters — cross-validation, grid search, random forests, and more all use the same Joblib infrastructure. You can control the backend and number of jobs globally using Joblib’s parallel_backend context manager, or pass n_jobs directly to estimators.
# sklearn_parallel.py
import time
import numpy as np
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score, GridSearchCV
from joblib import parallel_backend
# Generate a sample dataset
X, y = make_classification(n_samples=5000, n_features=20, random_state=42)
# Train a random forest using all CPU cores
print("Training RandomForest with n_jobs=-1...")
start = time.perf_counter()
rf = RandomForestClassifier(n_estimators=100, n_jobs=-1, random_state=42)
rf.fit(X, y)
print(f" Fit time: {time.perf_counter() - start:.2f}s")
# Cross-validation in parallel
start = time.perf_counter()
scores = cross_val_score(rf, X, y, cv=5, n_jobs=-1, scoring="accuracy")
print(f" CV scores: {scores.round(3)}, mean={scores.mean():.3f}, time={time.perf_counter() - start:.2f}s")
# Hyperparameter search -- each combo evaluated in parallel
param_grid = {
"n_estimators": [50, 100],
"max_depth": [5, 10, None],
}
start = time.perf_counter()
grid = GridSearchCV(
RandomForestClassifier(random_state=42),
param_grid,
cv=3,
n_jobs=-1,
verbose=0,
)
grid.fit(X, y)
elapsed = time.perf_counter() - start
print(f" Best params: {grid.best_params_}, score={grid.best_score_:.3f}, time={elapsed:.2f}s")
Output:
Training RandomForest with n_jobs=-1...
Fit time: 0.31s
CV scores: [0.934 0.931 0.929 0.927 0.932], mean=0.931, time=0.48s
Best params: {'max_depth': None, 'n_estimators': 100}, score=0.931, time=1.24s
The n_jobs=-1 parameter on scikit-learn estimators and model-selection utilities goes directly to Joblib. Setting it uses all available cores for that operation. For nested parallelism (a parallel grid search that itself trains parallel random forests), Joblib automatically avoids over-subscribing the CPU — the inner jobs run sequentially when the outer jobs already fill all cores.
Real-Life Example: Parallel Feature Extraction Pipeline
The following pipeline processes a directory of text files, extracts word-frequency features from each, and caches the results. Combining Parallel with Memory gives you both speed and resilience — if the pipeline is interrupted, the cached results mean you do not repeat work already done.
# feature_pipeline.py
import os
import time
import re
from collections import Counter
from pathlib import Path
from joblib import Parallel, delayed, Memory
cache = Memory("./feature_cache", verbose=0)
# --- Create sample text files ---
SAMPLE_DIR = Path("sample_texts")
SAMPLE_DIR.mkdir(exist_ok=True)
sample_texts = {
"python.txt": "Python is a high-level programming language. Python emphasises readability.",
"data.txt": "Data science uses statistics and programming. Data analysis reveals patterns.",
"web.txt": "Web development creates websites and applications. The web uses HTML CSS JavaScript.",
"ai.txt": "Artificial intelligence mimics human thinking. Machine learning trains models on data.",
"cloud.txt": "Cloud computing provides on-demand resources. Cloud services scale automatically.",
}
for fname, text in sample_texts.items():
(SAMPLE_DIR / fname).write_text(text * 50) # Make files large enough to matter
@cache.cache
def extract_features(filepath: str) -> dict:
"""Extract word frequency features from a text file (cached)."""
text = Path(filepath).read_text().lower()
words = re.findall(r'\b[a-z]{3,}\b', text)
top_words = dict(Counter(words).most_common(10))
time.sleep(0.3) # Simulate expensive NLP processing
return {"file": Path(filepath).name, "word_count": len(words), "top_words": top_words}
def run_pipeline(data_dir: Path) -> list[dict]:
files = [str(f) for f in data_dir.glob("*.txt")]
print(f"Processing {len(files)} files in parallel...")
start = time.perf_counter()
results = Parallel(n_jobs=-1, verbose=10)(
delayed(extract_features)(f) for f in files
)
elapsed = time.perf_counter() - start
print(f"Done in {elapsed:.2f}s")
return results
features = run_pipeline(SAMPLE_DIR)
for feat in features:
top3 = list(feat["top_words"].keys())[:3]
print(f" {feat['file']:15s} words={feat['word_count']:,} top={top3}")
Output (first run — cold cache):
Processing 5 files in parallel...
[Parallel(n_jobs=-1)]: Done 5 out of 5 | elapsed: 0.4s finished
Done in 0.41s
python.txt words=350 top=['python', 'language', 'high']
data.txt words=350 top=['data', 'science', 'analysis']
web.txt words=350 top=['web', 'development', 'html']
ai.txt words=350 top=['learning', 'machine', 'data']
cloud.txt words=350 top=['cloud', 'computing', 'services']
s from a file or a database, the cache becomes stale when that data changes. You are responsible for clearing the cache when upstream data changes, either by calling memory.clear(), by passing a version argument to the function, or by using a time-based expiry implemented in the function body.
How do I track progress in a long Parallel job?
Set verbose=10 (the maximum) in Parallel() to print a status line after each completed job, including elapsed time, estimated remaining time, and memory usage. For a progress bar, use the tqdm library: wrap the generator with tqdm(delayed(func)(x) for x in items, total=len(items)) -- Joblib will pull items from the tqdm-wrapped iterator and tqdm updates the bar as items are consumed.
Are there memory issues with Joblib on long-running jobs?
When using the loky backend with large return values, worker memory can accumulate if workers are reused across many batches. Set max_nbytes="10M" in Parallel() to use memory-mapped files for return values above 10 MB instead of pickle serialisation. To prevent worker memory from growing across restarts, set Parallel(n_jobs=4, max_nbytes=None) combined with periodic worker recycling using loky.get_reusable_executor(max_workers=4, reuse="kill_workers").
Conclusion
Joblib makes two of the most common performance problems in data pipelines trivially easy to solve: parallelising embarrassingly parallel loops with Parallel and delayed, and caching expensive deterministic computations with Memory. You have seen how to replace a for loop with a parallel equivalent in four lines, choose the right backend for CPU-bound versus IO-bound work, cache results to disk, and integrate both patterns with scikit-learn.
The natural extension of the feature extraction pipeline is to add a cache validation step that checks file modification timestamps, and to feed the extracted features directly into a scikit-learn pipeline with n_jobs=-1 cross-validation -- so both the feature extraction and the model evaluation run in parallel with full caching.
For the full Joblib reference including memory-mapped arrays, batch processing, and custom backends, see the official Joblib documentation.
Related Articles
Further Reading: For more details, see the Python random module documentation.
Frequently Asked Questions
How do I generate a random number in Python?
Use random.randint(a, b) for integers or random.random() for a float between 0 and 1. Example: import random; num = random.randint(1, 100).
What is the difference between random and secrets?
The random module is for simulations and games but NOT for security. The secrets module provides cryptographically secure randomness for passwords, tokens, and security-sensitive applications.
How do I generate a random list of numbers?
Use [random.randint(1, 100) for _ in range(10)] for random integers. For unique numbers, use random.sample(range(1, 101), 10). For float arrays, use numpy.random.rand(10).
How do I set a random seed?
Call random.seed(42) before generating numbers. The same seed always produces the same sequence, useful for testing and reproducible experiments.
Can I generate numbers following a specific distribution?
Yes. Use random.gauss() for normal, random.uniform() for uniform. NumPy offers numpy.random.normal(), poisson(), binomial(), and many more.