Intermediate

Networks are everywhere: social connections, airline routes, software dependencies, financial transactions, biological pathways. When you need to find the shortest route between two cities, detect communities in a social network, or identify the most influential node in a dependency graph, you need graph analysis tools. Python’s NetworkX library provides all of this in a clean, expressive API that integrates naturally with NumPy, SciPy, and Matplotlib.

NetworkX represents graphs as Python objects, so you can build networks programmatically and then apply powerful algorithms without writing them yourself. It supports undirected graphs, directed graphs (digraphs), multigraphs, and weighted graphs. Installation is simple: pip install networkx matplotlib. The Matplotlib package is needed for visualization.

This article covers everything you need to start working with graphs in Python: creating graphs and adding nodes and edges, calculating basic graph metrics, finding shortest paths, running centrality analysis, detecting connected components, and visualizing networks. By the end, we will analyze a real social network dataset to find the most connected people in a group.

NetworkX Quick Example

Here is a complete example building a small social network and running basic analysis on it:

# quick_networkx.py
import networkx as nx

# Create an undirected graph
G = nx.Graph()

# Add edges (nodes are created automatically)
G.add_edges_from([
    ('Alice', 'Bob'), ('Alice', 'Carol'),
    ('Bob', 'Carol'), ('Bob', 'Dave'),
    ('Carol', 'Eve'), ('Dave', 'Eve')
])

print("Nodes:", list(G.nodes()))
print("Edges:", list(G.edges()))
print("Number of nodes:", G.number_of_nodes())
print("Number of edges:", G.number_of_edges())

# Shortest path between Alice and Eve
path = nx.shortest_path(G, source='Alice', target='Eve')
print("Shortest path Alice -> Eve:", path)

# Degree of each node (number of connections)
for node, degree in G.degree():
    print(f"  {node}: {degree} connections")

Output:

Nodes: ['Alice', 'Bob', 'Carol', 'Dave', 'Eve']
Edges: [('Alice', 'Bob'), ('Alice', 'Carol'), ('Bob', 'Carol'), ('Bob', 'Dave'), ('Carol', 'Eve'), ('Dave', 'Eve')]
Number of nodes: 6
Number of edges: 6
Shortest path Alice -> Eve: ['Alice', 'Carol', 'Eve']

  Alice: 2 connections
  Bob: 3 connections
  Carol: 3 connections
  Dave: 2 connections
  Eve: 2 connections

NetworkX found the shortest social path from Alice to Eve through Carol in two hops. Bob and Carol have the most connections (degree 3), making them the most central figures in this small network. We built this in about 10 lines of code.

Graph Types in NetworkX

NetworkX supports four main graph types, each suited to different real-world scenarios. Choosing the right type matters because it changes which algorithms are available and what the edges mean.

ClassDirectionMulti-edgesUse Case
GraphUndirectedNoFriendships, protein interactions
DiGraphDirectedNoTwitter follows, dependency graphs
MultiGraphUndirectedYesMulti-lane roads, parallel networks
MultiDiGraphDirectedYesFinancial transactions, flight routes
# graph_types.py
import networkx as nx

# Undirected graph: friendship is mutual
G = nx.Graph()
G.add_edge('Alice', 'Bob')
print("Alice in Bob's neighbors:", 'Alice' in G.neighbors('Bob'))  # True

# Directed graph: following is one-way
DG = nx.DiGraph()
DG.add_edge('Alice', 'Bob')  # Alice follows Bob
print("Bob follows Alice:", DG.has_edge('Bob', 'Alice'))  # False
print("Alice follows Bob:", DG.has_edge('Alice', 'Bob'))  # True

# Weighted graph: distances or strengths
WG = nx.Graph()
WG.add_edge('NYC', 'Chicago', weight=790)
WG.add_edge('Chicago', 'LA', weight=2015)
WG.add_edge('NYC', 'LA', weight=2800)
print("NYC-Chicago distance:", WG['NYC']['Chicago']['weight'])

Output:

Alice in Bob's neighbors: True
Bob follows Alice: False
Alice follows Bob: True
NYC-Chicago distance: 790

Edge weights can represent anything — distance, cost, bandwidth, similarity, or strength of relationship. The weight key is the standard convention in NetworkX, and most shortest-path algorithms use it automatically.

Building network graphs
Graphs are just nodes and edges. NetworkX handles the math in between.

Centrality Analysis

Centrality measures answer the question “which nodes are most important in this network?” Different centrality metrics define “important” in different ways. Degree centrality looks at number of direct connections. Betweenness centrality identifies nodes that bridge different parts of the network. PageRank (the algorithm behind Google Search) measures influence based on the quality of incoming connections.

# centrality_demo.py
import networkx as nx

# Build a slightly larger network
edges = [
    ('Alice', 'Bob'), ('Alice', 'Carol'), ('Alice', 'Dave'),
    ('Bob', 'Eve'), ('Carol', 'Frank'), ('Dave', 'Grace'),
    ('Eve', 'Frank'), ('Frank', 'Grace'), ('Grace', 'Bob')
]
G = nx.Graph()
G.add_edges_from(edges)

# Degree centrality: fraction of nodes connected to this node
degree_c = nx.degree_centrality(G)
print("Degree centrality:")
for node, val in sorted(degree_c.items(), key=lambda x: -x[1]):
    print(f"  {node}: {val:.3f}")

# Betweenness centrality: fraction of shortest paths passing through node
between_c = nx.betweenness_centrality(G)
print("\nBetweenness centrality (bridges):")
for node, val in sorted(between_c.items(), key=lambda x: -x[1])[:3]:
    print(f"  {node}: {val:.3f}")

Output:

Degree centrality:
  Alice: 0.500
  Frank: 0.500
  Grace: 0.500
  Bob: 0.375
  Carol: 0.250
  Dave: 0.250
  Eve: 0.250

Betweenness centrality (bridges):
  Frank: 0.286
  Alice: 0.238
  Grace: 0.214

Frank and Grace are the key bridges in this network — removing them would disconnect the graph into separate clusters. This kind of analysis is invaluable for understanding vulnerability in supply chains, critical nodes in infrastructure, or key connectors in organizational charts.

Shortest Paths and Weighted Routing

Finding the shortest path between nodes is one of the most common graph problems. NetworkX implements Dijkstra’s algorithm, Bellman-Ford, and A* for weighted graphs. For unweighted graphs, it uses BFS. All of these are available with a single function call.

# shortest_path_demo.py
import networkx as nx

# City distance network (weights in miles)
G = nx.Graph()
G.add_weighted_edges_from([
    ('NYC', 'Philadelphia', 95),
    ('Philadelphia', 'Baltimore', 100),
    ('Baltimore', 'DC', 45),
    ('NYC', 'Boston', 215),
    ('Boston', 'Providence', 50),
    ('NYC', 'DC', 230),
    ('DC', 'Richaond', 100)
])

# Shortest path by number of hops (ignoring weight)
hop_path = nx.shortest_path(G, 'NYC', 'Richaond')
print("Shortest by hops:", hop_path)
print("Hops:", len(hop_path) - 1)

# Shortest path by distance (using weights)
dist_path = nx.shortest_path(G, 'NYC', 'Richmond', weight='weight')
dist_len = nx.shortest_path_length(G, 'NYC', 'Richmond', weight='weight')
print("\nShortest by distance:", dist_path)
print("Total distance:", dist_len, "miles")

# All shortest paths from NYC
all_paths = dict(nx.single_source_shortest_path_length(G, 'NYC'))
print("\nAll cities reachable from NYC and hop count:")
for city, hops in sorted(all_paths.items()):
    print(f"  {city}: {hops} hops")

Output:

Shortest by hops: ['NYC', 'DC', 'Richmond']
Hops: 2

Shortest by distance: ['NYC', 'Philadelphia', 'Baltimore', 'DC', 'Richaond']
Total distance: 340 miles

All cities reachable from NYC and hop count:
  NYC: 0 hops
  Boston: 1 hops
  DC: 1 hops
  Philadelphia: 1 hops
  Baltimore: 2 hops
  Providence: 2 hops
  Richmond: 2 hops

Notice that the shortest route by hops (NYC -> DC -> Richmond, 2 stops) is actually 340 miles via DC, the same distance as the longer hop path via Philadelphia. But this is because the direct NYC->DC edge (230 miles) plus DC->Richaond (100) equals 330 miles — actually shorter! NetworkX correctly identifies this using Dijkstra’s algorithm on weighted edges.

Analyzing graph properties
Shortest path, centrality, clustering. NetworkX has an algorithm for that.

Connected Components and Clustering

A connected component is a subgraph where every node can reach every other node. Identifying components helps you find isolated clusters, detect when a network has split, or understand the community structure of a social network.

# components_demo.py
import networkx as nx

# Build a graph with two separate clusters
G = nx.Graph()
# Cluster 1: Python developers
G.add_edges_from([('Alice', 'Bob'), ('Bob', 'Carol'), ('Carol', 'Alice')])
# Cluster 2: Data scientists (not connected to cluster 1 yet)
G.add_edges_from([('Dave', 'Eve'), ('Eve', 'Frank')])
# Isolated node
G.add_node('Greg')

# Find connected components
components = list(nx.connected_components(G))
print(f"Number of components: {len(components)}")
for i, comp in enumerate(components, 1):
    print(f"  Component {i}: {comp}")

# Is the graph connected?
print("Fully connected:", nx.is_connected(G))

# Add a bridge between clusters
G.add_edge('Carol', 'Dave')
print("After bridge -- fully connected:", nx.is_connected(G))

# Clustering coefficient: how tightly knit is each node's neighborhood?
clustering = nx.clustering(G)
print("\nClustering coefficients:")
for node, coef in clustering.items():
    print(f"  {node}: {coef:.3f}")

Output:

Number of components: 3
  Component 1: {'Alice', 'Bob', 'Carol'}
  Component 2: {'Dave', 'Eve', 'Frank'}
  Component 3: {'Greg'}
Fully connected: False
After bridge -- fully connected: True

Clustering coefficients:
  Alice: 1.000
  Bob: 1.000
  Carol: 0.333
  Dave: 0.000
  Eve: 0.000
  Frank: 0.000
  Greg: 0.000

Alice and Bob have clustering coefficient 1.0 — all their neighbors are also connected to each other (a perfect triangle). Carol’s coefficient drops to 0.333 after she bridges to Dave, because Dave and Alice/Bob are not connected. High clustering indicates tight-knit groups; low clustering indicates bridging roles.

Real-Life Example: Analyzing a Software Package Dependency Graph

We will build a directed dependency graph for a Python project, find circular dependencies, and identify the most critical packages by centrality.

# dependency_graph.py
import networkx as nx

# Define package dependencies (A depends on B means edge A -> B)
dependencies = {
    'app': ['flask', 'sqlalchemy', 'celery'],
    'flask': ['werkzeug', 'jinja2', 'click'],
    'sqlalchemy': ['greenlet'],
    'celery': ['kombu', 'billiard', 'redis'],
    'kombu': ['redis', 'amqp'],
    'amqp': ['vine'],
    'jinja2': ['markupsafe'],
    'click': [],
    'werkzeug': [],
    'greenlet': [],
    'billiard': [],
    'redis': [],
    'vine': [],
    'markupsafe': []
}

# Build directed graph
DG = nx.DiGraph()
for package, deps in dependencies.items():
    DG.add_node(package)
    for dep in deps:
        DG.add_edge(package, dep)

print(f"Total packages: {DG.number_of_nodes()}")
print(f"Total dependencies: {DG.number_of_edges()}")

# Find packages with no dependencies (leaf nodes)
leaves = [n for n, d in DG.out_degree() if d == 0]
print(f"\nLeaf packages (no dependencies): {leaves}")

# Find the most depended-upon packages (high in-degree)
print("\nMost required packages:")
for pkg, in_deg in sorted(DG.in_degree(), key=lambda x: -x[1])[:3]:
    print(f"  {pkg}: required by {in_deg} others")

# Topological sort: valid installation order
try:
    install_order = list(nx.topological_sort(DG))
    print(f"\nInstall order (reversed): {install_order[:5]}...")
except nx.NetworkXUnfeasible:
    print("Circular dependency detected!")

# Check for cycles
cycles = list(nx.simple_cycles(DG))
if cycles:
    print(f"Circular dependencies: {cycles}")
else:
    print("\nNo circular dependencies found.")

Output:

Total packages: 14
Total dependencies: 14

Leaf packages (no dependencies): ['click', 'werkzeug', 'greenlet', 'billiard', 'redis', 'vine', 'markupsafe']

Most required packages:
  redis: required by 2 others
  click: required by 1 others
  jinja2: required by 1 others

Install order (reversed): ['app', 'flask', 'jinja2', 'markupsafe', 'click']...

No circular dependencies found.

This analysis immediately reveals that redis is a shared dependency (required by both celery and kombu), making it a critical package to monitor for version conflicts. The topological sort gives us a valid installation order — exactly what pip’s dependency resolver computes internally. You can extend this to parse actual requirements.txt or pyproject.toml files using the pip library or packaging module.

Communities, centrality, shortest path. NetworkX has the algorithms.
Communities, centrality, shortest path. NetworkX has the algorithms.

Frequently Asked Questions

Can NetworkX handle large graphs with millions of nodes?

NetworkX works well for graphs up to a few hundred thousand nodes on a modern machine. For graphs with millions of nodes, consider graph-parallel frameworks like graph-tool (C++ backend) or NetworKit. For distributed processing of billion-scale graphs, Apache Spark’s GraphX or GraphFrames are the standard choice.

How do I visualize larger networks properly?

For small graphs (under ~100 nodes), nx.draw(G, with_labels=True) with Matplotlib works well. For larger graphs, use nx.spring_layout() for force-directed layouts, or export to Gephi with nx.write_gexf(G, 'graph.gexf') for interactive exploration. The pyvis library creates interactive HTML visualizations that work in Jupyter notebooks.

How do I load graph data from files?

NetworkX supports many formats: nx.read_edgelist('edges.txt') for simple edge lists, nx.read_gml() for GML format, nx.read_graphml() for GraphML, and nx.from_pandas_edgelist(df) for Pandas DataFrames. For large datasets, edge list files are the most efficient — each line is a pair of node IDs.

How do weighted shortest paths differ from unweighted?

Unweighted shortest paths minimize the number of edges (hops). Weighted shortest paths (using Dijkstra’s algorithm) minimize the total edge weight. Always pass weight='weight' (or your custom weight attribute name) to nx.shortest_path() when you want distance-based routing. Without this parameter, NetworkX ignores weights and counts hops only.

How do I detect communities in a network?

NetworkX includes several community detection algorithms. The most popular for undirected graphs is the Louvain algorithm, available via the community module: from networkx.algorithms.community import louvain_communities; communities = louvain_communities(G). For smaller graphs, Girvan-Newman algorithm works well: from networkx.algorithms.community import girvan_newman. Community detection is useful for finding friend groups, topic clusters, or organizational units.

Conclusion

We have covered the NetworkX fundamentals: creating graphs with Graph, DiGraph, and weighted edges; calculating centrality metrics with degree_centrality() and betweenness_centrality(); finding shortest paths with shortest_path() and shortest_path_length(); identifying connected components with connected_components(); and applying topological sorting with topological_sort() for dependency resolution. The dependency graph example showed how to translate a real-world engineering problem into a graph analysis workflow.

From here, explore NetworkX’s community detection algorithms for social network analysis, or try exporting your graphs to Gephi for interactive visualization. The nx.generate_random_graphs module provides benchmark graphs for testing your algorithms at scale.

Official documentation: networkx.org/documentation

Creating and Modifying Graphs

NetworkX represents graphs as objects with add/remove methods for nodes and edges. The four graph types cover most needs:

import networkx as nx

# Undirected — friendships, transit lines
G = nx.Graph()
G.add_node("Alice")
G.add_nodes_from(["Bob", "Carol", "Dave"])
G.add_edge("Alice", "Bob")
G.add_edges_from([("Alice", "Carol"), ("Bob", "Dave")])

# Directed — Twitter follows, dependency graphs
D = nx.DiGraph()
D.add_edge("user1", "user2")    # user1 follows user2

# Weighted — road networks, network capacity
W = nx.Graph()
W.add_edge("A", "B", weight=10)
W.add_edge("B", "C", weight=5)
W.add_edge("A", "C", weight=20)

# Multi — multiple edges between same nodes (parallel routes)
M = nx.MultiGraph()
M.add_edge("X", "Y", route="highway")
M.add_edge("X", "Y", route="back-road")

print(G.number_of_nodes(), G.number_of_edges())   # 4 3
print(list(G.neighbors("Alice")))                 # ['Bob', 'Carol']

Reading Graphs from Files

For real-world graphs, load from files instead of building manually:

# Edge list — one edge per line, space-separated
G = nx.read_edgelist("friendships.txt")

# CSV with attributes
import pandas as pd
df = pd.read_csv("edges.csv")
G = nx.from_pandas_edgelist(df, source="from", target="to", edge_attr="weight")

# GraphML — preserves all attributes
G = nx.read_graphml("network.graphml")
nx.write_graphml(G, "out.graphml")

# JSON (for D3.js visualization)
import json
data = nx.node_link_data(G)
with open("graph.json", "w") as f:
    json.dump(data, f)

Shortest Paths and Distances

One of the most-used graph operations — find the shortest route between two nodes:

G = nx.Graph()
G.add_weighted_edges_from([
    ("Home", "Work", 30),
    ("Home", "Gym", 10),
    ("Gym", "Work", 25),
    ("Home", "Cafe", 5),
    ("Cafe", "Work", 35),
])

# Dijkstra with edge weights
path = nx.dijkstra_path(G, "Home", "Work", weight="weight")
print(path)        # ['Home', 'Gym', 'Work']
distance = nx.dijkstra_path_length(G, "Home", "Work", weight="weight")
print(distance)    # 35

# BFS for unweighted graphs (faster)
print(nx.shortest_path(G, "Home", "Work"))

# All shortest paths from one node
print(dict(nx.shortest_path_length(G, "Home", weight="weight")))

Centrality: Who’s Important?

Centrality measures rank nodes by importance. Four common ones:

# Degree centrality — most connections
nx.degree_centrality(G)
# {'Alice': 0.8, 'Bob': 0.4, ...}

# Betweenness — most common "bridge" between others
nx.betweenness_centrality(G)

# Closeness — quickest to reach everyone
nx.closeness_centrality(G)

# PageRank — Google's algorithm
nx.pagerank(G, alpha=0.85)

For a social network, PageRank or betweenness identifies influencers. For a transit network, closeness identifies hubs.

Community Detection

Communities are dense clusters of connected nodes — groups, neighborhoods, market segments:

import networkx as nx
from networkx.algorithms.community import greedy_modularity_communities, louvain_communities

communities = louvain_communities(G, seed=42)
for i, c in enumerate(communities):
    print(f"Community {i}: {len(c)} nodes — {list(c)[:5]}")

# Compare community quality
print(nx.community.modularity(G, communities))

Visualizing Graphs

NetworkX uses matplotlib for plots. For small graphs, the defaults are fine; for larger ones, use a proper graph viz tool (Gephi, Graphistry):

import matplotlib.pyplot as plt

pos = nx.spring_layout(G, seed=42)
nx.draw_networkx_nodes(G, pos, node_color="lightblue", node_size=400)
nx.draw_networkx_edges(G, pos, alpha=0.5)
nx.draw_networkx_labels(G, pos)
plt.axis("off")
plt.savefig("graph.png", dpi=150)

Common Pitfalls

  • Wrong graph type. Using Graph when you have directed data hides the directionality. Pick DiGraph for follows, citations, dependencies.
  • Forgetting weight parameter. Many algorithms accept a weight argument to use edge weights. Default ignores them — your “shortest path” may not be shortest.
  • Slow on large graphs. NetworkX is pure Python — fine to ~100K nodes. For millions, switch to graph-tool (C++) or networkit.
  • Naming algorithm wrong. Some algorithms only work on connected graphs. Check nx.is_connected(G) first or operate on subgraphs.
  • Mutating during iteration. Modifying a graph (add/remove nodes) while iterating over it raises RuntimeError. Build a snapshot first.

FAQ

Q: NetworkX, graph-tool, or igraph?
A: NetworkX for general use — pure Python, clean API, comprehensive algorithm library. graph-tool for performance (compiled). igraph for R interop or specific algorithms NetworkX lacks.

Q: How do I scale to millions of nodes?
A: NetworkX is too slow. Use cuGraph (GPU-accelerated), graph-tool, or specialized graph databases (Neo4j, Memgraph).

Q: Can I run graph algorithms on a DataFrame directly?
A: nx.from_pandas_edgelist converts; nx.to_pandas_edgelist goes back. For graph operations, you need the NetworkX object — keeping it pandas-only doesn’t work.

Q: How do I find cycles in a directed graph?
A: nx.simple_cycles(D) for all simple cycles. nx.is_directed_acyclic_graph(D) to check if a graph is a DAG. nx.topological_sort(D) orders DAG nodes.

Q: Best layout algorithm for visualization?
A: spring_layout for general use, kamada_kawai_layout for smaller graphs (better quality), circular_layout for highly connected graphs.

Wrapping Up

NetworkX is the Swiss Army knife of graph analysis in Python — undirected, directed, weighted, multi-edge graphs all behave consistently. Add a graph from edges, ask shortest-path or centrality questions, and you’re already 80% of the way to solving real network problems. For graphs too large for pure Python (millions of nodes), specialized libraries take over; for everything else NetworkX is the right tool.