Intermediate

Networks are everywhere: social connections, airline routes, software dependencies, financial transactions, biological pathways. When you need to find the shortest route between two cities, detect communities in a social network, or identify the most influential node in a dependency graph, you need graph analysis tools. Python’s NetworkX library provides all of this in a clean, expressive API that integrates naturally with NumPy, SciPy, and Matplotlib.

NetworkX represents graphs as Python objects, so you can build networks programmatically and then apply powerful algorithms without writing them yourself. It supports undirected graphs, directed graphs (digraphs), multigraphs, and weighted graphs. Installation is simple: pip install networkx matplotlib. The Matplotlib package is needed for visualization.

This article covers everything you need to start working with graphs in Python: creating graphs and adding nodes and edges, calculating basic graph metrics, finding shortest paths, running centrality analysis, detecting connected components, and visualizing networks. By the end, we will analyze a real social network dataset to find the most connected people in a group.

NetworkX Quick Example

Here is a complete example building a small social network and running basic analysis on it:

# quick_networkx.py
import networkx as nx

# Create an undirected graph
G = nx.Graph()

# Add edges (nodes are created automatically)
G.add_edges_from([
    ('Alice', 'Bob'), ('Alice', 'Carol'),
    ('Bob', 'Carol'), ('Bob', 'Dave'),
    ('Carol', 'Eve'), ('Dave', 'Eve')
])

print("Nodes:", list(G.nodes()))
print("Edges:", list(G.edges()))
print("Number of nodes:", G.number_of_nodes())
print("Number of edges:", G.number_of_edges())

# Shortest path between Alice and Eve
path = nx.shortest_path(G, source='Alice', target='Eve')
print("Shortest path Alice -> Eve:", path)

# Degree of each node (number of connections)
for node, degree in G.degree():
    print(f"  {node}: {degree} connections")

Output:

Nodes: ['Alice', 'Bob', 'Carol', 'Dave', 'Eve']
Edges: [('Alice', 'Bob'), ('Alice', 'Carol'), ('Bob', 'Carol'), ('Bob', 'Dave'), ('Carol', 'Eve'), ('Dave', 'Eve')]
Number of nodes: 6
Number of edges: 6
Shortest path Alice -> Eve: ['Alice', 'Carol', 'Eve']

  Alice: 2 connections
  Bob: 3 connections
  Carol: 3 connections
  Dave: 2 connections
  Eve: 2 connections

NetworkX found the shortest social path from Alice to Eve through Carol in two hops. Bob and Carol have the most connections (degree 3), making them the most central figures in this small network. We built this in about 10 lines of code.

Graph Types in NetworkX

NetworkX supports four main graph types, each suited to different real-world scenarios. Choosing the right type matters because it changes which algorithms are available and what the edges mean.

ClassDirectionMulti-edgesUse Case
GraphUndirectedNoFriendships, protein interactions
DiGraphDirectedNoTwitter follows, dependency graphs
MultiGraphUndirectedYesMulti-lane roads, parallel networks
MultiDiGraphDirectedYesFinancial transactions, flight routes
# graph_types.py
import networkx as nx

# Undirected graph: friendship is mutual
G = nx.Graph()
G.add_edge('Alice', 'Bob')
print("Alice in Bob's neighbors:", 'Alice' in G.neighbors('Bob'))  # True

# Directed graph: following is one-way
DG = nx.DiGraph()
DG.add_edge('Alice', 'Bob')  # Alice follows Bob
print("Bob follows Alice:", DG.has_edge('Bob', 'Alice'))  # False
print("Alice follows Bob:", DG.has_edge('Alice', 'Bob'))  # True

# Weighted graph: distances or strengths
WG = nx.Graph()
WG.add_edge('NYC', 'Chicago', weight=790)
WG.add_edge('Chicago', 'LA', weight=2015)
WG.add_edge('NYC', 'LA', weight=2800)
print("NYC-Chicago distance:", WG['NYC']['Chicago']['weight'])

Output:

Alice in Bob's neighbors: True
Bob follows Alice: False
Alice follows Bob: True
NYC-Chicago distance: 790

Edge weights can represent anything — distance, cost, bandwidth, similarity, or strength of relationship. The weight key is the standard convention in NetworkX, and most shortest-path algorithms use it automatically.

Loop Larry untangling a web of network nodes
NetworkX: because someone has to untangle your social graph.

Centrality Analysis

Centrality measures answer the question “which nodes are most important in this network?” Different centrality metrics define “important” in different ways. Degree centrality looks at number of direct connections. Betweenness centrality identifies nodes that bridge different parts of the network. PageRank (the algorithm behind Google Search) measures influence based on the quality of incoming connections.

# centrality_demo.py
import networkx as nx

# Build a slightly larger network
edges = [
    ('Alice', 'Bob'), ('Alice', 'Carol'), ('Alice', 'Dave'),
    ('Bob', 'Eve'), ('Carol', 'Frank'), ('Dave', 'Grace'),
    ('Eve', 'Frank'), ('Frank', 'Grace'), ('Grace', 'Bob')
]
G = nx.Graph()
G.add_edges_from(edges)

# Degree centrality: fraction of nodes connected to this node
degree_c = nx.degree_centrality(G)
print("Degree centrality:")
for node, val in sorted(degree_c.items(), key=lambda x: -x[1]):
    print(f"  {node}: {val:.3f}")

# Betweenness centrality: fraction of shortest paths passing through node
between_c = nx.betweenness_centrality(G)
print("\nBetweenness centrality (bridges):")
for node, val in sorted(between_c.items(), key=lambda x: -x[1])[:3]:
    print(f"  {node}: {val:.3f}")

Output:

Degree centrality:
  Alice: 0.500
  Frank: 0.500
  Grace: 0.500
  Bob: 0.375
  Carol: 0.250
  Dave: 0.250
  Eve: 0.250

Betweenness centrality (bridges):
  Frank: 0.286
  Alice: 0.238
  Grace: 0.214

Frank and Grace are the key bridges in this network — removing them would disconnect the graph into separate clusters. This kind of analysis is invaluable for understanding vulnerability in supply chains, critical nodes in infrastructure, or key connectors in organizational charts.

Shortest Paths and Weighted Routing

Finding the shortest path between nodes is one of the most common graph problems. NetworkX implements Dijkstra’s algorithm, Bellman-Ford, and A* for weighted graphs. For unweighted graphs, it uses BFS. All of these are available with a single function call.

# shortest_path_demo.py
import networkx as nx

# City distance network (weights in miles)
G = nx.Graph()
G.add_weighted_edges_from([
    ('NYC', 'Philadelphia', 95),
    ('Philadelphia', 'Baltimore', 100),
    ('Baltimore', 'DC', 45),
    ('NYC', 'Boston', 215),
    ('Boston', 'Providence', 50),
    ('NYC', 'DC', 230),
    ('DC', 'Richaond', 100)
])

# Shortest path by number of hops (ignoring weight)
hop_path = nx.shortest_path(G, 'NYC', 'Richaond')
print("Shortest by hops:", hop_path)
print("Hops:", len(hop_path) - 1)

# Shortest path by distance (using weights)
dist_path = nx.shortest_path(G, 'NYC', 'Richmond', weight='weight')
dist_len = nx.shortest_path_length(G, 'NYC', 'Richmond', weight='weight')
print("\nShortest by distance:", dist_path)
print("Total distance:", dist_len, "miles")

# All shortest paths from NYC
all_paths = dict(nx.single_source_shortest_path_length(G, 'NYC'))
print("\nAll cities reachable from NYC and hop count:")
for city, hops in sorted(all_paths.items()):
    print(f"  {city}: {hops} hops")

Output:

Shortest by hops: ['NYC', 'DC', 'Richmond']
Hops: 2

Shortest by distance: ['NYC', 'Philadelphia', 'Baltimore', 'DC', 'Richaond']
Total distance: 340 miles

All cities reachable from NYC and hop count:
  NYC: 0 hops
  Boston: 1 hops
  DC: 1 hops
  Philadelphia: 1 hops
  Baltimore: 2 hops
  Providence: 2 hops
  Richmond: 2 hops

Notice that the shortest route by hops (NYC -> DC -> Richmond, 2 stops) is actually 340 miles via DC, the same distance as the longer hop path via Philadelphia. But this is because the direct NYC->DC edge (230 miles) plus DC->Richaond (100) equals 330 miles — actually shorter! NetworkX correctly identifies this using Dijkstra’s algorithm on weighted edges.

Pyro Pete sprinting along a glowing network path
G.add_edge() — the digital friendship you never had to ask for.

Connected Components and Clustering

A connected component is a subgraph where every node can reach every other node. Identifying components helps you find isolated clusters, detect when a network has split, or understand the community structure of a social network.

# components_demo.py
import networkx as nx

# Build a graph with two separate clusters
G = nx.Graph()
# Cluster 1: Python developers
G.add_edges_from([('Alice', 'Bob'), ('Bob', 'Carol'), ('Carol', 'Alice')])
# Cluster 2: Data scientists (not connected to cluster 1 yet)
G.add_edges_from([('Dave', 'Eve'), ('Eve', 'Frank')])
# Isolated node
G.add_node('Greg')

# Find connected components
components = list(nx.connected_components(G))
print(f"Number of components: {len(components)}")
for i, comp in enumerate(components, 1):
    print(f"  Component {i}: {comp}")

# Is the graph connected?
print("Fully connected:", nx.is_connected(G))

# Add a bridge between clusters
G.add_edge('Carol', 'Dave')
print("After bridge -- fully connected:", nx.is_connected(G))

# Clustering coefficient: how tightly knit is each node's neighborhood?
clustering = nx.clustering(G)
print("\nClustering coefficients:")
for node, coef in clustering.items():
    print(f"  {node}: {coef:.3f}")

Output:

Number of components: 3
  Component 1: {'Alice', 'Bob', 'Carol'}
  Component 2: {'Dave', 'Eve', 'Frank'}
  Component 3: {'Greg'}
Fully connected: False
After bridge -- fully connected: True

Clustering coefficients:
  Alice: 1.000
  Bob: 1.000
  Carol: 0.333
  Dave: 0.000
  Eve: 0.000
  Frank: 0.000
  Greg: 0.000

Alice and Bob have clustering coefficient 1.0 — all their neighbors are also connected to each other (a perfect triangle). Carol’s coefficient drops to 0.333 after she bridges to Dave, because Dave and Alice/Bob are not connected. High clustering indicates tight-knit groups; low clustering indicates bridging roles.

Real-Life Example: Analyzing a Software Package Dependency Graph

We will build a directed dependency graph for a Python project, find circular dependencies, and identify the most critical packages by centrality.

# dependency_graph.py
import networkx as nx

# Define package dependencies (A depends on B means edge A -> B)
dependencies = {
    'app': ['flask', 'sqlalchemy', 'celery'],
    'flask': ['werkzeug', 'jinja2', 'click'],
    'sqlalchemy': ['greenlet'],
    'celery': ['kombu', 'billiard', 'redis'],
    'kombu': ['redis', 'amqp'],
    'amqp': ['vine'],
    'jinja2': ['markupsafe'],
    'click': [],
    'werkzeug': [],
    'greenlet': [],
    'billiard': [],
    'redis': [],
    'vine': [],
    'markupsafe': []
}

# Build directed graph
DG = nx.DiGraph()
for package, deps in dependencies.items():
    DG.add_node(package)
    for dep in deps:
        DG.add_edge(package, dep)

print(f"Total packages: {DG.number_of_nodes()}")
print(f"Total dependencies: {DG.number_of_edges()}")

# Find packages with no dependencies (leaf nodes)
leaves = [n for n, d in DG.out_degree() if d == 0]
print(f"\nLeaf packages (no dependencies): {leaves}")

# Find the most depended-upon packages (high in-degree)
print("\nMost required packages:")
for pkg, in_deg in sorted(DG.in_degree(), key=lambda x: -x[1])[:3]:
    print(f"  {pkg}: required by {in_deg} others")

# Topological sort: valid installation order
try:
    install_order = list(nx.topological_sort(DG))
    print(f"\nInstall order (reversed): {install_order[:5]}...")
except nx.NetworkXUnfeasible:
    print("Circular dependency detected!")

# Check for cycles
cycles = list(nx.simple_cycles(DG))
if cycles:
    print(f"Circular dependencies: {cycles}")
else:
    print("\nNo circular dependencies found.")

Output:

Total packages: 14
Total dependencies: 14

Leaf packages (no dependencies): ['click', 'werkzeug', 'greenlet', 'billiard', 'redis', 'vine', 'markupsafe']

Most required packages:
  redis: required by 2 others
  click: required by 1 others
  jinja2: required by 1 others

Install order (reversed): ['app', 'flask', 'jinja2', 'markupsafe', 'click']...

No circular dependencies found.

This analysis immediately reveals that redis is a shared dependency (required by both celery and kombu), making it a critical package to monitor for version conflicts. The topological sort gives us a valid installation order — exactly what pip’s dependency resolver computes internally. You can extend this to parse actual requirements.txt or pyproject.toml files using the pip library or packaging module.

Frequently Asked Questions

Can NetworkX handle large graphs with millions of nodes?

NetworkX works well for graphs up to a few hundred thousand nodes on a modern machine. For graphs with millions of nodes, consider graph-parallel frameworks like graph-tool (C++ backend) or NetworKit. For distributed processing of billion-scale graphs, Apache Spark’s GraphX or GraphFrames are the standard choice.

How do I visualize larger networks properly?

For small graphs (under ~100 nodes), nx.draw(G, with_labels=True) with Matplotlib works well. For larger graphs, use nx.spring_layout() for force-directed layouts, or export to Gephi with nx.write_gexf(G, 'graph.gexf') for interactive exploration. The pyvis library creates interactive HTML visualizations that work in Jupyter notebooks.

How do I load graph data from files?

NetworkX supports many formats: nx.read_edgelist('edges.txt') for simple edge lists, nx.read_gml() for GML format, nx.read_graphml() for GraphML, and nx.from_pandas_edgelist(df) for Pandas DataFrames. For large datasets, edge list files are the most efficient — each line is a pair of node IDs.

How do weighted shortest paths differ from unweighted?

Unweighted shortest paths minimize the number of edges (hops). Weighted shortest paths (using Dijkstra’s algorithm) minimize the total edge weight. Always pass weight='weight' (or your custom weight attribute name) to nx.shortest_path() when you want distance-based routing. Without this parameter, NetworkX ignores weights and counts hops only.

How do I detect communities in a network?

NetworkX includes several community detection algorithms. The most popular for undirected graphs is the Louvain algorithm, available via the community module: from networkx.algorithms.community import louvain_communities; communities = louvain_communities(G). For smaller graphs, Girvan-Newman algorithm works well: from networkx.algorithms.community import girvan_newman. Community detection is useful for finding friend groups, topic clusters, or organizational units.

Conclusion

We have covered the NetworkX fundamentals: creating graphs with Graph, DiGraph, and weighted edges; calculating centrality metrics with degree_centrality() and betweenness_centrality(); finding shortest paths with shortest_path() and shortest_path_length(); identifying connected components with connected_components(); and applying topological sorting with topological_sort() for dependency resolution. The dependency graph example showed how to translate a real-world engineering problem into a graph analysis workflow.

From here, explore NetworkX’s community detection algorithms for social network analysis, or try exporting your graphs to Gephi for interactive visualization. The nx.generate_random_graphs module provides benchmark graphs for testing your algorithms at scale.

Official documentation: networkx.org/documentation