Random Graphs and Random Walks Tutorial

Introduction to Random Graphs and Random Walks

Random graphs and random walks are foundational topics in network science, with applications ranging from social network analysis to the spread of information in AI systems. In this tutorial, you will learn how to generate Erdős–Rényi (ER) and preferential attachment networks, analyze their degree distributions, identify giant connected components (GCC), and simulate random walks. These concepts are central to the ECE 232E summer 2025 project on random graphs and random walks. We'll use Python with the igraph library, and draw parallels to modern trends like recommendation algorithms and viral content propagation.

1. Generating Erdős–Rényi Random Networks

1.1 Creating ER Networks and Degree Distributions

The Erdős–Rényi model G(n,p) creates a graph with n nodes where each edge exists independently with probability p. For n=900 and p values 0.002, 0.006, 0.012, 0.045, and 0.1, you can generate networks using igraph.Graph.Erdos_Renyi(n, p). The degree distribution follows a binomial distribution B(n-1, p), which for large n approximates a Poisson distribution with mean λ = (n-1)p. Plotting the degree distribution (using degree_distribution()) will show a bell-shaped curve for moderate p. For small p, the distribution is skewed right; for larger p, it becomes more symmetric. The theoretical mean degree is (n-1)p, and variance is (n-1)p(1-p). Compare your empirical results to these values.

1.2 Connectivity and Giant Component Analysis

For each p, determine if the network is connected using is_connected(). For n=900, the threshold for connectivity is around p = ln(n)/n ≈ 0.005. For p below this, networks are likely disconnected; above it, they become almost surely connected. Use clusters() to find the giant connected component (GCC). The diameter of the GCC can be computed with diameter(). For p just above the threshold, the GCC is large but the diameter may be relatively high.

1.3 Phase Transition: GCC Size vs. p

Sweep p from 0 to a pmax (e.g., 0.1) and generate 100 networks per p. Plot the normalized GCC size (size of GCC / n) against p. You'll observe a sharp transition near p = 1/n ≈ 0.0011, where a giant component emerges. The theoretical threshold for emergence is p = 1/(n-1). For p = ln(n)/n ≈ 0.005, the GCC includes >99% of nodes. This phase transition is analogous to the tipping point in viral marketing campaigns: a small increase in connection probability can lead to widespread network connectivity.

1.4 Expected GCC Size vs. Number of Nodes

Fix average degree c = n*p. For c=0.5, 1, 1.15, 1.25, 1.35, vary n from 100 to 10000 and plot expected GCC size. For c < 1, the GCC size scales sublinearly; for c > 1, it scales linearly with n. This reflects the percolation threshold: when average degree exceeds 1, a giant component exists. This concept is used in epidemiology to model disease spread thresholds.

2. Preferential Attachment (Barabási–Albert) Networks

2.1 Generating and Analyzing BA Networks

Use Graph.Barabasi(n, m) to create networks with n=1050 nodes and m=1 edge per new node. These networks are always connected because each new node attaches to an existing node. The degree distribution follows a power law: P(k) ~ k^{-3}. Plot the degree distribution on a log-log scale; it should appear linear with slope approximately -3. Use community_fastgreedy() to detect communities. Modularity measures the strength of community structure; typical values range from 0.3 to 0.7. Assortativity (degree correlation) can be computed with assortativity_degree(). For BA networks, assortativity is often slightly negative (disassortative).

2.2 Scaling Up: Larger Networks

Repeat with n=10500 nodes. The modularity may decrease slightly because community structure becomes weaker in larger power-law networks. The degree distribution slope remains near -3. Compare the community structures visually.

2.3 Degree Distribution of Neighbors

Randomly pick a node i, then a random neighbor j. Plot the degree distribution of j on a log-log scale. This distribution is broader than the original degree distribution because high-degree nodes are more likely to be neighbors (the friendship paradox). The slope is shallower (around -2). This phenomenon is observed in social networks: your friends have more friends than you do.

2.4 Age vs. Degree Relationship

In the BA model, older nodes have higher expected degrees. Plot the degree of each node vs. its age (time of addition). The relationship is approximately linear on a log-log scale, indicating a power-law growth: k_i ~ t_i^{-1/2}. This reflects the 'rich-get-richer' effect seen in citation networks and YouTube video popularity.

2.5 Varying m: m=2 and m=6

Repeat the above for m=2 and m=6. Higher m increases the average degree and reduces the power-law exponent (makes the distribution less skewed). The modularity decreases as m increases because the network becomes denser and community structure weakens.

2.6 Stub Matching vs. Preferential Attachment

Generate a BA network with n=1050, m=1. Extract its degree sequence and create a new network using Graph.Degree_Sequence(deg_seq, method="vl") (stub matching). Compare the two networks: the stub-matched network preserves the degree sequence but loses the preferential attachment structure. Plot both with community detection. The modularity of the stub-matched network may be lower because it lacks the hierarchical community structure inherent in BA networks.

3. Modified Preferential Attachment with Age Penalty

3.1 Creating Networks with Age-Dependent Attachment

Use a custom model where attachment probability is proportional to (k_i + 1) * (l_i^{-1} + 1) (with parameters α=1, β=-1, a=c=d=1, b=0). This penalizes older nodes. Generate a network with n=1050, m=1. The degree distribution still follows a power law but with a steeper exponent (e.g., -3.5). Plot the degree distribution on log-log scale and estimate the slope using linear regression.

3.2 Community Detection and Modularity

Apply fast greedy community detection. The modularity may be higher than in standard BA networks because age penalization creates more modular structure, similar to how older social media accounts lose influence over time.

4. Random Walks on Networks

4.1 Random Walk on ER Networks

Create an ER network with n=900, p=0.015. Start a random walk from a random node. Measure the average distance ⟨s(t)⟩ from the start after t steps, and its variance σ²(t). Plot both vs. t. ⟨s(t)⟩ grows as √t for small t, then saturates near the network diameter. The variance also increases initially, then stabilizes. This behavior is analogous to the spread of information in a social network: the average distance from the source grows with the square root of time until the entire network is reached.

4.2 Degree Distribution of Visited Nodes

Measure the degree distribution of nodes visited by the random walk. It will be biased toward higher-degree nodes compared to the original degree distribution, because random walks preferentially visit high-degree nodes (the 'friendship paradox' in action).

4.3 Scaling to Larger Networks

Repeat with n=9000. The saturation distance (diameter) increases, but the overall shape remains similar. The diameter plays a role in the saturation point: larger diameters require more steps to explore.

4.4 Random Walk on Preferential Attachment Networks

Generate a BA network with n=900, m=1. Repeat the random walk analysis. In power-law networks, ⟨s(t)⟩ grows slower (logarithmically) because hubs act as short-cuts. The variance also behaves differently. This reflects how information spreads faster in networks with hubs, like Twitter trending topics.

Conclusion

This tutorial covered the core techniques for generating and analyzing random graphs and random walks as required in ECE 232E. By implementing these steps, you will gain hands-on experience with network science concepts that are crucial for understanding complex systems, from the internet to AI recommendation engines. Remember to compare your empirical results with theoretical predictions and interpret any discrepancies.