Assignment Chef icon Assignment Chef
All English tutorials

Programming lesson

Simplified PageRank Algorithm: Build Google's Core Ranking with C++ Adjacency List

Learn to implement a simplified PageRank algorithm using an adjacency list in C++. This tutorial breaks down the graph-based ranking system that powers Google, with step-by-step code examples and performance analysis.

simplified PageRank algorithm PageRank C++ adjacency list COP3530 project 2 graph ranking algorithm tutorial power iterations PageRank adjacency list vs matrix Google PageRank explained C++ graph data structure web graph ranking dangling nodes PageRank PageRank complexity Big O search engine ranking algorithm PageRank implementation guide C++ power iterations adjacency list graph C++ PageRank for beginners

Introduction: The Algorithm That Changed the Web

In the late 1990s, as the internet exploded with millions of webpages, search engines struggled to deliver relevant results. Two Stanford PhD students, Sergey Brin and Larry Page, asked a simple but powerful question: How can we trust information? Their answer became the PageRank algorithm, the foundation of Google. Today, similar ranking concepts power not only search engines but also recommendation systems in social media, e-commerce, and even AI-driven content feeds. In this tutorial, you'll implement a simplified PageRank algorithm in C++ using an adjacency list, a fundamental data structure for representing graphs. This project is typical of advanced data structures courses like COP3530, and mastering it will sharpen your skills in graph theory, iteration, and performance optimization.

Understanding PageRank: The Core Idea

PageRank treats the web as a directed graph. Each webpage is a node, and each hyperlink is an edge from the source page to the target page. The rank of a page depends on the ranks of pages that link to it. Importantly, a page distributes its rank equally among all its outgoing links. This means that a link from a high-ranking page (like Wikipedia) carries more weight than a link from a personal blog. The algorithm iteratively updates ranks until they converge. In this simplified version, you'll perform a fixed number of power iterations (p), similar to how Google's early prototypes worked.

Why Adjacency List?

For large graphs (like the web with billions of pages), an adjacency matrix is impractical due to O(V²) memory usage. An adjacency list stores only existing edges, making it memory-efficient for sparse graphs. In this project, you are required to implement an adjacency list; using a matrix will incur a 50% penalty. The adjacency list is typically implemented as a vector of lists or a map of vectors, where each vertex stores its outgoing neighbors.

Step-by-Step Implementation in C++

1. Data Structures

We'll define a Graph class with the following members:

  • adj_list: a std::unordered_map<std::string, std::vector<std::string>> mapping each URL to its outgoing links.
  • out_degree: a map storing the number of outgoing links per page.
  • in_degree: a map storing incoming links count (optional for rank calculation).
  • ranks: a map holding the current rank for each page.
class Graph {private:
    std::unordered_map<std::string, std::vector<std::string>> adj_list;
    std::unordered_map<std::string, int> out_degree;
    std::unordered_map<std::string, double> ranks;
    std::vector<std::string> pages; // sorted list of all unique pages
public:
    void addEdge(const std::string& from, const std::string& to);
    void PageRank(int power_iterations);
    void printRanks();
};

2. Adding Edges

When reading input, for each line from_page to_page, we add an edge from from to to. We also ensure both pages exist in our page list.

void Graph::addEdge(const std::string& from, const std::string& to) {
    adj_list[from].push_back(to);
    out_degree[from]++;
    // Ensure 'from' and 'to' are in pages set (we'll use a set later)
}

3. Initializing Ranks

All pages start with an equal rank: 1/N, where N is the number of unique pages. We'll store pages in a sorted vector for alphabetical output.

// After reading all edges, collect unique pages
std::set<std::string> all_pages;
for (auto& p : adj_list) {
    all_pages.insert(p.first);
    for (auto& neighbor : p.second)
        all_pages.insert(neighbor);
}
pages.assign(all_pages.begin(), all_pages.end());
int N = pages.size();
for (auto& page : pages)
    ranks[page] = 1.0 / N;

4. Power Iterations

For each iteration, compute new ranks based on incoming contributions. For each page, sum the ranks of pages that link to it, divided by their out-degree. We need to handle dangling nodes (pages with no outgoing links). In the simplified algorithm, dangling nodes are ignored (they don't distribute rank). However, a common approach is to treat them as having out-degree 0, so they contribute nothing. Alternatively, you can redistribute their rank equally to all pages. Check your assignment spec; here we'll assume they contribute nothing.

void Graph::PageRank(int power_iterations) {
    int N = pages.size();
    for (int iter = 0; iter < power_iterations; ++iter) {
        std::unordered_map<std::string, double> new_ranks;
        for (auto& page : pages)
            new_ranks[page] = 0.0;
        // For each page that has outgoing links, distribute its rank
        for (auto& page : pages) {
            if (out_degree[page] > 0) {
                double share = ranks[page] / out_degree[page];
                for (auto& neighbor : adj_list[page])
                    new_ranks[neighbor] += share;
            }
            // If out_degree == 0, do nothing (dangling node)
        }
        ranks = new_ranks;
    }
}

5. Output

Print each page in alphabetical order with its rank rounded to two decimal places.

void Graph::printRanks() {
    std::cout << std::fixed << std::setprecision(2);
    for (auto& page : pages)
        std::cout << page << " " << ranks[page] << std::endl;
}

Complexity Analysis

Understanding the computational complexity is crucial for optimization and documentation. Here's the Big O analysis:

  • addEdge(): O(1) average (hash map insertion).
  • PageRank(): O(p * (V + E)) where p = power iterations, V = number of vertices, E = number of edges. Each iteration loops through all pages and their outgoing edges.
  • printRanks(): O(V log V) due to sorting (if not already sorted). In our code, we maintain a sorted vector, so it's O(V).

Overall, the main method reads input (O(E)) and calls PageRank, so total complexity is O(E + p*(V+E)). For large graphs, this is efficient.

Real-World Connections: From Google to TikTok

PageRank's influence extends far beyond search engines. Today, social media platforms like TikTok and Instagram use similar graph-based ranking to recommend content. For example, a viral video (node) with many shares (edges) from influential creators gains higher visibility. Even AI chatbots like ChatGPT use attention mechanisms inspired by ranking algorithms. Understanding PageRank gives you insight into how information flows in networks—a skill valuable for data science, machine learning, and system design.

Testing Your Implementation

Test with small graphs. For instance, a simple 3-node cycle: A->B, B->C, C->A. After many iterations, each rank should converge to 1/3. Use the Catch2 framework to write unit tests as required by your assignment. Example test case:

TEST_CASE("Simple cycle rank") {
    Graph g;
    g.addEdge("A", "B");
    g.addEdge("B", "C");
    g.addEdge("C", "A");
    g.PageRank(100);
    // Each rank should be approximately 0.33
    CHECK(std::abs(g.getRank("A") - 1.0/3) < 0.01);
}

Common Pitfalls and Tips

  • Dangling nodes: Decide how to handle pages with no outgoing links. Some implementations redistribute their rank equally to all pages. Read your assignment spec carefully.
  • Floating-point precision: Use double and round to two decimals for output.
  • Memory: Use unordered_map for fast lookups. If URLs are long, consider hashing them.
  • Sorting: Keep a sorted list of page names to output in alphabetical order.

Conclusion

Implementing a simplified PageRank algorithm is a classic exercise that combines graph theory, data structures, and algorithmic thinking. By building this in C++ with an adjacency list, you're not only completing a course project but also gaining a deeper appreciation for the technology that powers the modern web. Whether you're preparing for a career in software engineering, data science, or AI, the concepts you learn here will serve you well. Now go ahead, code your solution, and test it thoroughly. Good luck!