Python Game Theory Simulation: Iterated Prisoner's Dilemma Strategies Tutorial

Introduction: Why Game Theory Matters in 2026

Game theory isn't just for economists—it's everywhere. In 2026, AI agents negotiate trades in decentralized markets, esports teams decide whether to share resources or go solo, and even your favorite apps use cooperative strategies to recommend content. Understanding how to simulate and evaluate strategies in games like the Prisoner's Dilemma gives you a superpower: predicting behavior in competitive environments. In this tutorial, you'll build a Python simulation that pits classic strategies against each other, analyze their performance, and design your own winning approach.

Understanding the Prisoner's Dilemma

The classic game: two players choose to cooperate or defect. Payoffs:

Both cooperate: +3 each
One cooperates, one defects: defector gets +5, cooperator gets 0
Both defect: +1 each

In a single round, defection is dominant. But in an iterated game, cooperation can emerge. This is the core of your assignment.

Setting Up the Simulation Framework

You'll write a Python program that accepts command-line arguments: num_of_iterations (default 2000) and num_of_strategies (default 8). The program simulates round-robin matches between strategies, recording total coins earned.

import sys

def simulate_match(strategy1, strategy2, rounds):
    my_history = []
    opp_history = []
    total1 = 0
    total2 = 0
    for r in range(rounds):
        move1 = strategy1(my_history, opp_history)
        move2 = strategy2(opp_history, my_history)
        # update histories and scores
    return total1, total2

Implementing the Predefined Strategies

alwaysCooperate and alwaysDefect

These are your baselines. alwaysCooperate returns 1 (cooperate) every round. alwaysDefect returns 0 (defect). Simple but essential for comparison.

def strategy_alwaysCooperate(myHistory, oppHistory):
    return 1

def strategy_alwaysDefect(myHistory, oppHistory):
    return 0

probeAndLock – The Scout

This strategy defects for the first 20 rounds, cooperates for the next 20, then locks onto whichever action yielded higher total reward. It's like a player testing the waters before committing.

def strategy_probeAndLock(myHistory, oppHistory):
    if len(myHistory) < 20:
        return 0
    elif len(myHistory) < 40:
        return 1
    else:
        reward1 = rangeReward(0, 20, myHistory, oppHistory)
        reward2 = rangeReward(20, 40, myHistory, oppHistory)
        return 0 if reward1 > reward2 else 1

continuousProbe – The Adaptive Learner

After the first two rounds (defect then cooperate), it calculates average reward for each action and picks the one with higher average. This strategy learns on the fly, similar to how AI models update based on feedback.

def strategy_continuousProbe(myHistory, oppHistory):
    if len(myHistory) == 0:
        return 0
    if len(myHistory) == 1:
        return 1
    # compute average rewards for defect and cooperate
    defect_rewards = []
    coop_rewards = []
    for i in range(len(myHistory)):
        # reward for that round based on both moves
        if myHistory[i] == 0:
            defect_rewards.append(reward_for_round(i, myHistory, oppHistory))
        else:
            coop_rewards.append(reward_for_round(i, myHistory, oppHistory))
    avg_defect = sum(defect_rewards)/len(defect_rewards) if defect_rewards else 0
    avg_coop = sum(coop_rewards)/len(coop_rewards) if coop_rewards else 0
    return 0 if avg_defect >= avg_coop else 1

defectUntilCooperate – The Grudge Holder

Defects until the opponent cooperates once, then cooperates forever. This is a classic 'forgiving' strategy, but it can be exploited by defectors.

def strategy_defectUntilCooperate(myHistory, oppHistory):
    if 1 in oppHistory:
        return 1
    else:
        return 0

opponentCooperatePercentage – The Threshold Strategist

Three variants: thresholds at 10%, 50%, and 90%. If opponent cooperation rate exceeds threshold, cooperate; else defect. This strategy is like a social media algorithm that changes behavior based on user engagement.

def strategy_opponentCooperate10Percentage(myHistory, oppHistory):
    if len(oppHistory) == 0:
        return 0
    coop_rate = sum(oppHistory) / len(oppHistory)
    return 1 if coop_rate > 0.1 else 0

# similarly for 50% and 90%

random50 – The Wildcard

Randomly chooses cooperate or defect with 50% probability. Unpredictable, but rarely optimal.

import random

def strategy_random50(myHistory, oppHistory):
    return random.choice([0, 1])

Designing Your Own Strategy

Now for the creative part. To beat the predefined strategies, consider a hybrid: start with cooperation to build trust, but if the opponent defects frequently, switch to defection. For instance, a 'tit-for-tat' variant that forgives occasional defections but retaliates after repeated betrayals. In 2026's AI-driven world, strategies that balance cooperation with self-defense often win—just like in multiplayer online games where alliances shift.

def strategy_myCustom(myHistory, oppHistory):
    # If opponent defected more than 30% of the time, defect
    if len(oppHistory) > 0 and sum(oppHistory)/len(oppHistory) < 0.7:
        return 0
    # Otherwise, cooperate
    return 1

Running the Simulation and Analyzing Results

Your main function loops over all strategy pairs, calls simulate_match, and accumulates totals. With default 2000 rounds, you'll see patterns like continuousProbe often leading, while alwaysDefect does well against naive cooperators but poorly against adaptive strategies.

Example output for 4 rounds, 2 strategies:

alwaysCooperate: 0
alwaysDefect: 20

With more rounds, cooperation becomes viable. At 2000 rounds, continuousProbe might score ~30096, while alwaysDefect gets ~22084. The key insight: adaptive strategies outperform fixed ones in the long run.

Tips for Success

Implement rangeReward carefully—it's used by probeAndLock.
Test with small iteration counts (4, 5) to verify your logic matches sample outputs.
Use Python's sys.argv to parse command-line arguments.
Keep your custom strategy simple but clever; often a small twist on tit-for-tat works wonders.

Conclusion

By building this simulation, you've not only completed your assignment but also gained insight into how cooperation evolves in competitive systems. Whether you're analyzing esports team dynamics, AI negotiation tactics, or even financial markets, the same principles apply. Now go ahead—implement your strategies, run the numbers, and see which one reigns supreme.