Programming lesson
Python Game Theory Simulation: Building and Testing Iterated Prisoner's Dilemma Strategies
Learn to implement and evaluate classic game theory strategies in Python using the Prisoner's Dilemma. This tutorial covers alwaysCooperate, alwaysDefect, probeAndLock, continuousProbe, and more, with simulation code and performance analysis.
Introduction: Why Game Theory Matters in 2026
Game theory isn't just for economists—it's everywhere. In 2026, AI agents negotiate trades in decentralized markets, esports teams decide whether to share resources or go solo, and even your favorite apps use cooperative strategies to recommend content. Understanding how to simulate and evaluate strategies in games like the Prisoner's Dilemma gives you a superpower: predicting behavior in competitive environments. In this tutorial, you'll build a Python simulation that pits classic strategies against each other, analyze their performance, and design your own winning approach.
Understanding the Prisoner's Dilemma
The classic game: two players choose to cooperate or defect. Payoffs:
- Both cooperate: +3 each
- One cooperates, one defects: defector gets +5, cooperator gets 0
- Both defect: +1 each
In a single round, defection is dominant. But in an iterated game, cooperation can emerge. This is the core of your assignment.
Setting Up the Simulation Framework
You'll write a Python program that accepts command-line arguments: num_of_iterations (default 2000) and num_of_strategies (default 8). The program simulates round-robin matches between strategies, recording total coins earned.
import sys
def simulate_match(strategy1, strategy2, rounds):
my_history = []
opp_history = []
total1 = 0
total2 = 0
for r in range(rounds):
move1 = strategy1(my_history, opp_history)
move2 = strategy2(opp_history, my_history)
# update histories and scores
return total1, total2Implementing the Predefined Strategies
alwaysCooperate and alwaysDefect
These are your baselines. alwaysCooperate returns 1 (cooperate) every round. alwaysDefect returns 0 (defect). Simple but essential for comparison.
def strategy_alwaysCooperate(myHistory, oppHistory):
return 1
def strategy_alwaysDefect(myHistory, oppHistory):
return 0probeAndLock – The Scout
This strategy defects for the first 20 rounds, cooperates for the next 20, then locks onto whichever action yielded higher total reward. It's like a player testing the waters before committing.
def strategy_probeAndLock(myHistory, oppHistory):
if len(myHistory) < 20:
return 0
elif len(myHistory) < 40:
return 1
else:
reward1 = rangeReward(0, 20, myHistory, oppHistory)
reward2 = rangeReward(20, 40, myHistory, oppHistory)
return 0 if reward1 > reward2 else 1continuousProbe – The Adaptive Learner
After the first two rounds (defect then cooperate), it calculates average reward for each action and picks the one with higher average. This strategy learns on the fly, similar to how AI models update based on feedback.
def strategy_continuousProbe(myHistory, oppHistory):
if len(myHistory) == 0:
return 0
if len(myHistory) == 1:
return 1
# compute average rewards for defect and cooperate
defect_rewards = []
coop_rewards = []
for i in range(len(myHistory)):
# reward for that round based on both moves
if myHistory[i] == 0:
defect_rewards.append(reward_for_round(i, myHistory, oppHistory))
else:
coop_rewards.append(reward_for_round(i, myHistory, oppHistory))
avg_defect = sum(defect_rewards)/len(defect_rewards) if defect_rewards else 0
avg_coop = sum(coop_rewards)/len(coop_rewards) if coop_rewards else 0
return 0 if avg_defect >= avg_coop else 1defectUntilCooperate – The Grudge Holder
Defects until the opponent cooperates once, then cooperates forever. This is a classic 'forgiving' strategy, but it can be exploited by defectors.
def strategy_defectUntilCooperate(myHistory, oppHistory):
if 1 in oppHistory:
return 1
else:
return 0opponentCooperatePercentage – The Threshold Strategist
Three variants: thresholds at 10%, 50%, and 90%. If opponent cooperation rate exceeds threshold, cooperate; else defect. This strategy is like a social media algorithm that changes behavior based on user engagement.
def strategy_opponentCooperate10Percentage(myHistory, oppHistory):
if len(oppHistory) == 0:
return 0
coop_rate = sum(oppHistory) / len(oppHistory)
return 1 if coop_rate > 0.1 else 0
# similarly for 50% and 90%random50 – The Wildcard
Randomly chooses cooperate or defect with 50% probability. Unpredictable, but rarely optimal.
import random
def strategy_random50(myHistory, oppHistory):
return random.choice([0, 1])Designing Your Own Strategy
Now for the creative part. To beat the predefined strategies, consider a hybrid: start with cooperation to build trust, but if the opponent defects frequently, switch to defection. For instance, a 'tit-for-tat' variant that forgives occasional defections but retaliates after repeated betrayals. In 2026's AI-driven world, strategies that balance cooperation with self-defense often win—just like in multiplayer online games where alliances shift.
def strategy_myCustom(myHistory, oppHistory):
# If opponent defected more than 30% of the time, defect
if len(oppHistory) > 0 and sum(oppHistory)/len(oppHistory) < 0.7:
return 0
# Otherwise, cooperate
return 1Running the Simulation and Analyzing Results
Your main function loops over all strategy pairs, calls simulate_match, and accumulates totals. With default 2000 rounds, you'll see patterns like continuousProbe often leading, while alwaysDefect does well against naive cooperators but poorly against adaptive strategies.
Example output for 4 rounds, 2 strategies:
alwaysCooperate: 0
alwaysDefect: 20With more rounds, cooperation becomes viable. At 2000 rounds, continuousProbe might score ~30096, while alwaysDefect gets ~22084. The key insight: adaptive strategies outperform fixed ones in the long run.
Tips for Success
- Implement
rangeRewardcarefully—it's used by probeAndLock. - Test with small iteration counts (4, 5) to verify your logic matches sample outputs.
- Use Python's
sys.argvto parse command-line arguments. - Keep your custom strategy simple but clever; often a small twist on tit-for-tat works wonders.
Conclusion
By building this simulation, you've not only completed your assignment but also gained insight into how cooperation evolves in competitive systems. Whether you're analyzing esports team dynamics, AI negotiation tactics, or even financial markets, the same principles apply. Now go ahead—implement your strategies, run the numbers, and see which one reigns supreme.