CS6601 Assignment 3 Bayes Nets: James Bond Security System Tutorial

Introduction: Bayesian Networks and the MI6 Security System

In CS6601 Assignment 3, you are tasked with building a Bayesian network to model a complex espionage mission. This assignment is a fantastic opportunity to apply probabilistic reasoning to a real-world-inspired scenario—complete with James Bond, Q, M, and the villainous Spectre. Bayesian networks (Bayes nets) are a powerful tool for reasoning under uncertainty, used in everything from AI diagnostics to recommendation systems. In this tutorial, we’ll walk through the process of designing the network for the MI6 security system, using the pgmpy library. We’ll also connect the concepts to current trends like AI-driven threat detection and cybersecurity, making the material timely and engaging.

Understanding the Scenario: Nodes and Relationships

The problem describes seven random variables (nodes) that capture the key events in Spectre’s plan:

H: Spectre hires professional hackers
C: Spectre buys the state-of-the-art computer “Contra”
M: Spectre hires mercenaries
B: Bond is guarding M at the time of the kidnapping
Q: Q’s database is hacked and the cipher is compromised
K: M gets kidnapped and gives away the key
D: Spectre succeeds in obtaining the “Double-0” files

These nodes are connected by causal dependencies. For example, whether Q’s database is hacked (Q) depends on both hiring hackers (H) and having Contra (C). Similarly, whether M gets kidnapped (K) depends on hiring mercenaries (M) and Bond’s presence (B). Finally, success (D) depends on both Q and K. This forms a directed acyclic graph (DAG) that we will encode using pgmpy.

Step 1: Setting Up the Bayesian Network Structure

First, import the necessary library and create an empty network. Remember that pgmpy is the allowed package for this assignment, but you must avoid prohibited modules like pgmpy.sampling.

from pgmpy.models import BayesianNetwork

# Create an instance of the network
security_net = BayesianNetwork()

# Add nodes
security_net.add_node("H")
security_net.add_node("C")
security_net.add_node("M")
security_net.add_node("B")
security_net.add_node("Q")
security_net.add_node("K")
security_net.add_node("D")

Now, add edges based on the causal relationships described in the problem. For instance, Q depends on H and C, so we add edges from H to Q and from C to Q.

# Add edges
security_net.add_edge("H", "Q")
security_net.add_edge("C", "Q")
security_net.add_edge("M", "K")
security_net.add_edge("B", "K")
security_net.add_edge("Q", "D")
security_net.add_edge("K", "D")

Note that nodes H, C, M, and B are root nodes (no parents). This structure captures the intuition that hiring hackers and buying Contra independently influence the hacking success, while mercenaries and Bond’s presence influence the kidnapping outcome.

Step 2: Defining Conditional Probability Tables (CPTs)

Each node requires a conditional probability distribution given its parents. For root nodes, we specify marginal probabilities. The problem provides these probabilities in textual form. Let’s convert them into numbers.

H: P(H=false) = 0.5 → P(H=true) = 0.5
C: P(C=true) = 0.3 → P(C=false) = 0.7
M: P(M=false) = 0.2 → P(M=true) = 0.8
B: No prior given; we can assume P(B=true)=0.5 for simplicity, but the assignment may expect you to define it later. However, the problem doesn't specify a prior for B, so we might treat it as a root with equal probability. Check your assignment details.

For non-root nodes, we need tables. Let’s define them using TabularCPD from pgmpy.

from pgmpy.factors.discrete import TabularCPD

# Q: parents H and C
# Values: [P(Q=false|H,C), P(Q=true|H,C)]
# Order of parents: H, C. States: false, true.
cpd_q = TabularCPD(
    variable='Q', variable_card=2,
    values=[[0.95, 0.75, 0.45, 0.1],  # Q=false
            [0.05, 0.25, 0.55, 0.9]], # Q=true
    evidence=['H', 'C'], evidence_card=[2, 2],
    state_names={'Q': ['false', 'true'],
                 'H': ['false', 'true'],
                 'C': ['false', 'true']}
)

# K: parents M and B
# Values: [P(K=false|M,B), P(K=true|M,B)]
# Order: M, B
cpd_k = TabularCPD(
    variable='K', variable_card=2,
    values=[[0.99, 0.85, 0.25, 0.05],  # K=false
            [0.01, 0.15, 0.75, 0.95]], # K=true
    evidence=['M', 'B'], evidence_card=[2, 2],
    state_names={'K': ['false', 'true'],
                 'M': ['false', 'true'],
                 'B': ['false', 'true']}
)

# D: parents Q and K
# Values: [P(D=false|Q,K), P(D=true|Q,K)]
# Order: Q, K
cpd_d = TabularCPD(
    variable='D', variable_card=2,
    values=[[0.98, 0.35, 0.6, 0.01],  # D=false
            [0.02, 0.65, 0.4, 0.99]], # D=true
    evidence=['Q', 'K'], evidence_card=[2, 2],
    state_names={'D': ['false', 'true'],
                 'Q': ['false', 'true'],
                 'K': ['false', 'true']}
)

Now add these CPDs to the network:

security_net.add_cpds(cpd_q, cpd_k, cpd_d)

# Check if the network is valid
print(security_net.check_model())  # Should return True

Step 3: Verifying the Network

You can now perform inference using VariableElimination (allowed) to answer queries. For example, what is the probability that Spectre succeeds given that Bond is not guarding M?

from pgmpy.inference import VariableElimination

inference = VariableElimination(security_net)
prob = inference.query(variables=['D'], evidence={'B': 'false'})
print(prob)

This will compute the marginal distribution of D given B=false. You can experiment with different evidence to understand the network’s behavior.

Trend Connection: Bayesian Networks in Modern AI

Bayesian networks are not just academic—they power real-world systems. For instance, AI-driven threat detection in cybersecurity often uses probabilistic graphical models to infer whether a network intrusion is happening based on sensor data. Similarly, recommendation algorithms in apps like Netflix or Spotify use Bayesian approaches to predict user preferences. The current hype around generative AI also relies on probabilistic models (though often more complex). By mastering Bayes nets in CS6601, you’re building a foundation for these cutting-edge applications.

Common Pitfalls and Tips

Edge direction: Ensure edges go from cause to effect. For example, H and C cause Q, not the reverse.
CPD values: Double-check the probabilities from the problem statement. The order of parent states matters—pgmpy expects the values in lexicographic order of the parent states (e.g., (H=false, C=false), (H=false, C=true), (H=true, C=false), (H=true, C=true)).
Submission limit: You only have 7 submissions on Gradescope, so test thoroughly locally. Use check_model() to validate your network.
Prohibited modules: Avoid using pgmpy.sampling, pgmpy.factor.*, and pgmpy.estimators.*. Stick to BayesianNetwork, TabularCPD, and VariableElimination.

Conclusion

You’ve now designed a Bayesian network for the MI6 security system. This tutorial covered the structure, CPTs, and inference, all within the constraints of CS6601 Assignment 3. Remember to fill in the make_security_system_net() function in submission.py with your code. Good luck, and may your network be both accurate and efficient!