Blockchain Course – Trust and Distributed Systems: Foundations for Blockchain

Opening Exercise: The Insider

Scenario: You are a group that shares a sensitive secret. However, an “Insider” has infiltrated your ranks. You must identify them before they figure out the secret.

The Rules:

The Secret: 6 of you will receive the exact same Secret Word. 1 of you will receive a blank card that says: You are the Insider.
The Broadcast: Go around the circle. Each person must say exactly one word related to the Secret Word to prove to the group that they are in on it.
- Strategy: Be specific enough to prove you know the secret, but vague enough that the Insider can’t guess it!
Consensus: After everyone has spoken, count to three and point at the person you believe is the Insider.
Resolution: If the group votes out the Insider, the group is safe! If the group votes out an honest member, or if the Insider successfully guesses the Secret Word, the Insider wins!

Overall Slide Concept This opening exercise establishes the emotional and practical feel of distributed coordination under uncertainty before any formal terminology appears. It matters here because students will understand later fault and consensus concepts better if they have already experienced the social version of an insider problem. Emphasize that the exercise is a live model of mutual suspicion, imperfect information, and group decision-making under risk.

Key Points - The insider game turns trust and deception into a concrete classroom experience before the theory is introduced. - Students should feel the tension between being informative enough to prove membership and vague enough not to help the adversary. - Consensus here means the group’s shared judgment about who the insider is after incomplete, peer-to-peer communication.

Walkthrough 1. Explain the secret-word and insider roles. 2. Have participants give one-word clues and then vote on the suspected insider. 3. Use the debrief to connect the game to malicious actors and agreement in distributed systems.

Sources None

Exercise (Digital Version)

blockchain-6dd63ebf.training-as-code.com/exercises/insider/

Trust

Two‑party trust (A ↔︎ B): At human scale, trust is an expectation that the other party will behave as promised; in security engineering, this becomes an assumption the system relies on about an entity’s behavior under stated conditions.
Introducing a third party (A ↔︎ TTP ↔︎ B): A trusted third party (e.g., broker, auditor, or certification authority) vouches for identity/attributes so A and B don’t need direct verification
Trade‑offs: Third parties can reduce bilateral friction but create transitive trust and potential single points of failure; the path‑validation algorithm and trust‑anchor model formalize how that trust is established and verified.

What is “trust” in digital systems?

Trust is the set of explicit assumptions and dependencies about an entity that a system’s security properties rely on.
Where trust “lives” in practice:
- Centralized Trust: Concentrated in a one or more third-parties (e.g., a bank or Certificate Authority). The system’s security relies entirely on the trusted entities behaving correctly. If compromised, the entire system fails.
- Distributed Trust: Spread across multiple participants and mechanisms. Instead of trusting one entity, trust is relocated to cryptography, consensus algorithms, and economic incentives, assuming that a majority of the network will act honestly.

Centralized vs. Distributed Architectures

Centralized Architecture
- Control: A single authority dictates system state and coordinates all work.
- Complexity: Simpler to design, update, monitor, and maintain.
- Trust Model: Users must place implicit, complete trust in the central hub.
- Vulnerability: Creates a single point of failure (both technical and administrative) that can compromise the entire service.

Distributed Architecture
- Control: Shared responsibility among many peer nodes without a central hub.
- Complexity: Highly complex to coordinate, synchronize state, and test under load.
- Trust Model: Trust is shifted from a central hub to cryptographic proofs and consensus rules.
- Vulnerability: Highly resilient to individual node failures, but susceptible to coordination breakdowns or network partitions.

Everyday Examples of Distributed Systems

Helium (IoT Network):
- A decentralized wireless network built by individuals deploying hotspots in their homes to provide connectivity over unlicensed radio frequencies (e.g., 915 MHz in the US).
InterPlanetary File System (IPFS):
- A peer-to-peer network for storing and sharing data. Files are found by their cryptographic hash rather than a centralized server location.
Hivemapper:
- A decentralized, community-owned mapping network where contributors use dashcams to collect imagery and update a global map.
The Fediverse (e.g., Mastodon):
- An ensemble of federated social networks where independent servers communicate with each other, eliminating reliance on a single corporate platform.

Overall Slide Concept This slide establishes that distributed systems are not rare abstractions; they already shape physical infrastructure, file sharing, mapping, and social networks. It matters here because students should see blockchain as part of a wider family of coordination designs rather than as a one-off anomaly. Emphasize that these systems distribute not only computation, but often control, ownership, or participation.

Key Points - Helium, IPFS, Hivemapper, and the Fediverse each show different ways to coordinate without one central operator. - DePIN means decentralized physical infrastructure networks built from distributed hardware contributions. - A shared protocol can produce one user experience even when many independent machines or communities participate.

Walkthrough None

Sources - [6]

Why Suspicion Matters

The Reality of Distributed Trust: Unlike centralized systems with a single trusted referee, distributed systems span multiple participants where perfect trust cannot be guaranteed.
Accident vs. Adversary: A node might fail due to a benign hardware crash (accidental), or it might actively lie and attempt to subvert the network (malicious).
Mutual Suspicion: The core design principle that every participant must operate as if others may be faulty or hostile. The system must be engineered to coordinate and reach consensus despite this distrust.

Suspicion In Practice

Hardware & Network Glitches: A faulty router might arbitrarily drop or misroute packets, creating inconsistent views of the network’s state.
Data Corruption: A degraded database replica might unknowingly serve different, conflicting answers to different clients querying the same record.
Malicious Insiders: A compromised node in a coalition system might deliberately alter intelligence data before forwarding it to allied peers.
The Takeaway: Mutual suspicion requires engineering systems that survive both benign accidents and deliberate attacks, often without needing to differentiate between the two.

Overall Slide Concept This slide establishes what mutual suspicion looks like in operational reality. It matters here because students need to translate the abstract idea of distrust into concrete issues such as packet loss, corruption, and malicious insiders. Emphasize that from the system’s perspective, bad information is dangerous whether it comes from accident or attack.

Key Points - Hardware glitches, corrupted replicas, and compromised insiders can all create conflicting system views. - Distributed systems must often survive the same observable behavior whether the cause is benign or malicious. - Verification and redundancy are needed because intent is usually hidden from the receiving node.

Walkthrough 1. Point to the healthy nodes and explain that they represent ordinary participants in the network. 2. Highlight the suspicious nodes to show that some participants may inject bad or conflicting information. 3. Conclude that the system has to continue functioning without first knowing which failures are accidental and which are hostile.

Sources - [6] - [7]

Crash Faults

The Failure: A node suddenly and completely stops responding, going “dark” without sending any further messages.
Common Causes: Benign hardware failures (like a power supply death), software panics, or deliberate denial-of-service attacks.
Detection & Recovery: Relatively easy to detect because silence is a clear indicator of failure. Systems recover using simple redundancy, such as failing over to a hot standby or routing traffic to another replica.

Omission Faults

The Failure: A node fails to send or receive some messages, but otherwise remains active and appears functional to the rest of the network.
Common Causes: Network congestion, dropped packets, flaky hardware connections, or an overloaded server quietly discarding requests.
Detection & Recovery: Harder to diagnose than a complete crash because the failures are intermittent. Defended against using robust communication protocols that require acknowledgments and automatic retries (e.g., TCP).

Timing Faults

The Failure: A node responds with the correct data, but outside of the required time window (typically arriving too late to be useful).
Common Causes: Overloaded processors, heavy network latency, or clock drift across distributed machines.
Detection & Recovery: Critical in real-time systems (like algorithmic trading or air traffic control) where a late response is practically equivalent to a wrong response. Managed through strict deadlines, timeouts, and logical sequencing.

Commission Faults

The Failure: A node actively processes a request but returns an incorrect or corrupted result to the network.
Common Causes: Undetected software bugs, memory corruption, misconfigurations, or unintentional logic errors.
Detection & Recovery: Highly dangerous because the output initially appears structurally valid. Systems must use cross-checking, replication (asking multiple nodes and comparing answers), and error-correcting codes to spot the discrepancy.

Byzantine Faults

The Failure: A node behaves arbitrarily and unpredictably, often with deliberate malicious intent to subvert the system.
The Behavior: The node doesn’t just crash; it actively lies, sends conflicting messages to different peers, or impersonates other nodes to break consensus.
The Challenge: This is the most dangerous and difficult fault to defend against, as the malicious node can dynamically adapt its behavior to exploit network rules.
Detection & Recovery: Simple redundancy is useless here. Defense requires specialized Byzantine Fault Tolerant (BFT) consensus protocols that utilize strict multi-party verification and voting thresholds.

Byzantine Generals Problem

Question: How can loyal actors reach reliable agreement despite malicious behavior?

Peer-to-Peer: Every node communicates directly with every other node
Independent Decisions: Each node broadcasts (“Attack” or “Retreat”)
Information Isolation: Nodes only see the messages sent directly to them
The Threat: A “traitor” node actively sends conflicting decisions
The Goal: All loyal nodes must arrive at the exact same correct conclusion

Overall Slide Concept This slide establishes the Byzantine Generals Problem as the intuitive story behind adversarial consensus failure. It matters here because students need a mental model before they encounter the formal version and later blockchain consensus designs. Emphasize that the hard part is not sharing messages, but distinguishing honest agreement from coordinated or conflicting lies.

Key Points - Loyal participants must agree even when traitors send conflicting information. - Information isolation is what gives the traitor leverage: no one sees the whole picture immediately. - The problem is fundamentally about reliable agreement under uncertainty and deception, not about military history itself.

Walkthrough 1. Identify the loyal and traitorous nodes in the diagram. 2. Show that loyal nodes send consistent messages while traitors send different instructions to different peers. 3. Conclude that the group needs a protocol strong enough to overcome contradictory reports and still converge.

Sources - [9] - [10] - [11]

Fault Tolerance Spectrum

Crash Faults (Easy): The node simply stops. Mitigated relatively easily with basic redundancy and timeouts.
Omission Faults (Moderate): The node remains active but drops messages. Addressed through acknowledgments and retransmissions.
Timing Faults (Moderate): The response is correct but arrives too late. Managed via strict deadlines and logical clocks.
Commission Faults (Hard): The node actively sends incorrect data due to bugs. Requires cross-checking across multiple nodes.
Byzantine Faults (Extreme): A node acts maliciously or unpredictably to actively subvert the system. Requires specialized cryptographic consensus protocols.

Overall Slide Concept This slide establishes that not all faults are equal and that system design depends on where a threat model sits on the severity spectrum. It matters here because students should not assume every distributed system needs the same defensive machinery. Emphasize that moving rightward on the spectrum means more coordination cost and stronger verification requirements.

Key Points - Crash faults are easier to tolerate than Byzantine faults because they do not involve active deception. - The spectrum helps explain why protocols built for benign environments fail in open hostile networks. - Threat models determine how much fault tolerance and verification a system must pay for.

Walkthrough 1. Start at crash faults and explain why simple redundancy is often enough there. 2. Move through omission, timing, and commission faults as the challenge becomes less visible and more misleading. 3. End at Byzantine faults to show why specialized consensus is needed in adversarial settings.

Sources - [7] - [6]

Why Failures Become Security Risks

Ambiguity of Intent: In distributed systems, benign operational faults and deliberate malicious attacks often manifest identically.
Crash vs. Denial-of-Service: A node suddenly going dark could be a simple power failure, or a targeted DoS attack.
Omission vs. Censorship: Dropped messages might stem from standard network congestion, or an adversary intentionally filtering traffic.
Commission vs. Compromise: A node returning incorrect data could be suffering from a software bug, or it may have been actively hijacked.
The Engineering Reality: Because failures do not come with labels, reliability and security are inextricably linked. Systems must survive the behavior, regardless of the intent.

Adversarial Scenario: The Compromised Node

The 1990s Dinner Problem: You and five friends are coordinating dinner plans using only 1:1 landline phone calls. Everyone must agree on the same restaurant, or the night is ruined.
The “Alien Imposter”: Unbeknownst to you, one friend is compromised by a malicious imposter. To cause maximum chaos, the imposter tells half the group “we are getting pizza” and the other half “we are getting tacos.”
Spreading Confusion: Because everyone only communicates 1:1, honest friends start passing along conflicting information. A single bad actor creates disagreement where none existed.
The Defense: To survive this, the group needs a protocol (like everyone calling everyone else to cross-check answers) to identify the lie and reach a majority decision despite the imposter.

Strategies for Handling Suspicion

Redundancy & Replication: Instead of relying on a single server, multiple nodes maintain identical copies of the data and provide the same service, ensuring that no single hardware failure can destroy the information.
Majority Voting: The system relies on the collective decision of the group rather than trusting any individual node. The result agreed upon by the majority is accepted as truth.
Quorum Systems: Protocols require a strict minimum number of consistent responses (a quorum) before committing a transaction or making a system-wide decision, preventing a small malicious faction from hijacking the network.
Cryptographic Proofs: Data is mathematically verified (e.g., through digital signatures and cryptographic hashes) to ensure it has not been altered by a Byzantine node.

Consensus as the Core Goal

The Objective: In a distributed network, consensus is the mechanism by which all honest, non-faulty nodes agree on a single shared state or action.
Safety (Consistency): The guarantee that the system never contradicts itself. Once an agreement is reached, it is final, and no two honest nodes will ever decide on conflicting values.
Liveness (Availability): The guarantee that the system always makes progress. The system must continue to process transactions and cannot be permanently stalled by failures or malicious actors.

Classical Consensus Approaches

Simple Replication: Early systems merely copied data across multiple nodes and periodically checked for consistency, offering basic redundancy but poor conflict resolution.
Two-Phase Commit: A central coordinator asks all participants if they are ready to proceed. Only if everyone says yes does the coordinator issue a final commit command.
Paxos & Raft: Formal, robust algorithms that use leader election and majority voting to achieve consensus safely, even if some nodes crash or messages are lost.
The Limitation: These classical models were designed for benign corporate environments. They expertly handle crash and omission faults but completely break down if a node turns Byzantine (malicious).

Why Decentralization Is Harder

No Central Referee: There is no single trusted authority to dictate the truth, resolve disputes, or coordinate actions. Every node must independently verify the state of the network.
Massive Scale: Reaching agreement requires coordination and communication across thousands of independent nodes, vastly increasing the computational overhead.
Network Latency: In a global, decentralized network, messages take time to travel across continents, leading to unavoidable delays and temporary inconsistencies.
Adversarial Environments: Because anyone can join an open network, systems must assume the presence of malicious participants actively attempting to subvert the consensus or feed the network bad data.

The Byzantine Generals Problem (formal)

Problem Statement: A group of distributed actors (generals) must agree on a single plan (attack/retreat), but some actors (traitors) or communication channels may be faulty and actively malicious.
Goal (Interactive Consistency):
1. All loyal actors must agree on the same plan.
The Impossibility Result: With oral (unauthenticated) messages, a solution is only possible if the total number of generals, \(n\), is strictly greater than three times the number of traitors, \(m\). \[n > 3m \text{ or } n \ge 3m + 1\]

Note: this presentation slightly adapts the original 1982 formulation by removing the hierarchical Commander/Lieutenant dynamic to better reflect flat, peer-to-peer blockchain networks.

Overall Slide Concept This slide establishes the formal version of Byzantine agreement and the conditions under which unauthenticated message systems can or cannot succeed. It matters here because students have now seen the intuition and are ready for the mathematical statement that frames later consensus research. Emphasize that authentication changes what can be solved because it reduces ambiguity about who said what.

Key Points - The formal goal is interactive consistency: loyal participants must agree and follow a loyal leader’s instruction consistently. - With unauthenticated messages, agreement requires enough honest participants to outnumber traitors by the classic threshold condition. - Authenticated messages, such as digital signatures, make lying and impersonation easier to detect and therefore change the problem.

Walkthrough None

Sources - [7]

Mini-Activity: Byzantine Agreement

Setup: Form groups of 6 or 7. The instructor will hand you a secret card defining your Role (Loyal or Traitor) and your Initial Value (“Attack” or “Retreat”).
Phase 1: Broadcast: You must send a direct, secret message (e.g., a slip of paper) to every other person in your group stating your value.
- Loyal nodes: Must accurately send their given Initial Value to everyone.
- Traitor nodes: Can lie, and should actively send conflicting values to different peers to cause confusion.
Phase 2: Tally: Look at the messages you received. Combine them with your own Initial Value. What is the majority?
Phase 3: Reveal: On the count of three, point up for “Attack” or down for “Retreat.” Did the loyal nodes reach consensus?

Overall Slide Concept This activity slide establishes a hands-on simulation of Byzantine agreement without a central coordinator. It matters here because students can feel how majority logic and cross-checking become necessary once traitors are allowed to send conflicting values. Emphasize that the exercise demonstrates why consensus depends on structured message exchange rather than trust in any one participant.

Key Points - Loyal nodes must send consistent values, while traitors try to break agreement by sending conflicting ones. - The exercise forces students to reconstruct shared truth from individually received messages. - Majority voting here is a simplified illustration of the broader principle behind many consensus protocols.

Walkthrough 1. Students receive secret role and initial-value cards. 2. Each participant sends direct messages to every other participant, with traitors free to lie. 3. Each student tallies the received values and reveals a final decision so the group can see whether consensus held.

Sources None

Mini-Activity: Digital Version

blockchain-6dd63ebf.training-as-code.com/exercises/bgp/

Recap & Key Takeaways

Trust is an Engineering Assumption: In secure systems, trust is not a feeling; it’s a documented dependency.
Centralized Trust requires delegated third-parties.
Decentralized Trust requires a mechanism for forming consensus.
The Byzantine Generals Problem is the foundational challenge of reaching agreement in the presence of malicious actors. The solution depends on the assumptions you can make (e.g., unforgeable signatures).

Overall Slide Concept This recap slide establishes the lasting lesson takeaway: trust in distributed systems is a design assumption, and consensus is the mechanism that makes decentralized coordination possible under suspicion. It matters here because students should leave with one coherent chain from trust assumptions to fault models to Byzantine agreement and consensus design. Emphasize that blockchain inherits and extends these distributed-systems foundations rather than escaping them.

Key Points - Trust should now be understood as an engineering dependency rather than a vague social feeling. - Decentralized systems survive by relocating trust into protocols, verification, and incentives. - The Byzantine Generals Problem is the canonical foundation for understanding malicious-agreement challenges in blockchain systems.

Walkthrough None

Sources - [5] - [7]

References

[1]

Helium Documentation, “Helium hotspot app.” Accessed: Mar. 28, 2026. [Online]. Available: https://docs.helium.com/mine-hnt/helium-hotspot-app

[2]

IPFS, “IPFS.” Accessed: Mar. 18, 2026. [Online]. Available: https://ipfs.tech

[3]

Hivemapper, “Hivemapper - build a decentralized global map.” Accessed: Mar. 28, 2026. [Online]. Available: https://hivemapper.com/tos/map-products/

[4]

Mastodon Documentation, “Using the network features.” Accessed: Mar. 28, 2026. [Online]. Available: https://docs.joinmastodon.org/user/network/

[5]

D. Yaga, P. Mell, N. Roby, and K. Scarfone, “Blockchain technology overview,” National Institute of Standards and Technology, Gaithersburg, MD, NIST IR 8202, Oct. 2018. doi: 10.6028/NIST.IR.8202.

[6]

D. L. Chaum, “Computer systems established, maintained and trusted by mutually suspicious groups,” PhD thesis, 1982. Available: https://cdn.nakamotoinstitute.org/docs/computer-systems-by-mutually-suspicious-groups.pdf

[7]

L. Lamport, R. Shostak, and M. Pease, “The Byzantine Generals Problem,” in ACM Transactions on Programming Languages and Systems, 1982, pp. 382–401. Available: https://nakamotoinstitute.org/static/docs/the-byzantine-generals-problem.pdf

[8]

L. Lamport, “Time, clocks, and the ordering of events in a distributed system,” Commun. ACM, vol. 21, no. 7, pp. 558–565, Jul. 1978, doi: 10.1145/359545.359563.

[9]

L. Lamport, “Paxos made simple,” ACM SIGACT News (Distributed Computing Column) 32, 4 (Whole Number 121, December 2001), pp. 51–58, Dec. 2001, Available: https://lamport.azurewebsites.net/pubs/paxos-simple.pdf

[10]

D. Ongaro and J. Ousterhout, “In Search of an Understandable Consensus Algorithm,” in 2014 USENIX Annual Technical Conference (USENIX ATC 14), Philadelphia, PA: USENIX Association, Jun. 2014, pp. 305–319. Available: https://www.usenix.org/conference/atc14/technical-sessions/presentation/ongaro

[11]

M. Castro and B. Liskov, “Practical Byzantine Fault Tolerance,” in Proceedings of the Third Symposium on Operating Systems Design and Implementation, New Orleans, LA, USA, Feb. 1999, pp. 173–186. Available: http://pmg.csail.mit.edu/papers/osdi99.pdf