Trust and Distributed Systems: Foundations for Blockchain

Section I: Introduction

Army Cyber Institute

April 9, 2026

Opening Exercise: The Insider

Scenario: You are a group that shares a sensitive secret. However, an “Insider” has infiltrated your ranks. You must identify them before they figure out the secret.

The Rules:

  1. The Secret: 6 of you will receive the exact same Secret Word. 1 of you will receive a blank card that says: You are the Insider.
  2. The Broadcast: Go around the circle. Each person must say exactly one word related to the Secret Word to prove to the group that they are in on it.
    • Strategy: Be specific enough to prove you know the secret, but vague enough that the Insider can’t guess it!
  3. Consensus: After everyone has spoken, count to three and point at the person you believe is the Insider.
  4. Resolution: If the group votes out the Insider, the group is safe! If the group votes out an honest member, or if the Insider successfully guesses the Secret Word, the Insider wins!

Exercise (Digital Version)

blockchain-6dd63ebf.training-as-code.com/exercises/insider/

Trust

  • Two‑party trust (A ↔︎ B): At human scale, trust is an expectation that the other party will behave as promised; in security engineering, this becomes an assumption the system relies on about an entity’s behavior under stated conditions.
  • Introducing a third party (A ↔︎ TTP ↔︎ B): A trusted third party (e.g., broker, auditor, or certification authority) vouches for identity/attributes so A and B don’t need direct verification
  • Trade‑offs: Third parties can reduce bilateral friction but create transitive trust and potential single points of failure; the path‑validation algorithm and trust‑anchor model formalize how that trust is established and verified.

What is “trust” in digital systems?

  • Trust is the set of explicit assumptions and dependencies about an entity that a system’s security properties rely on.

  • Where trust “lives” in practice:

    • Centralized Trust: Concentrated in a one or more third-parties (e.g., a bank or Certificate Authority). The system’s security relies entirely on the trusted entities behaving correctly. If compromised, the entire system fails.
    • Distributed Trust: Spread across multiple participants and mechanisms. Instead of trusting one entity, trust is relocated to cryptography, consensus algorithms, and economic incentives, assuming that a majority of the network will act honestly.

Centralized vs. Distributed Architectures

  • Centralized Architecture
    • Control: A single authority dictates system state and coordinates all work.
    • Complexity: Simpler to design, update, monitor, and maintain.
    • Trust Model: Users must place implicit, complete trust in the central hub.
    • Vulnerability: Creates a single point of failure (both technical and administrative) that can compromise the entire service.
  • Distributed Architecture
    • Control: Shared responsibility among many peer nodes without a central hub.
    • Complexity: Highly complex to coordinate, synchronize state, and test under load.
    • Trust Model: Trust is shifted from a central hub to cryptographic proofs and consensus rules.
    • Vulnerability: Highly resilient to individual node failures, but susceptible to coordination breakdowns or network partitions.

Everyday Examples of Distributed Systems

  • Helium (IoT Network):
    • A decentralized wireless network built by individuals deploying hotspots in their homes to provide connectivity over unlicensed radio frequencies (e.g., 915 MHz in the US).
  • InterPlanetary File System (IPFS):
    • A peer-to-peer network for storing and sharing data. Files are found by their cryptographic hash rather than a centralized server location.
  • Hivemapper:
    • A decentralized, community-owned mapping network where contributors use dashcams to collect imagery and update a global map.
  • The Fediverse (e.g., Mastodon):
    • An ensemble of federated social networks where independent servers communicate with each other, eliminating reliance on a single corporate platform.

Why Suspicion Matters

  • The Reality of Distributed Trust: Unlike centralized systems with a single trusted referee, distributed systems span multiple participants where perfect trust cannot be guaranteed.
  • Accident vs. Adversary: A node might fail due to a benign hardware crash (accidental), or it might actively lie and attempt to subvert the network (malicious).
  • Mutual Suspicion: The core design principle that every participant must operate as if others may be faulty or hostile. The system must be engineered to coordinate and reach consensus despite this distrust.

Suspicion In Practice

  • Hardware & Network Glitches: A faulty router might arbitrarily drop or misroute packets, creating inconsistent views of the network’s state.
  • Data Corruption: A degraded database replica might unknowingly serve different, conflicting answers to different clients querying the same record.
  • Malicious Insiders: A compromised node in a coalition system might deliberately alter intelligence data before forwarding it to allied peers.
  • The Takeaway: Mutual suspicion requires engineering systems that survive both benign accidents and deliberate attacks, often without needing to differentiate between the two.

G n1 n2 n1--n2 n3 n1--n3 n5 n1--n5 n4 n2--n4 n6 n2--n6 n3--n4 n7 n3--n7 n5--n6 n4--n5 n6--n7 n7--n1 bad1 ! bad1--n2 bad1--n4 bad2 ? bad1--bad2 bad2--n5 bad2--n7

Crash Faults

  • The Failure: A node suddenly and completely stops responding, going “dark” without sending any further messages.
  • Common Causes: Benign hardware failures (like a power supply death), software panics, or deliberate denial-of-service attacks.
  • Detection & Recovery: Relatively easy to detect because silence is a clear indicator of failure. Systems recover using simple redundancy, such as failing over to a hot standby or routing traffic to another replica.

Omission Faults

  • The Failure: A node fails to send or receive some messages, but otherwise remains active and appears functional to the rest of the network.
  • Common Causes: Network congestion, dropped packets, flaky hardware connections, or an overloaded server quietly discarding requests.
  • Detection & Recovery: Harder to diagnose than a complete crash because the failures are intermittent. Defended against using robust communication protocols that require acknowledgments and automatic retries (e.g., TCP).

Timing Faults

  • The Failure: A node responds with the correct data, but outside of the required time window (typically arriving too late to be useful).
  • Common Causes: Overloaded processors, heavy network latency, or clock drift across distributed machines.
  • Detection & Recovery: Critical in real-time systems (like algorithmic trading or air traffic control) where a late response is practically equivalent to a wrong response. Managed through strict deadlines, timeouts, and logical sequencing.

Commission Faults

  • The Failure: A node actively processes a request but returns an incorrect or corrupted result to the network.
  • Common Causes: Undetected software bugs, memory corruption, misconfigurations, or unintentional logic errors.
  • Detection & Recovery: Highly dangerous because the output initially appears structurally valid. Systems must use cross-checking, replication (asking multiple nodes and comparing answers), and error-correcting codes to spot the discrepancy.

Byzantine Faults

  • The Failure: A node behaves arbitrarily and unpredictably, often with deliberate malicious intent to subvert the system.
  • The Behavior: The node doesn’t just crash; it actively lies, sends conflicting messages to different peers, or impersonates other nodes to break consensus.
  • The Challenge: This is the most dangerous and difficult fault to defend against, as the malicious node can dynamically adapt its behavior to exploit network rules.
  • Detection & Recovery: Simple redundancy is useless here. Defense requires specialized Byzantine Fault Tolerant (BFT) consensus protocols that utilize strict multi-party verification and voting thresholds.

Byzantine Generals Problem

Question: How can loyal actors reach reliable agreement despite malicious behavior?

  • Peer-to-Peer: Every node communicates directly with every other node
  • Independent Decisions: Each node broadcasts (“Attack” or “Retreat”)
  • Information Isolation: Nodes only see the messages sent directly to them
  • The Threat: A “traitor” node actively sends conflicting decisions
  • The Goal: All loyal nodes must arrive at the exact same correct conclusion

G L1->L2 A L1->L3 A L1->L4 A L1->T1 A L1->T2 A L1->T3 A L2->L1 A L2->L3 A L2->L4 A L2->T1 A L2->T2 A L2->T3 A L3->L1 A L3->L2 A L3->L4 A L3->T1 A L3->T2 A L3->T3 A L4->L1 A L4->L2 A L4->L3 A L4->T1 A L4->T2 A L4->T3 A T1->L1 R T1->L2 A T1->L3 R T1->L4 A T1->T2 R T1->T3 A T2->L1 A T2->L2 R T2->L3 A T2->L4 R T2->T1 R T2->T3 A T3->L1 R T3->L2 R T3->L3 A T3->L4 A T3->T1 A T3->T2 R L1 L1 L2 L2 L3 L3 L4 L4 T1 T1 T2 T2 T3 T3

Fault Tolerance Spectrum

  • Crash Faults (Easy): The node simply stops. Mitigated relatively easily with basic redundancy and timeouts.
  • Omission Faults (Moderate): The node remains active but drops messages. Addressed through acknowledgments and retransmissions.
  • Timing Faults (Moderate): The response is correct but arrives too late. Managed via strict deadlines and logical clocks.
  • Commission Faults (Hard): The node actively sends incorrect data due to bugs. Requires cross-checking across multiple nodes.
  • Byzantine Faults (Extreme): A node acts maliciously or unpredictably to actively subvert the system. Requires specialized cryptographic consensus protocols.

G A Crash B Omission A->B C Timing B->C D Commission C->D E Byzantine D->E

Why Failures Become Security Risks

  • Ambiguity of Intent: In distributed systems, benign operational faults and deliberate malicious attacks often manifest identically.
  • Crash vs. Denial-of-Service: A node suddenly going dark could be a simple power failure, or a targeted DoS attack.
  • Omission vs. Censorship: Dropped messages might stem from standard network congestion, or an adversary intentionally filtering traffic.
  • Commission vs. Compromise: A node returning incorrect data could be suffering from a software bug, or it may have been actively hijacked.
  • The Engineering Reality: Because failures do not come with labels, reliability and security are inextricably linked. Systems must survive the behavior, regardless of the intent.

Adversarial Scenario: The Compromised Node

  • The 1990s Dinner Problem: You and five friends are coordinating dinner plans using only 1:1 landline phone calls. Everyone must agree on the same restaurant, or the night is ruined.
  • The “Alien Imposter”: Unbeknownst to you, one friend is compromised by a malicious imposter. To cause maximum chaos, the imposter tells half the group “we are getting pizza” and the other half “we are getting tacos.”
  • Spreading Confusion: Because everyone only communicates 1:1, honest friends start passing along conflicting information. A single bad actor creates disagreement where none existed.
  • The Defense: To survive this, the group needs a protocol (like everyone calling everyone else to cross-check answers) to identify the lie and reach a majority decision despite the imposter.

Strategies for Handling Suspicion

  • Redundancy & Replication: Instead of relying on a single server, multiple nodes maintain identical copies of the data and provide the same service, ensuring that no single hardware failure can destroy the information.
  • Majority Voting: The system relies on the collective decision of the group rather than trusting any individual node. The result agreed upon by the majority is accepted as truth.
  • Quorum Systems: Protocols require a strict minimum number of consistent responses (a quorum) before committing a transaction or making a system-wide decision, preventing a small malicious faction from hijacking the network.
  • Cryptographic Proofs: Data is mathematically verified (e.g., through digital signatures and cryptographic hashes) to ensure it has not been altered by a Byzantine node.

Consensus as the Core Goal

  • The Objective: In a distributed network, consensus is the mechanism by which all honest, non-faulty nodes agree on a single shared state or action.
  • Safety (Consistency): The guarantee that the system never contradicts itself. Once an agreement is reached, it is final, and no two honest nodes will ever decide on conflicting values.
  • Liveness (Availability): The guarantee that the system always makes progress. The system must continue to process transactions and cannot be permanently stalled by failures or malicious actors.

Classical Consensus Approaches

  • Simple Replication: Early systems merely copied data across multiple nodes and periodically checked for consistency, offering basic redundancy but poor conflict resolution.
  • Two-Phase Commit: A central coordinator asks all participants if they are ready to proceed. Only if everyone says yes does the coordinator issue a final commit command.
  • Paxos & Raft: Formal, robust algorithms that use leader election and majority voting to achieve consensus safely, even if some nodes crash or messages are lost.
  • The Limitation: These classical models were designed for benign corporate environments. They expertly handle crash and omission faults but completely break down if a node turns Byzantine (malicious).

Why Decentralization Is Harder

  • No Central Referee: There is no single trusted authority to dictate the truth, resolve disputes, or coordinate actions. Every node must independently verify the state of the network.
  • Massive Scale: Reaching agreement requires coordination and communication across thousands of independent nodes, vastly increasing the computational overhead.
  • Network Latency: In a global, decentralized network, messages take time to travel across continents, leading to unavoidable delays and temporary inconsistencies.
  • Adversarial Environments: Because anyone can join an open network, systems must assume the presence of malicious participants actively attempting to subvert the consensus or feed the network bad data.

The Byzantine Generals Problem (formal)

  • Problem Statement: A group of distributed actors (generals) must agree on a single plan (attack/retreat), but some actors (traitors) or communication channels may be faulty and actively malicious.

  • Goal (Interactive Consistency):

    1. All loyal actors must agree on the same plan.
  • The Impossibility Result: With oral (unauthenticated) messages, a solution is only possible if the total number of generals, \(n\), is strictly greater than three times the number of traitors, \(m\). \[n > 3m \text{ or } n \ge 3m + 1\]

Note: this presentation slightly adapts the original 1982 formulation by removing the hierarchical Commander/Lieutenant dynamic to better reflect flat, peer-to-peer blockchain networks.

Mini-Activity: Byzantine Agreement

  • Setup: Form groups of 6 or 7. The instructor will hand you a secret card defining your Role (Loyal or Traitor) and your Initial Value (“Attack” or “Retreat”).
  • Phase 1: Broadcast: You must send a direct, secret message (e.g., a slip of paper) to every other person in your group stating your value.
    • Loyal nodes: Must accurately send their given Initial Value to everyone.
    • Traitor nodes: Can lie, and should actively send conflicting values to different peers to cause confusion.
  • Phase 2: Tally: Look at the messages you received. Combine them with your own Initial Value. What is the majority?
  • Phase 3: Reveal: On the count of three, point up for “Attack” or down for “Retreat.” Did the loyal nodes reach consensus?

Mini-Activity: Digital Version

blockchain-6dd63ebf.training-as-code.com/exercises/bgp/

Recap & Key Takeaways

  • Trust is an Engineering Assumption: In secure systems, trust is not a feeling; it’s a documented dependency.

  • Centralized Trust requires delegated third-parties.

  • Decentralized Trust requires a mechanism for forming consensus.

  • The Byzantine Generals Problem is the foundational challenge of reaching agreement in the presence of malicious actors. The solution depends on the assumptions you can make (e.g., unforgeable signatures).

References

[1]
Helium Documentation, “Helium hotspot app.” Accessed: Mar. 28, 2026. [Online]. Available: https://docs.helium.com/mine-hnt/helium-hotspot-app
[2]
IPFS, “IPFS.” Accessed: Mar. 18, 2026. [Online]. Available: https://ipfs.tech
[3]
Hivemapper, “Hivemapper - build a decentralized global map.” Accessed: Mar. 28, 2026. [Online]. Available: https://hivemapper.com/tos/map-products/
[4]
Mastodon Documentation, “Using the network features.” Accessed: Mar. 28, 2026. [Online]. Available: https://docs.joinmastodon.org/user/network/
[5]
D. Yaga, P. Mell, N. Roby, and K. Scarfone, “Blockchain technology overview,” National Institute of Standards and Technology, Gaithersburg, MD, NIST IR 8202, Oct. 2018. doi: 10.6028/NIST.IR.8202.
[6]
D. L. Chaum, “Computer systems established, maintained and trusted by mutually suspicious groups,” PhD thesis, 1982. Available: https://cdn.nakamotoinstitute.org/docs/computer-systems-by-mutually-suspicious-groups.pdf
[7]
L. Lamport, R. Shostak, and M. Pease, “The Byzantine Generals Problem,” in ACM Transactions on Programming Languages and Systems, 1982, pp. 382–401. Available: https://nakamotoinstitute.org/static/docs/the-byzantine-generals-problem.pdf
[8]
L. Lamport, “Time, clocks, and the ordering of events in a distributed system,” Commun. ACM, vol. 21, no. 7, pp. 558–565, Jul. 1978, doi: 10.1145/359545.359563.
[9]
L. Lamport, “Paxos made simple,” ACM SIGACT News (Distributed Computing Column) 32, 4 (Whole Number 121, December 2001), pp. 51–58, Dec. 2001, Available: https://lamport.azurewebsites.net/pubs/paxos-simple.pdf
[10]
D. Ongaro and J. Ousterhout, “In Search of an Understandable Consensus Algorithm,” in 2014 USENIX Annual Technical Conference (USENIX ATC 14), Philadelphia, PA: USENIX Association, Jun. 2014, pp. 305–319. Available: https://www.usenix.org/conference/atc14/technical-sessions/presentation/ongaro
[11]
M. Castro and B. Liskov, “Practical Byzantine Fault Tolerance,” in Proceedings of the Third Symposium on Operating Systems Design and Implementation, New Orleans, LA, USA, Feb. 1999, pp. 173–186. Available: http://pmg.csail.mit.edu/papers/osdi99.pdf