Consensus in Distributed Systems & Blockchains — An Introduction

Section IV: Consensus Mechanisms

Army Cyber Institute

April 9, 2026

Why Agreement Matters

  • In the previous lesson, we looked at ledgers in both permissionless and permissioned settings.
  • A ledger by itself does not solve the “which history do we all accept?” problem.
  • Some situations need system-wide agreement:
    • Multiple nodes must agree which transaction happened first
    • A double-spend attempt creates competing histories
    • A failed or slow node cannot be allowed to define the ledger alone
  • Some situations do not:
    • A single user reading a local copy of the ledger
    • A node validating a signature or hash on its own
  • In distributed systems, this problem of reaching agreement is called consensus.

Foundations of Consensus

Consensus definition
Each process proposes a value; the protocol must guarantee that all non-faulty processes eventually agree on a single value that was proposed. (Fischer, Lynch & Paterson 1985)
  • Consensus algorithms provide formal guarantees for agreement in decentralized systems.
  • We will use four core properties throughout the lesson:
    • Safety: no conflicting commits or later rollbacks.
    • Liveness: the system eventually makes forward progress.
    • Validity: only rule-following values are committed.
    • Agreement: correct nodes converge on the same result.
  • Those four are often abbreviated as SLVA.

Consensus in the Real World

Real-world Example Problem Solved Consensus Mechanism
Basketball shot clock Prevents endless delay Timeout
Jury supermajority vote Prevents conflicting outcomes Quorum threshold
Chair or moderator Coordinates who speaks and when Leader / proposer
  • We already use structured processes to reach group agreement.
  • Consensus protocols formalize those same ideas for machines operating under faults and delay.

Consensus Properties: SLVA and FPIC

Two vocabularies are commonly used to describe closely related correctness goals.

SLVA Meaning FPIC
Safety No conflicting commit or later rollback Finality
Liveness Eventually makes forward progress Progress
Validity Only rule-following values commit Integrity
Agreement Correct nodes converge on one result Consistency

SLVA is the academic vocabulary (FLP, Paxos, Raft). Blockchain documentation uses the same ideas under different names. Both appear in your readings.

Failure Models

When nodes fail to reach agreement, the system experiences a failure. Consensus protocols don’t prevent failures—they manage them. To design robust protocols, we must classify the types of failures we expect.

  • Crash faults
    • A node stops responding or goes offline
    • Example: a replica process crashes and stops sending messages
  • Omission faults
    • A node or network path drops some messages but not all
    • Example: a firewall or overloaded link silently drops packets
  • Byzantine faults
    • A node can send false, conflicting, or malicious messages
    • Example: a compromised replica tells different peers different stories

Timing Models on a Spectrum

Failure types alone don’t determine correctness—our assumptions about message timing are equally critical.

  • Asynchronous: No known bound on message or processing delay
    • Example: the network is so unpredictable that delay and failure look the same
  • Partially synchronous: The network may be unstable for a while, but eventually delays become bounded
    • Example: a congested network eventually settles down enough for timeouts to work
  • Synchronous: Message and processing delays are bounded and known ahead of time
    • Example: a tightly controlled lab network with fixed timing assumptions
Asynchronous
Partially Synchronous
Synchronous
Less predictable timing -> More predictable timing

The Timing Limits — FLP and DLS Results

  • FLP (Fischer, Lynch, Paterson 1985): in a fully asynchronous system, a deterministic protocol cannot guarantee both safety and liveness if even one node may fail.
    • If a message can be delayed forever, the rest of the system cannot tell whether a node is dead or just slow
  • DLS (Dwork, Lynch, Stockmeyer 1988): consensus becomes practical if the network eventually behaves well enough that timeouts start meaning something again.
    • This is the usual “partial synchrony” assumption behind practical BFT protocols
  • Two practical implications:
    • When timing is unstable, protocols prioritize safety
    • Once timing stabilizes, timeouts and view changes help recover liveness

Practical Consensus Design

  • If we cannot assume perfect timing, but still want safety under malicious behavior, what should a practical protocol do?
  • Practical protocols answer with four recurring ideas:
    • Structured rounds: break progress into named steps so everyone can tell which proposal and evidence belong together
    • Quorum overlap: require enough votes that two conflicting decisions cannot both gather support without sharing an honest node
    • Leader-based coordination: give one node the job of proposing the next candidate value so the protocol does not descend into chaos
    • Timeouts and view changes: detect stalls, rotate leadership, and help the protocol recover liveness once timing becomes predictable enough again
  • Together, these ideas let practical BFT protocols protect safety first and recover liveness when the network settles down.

Rounds and Phases

  • Consensus often proceeds in rounds.
  • Each round has a leader or proposer responsible for putting forward the next candidate value.
  • The three phases of a round are:
    • Propose: the leader suggests a candidate value
    • Vote: replicas or validators sign support for that value
    • Commit: enough matching votes produce a quorum certificate and allow finalization
  • The quorum certificate is the portable proof that the round gathered enough support to move forward safely.

Phases P Propose V Vote P->V C Commit V->C QC Quorum Certificate C->QC

Traditional BFT-Style Finality

  • In classical BFT protocols, finality is deterministic and immediate: once a quorum certificate is formed for a value at a slot, that slot is final and cannot be rolled back.
  • Once a quorum certificate is formed for a value, a conflicting value at that same position should not also be able to gather a valid certificate.
  • Even if Byzantine nodes vote twice, the quorum overlap still includes honest nodes that carry forward the prior safe history.
  • Those honest overlap nodes will not sign a conflicting commit.

Key idea: a quorum certificate is not just evidence that “many nodes agreed”; it is the reason a conflicting commit should no longer be possible for that slot or height.

Quorums

  • Quorums are the safety mechanism behind BFT finality.
  • For a network with total nodes n, the largest Byzantine fault set we can tolerate is:
    • t = (n - 1) / 3
  • The corresponding quorum size is:
    • 2t + 1
  • That quorum size guarantees overlapping membership between any two valid quorums, so conflicting commits cannot both collect enough support.

Key idea: any two valid quorums must overlap in at least one honest node, and that overlap is what protects safety.

QuorumIntersection Q1 Quorum Q1 (size = 5) H1 H1 Q1->H1 H2 H2 Q1->H2 H3 H3 Q1->H3 B1 B1 Q1->B1 B2 B2 Q1->B2 Q2 Quorum Q2 (size = 5) Q2->H3 H4 H4 Q2->H4 H5 H5 Q2->H5 Q2->B1 Q2->B2

Timeouts and View Changes

  • A leader is the node responsible for proposing the next candidate value in a round.
  • If the leader is slow, faulty, or malicious, the round can stall.
  • Nodes use timeouts to decide that they have waited long enough.
  • When enough nodes timeout, they move to a new view with a new leader.
  • This is how practical BFT protocols recover liveness once the network becomes timely enough again.

Key idea: timeouts and leader rotation preserve liveness once timing stabilizes.

Crash-Fault Tolerance (CFT)

  • Identity model: known, authenticated participants
  • Sybil resistance: not applicable—a Sybil attack occurs when an attacker creates multiple fake identities to gain disproportionate influence. CFT assumes participants are pre-authenticated and cannot engage in deception.
  • Adversary assumption: failures are crashes, restarts, or silence, not deceit
  • Finality: deterministic once the protocol commits
  • Environment: internal clusters or tightly controlled distributed systems
  • Trade-offs: efficient and simple, but not appropriate when participants may equivocate or collude

Byzantine-Fault Tolerance (BFT)

  • Identity model: known participants with governed membership
  • Sybil resistance: handled through admission, authorization, and governance outside the voting protocol itself
  • Adversary assumption: some participants may lie, equivocate, or act maliciously
  • Finality: deterministic under the protocol’s fault and timing assumptions
  • Environment: consortium or permissioned systems where membership is controlled but trust is incomplete
  • Trade-offs: strong safety and fast finality, but higher coordination and messaging cost

Nakamoto (Proof-of-Work) Consensus

  • Identity model: open participation
  • Sybil resistance: scarce computational work makes influence expensive
    • Asymmetric difficulty: producing valid work is expensive, but verifying it is cheap
  • Adversary assumption: the protocol remains secure as long as no one controls a majority of the work
  • Finality: probabilistic rather than immediate
  • Environment: permissionless networks where nodes can appear, disappear, and remain anonymous
  • Trade-offs: global openness and censorship resistance, but with higher latency and energy cost

Comparing Consensus Families

Consensus protocols differ mainly in their assumptions about identity, adversaries, and finality.

Family Identity model Adversary model Finality
CFT Known, authenticated nodes Crash-only faults Deterministic
BFT Known nodes, possibly adversarial Arbitrary or malicious faults Deterministic under partial synchrony
Nakamoto-style (PoW) Open participation Rational or Byzantine under work assumptions Probabilistic

Why Consensus Matters in Blockchains

  • Trust minimization: replaces centralized intermediaries with protocol rules and economic incentives—participants reach agreement without a central authority.
  • Ledger convergence: consensus ensures all honest nodes agree on a single canonical chain, preventing double-spending or conflicting transaction histories.
  • Finality spectrum: open, permissionless systems (like PoW) achieve probabilistic finality—confidence grows with depth—while permissioned systems can achieve deterministic finality immediately.
  • Intuition: in public chains, branches may temporarily compete; consensus gives the network a rule for deciding which history survives.

ConsensusImportance B0 Block 0 (Genesis) B1 Block 1 B0->B1 B2a Block 2a (Alice → Carol) B1->B2a B2b Block 2b (Alice → Dave) B1->B2b C3 Block 3 (+1 confirmation) B2a->C3 HR High Risk (Few Confirmations) B2a->HR X Orphaned Branch (Double-Spend Invalidated) B2b->X C4 Block k (+k confirmations) C3->C4 LR Low Risk (More Confirmations) C4->LR L Consensus selects the heaviest (most-work) chain. Forks are pruned; confidence grows with depth. C4->L

References

[1]
M. Fischer, N. Lynch, and M. Paterson, “Impossibility of Distributed Consensus with One Faulty Process,” Journal of the ACM, vol. 32, no. 2, pp. 374–382, 1985, doi: 10.1145/3149.214121.
[2]
D. Ongaro and J. Ousterhout, “In Search of an Understandable Consensus Algorithm,” in 2014 USENIX Annual Technical Conference (USENIX ATC 14), Philadelphia, PA: USENIX Association, Jun. 2014, pp. 305–319. Available: https://www.usenix.org/conference/atc14/technical-sessions/presentation/ongaro
[3]
M. Castro and B. Liskov, “Practical Byzantine Fault Tolerance,” in Proceedings of the Third Symposium on Operating Systems Design and Implementation, New Orleans, LA, USA, Feb. 1999, pp. 173–186. Available: http://pmg.csail.mit.edu/papers/osdi99.pdf
[4]
M. Pease, R. Shostak, and L. Lamport, “Reaching Agreement in the Presence of Faults,” Journal of the ACM, vol. 27, no. 2, pp. 228–234, 1980, doi: 10.1145/322186.322188.
[5]
S. Bano et al., “Consensus in the Age of Blockchains.” 2017. Available: https://arxiv.org/abs/1711.03936
[6]
L. Lamport, R. Shostak, and M. Pease, “The Byzantine Generals Problem,” in ACM Transactions on Programming Languages and Systems, 1982, pp. 382–401. Available: https://nakamotoinstitute.org/static/docs/the-byzantine-generals-problem.pdf
[7]
C. Dwork, N. Lynch, and L. Stockmeyer, “Consensus in the presence of partial synchrony,” J. ACM, vol. 35, no. 2, pp. 288–323, Apr. 1988, doi: 10.1145/42282.42283.
[8]
S. Nakamoto, “Bitcoin: A Peer-to-Peer Electronic Cash System.” Satoshi Nakamoto Institute, Oct. 31, 2008. Accessed: Sep. 12, 2025. [Online]. Available: https://cdn.nakamotoinstitute.org/docs/bitcoin.pdf
[9]
J. Garay, A. Kiayias, and N. Leonardos, “The Bitcoin Backbone Protocol: Analysis and Applications.” Accessed: Oct. 21, 2025. [Online]. Available: https://eprint.iacr.org/2014/765