Blockchain Course – Consensus in Distributed Systems & Blockchains

Why Agreement Matters

In the previous lesson, we looked at ledgers in both permissionless and permissioned settings.
A ledger by itself does not solve the “which history do we all accept?” problem.
Some situations need system-wide agreement:
- Multiple nodes must agree which transaction happened first
- A double-spend attempt creates competing histories
- A failed or slow node cannot be allowed to define the ledger alone
Some situations do not:
- A single user reading a local copy of the ledger
- A node validating a signature or hash on its own
In distributed systems, this problem of reaching agreement is called consensus.

Foundations of Consensus

Consensus definition: Each process proposes a value; the protocol must guarantee that all non-faulty processes eventually agree on a single value that was proposed. (Fischer, Lynch & Paterson 1985)

Consensus algorithms provide formal guarantees for agreement in decentralized systems.
We will use four core properties throughout the lesson:
- Safety: no conflicting commits or later rollbacks.
- Liveness: the system eventually makes forward progress.
- Validity: only rule-following values are committed.
- Agreement: correct nodes converge on the same result.
Those four are often abbreviated as SLVA.

Consensus in the Real World

Real-world Example	Problem Solved	Consensus Mechanism
Basketball shot clock	Prevents endless delay	Timeout
Jury supermajority vote	Prevents conflicting outcomes	Quorum threshold
Chair or moderator	Coordinates who speaks and when	Leader / proposer

We already use structured processes to reach group agreement.
Consensus protocols formalize those same ideas for machines operating under faults and delay.

Consensus Properties: SLVA and FPIC

Two vocabularies are commonly used to describe closely related correctness goals.

SLVA	Meaning	FPIC
Safety	No conflicting commit or later rollback	Finality
Liveness	Eventually makes forward progress	Progress
Validity	Only rule-following values commit	Integrity
Agreement	Correct nodes converge on one result	Consistency

SLVA is the academic vocabulary (FLP, Paxos, Raft). Blockchain documentation uses the same ideas under different names. Both appear in your readings.

Failure Models

When nodes fail to reach agreement, the system experiences a failure. Consensus protocols don’t prevent failures—they manage them. To design robust protocols, we must classify the types of failures we expect.

Crash faults
- A node stops responding or goes offline
- Example: a replica process crashes and stops sending messages
Omission faults
- A node or network path drops some messages but not all
- Example: a firewall or overloaded link silently drops packets
Byzantine faults
- A node can send false, conflicting, or malicious messages
- Example: a compromised replica tells different peers different stories

Timing Models on a Spectrum

Failure types alone don’t determine correctness—our assumptions about message timing are equally critical.

Asynchronous: No known bound on message or processing delay
- Example: the network is so unpredictable that delay and failure look the same
Partially synchronous: The network may be unstable for a while, but eventually delays become bounded
- Example: a congested network eventually settles down enough for timeouts to work
Synchronous: Message and processing delays are bounded and known ahead of time
- Example: a tightly controlled lab network with fixed timing assumptions

Asynchronous

Partially Synchronous

Synchronous

Less predictable timing -> More predictable timing

The Timing Limits — FLP and DLS Results

FLP (Fischer, Lynch, Paterson 1985): in a fully asynchronous system, a deterministic protocol cannot guarantee both safety and liveness if even one node may fail.
- If a message can be delayed forever, the rest of the system cannot tell whether a node is dead or just slow
DLS (Dwork, Lynch, Stockmeyer 1988): consensus becomes practical if the network eventually behaves well enough that timeouts start meaning something again.
- This is the usual “partial synchrony” assumption behind practical BFT protocols
Two practical implications:
- When timing is unstable, protocols prioritize safety
- Once timing stabilizes, timeouts and view changes help recover liveness

Practical Consensus Design

If we cannot assume perfect timing, but still want safety under malicious behavior, what should a practical protocol do?
Practical protocols answer with four recurring ideas:
- Structured rounds: break progress into named steps so everyone can tell which proposal and evidence belong together
- Quorum overlap: require enough votes that two conflicting decisions cannot both gather support without sharing an honest node
- Leader-based coordination: give one node the job of proposing the next candidate value so the protocol does not descend into chaos
- Timeouts and view changes: detect stalls, rotate leadership, and help the protocol recover liveness once timing becomes predictable enough again
Together, these ideas let practical BFT protocols protect safety first and recover liveness when the network settles down.

Rounds and Phases

Consensus often proceeds in rounds.
Each round has a leader or proposer responsible for putting forward the next candidate value.
The three phases of a round are:
- Propose: the leader suggests a candidate value
- Vote: replicas or validators sign support for that value
- Commit: enough matching votes produce a quorum certificate and allow finalization
The quorum certificate is the portable proof that the round gathered enough support to move forward safely.

Overall Slide Concept This slide explains the basic shape of a BFT round before the class looks at finality and quorum math. It matters here because students need to understand how propose, vote, and commit fit together in sequence. Emphasize that the quorum certificate is the portable evidence that a round gathered enough support to move forward safely.

Key Points - Many consensus protocols advance in rounds with a designated proposer or leader. - Propose, vote, and commit are distinct phases with different jobs. - A quorum certificate proves that enough matching support was collected.

Walkthrough 1. Propose: the leader suggests the candidate value for the round. 2. Vote: replicas or validators sign support for that value. 3. Commit: enough matching votes form a quorum certificate and allow finalization.

Sources - [3] - [2]

Traditional BFT-Style Finality

In classical BFT protocols, finality is deterministic and immediate: once a quorum certificate is formed for a value at a slot, that slot is final and cannot be rolled back.
Once a quorum certificate is formed for a value, a conflicting value at that same position should not also be able to gather a valid certificate.
Even if Byzantine nodes vote twice, the quorum overlap still includes honest nodes that carry forward the prior safe history.
Those honest overlap nodes will not sign a conflicting commit.

Key idea: a quorum certificate is not just evidence that “many nodes agreed”; it is the reason a conflicting commit should no longer be possible for that slot or height.

Quorums

Quorums are the safety mechanism behind BFT finality.
For a network with total nodes n, the largest Byzantine fault set we can tolerate is:
- t = (n - 1) / 3
The corresponding quorum size is:
- 2t + 1
That quorum size guarantees overlapping membership between any two valid quorums, so conflicting commits cannot both collect enough support.

Key idea: any two valid quorums must overlap in at least one honest node, and that overlap is what protects safety.

Timeouts and View Changes

A leader is the node responsible for proposing the next candidate value in a round.
If the leader is slow, faulty, or malicious, the round can stall.
Nodes use timeouts to decide that they have waited long enough.
When enough nodes timeout, they move to a new view with a new leader.
This is how practical BFT protocols recover liveness once the network becomes timely enough again.

Key idea: timeouts and leader rotation preserve liveness once timing stabilizes.

Crash-Fault Tolerance (CFT)

Identity model: known, authenticated participants
Sybil resistance: not applicable—a Sybil attack occurs when an attacker creates multiple fake identities to gain disproportionate influence. CFT assumes participants are pre-authenticated and cannot engage in deception.
Adversary assumption: failures are crashes, restarts, or silence, not deceit
Finality: deterministic once the protocol commits
Environment: internal clusters or tightly controlled distributed systems
Trade-offs: efficient and simple, but not appropriate when participants may equivocate or collude

Byzantine-Fault Tolerance (BFT)

Identity model: known participants with governed membership
Sybil resistance: handled through admission, authorization, and governance outside the voting protocol itself
Adversary assumption: some participants may lie, equivocate, or act maliciously
Finality: deterministic under the protocol’s fault and timing assumptions
Environment: consortium or permissioned systems where membership is controlled but trust is incomplete
Trade-offs: strong safety and fast finality, but higher coordination and messaging cost

Nakamoto (Proof-of-Work) Consensus

Identity model: open participation
Sybil resistance: scarce computational work makes influence expensive
- Asymmetric difficulty: producing valid work is expensive, but verifying it is cheap
Adversary assumption: the protocol remains secure as long as no one controls a majority of the work
Finality: probabilistic rather than immediate
Environment: permissionless networks where nodes can appear, disappear, and remain anonymous
Trade-offs: global openness and censorship resistance, but with higher latency and energy cost

Comparing Consensus Families

Consensus protocols differ mainly in their assumptions about identity, adversaries, and finality.

Family	Identity model	Adversary model	Finality
CFT	Known, authenticated nodes	Crash-only faults	Deterministic
BFT	Known nodes, possibly adversarial	Arbitrary or malicious faults	Deterministic under partial synchrony
Nakamoto-style (PoW)	Open participation	Rational or Byzantine under work assumptions	Probabilistic

Why Consensus Matters in Blockchains

Trust minimization: replaces centralized intermediaries with protocol rules and economic incentives—participants reach agreement without a central authority.
Ledger convergence: consensus ensures all honest nodes agree on a single canonical chain, preventing double-spending or conflicting transaction histories.
Finality spectrum: open, permissionless systems (like PoW) achieve probabilistic finality—confidence grows with depth—while permissioned systems can achieve deterministic finality immediately.
Intuition: in public chains, branches may temporarily compete; consensus gives the network a rule for deciding which history survives.

References

[1]

M. Fischer, N. Lynch, and M. Paterson, “Impossibility of Distributed Consensus with One Faulty Process,” Journal of the ACM, vol. 32, no. 2, pp. 374–382, 1985, doi: 10.1145/3149.214121.

[2]

D. Ongaro and J. Ousterhout, “In Search of an Understandable Consensus Algorithm,” in 2014 USENIX Annual Technical Conference (USENIX ATC 14), Philadelphia, PA: USENIX Association, Jun. 2014, pp. 305–319. Available: https://www.usenix.org/conference/atc14/technical-sessions/presentation/ongaro

[3]

M. Castro and B. Liskov, “Practical Byzantine Fault Tolerance,” in Proceedings of the Third Symposium on Operating Systems Design and Implementation, New Orleans, LA, USA, Feb. 1999, pp. 173–186. Available: http://pmg.csail.mit.edu/papers/osdi99.pdf

[4]

M. Pease, R. Shostak, and L. Lamport, “Reaching Agreement in the Presence of Faults,” Journal of the ACM, vol. 27, no. 2, pp. 228–234, 1980, doi: 10.1145/322186.322188.

[5]

S. Bano et al., “Consensus in the Age of Blockchains.” 2017. Available: https://arxiv.org/abs/1711.03936

[6]

L. Lamport, R. Shostak, and M. Pease, “The Byzantine Generals Problem,” in ACM Transactions on Programming Languages and Systems, 1982, pp. 382–401. Available: https://nakamotoinstitute.org/static/docs/the-byzantine-generals-problem.pdf

[7]

C. Dwork, N. Lynch, and L. Stockmeyer, “Consensus in the presence of partial synchrony,” J. ACM, vol. 35, no. 2, pp. 288–323, Apr. 1988, doi: 10.1145/42282.42283.

[8]

S. Nakamoto, “Bitcoin: A Peer-to-Peer Electronic Cash System.” Satoshi Nakamoto Institute, Oct. 31, 2008. Accessed: Sep. 12, 2025. [Online]. Available: https://cdn.nakamotoinstitute.org/docs/bitcoin.pdf

[9]

J. Garay, A. Kiayias, and N. Leonardos, “The Bitcoin Backbone Protocol: Analysis and Applications.” Accessed: Oct. 21, 2025. [Online]. Available: https://eprint.iacr.org/2014/765

Consensus in Distributed Systems & Blockchains — An Introduction

Why Agreement Matters

Foundations of Consensus

Consensus in the Real World

Consensus Properties: SLVA and FPIC

Failure Models

Timing Models on a Spectrum

The Timing Limits — FLP and DLS Results

Practical Consensus Design

Rounds and Phases

Traditional BFT-Style Finality

Quorums

Timeouts and View Changes

Crash-Fault Tolerance (CFT)

Byzantine-Fault Tolerance (BFT)

Nakamoto (Proof-of-Work) Consensus

Comparing Consensus Families

Why Consensus Matters in Blockchains

References