University of Cambridge
Computer Laboratory • Department of Computer Science and Technology
Stochastic Consensus: Mitigating Large Language Model Hallucinations in Software Generation via Formal Constraint Injection and N-Version Runtime Arbitration
Proposed Research Programme
Suitable for: Systems Research Group • Programming, Logic, and Semantics Group
Abstract
This research proposes a novel methodology for achieving mission-critical software reliability through the integration of Generative AI, Object Constraint Language (OCL) and N-Version Programming into a unified, self-healing system. The central hypothesis posits that semantic independence between programming languages (C, Java, Python) creates a probabilistic filter capable of detecting and correcting logic errors that remain invisible to single-path code generation. By leveraging Large Language Models to generate multiple distinct implementations from a formally-specified contract, and arbitrating their outputs through majority voting, we achieve reliability guarantees previously reserved for aerospace and nuclear industries—at a fraction of the traditional cost.
Research Questions
RQ1: Constraint Efficacy
Can formal constraints (OCL) injected into LLM prompts statistically eliminate semantic hallucinations in generated software?
RQ2: Semantic Independence
To what degree does implementation in heterogeneous languages (compiled vs. interpreted, statically vs. dynamically typed) reduce correlated failures?
RQ3: Self-Healing Convergence
What is the convergence velocity of the recursive feedback loop and under what conditions does consensus become unattainable?
RQ4: Economic Viability
At what point does the computational overhead of N-version execution become economically justified by the reliability gains?
Central Hypothesis
“Semantic independence between programming languages creates a probabilistic filter that catches logic bugs which are otherwise invisible to single-path generation. The consensus of independent implementations, constrained by formal specification, converges upon correct behaviour with quantifiable reliability guarantees.”
Alignment with Cambridge Research Strengths
Formal Methods
Cambridge has a legendary history in formal verification. This research bridges Natural Language Processing (probabilistic) with Formal Logic (discrete) through OCL.
Security Research
The “Triumvirate” architecture pattern provides defence-in-depth against both accidental bugs and adversarial manipulation of AI-generated code.
Systems Research
This proposes a new runtime architecture involving containerisation, deterministic compilation and distributed consensus algorithms applied to code output.
AI Safety
This is not about using AI blindly—it is about building Control Theory for AI: systems that verify, validate and correct AI output in real-time.
Proposed Methodology
System Implementation
Build the Parallax runtime: orchestrator, sandboxed execution environments, consensus engine and observability stack (ELK/Grafana).
Large-Scale Generation
Generate 10,000+ programmes across diverse problem domains using multiple LLM providers, capturing all execution metrics and voting outcomes.
Statistical Analysis
Graph the “Hallucination Decay Rate”, measure convergence velocity and quantify the reliability improvements across different constraint configurations.
Mathematical Proof
Develop formal proofs demonstrating why consensus holds under specific conditions, deriving bounds on reliability guarantees.
Expected Contributions
- Novel Methodology: The Parallax Protocol as a replicable framework for consensus-driven development.
- OCL-P Specification Language: A structured schema combining natural language intent with formal OCL constraints.
- Quantified Reliability Metrics: Empirical data on hallucination rates, convergence velocity and reliability bounds.
- Open-Source Implementation: Reference implementation enabling further research and industrial adoption.
The Paradigm Shift
This research represents a fundamental shift in software development philosophy: