PhD Research Proposal

University of Cambridge

Computer Laboratory • Department of Computer Science and Technology

Stochastic Consensus: Mitigating Large Language Model Hallucinations in Software Generation via Formal Constraint Injection and N-Version Runtime Arbitration

Proposed Research Programme

Suitable for: Systems Research Group • Programming, Logic, and Semantics Group

Abstract

This research proposes a novel methodology for achieving mission-critical software reliability through the integration of Generative AI, Object Constraint Language (OCL) and N-Version Programming into a unified, self-healing system. The central hypothesis posits that semantic independence between programming languages (C, Java, Python) creates a probabilistic filter capable of detecting and correcting logic errors that remain invisible to single-path code generation. By leveraging Large Language Models to generate multiple distinct implementations from a formally-specified contract, and arbitrating their outputs through majority voting, we achieve reliability guarantees previously reserved for aerospace and nuclear industries—at a fraction of the traditional cost.

Research Questions

RQ1: Constraint Efficacy

Can formal constraints (OCL) injected into LLM prompts statistically eliminate semantic hallucinations in generated software?

RQ2: Semantic Independence

To what degree does implementation in heterogeneous languages (compiled vs. interpreted, statically vs. dynamically typed) reduce correlated failures?

RQ3: Self-Healing Convergence

What is the convergence velocity of the recursive feedback loop and under what conditions does consensus become unattainable?

RQ4: Economic Viability

At what point does the computational overhead of N-version execution become economically justified by the reliability gains?

Central Hypothesis

“Semantic independence between programming languages creates a probabilistic filter that catches logic bugs which are otherwise invisible to single-path generation. The consensus of independent implementations, constrained by formal specification, converges upon correct behaviour with quantifiable reliability guarantees.”

Alignment with Cambridge Research Strengths

Formal Methods

Cambridge has a legendary history in formal verification. This research bridges Natural Language Processing (probabilistic) with Formal Logic (discrete) through OCL.

Security Research

The “Triumvirate” architecture pattern provides defence-in-depth against both accidental bugs and adversarial manipulation of AI-generated code.

Systems Research

This proposes a new runtime architecture involving containerisation, deterministic compilation and distributed consensus algorithms applied to code output.

AI Safety

This is not about using AI blindly—it is about building Control Theory for AI: systems that verify, validate and correct AI output in real-time.

Proposed Methodology

System Implementation

Build the Parallax runtime: orchestrator, sandboxed execution environments, consensus engine and observability stack (ELK/Grafana).

Large-Scale Generation

Generate 10,000+ programmes across diverse problem domains using multiple LLM providers, capturing all execution metrics and voting outcomes.

Statistical Analysis

Graph the “Hallucination Decay Rate”, measure convergence velocity and quantify the reliability improvements across different constraint configurations.

Mathematical Proof

Develop formal proofs demonstrating why consensus holds under specific conditions, deriving bounds on reliability guarantees.

Expected Contributions

Novel Methodology: The Parallax Protocol as a replicable framework for consensus-driven development.
OCL-P Specification Language: A structured schema combining natural language intent with formal OCL constraints.
Quantified Reliability Metrics: Empirical data on hallucination rates, convergence velocity and reliability bounds.
Open-Source Implementation: Reference implementation enabling further research and industrial adoption.

The Paradigm Shift

This research represents a fundamental shift in software development philosophy:

Test-Driven DevelopmentConsensus-Driven Development

View Methodology Prior Art Claims