Glassnest

Transparent AI from first principles

Recursive Competing Units for Interpretable Language Modelling

The Problem

■

Transformers are black boxes. Attention weights don't tell you what the model is actually doing. Post-hoc interpretability is reverse-engineering.

■

Regulation is coming. The EU AI Act and similar legislation will require explainability. Current architectures can't provide it.

■

Parameter inefficiency. Transformers need billions of parameters. Most of that capacity is redundant — the same features stored many times across different attention heads.

■

O(N²) scaling. Self-attention scales quadratically with sequence length, limiting context and driving up compute costs.

The Approach

◆

Recursive competing units. Instead of attention layers, the model is a tree of specialists that compete on prediction error. The best predictor wins — and you can see who won.

◆

Three primitives: prediction, error, recursion. Everything else — gating, routing, specialisation — emerges from these.

◆

Transparency by construction. At every character, we know which group won, which cell within it, and how surprised the model was. No post-hoc analysis needed.

Router

▽ competes ▽

Group 0

Group 1

Group 2

Group 3

each group contains 3 competing cells

Results

~100K Parameters

Glassnest

1.598

vs

Transformer

1.699

↓ 0.10 lower cross-entropy

~1M Parameters

Glassnest

1.416

vs

Transformer

1.631

↓ 0.21 lower cross-entropy

The gap widens with scale — more parameters, greater advantage

Glassnest (1M params)

O, thou shalt way, with grief, and theressely. HENRY BOLINGBROKE: Why, and yourself him for then, I prove him...

Transformer (1M params)

O, the rard; or the comfort; see that the ward see would scatch her dead? LADY CAPULET: Good goodven, that in my me...

Architecture Advantages

Transparent
At every token, see which specialist won and how confident the model was. Transparency is structural, not bolted on.

Parameter Efficient
Beats transformers with the same parameter budget. Competing units avoid the redundancy of multi-head attention.

O(N) Scaling

Linear scaling with sequence length. No quadratic attention bottleneck — longer contexts at lower cost.

Recursive Grammar

The tree structure is also a communication protocol. Models can share information by exchanging consensus/entropy signals — a grammar for inter-model communication.

Tractable Parallelisation

The recursive structure allows multiple smaller models to be parallelised across hardware — rather than one monolithic model requiring ever-larger GPUs. This opens a path to distributed inference on commodity hardware.

To Explore

Research

Scale up — larger datasets and model sizes to test scaling laws
Improve efficiency — optimise training speed and inference throughput
From transparency to interpretability — map specialist groups to semantic meaning
Inter-model communication — proof of concept for models sharing state via the recursive grammar

Business

Convert patents — pending provisional patents to full filings
Publish research — establish academic credibility and prior art
License architecture — to organisations needing explainable AI for regulatory compliance

Glassnest

Transparent AI from first principles

glassnest.ai