Glassnest

Transparent AI from first principles
Recursive Competing Units for Interpretable Language Modelling

The Problem

Transformers are black boxes. Attention weights don't tell you what the model is actually doing. Post-hoc interpretability is reverse-engineering.
Regulation is coming. The EU AI Act and similar legislation will require explainability. Current architectures can't provide it.
Parameter inefficiency. Transformers need billions of parameters. Most of that capacity is redundant — the same features stored many times across different attention heads.
O(N²) scaling. Self-attention scales quadratically with sequence length, limiting context and driving up compute costs.

The Approach

Recursive competing units. Instead of attention layers, the model is a tree of specialists that compete on prediction error. The best predictor wins — and you can see who won.
Three primitives: prediction, error, recursion. Everything else — gating, routing, specialisation — emerges from these.
Transparency by construction. At every character, we know which group won, which cell within it, and how surprised the model was. No post-hoc analysis needed.
Router
▽ competes ▽
Group 0
Group 1
Group 2
Group 3
each group contains 3 competing cells

Results

~100K Parameters
Glassnest
1.598
vs
Transformer
1.699
↓ 0.10 lower cross-entropy
~1M Parameters
Glassnest
1.416
vs
Transformer
1.631
↓ 0.21 lower cross-entropy
The gap widens with scale — more parameters, greater advantage
Glassnest (1M params)
O, thou shalt way, with grief, and theressely. HENRY BOLINGBROKE: Why, and yourself him for then, I prove him...
Transformer (1M params)
O, the rard; or the comfort; see that the ward see would scatch her dead? LADY CAPULET: Good goodven, that in my me...

Architecture Advantages

Transparent
At every token, see which specialist won and how confident the model was. Transparency is structural, not bolted on.
Parameter Efficient
Beats transformers with the same parameter budget. Competing units avoid the redundancy of multi-head attention.
O(N) Scaling
Linear scaling with sequence length. No quadratic attention bottleneck — longer contexts at lower cost.
Recursive Grammar
The tree structure is also a communication protocol. Models can share information by exchanging consensus/entropy signals — a grammar for inter-model communication.
Tractable Parallelisation
The recursive structure allows multiple smaller models to be parallelised across hardware — rather than one monolithic model requiring ever-larger GPUs. This opens a path to distributed inference on commodity hardware.

To Explore

Research

  • Scale up — larger datasets and model sizes to test scaling laws
  • Improve efficiency — optimise training speed and inference throughput
  • From transparency to interpretability — map specialist groups to semantic meaning
  • Inter-model communication — proof of concept for models sharing state via the recursive grammar

Business

  • Convert patents — pending provisional patents to full filings
  • Publish research — establish academic credibility and prior art
  • License architecture — to organisations needing explainable AI for regulatory compliance

Glassnest

Transparent AI from first principles