The Problem
■
Transformers are black boxes. Attention weights don't tell you what the model is actually doing. Post-hoc interpretability is reverse-engineering.
■
Regulation is coming. The EU AI Act and similar legislation will require explainability. Current architectures can't provide it.
■
Parameter inefficiency. Transformers need billions of parameters. Most of that capacity is redundant — the same features stored many times across different attention heads.
■
O(N²) scaling. Self-attention scales quadratically with sequence length, limiting context and driving up compute costs.