TopoSwarm: SOM for speed, swarm for truth

Author

Mattijn van Hoek

Published

May 3, 2026

You have a large dataset and want to see its structure as a 2D map. A Self-Organizing Map does this well at scale, but the neighbourhood function that drives training also shapes the final layout: prototypes land where the grid expects them, not exactly where the data places them. The topology is approximately right, but carries a grid-geometry imprint.

The Databionic Swarm (Thrun 2018) was designed to fix this. It lets each sample find its own position on a toroidal grid by minimising a stress function, with no neighbourhood bias at all. The layout then reflects the data’s own topology. The problem: it needs one bot per sample, and Thrun and Ultsch note that beyond roughly 4,000 samples, computation exceeds a day on a single core (Thrun and Ultsch 2021).

TopoSwarm sidesteps that limit. A SOM first compresses \(N\) samples into \(n \ll N\) prototypes; the swarm then projects only those \(n\) prototypes onto the toroidal grid. The dataset size \(N\) stops mattering to the projection stage. On the LSUN benchmark, four ground-truth clusters come out cleanly via U*F flood-fill, with a final stress of 0.014.


Here is what it looks like live: the swarm projecting 212 points from the FCPS hepta benchmark in real time:

TopoSwarm live, hepta, 7 clusters, no k

Watch on asciinema


Why not just use a SOM?

A SOM gives you a map quickly. The catch is that the neighbourhood function couples compression and topology during training: the grid geometry influences where prototypes land, not just how close they sit to each other. Close things in feature space end up close on the grid, which is good, but the exact arrangement reflects the training grid as much as the data.

If you want the layout to reflect only the data, you need to decouple the two stages. The Databionic Swarm (Thrun 2018) does this by letting each sample find its position freely, minimising stress with no neighbourhood bias. But operating on all \(N\) raw samples is expensive; Thrun and Ultsch report that beyond roughly 4,000 samples, a single-core run exceeds a day (Thrun and Ultsch 2021).

TopoSwarm decouples compression from projection while keeping the swarm tractable:

Stage Input Output Complexity
SOM quantization \(N \times d\) raw data \(n\) prototype weight vectors \(O(N \cdot n)\) per epoch
Swarm projection \(n \times n\) distance matrix grid position per prototype \(O(n^2)\) per epoch
Sample assignment \(N\) BMU indices grid position per raw sample \(O(N)\)

With \(n \ll N\) prototypes the swarm stage stays fast regardless of dataset size.


What happens inside

The algorithm in pictures

Stage 1: Quantization

SOM Compression

raw data prototypes compress w₁ w₂ w₃ N = 404 samples n = 50 nodes
Figure 1: A SOM compresses \(N\) raw samples into \(n \ll N\) prototype weight vectors. Only the weights and hit counts are kept; the SOM grid topology is discarded.
Stage 2: Jump Candidates

Polar Jump Positions

R1 R2 R3 R4 Rmax origin Rmin Lines × Columns grid (toroid)
Figure 2: Grid positions encoded as Complex(row,col). \(R_\text{min}\) from grid density, \(R_\text{max} = \text{Lines}/2\). Annealing sweeps \(R_\text{max} \to R_\text{min}\) in integer steps.
Stage 2: Grid Space

Pac-Man Torus

↕ wraps to bottom ↕ wraps to top ↔ wraps right ↔ wraps left A B direct = 156px (❌ longer) wrap = 44px (✓ shorter) dx = |xA − xB| = 156 d_torus = min(156, W−156+1) = 44
Figure 3: All four edges wrap. A bot at column 1 and one at column \(N\) are neighbours. Removes boundary artefacts that distort rectangular ESOM maps.
Stage 2: Annealing

Radius Annealing

Rmax Rmin annealing → DB R = Rmax 50% bots jump DB R = Rmin 5% bots jump 4 candidate positions sampled per bot per iteration best payoff wins → jump
Figure 4: Large \(R\): ~50% of bots explore widely. Small \(R\): ~5% make fine-grained adjustments. Each bot samples 4 random candidates and keeps the best. Exits at weak Nash equilibrium.
Stage 2: Convergence

Stress Over Iterations

0 0.25 0.50 0.75 iterations (epochs) stress R=Rmax R=mid R=low R=Rmin ε weak Nash equilibrium stop when lm(slope) < ε = 0.01
Figure 5: Stress drops steeply at large \(R\) (exploration), then plateaus. Each coloured band = one radius phase. Convergence declared when the linear slope of recent stress \(< \varepsilon\).
Stage 3: Clustering

U-matrix & U*F

A B ridge → boundary low = valley (cluster) high = ridge (boundary) U*F flood-fill assigns each valley → cluster
Figure 6: The U-matrix maps average neighbour distances onto the grid. U*F flood-fills from each local minimum, stopping at ridges, to assign cluster membership without a fixed \(k\).

Step 1: compress with a SOM

A SOM with \(n\) nodes is trained on the raw data \(\mathbf{X} \in \mathbb{R}^{N \times d}\) as a compressor only; its internal grid topology is discarded. We keep only the \(n\) prototype weight vectors \(\mathbf{W} \in \mathbb{R}^{n \times d}\) and the population counts \(\mathbf{p} \in \mathbb{N}^n\).

The quality of compression is the quantization error:

\[ \text{QE} = \frac{1}{N} \sum_{i=1}^{N} \| \mathbf{x}_i - \mathbf{w}_{\text{bmu}(i)} \|_2 \]

Choose \(n\) at the elbow of the QE-vs-\(n\) curve, where additional nodes stop reducing the error meaningfully.

Step 2: project with a swarm

The \(n\) prototype weight vectors are embedded onto the toroidal grid by minimising the population-weighted stress:

\[ \mathcal{S} = \frac{\sum_{i<j} p_i p_j \left(\hat{d}^{\text{data}}_{ij} - \hat{d}^{\text{grid}}_{ij}\right)^2}{\sum_{i<j} p_i p_j} \]

where \(\hat{d}\) denotes distances normalised to \([0, 1]\), and \(p_i\) is the normalised population of node \(i\). Dense nodes dominate the objective over sparse boundary nodes.

Why topology is preserved

\[ d^{\text{feature}}(\mathbf{x}_i, \mathbf{x}_j) \;\approx\; d^{\text{feature}}(\mathbf{w}_{\text{bmu}(i)}, \mathbf{w}_{\text{bmu}(j)}) \;\approx\; d^{\text{grid}}(\text{pos}_i, \text{pos}_j) \]

The first approximation is controlled by QE; the second by final stress \(\mathcal{S}\). Because both are measurable at runtime, the chain is auditable end-to-end: a low QE with a high stress points to the swarm stage; a high QE with a low stress points to the SOM.

Step 3: read the clusters from the U-matrix

Once the swarm has settled, each occupied grid cell gets a U-matrix value: the average feature-space distance to its immediately adjacent occupied neighbours (Ultsch 2003). Low values mean nearby prototypes are similar; the cell sits inside a cluster. High values mean a sharp boundary; the cell sits on a ridge between clusters. Unoccupied cells are left as \(\mathrm{NaN}\).

Most clustering algorithms ask you to pick \(k\) before you look at the data. U*F does not. The number of clusters is a property of the map, not a parameter you supply. U*F flood-fill reads \(k\) directly from the topology:

  1. Seed one flood from every local minimum on the U-matrix.
  2. Each flood expands outward to neighbouring cells in order of increasing U-value.
  3. A flood stops when it hits a cell above a threshold derived from the upper tail of the U-matrix distribution (threshold_anchor="upper"). That cell becomes a ridge.
  4. Any cell claimed by two different floods becomes a ridge too.
  5. Floods that cover fewer than min_cluster_size nodes are discarded.

What comes out is a partition that follows the natural valleys of the map. If there are four valleys separated by ridges, you get four clusters. If two valleys merge into one broad basin, they come out as one. The data decides.


The code

The SOM quantization stage uses Python (pyesom, IntraSOM backend). The swarm projection stage runs in Julia (TopoSwarm.jl). The two stages communicate via a single .npz file.

Training the SOM

show code
from pathlib import Path
import numpy as np
from pyesom import ESOM

root = Path("../..").resolve()
z = np.load(root / "tests/fixtures/fcps.npz")
X_lsun   = z["lsun_data"].astype(np.float64)
cls_lsun = z["lsun_cls"].astype(int)
print(f"lsun: {X_lsun.shape}  classes: {np.unique(cls_lsun)}")

Export the bridge file

show code
esom = ESOM(n_nodes=50, backend="intrasom",
            intrasom_kwargs={"mapshape": "toroid", "lattice": "hexa"})
esom.fit(X_lsun, epochs=20)

bridge_path = Path("bridge.npz")
result_path = Path("result.npz")
esom.export_npz(bridge_path, X_lsun, labels=cls_lsun)

qe = esom.quantization_error(X_lsun)
print(f"Quantization error : {qe:.4f}")
print(f"Bridge written to  : {bridge_path}")

Running the swarm

show code
import subprocess

repo_root   = Path("../..").resolve()
script      = repo_root / "toposwarm/scripts/run_pswarm.jl"
julia_proj  = repo_root / "toposwarm"

proc = subprocess.run(
    ["julia", f"--project={julia_proj}", str(script),
     str(bridge_path.resolve()), str(result_path.resolve())],
    capture_output=True, text=True, cwd=repo_root,
)
print(proc.stdout)
if proc.returncode != 0:
    print(proc.stderr)
    raise RuntimeError("Julia step failed — see stderr above")
Reading bridge: /Users/mattijnvanhoek/mattijn/pyesom/toposwarm/paper/bridge.npz
Building distance matrix  (64 nodes × 2 features)
Running TopoSwarm...

Bots  : 64
Grid  : 15 × 15 = 225 cells
Rmax  : 7  Rmin : 1

R= 7  epochs= 10  stress=0.0347
R= 6  epochs= 10  stress=0.0305
R= 5  epochs= 10  stress=0.0233
R= 4  epochs= 10  stress=0.0179
R= 3  epochs= 10  stress=0.0154
R= 2  epochs= 10  stress=0.0144
R= 1  epochs= 10  stress=0.0135

Done. Final stress : 0.0135

── Convergence ─────────────────────────────────────────────────
Stress History  (70 epochs)

0.0824  │•                                                  
        │ •                                                 
        │ •                                                 
        │  •••                                              
        │    •••••••••                                      
        │             •••••••••••••                         
0.0135  │                          •••••••••••••••••••••••••
        └───────────────────────────────────────────────────
        1                                                 70

── Projection ──────────────────────────────────────────────────
Grid  15×15  (label=bot  ░▒▓█=cluster boundary)

                █  ▒  ·  0  ░  0  0     0  0 
                ▓  0  0  0  0  ·  0  0  ░  0 
                ░  ░  0  0  0  0  0  ░  0  0 
 ░              ·  ░  1  0  0  0  ░  0  0  · 
 ·              ·  1  ░  1  1  ░  2  ·  2  2 
 ·           ░  ·  ·  ·  ·  1  1  2  2  2  · 
 ·        ░  ░  1  ·  1  1  ·  2  ·  2  2  2 
 ░  ·     ·  1  ·  ·  ·  1  ░  ░  2  2  2  2 
 3  █  ▒  ·  ·  1  1  ·  ·  ░  2  2  2  ░    
 █  ▒     1  1  1  1  ·  ░  ▓  ░     ░  ▒  ▒ 
                   ·           ▒  ░  ▓       
                                  ·          
                                             
                               ·             
                   █  ▒     ░  ░  ▓  ▓  █    

Computing U-matrix...
Assigning raw samples to grid...

Done. Final stress: 0.0135
Result written to: /Users/mattijnvanhoek/mattijn/pyesom/toposwarm/paper/result.npz

Results

show code
r           = np.load(result_path)
b           = np.load(bridge_path)
sample_rows = r["sample_rows"]
sample_cols = r["sample_cols"]
node_rows   = r["node_rows"]
node_cols   = r["node_cols"]
U           = r["umatrix"]
stress      = float(r["stress"][0])
hit_map     = b["hit_map"]

# Build P-matrix grid: place hit counts at occupied node positions (Julia is 1-based)
P = np.zeros_like(U)
for k, (nr, nc) in enumerate(zip(node_rows, node_cols)):
    P[nr - 1, nc - 1] = hit_map[k]
show code
from pyesom.clustering.ustar_flood import UStarFloodClustering

clf = UStarFloodClustering(min_cluster_size=2, threshold_anchor="upper", toroidal=True)
clf.fit(U, P)

# predict needs 0-based (row, col); Julia exports 1-based
bmu_2d         = np.column_stack([sample_rows - 1, sample_cols - 1])
cluster_labels = clf.predict(bmu_2d)

print(f"U*F threshold : {clf.threshold_:.4f}")
print(f"Clusters found: {clf.n_clusters_}")

# Confusion table: true class vs. U*F cluster
classes  = np.unique(cls_lsun)
clusters = np.arange(clf.n_clusters_)
header   = f"{'true \\ cluster':<14}" + "".join(f"{c:>6}" for c in clusters) + f"{'unassigned':>12}"
print("\n" + header)
print("-" * len(header))
for cls in classes:
    mask = cls_lsun == cls
    row  = f"{cls:<14}"
    for cid in clusters:
        row += f"{int(np.sum((cluster_labels == cid) & mask)):>6}"
    row += f"{int(np.sum((cluster_labels == -1) & mask)):>12}"
    print(row)
U*F threshold : 0.3131
Clusters found: 4

true \ cluster     0     1     2     3  unassigned
--------------------------------------------------
0                134    44     0     0          22
1                 18     0    54     6          22
2                 93     0     0     0           7
3                  0     0     0     0           4

How well does it work?

show code
from sklearn.manifold import trustworthiness

grid_coords = np.column_stack([sample_rows, sample_cols]).astype(float)
tw = trustworthiness(X_lsun, grid_coords, n_neighbors=10)

print(f"{'Metric':<28} {'Value':>8}")
print("-" * 38)
print(f"{'Quantization error (QE)':<28} {qe:>8.4f}")
print(f"{'Projection stress':<28} {stress:>8.4f}")
print(f"{'Trustworthiness (k=10)':<28} {tw:>8.4f}")
Table 1: End-to-end quality metrics on the lsun benchmark.
Metric                          Value
--------------------------------------
Quantization error (QE)        0.3035
Projection stress              0.0135
Trustworthiness (k=10)         0.9536
Metric Interpretation
Quantization error Mean distance from each raw sample to its prototype. Governs Stage 1 fidelity.
Projection stress Mean squared normalised rank error between prototype distances and grid distances. Governs Stage 2 fidelity.
Trustworthiness Fraction of true \(k\)-nearest neighbours preserved in the projection. 1.0 = perfect; >0.9 = good.

How does it scale?

Table 2: Runtime scaling. SOM: 20 epochs. Swarm: max 200 epochs.
Dataset                       N   n_nodes  Expected runtime
----------------------------------------------------------
lsun benchmark              404        50  seconds
medium dataset           50,000       500  minutes
large dataset           200,000     1,500  minutes
500K use case           500,000     2,000  tens of minutes

Wrapping up

If you want topology-honest 2D maps from large datasets, the bottleneck has always been the swarm stage. TopoSwarm removes it by running the swarm only on \(n\) prototypes, not on all \(N\) raw samples. The dataset size stops mattering to the projection.

Each stage is independently tunable. Swap the SOM backend, retrain, and check quantization error without touching the swarm. Run the swarm longer and check stress without retraining the SOM. When something looks off on the map, the metrics tell you which stage to adjust.

The output is a toroidal grid with a U-matrix and U*F cluster boundaries, readable the same way as the original Databionic Swarm, just no longer limited to small datasets.


References

Thrun, Michael C. 2018. Projection-Based Clustering Through Self-Organization and Swarm Intelligence. Springer. https://doi.org/10.1007/978-3-658-20540-9.
Thrun, Michael C., and Alfred Ultsch. 2021. “Swarm Intelligence for Self-Organized Clustering.” Artificial Intelligence. https://doi.org/10.1016/j.artint.2020.103237.
Ultsch, Alfred. 2003. “U*-Matrix: A Tool to Examine High Dimensional Data for Clusters, Topology, Density and Outliers.” Proc. Workshop on Self-Organizing Maps (WSOM).