algbio-edu

algbio-edu — interactive visualizers for algorithmic bioinformatics

Want a new visualization? Open an issue and describe the algorithm — contributions and requests are welcome.

Single-page web demos for the central algorithms taught in an “Algorithmic Bioinformatics” course (FU Berlin ALBI, WS 2024 lectures by Hölzer / Backofen / Liebl). Each demo follows the same pattern:

This is a sister repo to rna-folding-edu (Nussinov + Zuker for RNA secondary-structure prediction).

What’s here

Folder Algorithm Demo
hmm/ Viterbi, Forward, Backward, Posterior on a 2-state CpG-island HMM open
alignment/ Needleman-Wunsch (global) and Smith-Waterman (local) open
fm-index/ FM Index: BWT construction, reverse transform, backward search, locate matches open
evoformer/ 3-D voxel walk through one Evoformer block (AlphaFold) open
structure-scores/ GDT-TS / TM-score / lDDT — sliders + intuition open
phylo/ Neighbor-Joining + UPGMA + Fitch + Sankoff (Python + step-by-step PNG figures) browse

For RNA structure (Nussinov + Zuker + 3D PDB): r-sayar/rna-folding-edu (open). For HMM training/evaluation on real DNA (the algorithm code that the HMM viz here visualises): r-sayar/cpg-island-hmm.

Each demo, in one paragraph

HMM inference — hmm/

A two-state Hidden Markov Model (B = background, I = CpG island) with hand-tuned emission and transition probabilities, four algorithms selectable from a single dropdown:

For the default preset ATATATATCGCGCGCGATATATAT:

HMM Viterbi screenshot HMM Posterior screenshot

FM Index — fm-index/

Four algorithms that share the same sorted-cyclic-shifts (F/L) matrix:

  1. BWT construction — show the cyclic shifts of T = mississippi$ being added (unsorted pane), the sort, then the last column extracted row by row into the BWT string ipssm$pissii.
  2. Reverse transform (BWT⁻¹) — walk the L-to-F mapping LF(i) = C(L[i]) + Occ(L[i], i) from row 1, extracting one character of the original text at each step until mississippi$ is reconstructed.
  3. Backward search (count) — Ferragina-Manzini’s right-to-left pattern match. Maintains an interval [a..b] of matching rows; shrinks it by a' = C(c) + Occ(c, a-1) + 1, b' = C(c) + Occ(c, b) for each new pattern character. The default preset P = ssi ends with interval [11..12] → 2 occurrences.
  4. Locate matches — given the search interval, walk LF from each row until you hit a marked row (a sampled SA entry) and read the position. Recovers the text positions (3 and 6) without storing the full suffix array.

FM Index — backward search

The right sidebar always shows the C array and the Occ table with the currently-consulted cells highlighted in orange. The active line of the pseudocode lights up yellow. The bottom yellow strip spells out the arithmetic for the current step.

Pairwise alignment — alignment/

Side-by-side Needleman-Wunsch (global, end-to-end) and Smith-Waterman (local, best subsegment). Same DP shape — only the boundary conditions and traceback rules differ. Each cell shows its score and a back-pointer (↖ / ↑ / ←); the chosen traceback path is highlighted in green and the recovered alignment is rendered below the matrix with match/mismatch colouring.

Alignment screenshot

Evoformer block — evoformer/

3-D voxel-grid walkthrough of one Evoformer block from AlphaFold 2. Each cube is one tensor cell; hue runs blue (negative) → red (positive). Use prev / next / play to step through MSA-row attention → MSA-column attention → outer product mean → triangle attention → triangle multiplication → transition.

This was already a self-contained tool in albi_exams/evoformer_viz/web/; copied here unchanged for hosting.

Protein structure-similarity scores — structure-scores/

Interactive playground for the four scores that compare predicted vs. experimental protein structures: GDT-TS, TM-score, GDT-HA, lDDT. Move the per-residue distance sliders, watch each score update in real time, and read the short intuition panel.

Already self-contained at albi_exams/structure_scores/; copied here.

Phylogenetics — phylo/

Python implementations of the lecture’s distance-based (neighbor_joining.py, upgma.py) and parsimony (fitch.py, sankoff.py) tree-reconstruction algorithms, plus the rendered step-by-step PNG figures. Run e.g.

cd phylo
python3 -c "from neighbor_joining import neighbor_joining; import numpy as np
D = np.array([[0,5,9,9,8],[5,0,10,10,9],[9,10,0,8,7],[9,10,8,0,3],[8,9,7,3,0]], float)
nwk, steps = neighbor_joining(['A','B','C','D','E'], D)
print(nwk)"
# → ((C:4,(A:2,B:3):3):1,(D:2,E:1):1);

Each step in steps carries the current distance matrix, Q matrix, which two clusters were joined, the new branch lengths, and the growing Newick string — perfect raw material for a future interactive NJ visualizer (TODO).

Controls (uniform across all interactive demos)

URL parameters work the same way: ?seq=...&autoplay=1&speed=2 (plus per-demo extras documented at the top of each index.html).

Running locally

git clone https://github.com/r-sayar/algbio-edu.git
cd algbio-edu
python3 -m http.server 8765
# open http://localhost:8765

All demos are static HTML/CSS/JS — no build step. The phylo/ scripts need numpy (and matplotlib for the figures, which are already pre-rendered as PNGs).

Credits