Processing pipeline

A cryo-EM experiment produces tens of thousands of movie files. To turn those raw frames into a set of picked, extracted particles ready for 3D reconstruction, the data must pass through a well-defined sequence of processing steps. Magellon automates the early, compute-heavy stages and lets you track, inspect, and re-run any step from the Jobs panel.

The 30-second mental model

A cryo-EM dataset is thousands of 2D images of the same molecule frozen at random orientations. Each image is one projection of the 3D molecular density. The pipeline’s job is to clean and characterise those images so the downstream reconstruction can figure out each molecule’s orientation and back-project everything into a 3D map.

Everything else is making that work in the presence of noise, motion, and imaging artefacts.

The pipeline at a glance

Step	What it does	Magellon today
Motion correction	Aligns the frames in each movie to remove beam-induced specimen motion	Automated via `motioncor` plugin (MotionCor2/3)
CTF estimation	Fits the microscope’s contrast transfer function per micrograph	Automated via `ctf` plugin (CTFFIND4)
Square detection	Locates grid squares in low-magnification overview images	Automated via `ptolemy` plugin (ONNX model)
Hole detection	Locates ice holes in medium-magnification images	Automated via `ptolemy` plugin (ONNX model)
FFT	Computes the power-spectrum FFT of each micrograph	Automated — always-on reference plugin
Particle picking	Finds particle coordinates in each micrograph	Automated via `topaz` (CNN-based) or `template-picker`
Micrograph denoising	Denoises micrographs using a trained CNN	Automated via Topaz denoising backend
Particle extraction	Cuts and normalises a stack of particle boxes	Automated via `stack-maker` plugin
2D classification	Clusters extracted particles by appearance; removes junk	Automated via `can-classifier` (CAN + MRA)
3D reconstruction onwards	Initial model → refinement → polishing → postprocess	Not yet automated — run externally (RELION, cryoSPARC)

Steps 1–9 are what Magellon orchestrates today. Steps 10 and beyond remain in external tools; Magellon stores their outputs as session artifacts for browsing.

What a microscope session produces

A single Krios session (12–48 hours) typically contains:

Item	Typical size	Description
Movies	50–500 frames, 50 MB–several GB each	Raw detector output — one movie per acquisition position
Micrograph count	1 k–10 k	Each micrograph captures ~100–1 000 particles
Gain reference	1 file per session	Per-pixel sensitivity correction applied during import

When Magellon imports a session it reads these files from the configured data path (MAGELLON_GPFS_PATH) and creates a session record. Large payloads (movies, micrographs) stay on the shared filesystem; only metadata and task results travel over the message bus.

Step-by-step details

1. Motion correction

Each movie’s frames suffer from beam-induced specimen motion — the specimen drifts 5–50 Å during the exposure. Summing the frames naively blurs the image. Motion correction aligns the frames first, producing a single, sharper micrograph.

Inputs: movie stack + gain reference
Outputs: one aligned .mrc micrograph per movie
Plugin: motioncor — wraps MotionCor2/3; GPU-accelerated

2. CTF estimation

The microscope intentionally defocuses the image to increase contrast, which introduces a sinusoidal modulation in Fourier space (the Contrast Transfer Function). Every downstream step needs to know each micrograph’s CTF to correctly weight and combine signal.

Inputs: aligned micrograph
Outputs: defocus, astigmatism, and CTF goodness-of-fit per micrograph
Plugin: ctf — wraps CTFFIND4; multiple backends (fast, GPU, external)

CTF quality filtering

The CTF fit quality score is stored as micrograph metadata. You can filter out poor micrographs (high astigmatism, low confidence) in the session view before dispatching particle picking — this dramatically reduces junk picks downstream.

3 & 4. Square and hole detection

At low magnification the microscope acquires overview images showing the grid squares. At medium magnification it captures the individual ice holes within each square. Magellon’s ptolemy plugin uses ONNX-based computer vision to locate both automatically, driving the acquisition target selection pipeline.

Plugin: ptolemy — one plugin, two categories (square_detection and hole_detection)

5. FFT

A fast-Fourier-transform of each aligned micrograph produces a power-spectrum thumbnail — the classic “Thon ring” image used to visually verify CTF quality. The FFT plugin is always-on and its output appears immediately in the image viewer for every micrograph.

6 & 7. Particle picking and denoising

Particle picking scans each aligned micrograph for blob-shaped signals that match the expected particle size and produces a coordinate file (x, y) per micrograph. Magellon ships two pickers:

Backend	Method	Best for
`topaz`	Trained CNN (Topaz)	General-purpose; works without a reference template
`template-picker`	Cross-correlation template matching	When you already have a good 2D template

The Topaz backend also supports micrograph denoising as a companion step — denoised micrographs feed improved coordinates back into subsequent picks.

8. Particle extraction

Extraction cuts a square box (e.g. 256 × 256 px) around each picked coordinate, normalises the contrast, and writes all boxes to a single .mrcs particle stack. The stack is the input to 2D classification.

Plugin: stack-maker — thin wrapper around the vendored extraction algorithm from the Magellon algorithm library

9. 2D classification

Picked particles always include some junk: ice contamination, broken molecules, neighbouring molecules accidentally cropped. 2D classification clusters all particles into K groups by appearance. Bad groups (featureless blobs, ice rings, edge artefacts) are dropped, leaving a clean particle set. Typical retention: 30–70 % of initial picks.

Plugin: can-classifier — Convolutional Autoencoder + Multi-Reference Alignment (CAN+MRA)

Beyond 2D classification

Steps beyond 2D classification — initial 3D model generation, 3D refinement, CTF refinement, Bayesian polishing, and postprocessing — are typically run in RELION or cryoSPARC. Magellon stores and displays the results but does not yet automate these later stages.

Monitoring progress

Each automated step dispatches work as tasks visible in the Jobs panel. One import creates one job containing one task per micrograph per step. The Jobs panel shows:

Per-step progress bars
Individual task status (pending / running / completed / failed)
Live log output from the plugin processing each task
Output file locations on the shared filesystem

Failed tasks can be individually retried from the Jobs panel without re-running the whole import.

Shared filesystem requirement

Metadata travels over the message bus; large files (movies, micrographs, particle stacks) travel over the shared filesystem. CoreService and every plugin container must mount the same path. In the default Docker Compose setup this is a bind mount; on HPC clusters it is typically GPFS, Lustre, or BeeGFS.

All containers must see the same path

If a plugin container can write /magellon/home/<session>/motioncor/file.mrc but CoreService cannot read that path, results will silently disappear. Verify the shared mount before running your first import — see Directory Structure for the expected layout.