Ab Initio Protein Structure Reconstruction
The Problem: Unmixing Molecular Shapes
Biological molecules are dynamic machines that constantly change shape (or “conformation”) to perform their functions. When scientists use cryo-electron microscopy (cryo-EM), they capture thousands of 2D projection images of these molecules from different angles. The core challenge is that a single sample contains a mix of these different conformations, a problem known as structural heterogeneity.
Traditional reconstruction algorithms often assume all projections come from one identical structure. This forces them to average dissimilar shapes, resulting in blurry reconstructions that obscure crucial details. Existing methods to separate conformations often require known reference structures or tedious manual classification, which is slow, biased, and can fail to identify new, undiscovered shapes.
Our Solution: A Fully Automated Pipeline
We developed an automated algorithm that solves the ab initio reconstruction problem for heterogeneous samples without human intervention. Here’s how it works:
-
Robust Clustering: We first use a variant of hierarchical clustering (single-linkage clustering) to group projections. Unlike standard k-means, this method creates “pure” clusters that contain projections from only one conformation, even in high-noise conditions.
-
Projection Denoising: The projections in each pure cluster are averaged and then cleaned using a patch-based PCA denoising algorithm. This produces a small set of clean, representative projections for each conformation.
-
Classification & Angle Estimation: We use a novel combination of techniques to sort the representative projections. A graph Laplacian-based algorithm reveals the underlying structure of the data, allowing us to automatically determine the number of conformations present. We then use the Helgason-Ludwig consistency conditions to find initial estimates for the viewing angle of each projection.
-
Iterative Refinement: Finally, we alternately refine the viewing angles and reconstruct the structure for each class using filtered back-projection. This iterative process minimizes the reconstruction error, converging on a high-fidelity model of each distinct conformation.
Key Innovation
Our method’s strength comes from uniquely combining three mathematical frameworks:
- Graph Laplacian techniques to discover the number of distinct conformations automatically.
- Moment-based constraints (HLCC) to connect 2D projections to 3D orientations without prior knowledge.
- A robust statistical framework that withstands significant noise and correctly separates projections from different structures.
Crucially, the algorithm requires no prior structural information, templates, or even knowledge of how many conformations exist. In experiments on protein complexes like Lipase, our method successfully separated and reconstructed up to eight different conformations from a single dataset, even with high levels of noise (noise standard deviation up to 30% of the average signal value, i.e., σ=0.3a).
This approach enables researchers to discover novel protein states and understand molecular dynamics, free from the bias of existing structural knowledge.
Paper Details
- Title: Ab Initio Tomography With Object Heterogeneity and Unknown Viewing Parameters
- Conference: IEEE International Conference on Image Processing (ICIP) 2019
- Presentation: Oral
- Authors: Arunabh Ghosh¹, Ritwick Chaudhry², Ajit Rajwade³
- Affiliations: ¹Dept. of EE, IIT Bombay; ²Adobe Research; ³Dept. of CSE, IIT Bombay
- Pages: 1257-1261
- DOI: 10.1109/ICIP.2019.8803750