Authors: Yana Valasatava, Antonio Rosato & Claudia Andreini
In this work, we developed a methodology to perform a systematic classification based on three-dimensional structural similarity of the metal sites contained in metalloproteins. Our definition of metal site extended beyond the metal ion and its aminoacidic ligands by including all the chemical species (aminoacids, nucleotides, exogenous ligands) providing at least one donor atom as well as all any other chemical species within a radius of 5.0 Å. We previously defined this as the Minimal Functional Site of a metalloprotein (MFS), and showed that its characteristics are related to the metalloprotein function. The methodology described here leverages the MetalS2 algorithm, whose total score provides a quantitative measure of structural similarity between pairs of MFS. We used this measure to build clusters of structurally similar MFSs using a two-stage hierarchical clustering algorithm. At the first stage we cluster MFSs identified in corresponding position within proteins with the same fold. For each of these clusters we then identify a representative MFS. At the second stage, all representative MFSs are clustered. The resulting groups are thus independent of the overall protein fold.
Metal ions are bound to biological macromolecules via coordination bonds. The bonds are formed by the so-called donor atoms. Such atoms can belong to either the backbone or side chains/bases of the macromolecule (protein or nucleic acid) as well as to non-macromolecular ligands, such as oligopeptides, small organic molecules, anions, water molecules. A metal ion together with its donor atoms and ligands constitute the metal-binding site. However, the biochemical properties of such a site depend also on the surrounding macromolecular environment (5-9). Consequently, we defined the “minimal functional site” (MFS) in a metal-macromolecule adduct as the ensemble of atoms containing the metal ion or cofactor, all its ligands and any other atom belonging to a chemical species within 5 Å from a ligand (1,10). The MFS describes the local 3D environment around the cofactor, independently of the larger context of the protein fold in which it is embedded. The MetalPDB database is an updated collection of all structurally characterized MFSs (11). Recently, we have developed a computational approach, implemented in the MetalS2 program, to quantify the structural similarity of MFSs in metalloproteins (1). In this work we exploited MetalS2 to perform systematic, quantitative comparisons of MFS structures with the final aim of producing a classification of metal sites. This classification does not depend on the overall metalloprotein fold and describes structural variability of MFSs within a metalloprotein family. Furthermore, it indicates possible relationships between different metalloprotein families binding the same metal cofactors.
The present computational protocol organizes MFSs into clusters in such a way that each cluster contains sites that are structurally similar to each other and differ from sites of the other clusters. The procedure uses a hierarchical agglomerative clustering algorithm to obtain a structure-based classification. In agglomerative clustering every individual object is initially considered as a singleton (i.e., a cluster containing only one member). Then the clusters are iteratively grouped by merging the two clusters at the shortest “distance”, i.e. the most similar pair. For the present work, the distance measure adopted was the global MetalS2 score, which increases with increasing structural diversity. Two merged clusters become one cluster, so after each iteration there is one less cluster. The iterations are repeated until all objects are collected into a single cluster. The result of hierarchical clustering is a nested sequence of partitions, with a single, all-inclusive cluster at the top and all singleton clusters at the bottom. Each intermediate cluster can be viewed as a combination of two clusters from the lower level or as a part of a split cluster from the higher level.
Hierarchical clustering methods differ in the way they merge clusters (linkage methods). Although all methods merge the two “closest” clusters at each step, they determine differently the distance between clusters, i.e., have different metrics to compare one cluster to another. Here we used both the complete and average linkage methods (12). For complete linkage the distance between a pair of clusters corresponds to greatest distance from any member of one cluster to any member of the other cluster. In the average linkage method the distance between two clusters is the average of the distances between all the members in one cluster and all the members in the other. The final clusters are defined by cutting the nested sequence of partitions at a certain threshold. The clusters are considered to be separate if the distance between them (a value of the global MetalS2 score) is bigger than this threshold. The value of the threshold also determines the extent to which the objects are similar within each cluster (the diameter of the clusters, which however is affected by the linkage type used).
The structures of all MFSs can be downloaded from MetalPDB database web-interface: http://metalweb.cerm.unifi.it/download/sites/.
The source code exploited by this protocol implements the pipeline as a library of Python scripts. The code requires the following freely-available software/libraries:
The protocol leverages the organization of MFSs in equistructural groups (EGs hereafter) that is already provided by the MetalPDB database (http://metalweb.cerm.unifi.it/). Such groups contain the MFSs that are found in proteins with the same fold and occur at the same position within that fold. EG are computed in MetalPDB by superimposing the entire domain containing the MFS in the protein structures under consideration and then computing the distance between the metal centers. MFSs whose metal centers are within a threshold of 3.5 Å from one another are assigned to the same equistructural group.
The workflow consisnts of the following steps:
Figure 1 graphically recapitulates the protocol.
The running time of the algorithm presented here depends on the number and size of the structures to be processed.
The running time of the MetalS2 tool to compare a pair of MFS structures on an Intel(R) Core(TM) i5 CPU 650 @ 3.20GHz processor varies from seconds to a few minutes, depending on the number of atoms in the structures of MFS. As an example, for 8891 sites of heme-binding proteins the entire procedure required approximately 10000 CPU-hours (using AMD Opteron(tm) Processor 6366 HE CPU 1.800 GHz).
We suggest that the users remove from the datasets all sites with less than 10 amino acids as well as all sites where the metal ions has only one aminoacidic ligand with all other ligands being water molecules. These sites tend to produce low MetalS2 scores even in the absence of significant structural similarity.
Clusters of MFS structures obtained independently of the overall metalloprotein fold.
This work was supported by MIUR (Ministero Italiano dell’Università e della Ricerca) through the FIRB projects RBFR08WGXT and by the European Commission through the BioMedBridges and EGI-Engage project (grant no 284209 and 654142).
Figure 1: Graphical representation of the computational protocol to systematically compare and classify metal-binding sites on the basis of their structural similarity
The workflow includes the following steps: (1) select MFSs organized in equistructural groups (EGs, contain MFSs that are found in proteins with the same fold and occur at the same position within that fold); (2) cluster MFSs within each EG into groups of highly similar structures (first or intra-group stage); (3) for each cluster built at the first stage select a representative MFS; (4) build the broader clusters comparing representative MFSs and organizing them in groups of similar structures (second or inter-group stage).
Hidden relationships between metalloproteins unveiled by structural comparison of their metal sites, Yana Valasatava, Claudia Andreini, and Antonio Rosato, Scientific Reports 5 () 30/03/2015 doi:10.1038/srep09486
Yana Valasatava, Magnetic Resonance Center (CERM) – University of Florence
Antonio Rosato & Claudia Andreini, Magnetic Resonance Center (CERM) – University of Florence, Department of Chemistry – University of Florence
Source: Protocol Exchange (2015) doi:10.1038/protex.2015.036. Originally published online 18 April 2015.