Energy Visualization Pipeline User's Guide

Jump ahead to the examples.

The energy visualization system renders force fields as a semitransparent cloud around the various geometric tinkertoys that can be used to display molecular structure. Where the cloud is thickest, the forces are strongest. Where the cloud is thin or nonexistent, the forces are reaching equilibrium. Rendering is straightforward, done by hardware with volume textures. You control the resolution detail of the texture and all important aspects of the transfer function, which is tailored to ProteinShop's functionality.

The energy rendering system is built on top of ProteinShop's older energy visualization feature (based on colored atom spheres), which remains available. In particular, the controls for that system are also used by the this system. Including both the original settings and the new ones added for this system, you have a total of eight settings to control the transfer function and determine the general appearance and information conveyed by the energy cloud. The assemblage of these settings is illustrated below.

Figure 1: The energy visualization pipeline.
Image pipeline

  1. Channel: You can show either the subset sum of the energy terms selected in the discriminator (2), or the subset sum of their gradient magnitudes. You control the channel through the Input Channel toggles in the Energy Rendering Dialog.

  2. Discriminator: This is the block of toggles labeled Visualized Energy Components in the Energy Visualization Dialog. You can use these toggles to select an arbitrary subset of the energy component terms to be visualized. Those not selected will be ignored. This setting and the clamp (3) are part of ProteinShop's original energy visualization functionality.

  3. Clamp: This is the Energy Mapping Range interval in the Energy Visualization Dialog. If Max is no more than Min, it defaults to Min + 100. This interval helps eliminate outliers from the data, which might otherwise hide detailed information elsewhere. Its exact effect is detailed below.

  4. Resolution: Use the Texels per angstrom slider in the Energy Rendering Dialog to control the resolution of the texture. The resolution you select may be automatically lowered to observe constraints imposed by your platform's OpenGL rendering capabilities and the value of the variable EnergyCalculator/maxBuffer in ProteinShop.cfg.

  5. Radial specifier: Use the Radius multiplier slider and the Radial Basis Function Coefficient Type toggles in the Energy Rendering Dialog to control the radius of the basis function, which determines how much an atom's energy spreads into the texture. The coefficient type can be either uniform or relative; in the latter case, equal to either of each atom's physical or Van der Waals radius. The final radius is defined in angstroms.

  6. Classifier: Use the Classifier menu in the Energy Rendering Dialog to select one classification function, which maps atoms to a limited range of integers [0,m), where m is the number of classifications in the function's range. The classifier's domain consists of everything ProteinShop knows about the atoms, including their element types, positions, topological relationships, current force field states, and the secondary structures and residues to which they belong. Three classifiers are provided in this release, detailed below. More can easily be added, also detailed below.

  7. Normalizing interval: Use the Energy Rendering Dialog to control this interval, which determines how the cumulative atom energy values from the input channel are normalized into the domain of the color function (8). It can be computed automatically based on the current energy levels or set to an arbitrary value, according to the value of the Calculate automatically toggle.

  8. Color function: Each integer in the classifier's range is associated with a color function. The color function maps the atom's energy to a color. The colors from all classifications are combined in a weighted average to produce the final color and transparency of the texture. You can assign color functions to classifications using the box immediately below the Classifier menu in the Energy Rendering Dialog. The Next and Previous Class buttons let you cycle through the classifications, and the Color function for x menu lets you assign the color function to the currently selected classification.

The data store in Figure 1 labeled Atom Energy is the energy calculator plug-in, which provides real-valued energy component terms and gradient vectors for each atom in the molecule. These numbers are processed according to the channel selected to produce a single floating-point value for each atom. Only the component terms selected in the discriminator are included. If no terms are selected in the discriminator, every atom's value will be zero. The number of toggles in the discriminator, c, is determined by the plug-in. For our AMBER plug-in, c = 5. If, for example, a solvation term is added to the force field, it will appear in the user interface as a sixth toggle in the discriminator.

Let the discriminator function $D(j) = 1$ if the $j^{th}$ energy component is selected and 0 if not, $0 \leq j < c$. We compute the value $e_i$ of the $i^{th}$ atom as


\begin{displaymath}
e_i = \sum_{j=0}^{c-1} D(j) \cdot \left \{
\begin{array}{ll...
...{A}^i_1 \Vert & \mathrm{for\:gradients}
\end{array}\right \}.
\end{displaymath} (1)

The value of $e_i$ is then clamped, and spread through the texel block by means of the radial basis and classification functions. The radius of the basis function $s$ is determined by the radial specifier, equal to the product of a multiplier chosen by the user with a slider and one of three coefficients: a constant (chosen with another slider), the atom's radius, or the atom's Van der Waals radius. The basis function $f(r_i)$ is a smooth curve similar to that used for the implicit modeling of molecular surfaces. It depends on the texel's distance $r_i$ from the center of each atom:


\begin{displaymath}
R(r_i) = \left \{
\begin{array}{ll}
1 - \frac{3r_i^2}{s^2} +...
...f} \: r_i < s \\
0 & \mathrm{otherwise}
\end{array}\right \}.
\end{displaymath} (2)

The voxel block store holds texel magnitudes for each classification. Let the classification function $L(i,k) = 1$ if the $i^{th}$ atom belongs to the $k^{th}$ classification and 0 if not; $0 \leq k < m$ and $0 \leq i < n$, where $n$ is the number of atoms in the molecule. Given the atom energy value $e_i$ (1), the radial basis $R(r_i)$ (2), and the classifier $L(i,k)$, the texel magnitude $t_k$ is


\begin{displaymath}
t_k = \sum_{i=0}^{n-1} e_i \cdot R(r_i) \cdot L(i,k) .
\end{displaymath} (3)

The normalizing interval $N(t_k)$ maps texel magnitudes to the unit interval (clamp and scale) for use with color functions. The color function $C_k(N(t_k))$ implements an arbitrary continuous color map. ProteinShop provides a dozen of these, including intensity functions (ranging from a component color at zero to white at one through different paths), constant functions, and invisibility to hide selected parts of the molecule. The final texel color $t$ is computed from the classified texel magnitudes $t_k$ (3) as a weighted average, defined as


\begin{displaymath}
t = \frac{\sum_{k=0}^{m-1} N(t_k) \cdot C_k(N(t_k))}{\sum_{k=0}^{m-1} N(t_k)} .
\end{displaymath} (4)

This pipeline runs in $O(n \cdot (s \cdot q)^3)$ time, where $q$ is the resolution of the texture grid, by classifying each atom and determining which portion of the texture grid it will affect prior to iterative computation of Equation (3). The pixel transfer operations will require $O(N^3)$ time in the width of the texel block regardless, but hardware makes this part of the computation relatively fast. In practice, depending on the size of the molecule and the resolution chosen, the execution of this pipeline requires anywhere from a fraction of a second to half a minute or more, but all of the textures shown here were produced in less than ten seconds on an obsolete machine (Pentium III, 733 MHz) with no 3D texture capability at all. Once generated, the textures can be viewed at interactive refresh rates, using suitable graphics hardware.


Examples

There are three classifiers in the current release: Unity, Hydrogen Bond, and Phobic-Philic. We provide an example of each. All three used the default clamp of [0, 100] and selected all components in the discriminator.

Figure 2: Configurations of CASP6 target T0209 before and after local minimization inside ProteinShop. The intensity of color shows the relative magnitude of the AMBER energy terms for each atom. Classifier: Unity. Resolution: 2.6 texels per angstrom. Radial specifier: 1 times Van der Waals. Input Channel: Subset sum. Normalizing interval: [0,508.3].
Image T209-before-and-after

Unity is the default classifier, defined as $L(i,1) = 1, i \in [0,n)$, which basically applies the same color function to the entire molecule. The configuration shown in Figure 2 was locally optimized inside ProteinShop by our energy plug-in. The initial state was used to initialize the normalizing interval (the Calculate automatically toggle was set), and that normalizing interval was also used for the final state (i.e. the toggle was cleared). You can use the Record Dialog to establish the normalizing interval to use for animating an entire minimization run using one of the iterations as a baseline. Or, you can simply flip back and forth between interesting points.

Figure 3: Two views of 1pgx showing gradients over hydrogen bond sites. Top: atoms that belong to bonded amide groups are blue, bonded carboxyl groups are red, all other atoms are green. Bottom: atoms not belonging to bonded dipoles are invisible. Classifier: Hydrogen Bond. Resolution: 4.06 texels per angstrom. Radial specifier: 1 times Van der Waals. Input Channel: Gradient norm. Normalizing interval: automatic.
Image 1pgx-hbond

The Hydrogen Bond classifier distinguishes atoms belonging to dipoles forming hydrogen bonds from the others. Figure 3 shows two views of 1pgx made with this classifier that are identical except in their energy rendering. The utility of the invisible color function is demonstrated by its use in this case, because the dipole atoms are small in number. The force fields of atoms from small classes can be overwhelmed or obscured by large numbers of atoms in other classes.

Figure 4: Different configuration of 1pgx showing gradients over ball-and-stick geometry with the Corey-Pauling-Koltun (CPK) color scheme. Atoms belonging to hydrophilic residues are blue, hydrophobic orange, and unclassified residues at the ends of the chain are green. Classifier: Phobic-Philic. Resolution: 4.08 texels per angstrom. Radial specifier: 1.5 times Van der Waals. Input Channel: Gradient norm. Normalizing interval: automatic.
Image 1pgx-phobic-philic

The Phobic-Philic classifier distinguishes atoms belonging to hydrophobic residues from those belonging to hydrophilic residues, and both of these from atoms whose residues are neither hydrophobic nor hydrophilic. A larger Radius multiplier was used for Figure 4 to support a better understanding of the overall shape of the molecule. This classifier can be used to evaluate the effects of solvation terms in the force field.

Adding classifiers and color functions

The classifiers and color functions were implemented in a highly modular way that makes the process of adding new functions to the source code almost trival. The actual time required depends on the complexity of the function, but a rich set of classifiers can easily be created in minutes based on ProteinShop's existing functionality. The four files involved in this process are AtomClassifier.cpp, AtomClassifier.h, ColorFunction.cpp, and ColorFunction.h. We offer an example for the classifier; the color function is analogous but simpler.

The default classifiers are implemented as singleton instances in AtomClassifier.cpp, made available through a static factory method (AtomClassifier::get()). The user interface uses this factory call in conjunction with AtomClassifier::numClassifiers() to initialize the menu in the Energy Rendering Dialog. To add a new classifier to the system, one only needs to define a new subclass of AtomClassifier and add it to the singleton array in AtomClassifier.cpp, where one finds a code template of the following form:

class Classifier : public AtomClassifier
{
public:

    ~Classifier() {}
    uint classify (const MD::Protein::ChainAtom &atom,
                   const ProteinState &proteinState)
    {
    }
    const char *className (uint classNum)
    {
    }
    const char *name()
    {
    }
    uint numClasses()
    {
    }
};
static Classifier f_Singleton;

To add a new classifier:

  1. Copy and paste this template into the area of AtomClassifier.cpp that contains the other classifiers.

  2. Replace each instance of the string Classifier in the copied template with the name of your new subclass. Give a unique name to the singleton instance.

  3. Fill in the function definitions.

  4. Add the address of the singleton instance to the factory array.

The name functions are important to the user interface; they are used to fill in the menu lists and dialog labels. The numClasses() function needs to return one higher than the largest possible value that can be returned by the classify() function. The classify() function can, in turn, be anything you need for it to be. You can also add any new members (functions or data) that your functionality requires. You can also add new constructors, but this will probably be be more complex; see below.

As a simple example, let's implement a new classifier called ExampleClassifier and add it to the system. This classifier will distinguish atoms belonging to Proline residues from all others. As such, it makes two classes, called Proline and Not Proline. The classifier itself will be called the Proline Partition. Following steps 1, 2, and 3, we modify the template to produce the following code:

class ExampleClassifier : public AtomClassifier
{
public:

    ~ExampleClassifier() {}
    uint classify (const MD::Protein::ChainAtom &atom,
                   const ProteinState &proteinState)
    {
        Protein::Residue *residue = atom.getResidue();
        Protein::Residue::AminoAcid type = residue->getType();
        if ( type == Protein::Residue::PRO ) return 0;
        else return 1;
    }
    const char *className (uint classNum)
    {
        if ( classNum == 0 ) return "Proline";
        else if ( classNum == 1 ) return "Not Proline";
        else return "<parameter range error>";
    }
    const char *name()
    {
        return "Proline Partition";
    }
    uint numClasses()
    {
        return 2;
    }
};
static ExampleClassifier f_exampleSingleton;

We take care to ensure that the names returned by className() correspond to the results produced by classify(). In the classify() function, the classifier queries the atom's residue object to determine whether or not it is the amino acid Proline. Fortunately (consulting Protein.h), class Protein::ChainAtom provides direct access to its associated Protein::Residue instance via the getResidue() method. So, all we have to do is call this function, get the type of amino acid, and evaluate it.

This brings us to step 4, which requires updating the variables f_numDefaultClassifiers and f_defaultClassifiers. First, increase the value of f_numDefaultClassifiers by one, and then add the address of the singleton instance to the end of the array. Thus, if the array used to look like this:

static const uint f_numDefaultClassifiers = 3;
static AtomClassifier *f_defaultClassifiers[f_numDefaultClassifiers] = {
    &f_unitySingleton,
    &f_phobiPhilicSingleton,
    &f_hBondSingleton
};

The updated array will look like this:

static const uint f_numDefaultClassifiers = 4;
static AtomClassifier *f_defaultClassifiers[f_numDefaultClassifiers] = {
    &f_unitySingleton,
    &f_phobiPhilicSingleton,
    &f_hBondSingleton,
    &f_exampleSingleton
};

There is a large resource of information contained in class Protein that can be exploited in this manner, but other objects associated with the protein can be consulted as well, via the ProteinState argument. For example, you can tap into the settings of its ProteinRenderer, or examine its energy components or force field gradient vectors through its EnergyCalculator. Note that, to use the EnergyCalculator, you need the atom's linear index, which is not the same as its atomIndex property that comes from the PDB file. However, this information can be extracted from the ProteinState object - as can everything that ProteinShop knows about the atom. The fields of struct ProteinState are defined in Globals.h. Hint: to get the linear index, iterate with Protein::ConstAtomIterator or add a new property to Protein::ChainAtom. The latter approach is faster, but it increases the storage requirement.

Adding a constructor to the classifier might compilicate the process somewhat, while adding additional methods will almost certainly require additional coding outside of AtomClassifier.cpp. We will address these issues separately.

Constructors: If you add a new constructor, and this constructor is safe to execute during the static initializer, you can follow the implementation pattern set forth above. If your new constructor takes no arguments, you can follow the pattern above exactly. If your new constructor takes arguments, and your arguments are available when the static initializer runs, and the number of different ways the constructor can meaningully be called is fairly limited, you can add one singleton for each of these possible constructor calls. For example, if your constructor takes an enumerated type with three possible values, you can define one singleton for each possible value of the enumeration:

static ExampleClassifier2 f_x2fooSingleton (ENUM_VALUE_FOO);
static ExampleClassifier2 f_x2barSingleton (ENUM_VALUE_BAR);
static ExampleClassifier2 f_x2bazSingleton (ENUM_VALUE_BAZ);

static const uint f_numDefaultClassifiers = 7;
static AtomClassifier *f_defaultClassifiers[f_numDefaultClassifiers] = {
    &f_unitySingleton,
    &f_phobiPhilicSingleton,
    &f_hBondSingleton,
    &f_exampleSingleton,
    &f_x2fooSingleton,
    &f_x2barSingleton,
    &f_x2bazSingleton
};

Additional methods: If your constructor is much more complicated than that, or if it requires additional configuration after construction, you will need to add intialization code somewhere else in ProteinShop and add the classifier(s) to the factory with the secondary initializer AtomClassifier::add(). (You can also remove these via AtomClassifier::remove()). Whenever you call the add() or remove() methods for classifiers or color functions, the list of objects available in the factory changes, so you will also need to reinitialize the user interface widgets that display these lists to the end user (no nice event notification model here, it's all done with callbacks). These lists are loaded and reloaded by the function initializeVolumeDialogChoices() (close to the top of ProteinFltk.cpp). At the time of this writing, this function is only called from loadEnergyLibrary(), but you should call it any time after changing the name of a function or classification, or after making one or more calls to add() or remove() in the classifier or color function factories.

The reason these hooks were built into the system is to allow the creation of user-editable functions. Such a function would provide its own user interface (accessed either from a menu or through an existing dialog) and allow the user to create modifiable instances of some general function type. It is in support of such features that you might need to put additional methods in your subclass. As a simple example, a classifier that partitions the elements into two sets might allow the user to edit the membership of these sets by means of a checkbox list. As a more complex example, the editor of a compound classifier might allow the user to specify one input classifier, and then associate each element of that input's range with another classifier. Note that, in both of these examples, it is probably easier to hard-code them than build a fancy GUI, unless large numbers of different instances (partitions or compounds, in the cases of these examples) are commonly in use.

Of course, the implementation of such nice things will undoubtedly lead to demands for more features, like generalized loading and saving of classifier and color function instances to files, etc. You might want to build new interfaces on top of the existing ones to get some code reuse out of that.



Clark Crawford 2005-04-24