Theory of Drug Design
What are drugs?
The vast majority of drugs are small molecules designed to bind, interact, and modulate the activity of specific biological receptors. Receptors are proteins that bind and interact with other molecules to perform the numerous functions required for the maintenance of life. They include an immense array of cell-surface receptors (hormone receptors, cell-signaling receptors, neurotransmitter receptors, etc.), enzymes, and other functional proteins. Due to genetic abnormalities, physiologic stressors, or some combination thereof, the function of specific receptors and enzymes may become altered to the point that our well-being is diminished. These alterations may manifest as minor physical symptoms, as in the case of a runny nose due to allergies, or as life threatening and debilitating events, such as sepsis or depression. The role of drugs is to correct the functioning of these receptors to remedy the resulting medical condition.
As an example, the highest grossing drug in 2000 was Prilosec, which earned $4.102 billion in sales. Prilosec is used to treat stomach ulcers and acid reflux disease. Prilosec targets a specific enzyme, the proton pump, which is located in the acid producing cells lining the stomach wall. This enzyme is responsible for the production of stomach acid. Due to genetic reasons, such as deficient enzymes that regulate acid secretion, or physiologic causes, such as stress, too much acid may be produced. This leads to ulceration of the stomach lining or acid reflux disease and heartburn. Prilosec binds to the proton pump and shuts it down, thereby diminishing the production of stomach acid and its associated symptoms.
The biochemistry of drugs
Enzymes are a subset of receptor-like proteins that are directly responsible for catalyzing the biochemical reactions that sustain life. For example, digestive enzymes act to break down the nutrients of our diet. DNA polymerase and related enzymes are crucial for cell division and replication. Enzymes are genetically programmed to be absolutely specific for their appropriate molecular targets. Any errors could have grave consequences. One can imagine the end result should blood clotting enzymes start activating throughout the body. Or consider the problems that arise when our immune system attacks our own tissues. Enzymes ensure the specificity of their targets by forming a molecular environment that excludes interactions with inappropriate molecules. The analogy most often mentioned is that of a lock and key. The enzyme is a molecular lock, which contains a keyhole that exhibits a very specific and consistent size and shape. This molecular keyhole is termed the active site of the enzyme and allows interaction with only the appropriate molecular targets. Just as a typical lock is much bigger than the keyhole, the receptor is usually much larger than the active site. The receptor, as specified by our DNA, is a folded protein whose major purpose is to form and maintain the size and shape of the active site. This is illustrated in Figure 1 using the structure of the HIV-1 protease.
The most important concept in drug design is to understand the methods by which the active site of a receptor selectively restricts the binding of inappropriate structures. Any potential molecule that can bind to a receptor is called a ligand. In order for a ligand to bind, it must contain a specific combination of atoms that presents the correct size, shape, and charge composition in order to bind and interact with the receptor. In essence, the ligand must possess the molecular key that binds the receptor lock. Figure 2 schematically shows a typical ligand-receptor binding interaction.
Figure 2. Enzyme substrate complementary interactions.
We see here that a putative ligand-receptor interaction must have complementary size and shape. This is termed steric complementarity. As is the case with an actual key, if a different molecule varies by even a single atom in the wrong place, it may not fit properly, and will most likely not interact with the receptor. However, the more closely the fit between the ligand and receptor, the more tightly the interaction becomes. Again, keep in mind that this is only a two-dimensional schematic. Both ligand and active site are volumes with complex three-dimensional shape.
In addition to steric complementarity, electrostatic interactions influence ligand binding. Charged receptor atoms often surround the active site, imparting a localized charge is specific regions of the active site. From physics, we know that opposite charges attract while similar charges repel. Electrostatic complementarity further restricts the binding of inappropriate molecules since the ligand must contain correctly placed complementary charged atoms for interaction to occur.
The main driving force for ligand and receptor binding is hydrophobic interaction. Nearly two-thirds of the body is water, and this aqueous milieu surrounds all our cells. In order for ligand and receptor to interact, there must be a driving force that compels the ligand to leave the water and bind to the receptor. The hydrophobicity of a ligand is what causes this. Hydrophobicity stands for 'water fearing' and is a measure of how 'greasy' a compound is. It can be roughly approximated by the percentage of hydrogen and carbon in the molecule. This force is easily demonstrated by placing a few drops of oil in a cup of water. The oil is composed of hydrocarbon chains and is highly hydrophobic. The oil droplets will instantly coalesce into a single globule in order to avoid the water, which is highly polar. As shown in Figure 2, the active site may contain a mixture of hydrophobic pockets and regions that are more polar. Since the hydrophobic portions of the ligand and receptor prefer to be juxtaposed, the arrangement of hydrophobic surfaces provides yet another way that receptors can limit the binding of inappropriate targets.
As discussed above, there are numerous potential interactions between ligand and receptor. Depending upon the size of the active site, there may be a myriad steric, electrostatic, and hydrophobic contacts. However, some are more important than others. The specific interactions that are crucial for ligand recognition and binding by the receptor are termed the pharmacophore. Usually, these are the interactions that directly factor into the structural integrity of a receptor or are involved in the mechanism of its action. Using our lock and key analogy from above, we can imagine a lock having numerous tumblers. There may be many keys that can sterically complement the lock and fit within the keyhole. However, all but the correct key will displace the wrong tumblers, leading to a sub-optimal interaction with the lock. Only the correct key, which presents the pharmacophore to the receptor, contacts the appropriate tumblers and properly interacts with the lock to open it. This is crucial to the design of pharmaceuticals since any successful drug must incorporate the appropriate chemical structures and present the pharmacophore to the receptor.
This is shown in Figure 3 above. In the upper left frame of this figure, we see our native ligand bound within the active site. Assume that through biochemical investigation, we determine that the phenyl ring (blue) and the carboxylic acid group (green) are vital to receptor interaction. Thus, we deduce that these two groups must be the pharmacophore that a ligand must present to the receptor for binding. In future drugs that we develop to mimic the native ligand, we must include these two pharmacophoric elements for successful binding to occur. This is shown in the upper right derivative compound where a bicyclic group has been substituted. Because it maintains the pharmacophore and retains its complementary size and shape, it has a reasonable chance of successfully binding. However, any drug that we develop which lacks a complete pharmacophore may not interact with the receptor target.
The challenge of drug design
Given our introduction to the biochemistry of ligand receptor binding, we can begin to appreciate the difficulties in designing drugs towards specific target receptors. Table 1 lists the major tasks and concerns in this endeavor.
Table 1. Major tasks and concerns in drug development.
1. Characterize medical condition and determine receptor targets.
2. Achieve active site complementarity: steric, electrostatic, and hydrophobic.
3. Consider biochemical mechanism of receptor.
4. Adhere to laws of chemistry.
5. Synthetic feasibility.
6. Biological considerations.
7. Patent considerations.
When a medical condition exists where a drug could be beneficial, extensive scientific study must first be done in order to determine the biological and biochemical problems that underlie the disease process. This often takes years of study in order to characterize the targets for a potential drug. The reason is that nearly all biological processes in the human body are tightly interconnected. Altering the behavior of select receptors or enzymes may have detrimental effects with other systems. These are the side effects that occur with nearly all drugs. Furthermore, the human body is a homeostatic machine, and always attempts to achieve equilibrium. As a result, the body will attempt to counteract any pharmacotherapeutic intervention.
Once a receptor target has been established and well characterized, the process of ligand design begins. Obviously, the first consideration is that the designed ligand must complement the active site of the receptor target. Steric, electrostatic, and hydrophobic complementarity must be established as we discussed above. The pharmacophore must be presented to the receptor in order for recognition and binding to occur. Otherwise, the designed ligand will have no chance of interacting with the receptor.
Figure 4. Designing ligands to offset enzyme mechanism.
In addition to adequately binding the receptor, the biochemical mechanism of the receptor target must be taken into consideration. This is shown in Figure 4. In this figure we schematically represent the biochemical mechanism of a protease. A protease is an enzyme that cleaves proteins and peptides. In the top part of the figure, we see that a protease recognizes a specific group of atoms, colored in red and blue, called a peptide bond. If the peptide bond is present at a specific position in the active site when the ligand binds, it is cleaved by the protease with the addition of water (H2O) to form two separate fragments. If our goal is to inactivate this protease, any designed ligand cannot possess this peptide bond at the same position. Otherwise, it will simply be cleaved by the protease, and the protease will continue to function unperturbed. However, the ligand can be modified so that the peptide bond is no longer present as shown in the bottom portion of the figure. If this ligand is then bound by the enzyme, the enzyme will not be able to cleave it. As such, the enzyme would be inactivated, as the ligand remains lodged in the active site.
Having characterized the active site region and the mechanism of action of the target receptor, the challenge then becomes one of designing a suitable ligand. This is, by far, the most daunting task of the entire drug design process. The optimal combination of atoms and functional groups to complement the receptor is often the natural ligand of the receptor. Unfortunately, this is usually an unacceptable candidate for a drug. This is because the natural ligand either fails to inactivate the receptor, as described above, or it is a natural substance that cannot be patented. Patent considerations are often paramount, as legal protection for the developed drug affords the opportunity to recoup the financial costs of development. Therefore, alternate combinations of chemical structures must be devised.
The design of novel ligands is often restricted by what chemists are physically able to synthesize. It is of no use to design the ultimate drug if it cannot be manufactured. The laws of chemistry dictate that each atom type has a specific size, charge, and geometry with respect to the number and types of neighboring atoms that it can be joined to. The entire field of chemistry is predicated on the establishment of synthetic rules for the construction and manipulation of various combinations of atoms and functional groups. It is the expertise in these chemical rules that govern the ability of the synthetic chemist to design and synthesize postulated ligand candidates. Within these rules, the drug developer must creatively propose suitable chemical structures that satisfy the requirements discussed above.
Finally, there are biological considerations to the development of new drugs. The liver is the major organ of detoxification in the human body. Any drug that is taken undergoes a number of chemical reactions in the liver as the body attempts to neutralize foreign substances. This set of reactions is well characterized, and a great deal of knowledge exists as to how drugs are modified as the body eliminates them. More importantly, various chemical structures are highly toxic to biological systems, and these are also well characterized. These constraints must also be taken under consideration as novel drugs are developed.
The drug discovery pipeline
As discussed above, the development of any potential drug begins with years of scientific study to determine the biochemistry behind a medical problem for which pharmaceutical intervention is possible. The result is the determination of specific receptor targets that must be modulated to alter their activity in some way. Once these targets have been identified, the goal is then to find compounds that will interact with the receptors in some fashion. At this initial stage of drug development, it does not matter what effect the compounds have on the targets. We simply wish to find anything that binds to the receptor in any fashion.
The modern day drug discovery pipeline is outlined in Figure 5. The first step is to determine an assay for the receptor. An assay is a chemical or biological test that turns positive when a suitable binding agent interacts with the receptor. Usually, this test is some form of colorimetric assay, in which an indicator turns a specific color when complementary ligands are present. This assay is then used in mass screening, which is a technique whereby hundreds of thousands of compounds can be tested in a matter of days to weeks. A pharmaceutical company will first screen their entire corporate database of known compounds. The reason is that if a successful match is found, the database compound is usually very well characterized. Furthermore, synthetic methods will be known for this compound, and patent protection is often present. This enables the company to rapidly prototype a candidate ligand whose chemistry is well known and within the intellectual property of the company.
Combinatorial chemistry is a very powerful technique that chemists can employ to aid in the refinement of the lead compound. Combinatorial chemistry is a synthetic tool that enables chemists to rapidly generate thousands of lead compound derivatives for testing. As shown above in Figure 6, a scaffold is employed that contains a portion of the ligand that remains constant. Subsite groups (shown in red, green, and blue) are potential sites for derivatization. These subsites are then reacted with combinatorial libraries to generate a multitude of derivative structures, each with different substituent groups. One can see how a vast number of compounds can be generated as a result of the combinatorial process. If a scaffold contains three derivatization sites and the library contains ten groups per site, theoretically 1000 different combinations are possible. By carefully selecting libraries based upon the study of the active site, we can target the derivatization process towards optimizing ligand receptor interaction.
If a successful match is found, the initial hit is called a lead compound. The lead compound is usually a weakly binding ligand with minimal receptor activity. The binding of this structure to the receptor is then studied to determine the interactions that foster the ligand-receptor association. If the receptor is water soluble, there is a chance that x-ray crystallographic analysis can be employed to determine the three-dimensional structure of the ligand bound to the receptor at the atomic level. This is a very powerful tool for it allows scientists to directly visualize a snapshot of the individual atoms of the ligand as they reside within the receptor. This snapshot is referred to as a crystal structure of the ligand-receptor complex. Unfortunately, not all complexes can be analyzed in this manner. However, if a crystal structure can be determined, a strategy can then be developed based upon this characterization to improve and optimize the binding of the lead compound. From this point onward, a cycle of iterative chemical refinement and testing continues until a drug is developed that undergoes clinical trials. The techniques most often used to refine drugs are combinatorial chemistry and structure based design.
Structure based design, often called rational drug design, is much more focused than combinatorial chemistry. As shown above in Figure 7, it involves using the biochemical laws of ligand-receptor association discussed above to postulate ligand refinements to improve binding. For example, we discussed that steric complementarity is vital to tight receptor binding. Using the crystal structure of the complex, we can target regions of the ligand that fit poorly within the active site and postulate chemical changes to improve complementarity with the receptor. In a similar fashion, functional groups on the ligand can be changed in order to augment electrostatic complementarity with the receptor. However, the danger in altering any portion of the ligand is the effect on the remaining ligand structures. Modifying even a single atom in the middle of the ligand can drastically change the shape of the overall structure. Even though complementarity in one portion of the ligand might be improved by the chemical revision, the overall binding might be severely compromised. This is the difficulty in any ligand refinement procedure.
Use of computers in drug design
In the early 1990’s there was a great deal of optimism that computer aided drug design would revolutionize the way in which drugs could be developed. The enduring exponential increase in computing power had progressed to the point that rudimentary estimations of ligand receptor complementarity could be performed. Furthermore, computer graphics technology had achieved the ability to generate vector models of chemical structures and manipulate them in real-time. This offered, for the first time, the ability to interactively study computer models of ligand structures and their binding interactions with a receptor.
Concomitant with the development of this technology was the emergence of the AIDS epidemic. During the late 1980s, scientists had isolated the causative agent of AIDS, the HIV-1 virus. Considerable characterization of the viral life cycle provided numerous potential targets for pharmaceutical intervention. Among them was the HIV-1 protease. This aspartyl protease was an enzyme that was unique to HIV, and absolutely required for the processing and maturation of HIV proteins. Thus, if a drug could be developed to inactivate this protease, the virus would be unable to generate mature infectious particles to sustain the infection. Numerous groups around the world rapidly solved the crystal structure of this enzyme (see Figure 1). The mechanism of this enzyme was determined, and the layout of the active site was carefully mapped.
It was known that humans possess similar classes of proteases. Renin is an enzyme secreted by the kidneys that is responsible for initiating a cascade of reactions that regulate blood pressure. It too is an aspartic protease, and ligands that inhibited its function were known. With the wealth of data from the study of the HIV-1 protease, the hope was that this target could be exploited by computer-aided rational drug design to rapidly generate novel AIDS drugs. Computational chemists believed they could circumvent much of the time and effort required for drug synthesis and testing by simply generating novel compounds using the computer. Testing would be replaced by merely calculating the ligand-receptor binding affinity using the physical laws of chemistry. The concept of generating virtual lead compounds entirely through computer simulation was termed denovo design.
Difficulties implementing denovo design
Many of the worlds largest pharmaceutical firms spent millions of dollars on hardware and software in their endeavor to make denovo design a reality. Unfortunately, successes were rare. Except for a few exceptions, denovo design was an utter failure, and did not prove to be an effective method to discover lead compounds. The main reasons were limitations in computing power and the lack of useful software functionality. In scientific computing, accuracy and processing time are always a tradeoff. Thus, in order to make the calculations run in a finite period of time, a plethora of assumptions, significant approximations, and numerous algorithmic shortcuts had to be utilized. This, in turn, greatly diminished the calculated accuracy of any ligand receptor interaction. As such, chemists could postulate numerous chemical structures that could potentially complement the active site; however, the calculated binding had no correlation with reality.
This remains the most significant challenge in denovo design to this day. Although computers have become exponentially faster, the sheer number of calculations needed to accurately predict the binding of a denovo generated ligand to its receptor in a useful timeframe still requires significant approximations. In denovo design, we are attempting to generate a whole ligand from scratch and dock it within the receptor. As stated above, the difficulty lies in predicting how the chemical structure will behave in real life. A ligand is an inherently flexible structure, and can assume a plethora of different conformations and orientations. The big question remains whether the predicted binding structure will mirror the calculated one. Failure in this endeavor has undermined the utility of denovo structure generating software. We will discuss these shortcomings and the technological advances of RACHEL, which attempt to circumvent these deficiencies, in detail below.
The second most significant problem in computer aided denovo design is the generation of undesired chemical structures. There are a nearly infinite number of potential combinations of atoms. However, the vast majority of these structures are of no use. As discussed above, undesired structures are rejected due to toxicity, chemical instability, or synthetic difficulty. Nearly all denovo design software packages are plagued by this problem, especially with respect to synthetic feasibility. Thus, although such software can postulate potential complementary ligands, the vast majority of them are worthless. We will discuss in great detail below how RACHEL attempts to circumvent this problem with newly developed technology.
The end result of these shortcomings was that computer aided denovo design soon fell out of favor as a means of generating viable lead compounds. By the mid 1990’s there had been a tremendous number of denovo software packages released; however, they all suffered these same problems. Gradually, such programs were shelved and investigators looked to other technologies to aid in their drug development efforts.
Rebirth of computer-aided drug refinement
It was at this time that the techniques of mass screening and combinatorial chemistry began to gain widespread acceptance and use. The use of mass screening and combinatorial chemistry allowed researchers to discover lead compounds in a rapid and efficient manner. As such, denovo design tools and their associated problems were no longer needed to generate lead structures. One would surmise that computer-aided drug design technology would have soon ceased to exist. On the contrary, it soon became apparent that computational tools were needed that could optimize these lead compounds into potent drugs.
The concept of drug optimization versus denovo design is an important one. The difficulty with denovo ligand generation is that an entire structure is being created from scratch. The confidence one has of accurately predicting how this structure will interact and bind within a target receptor is shaky at best. In drug optimization, we begin with a lead compound whose bound structure within the receptor has been characterized, most likely through x-ray crystallography. Subtle modifications are then performed to generate derivative compounds using structure based drug design to improve binding affinity. Because we are making much smaller changes, our faith in the validity of the resulting structures is far greater. These derivatives then undergo testing to determine which modifications improve binding. The structures of the best ligands can then be elucidated to verify the accuracy of the modifications. This refinement process continues iteratively until optimal binding ligands are produced.
Since subtle modifications are being made to a common structure, the predictive ability of ligand refinement software is much higher. This is because the effect of a single chemical modification on ligand-receptor binding is far easier to quantitate than an extreme change. No longer are we trying to determine the binding affinities of drastically different structures. Instead, we are simply determining the rank order of a list of derivative compounds. This greatly increases the confidence that proposed structures will bind in a manner consistent with our understanding.
In addition, the act of generating chemical derivatives is highly amenable to computerized automation. Consider the application of targeted structure based combinatorial chemistry as discussed above. Libraries of derivative components are assembled based upon the analysis of the active site. Because of the combinatorial nature of this method, an extremely large number of candidate structures may be possible. A computer can rapidly generate and predict the binding of all potential derivatives, creating a list of the best potential candidates. In essence, the computer filters all weak-binding compounds, allowing the chemist to focus, synthesize, and test only the most promising ligands. Thus, utilizing computer aided drug design software to aid in the refinement of weak binding lead compounds is the most effective manner in which these tools can be employed. The use of computer modeling to refine structures has become standard practice in modern drug design.
The primary utility of a hybrid program is in denovo ligand generation. Figure 11 depicts how these programs work. In the upper left hand corner of this figure, we see an active site with three distinct regions colored in blue, green, and red. The goal of a hybrid program is to generate a complete ligand that complements the active site region. To do so, these programs employ a combination of scanner and builder algorithms in divide and conquer approach. These programs first utilize scanner technology to find components that will complement individual subsites within the active site volume. This is depicted by the components in the blue, green, and red boxes. These individual components are then docked into their respective regions within the active site as shown in the upper right. Splicing fragments, shown in purple, are then used to join these components into a single complete ligand. It is important to note that numerous possible fragments may exist that complement the various active site regions. Thus, a potentially large number of ligands may be generated by combinatorially linking the various components.
As stated above, hybrid-type programs are mainly used for denovo generation of lead compounds. The strength of these programs is in their ability to generate a large number of diverse potential hits. However, they suffer the same shortcomings as all denovo design packages described above. The difficulties in accurately calculating ligand receptor binding affinity are significant. In addition, the combinatorial nature of the algorithm often leads to the generation of chemical structures that are undesirable, unstable, or synthetically difficult. Finally, the developer of the software may bias the generation of compounds. For example, many of these programs place components within the active site using a pre-determined binding algorithm based upon the functional groups that are presented by the receptor. Additionally, the algorithms used to splice the components together greatly affect the generated ligands.