
Overview ![]()
Quantitative structure-activity relationships (QSAR) represent an
attempt to correlate structural or property descriptors of compounds
with activities. These physicochemical descriptors, which include
parameters to account for hydrophobicity, topology, electronic
properties, and steric effects, are determined empirically or, more
recently, by computational methods. Activities used in QSAR include
chemical measurements and biological assays. QSAR currently are being
applied in many disciplines, with many pertaining to drug design and
environmental risk assessment.
The Early Years ![]()
QSAR date back to the 19th century. In 1863, A.F.A. Cros at the
University of Strasbourg observed that toxicity of alcohols to
mammals increased as the water solubility of the alcohols decreased
[1]. In the 1890's, Hans Horst Meyer of the University of Marburg and
Charles Ernest Overton of the University of Zurich, working
independently, noted that the toxicity of organic compounds depended
on their lipophilicity [1,2].
Linear Free Energy Relationships
![]()
Little additional development of QSAR occurred until the work of
Louis Hammett (1894-1987), who correlated electronic properties of
organic acids and bases with their equilibrium constants and
reactivity. Consider the dissociation of benzoic acid:
![]()
Hammett observed that adding substituents to the aromatic ring of
benzoic acid had an orderly and quantitative effect on the
dissociation constant. For example,
![]()
a nitro group in the meta position increases the dissociation
constant, because the nitro group is electron-withdrawing, thereby
stabilizing the negative charge that develops. Consider now the
effect of a nitro group in the para position:
![]()
The equilibrium constant is even larger than for the nitro group in
the meta position, indicating even greater electron-withdrawal.
![]()
Now consider the case in which an ethyl group is in the para
position:
![]()
In this case, the dissociation constant is lower than for the
unsubstituted compound, indicating that the ethyl group is
electron-donating, thereby destabilizing the negative charge that
arises upon dissociation.
![]()
Hammett also observed that substituents have a similar effect on the
dissociation of other organic acids and bases. Consider the
dissociation of phenylacetic acids:
![]()
Electron-withdrawal by the nitro group increases dissociation, with
the effect being less for the meta than for the para substituent,
just as was observed with benzoic acid. The electron-donating ethyl
group decreases the equilibrium constant, as would be expected.
![]()
Data for these equilibria typically are graphed as illustrated
below:

Figure 1: Example of a graph for a linear free energy
relationship. K0 or K0' represent equilibrium
constants for unsubstituted compounds and K or K', for substituted
compounds. Values for the abscissa are calculated from the
dissociation constants of unsubstituted and substituted benzoic acid.
Values for the ordinate are obtained from another organic acid or
base with identical patterns of substitution, in this case
phenylacetic acid.
![]()
Because this relationship is linear, the following equation can be
written:
![]()
where
is the slope of the line. The values for
the abscissa in Figure 1 are always those for benzoic acid and are
given the symbol,
. Therefore, we can write:
![]()
, the slope of the line, is a proportionality constant
pertaining to a given equilibrium. It relates the effect of
substituents on that equilibrium to the effect of those substituents
on the benzoic acid equilibrium. That is, if the effect of
substituents is proportionally greater than on the benzoic acid
equilibrium, then
> 1; if the effect is less than
on the benzoic acid equilibrium,
< 1. By
definition,
for benzoic acid is equal to 1.
![]()
is a descriptor of the substituents. The magnitude of
gives the relative strength of the
electron-withdrawing or -donating properties of the substituents.
is positive if the substituent is
electron-withdrawing and negative if it is electron-donating.
![]()
These relationships as developed by Hammett are termed linear free
energy relationships. Recall the equation relating free energy to an
equilibrium constant:
![]()
That is, the free energy is proportional to the logarithm of the
equilibrium constant. These linear free energy relationships are
termed "extrathermodynamic". Although they can be stated in terms of
thermodynamic parameters, no thermodynamic principle states that the
relationships should be true.
![]()
To develop a better understanding of these relationships, it is
instructive to consider some values of
and
. Values of
are provided
below:
In the aniline and phenol equilibria, the hydrogen ion that is
dissociating is one atom removed from the phenyl ring, whereas in the
benzoic acid equilibrium it is two atoms removed. Thus, substituents
are able to exert a greater effect on the dissociation in aniline and
phenol than in benzoic acid and the value of
> 1.
In phenylacetic and phenylpropionic acids, the hydrogen ion
dissociating is three and four atoms removed, respectively, from the
phenyl ring. Substituents are able to exert a lesser effect on the
equilibrium than on the benzoic acid equilibrium and
< 1.
![]()
Some illustrative values of
for
substituents in the meta and para positions are given below:
By definition,
for hydrogen is 0. The positive
values of
for the nitro group indicate that it is
electron-withdrawing. In understanding the magnitudes of the
values for the nitro group in meta vs. para
positions, consider the mechanisms of electron withdrawal or
donation. For a nitro group in the meta position, electron-withdrawal
is due to an inductive effect produced by the electronegativity of
the constituent atoms. If only induction were operative, one would
expect the electron-withdrawing effect of a nitro group in the para
position to be less than in the meta position. The larger value for a
para-substituted nitro group results from the combination of both
inductive and resonance effects. The resonance structures for
para-nitrobenzoate are illustrated below.
For chlorine, the electronegativity of the atom produces an inductive electron-withdrawing effect, with the magnitude of the effect in the para position being less than in the meta position. For chlorine, only the inductive effect is possible. The methoxy group can be electron-donating or -withdrawing, depending on the position of substitution. In the meta position, the electronegativity of the oxygen produces an inductive electron-withdrawing effect. In the para position, only a small inductive effect would be expected. Moreover, an electron-donating resonance effect, as illustrated below, occurs for the methoxy group in the para position, giving an overall electron-donating effect.
Tables of
values for numerous substituents
have been published [3,4]. In some cases, the sigma values are
generally applicable to many different equilibria. In other cases,
sigma values have been derived for specific equilibria, which is
particularly true when one considers sigma values for ortho
substituents.
![]()
Applications of the Hammett Equation
![]()
Illustrative examples of the application of the Hammett relationship
will be presented. The first is the prediction of the pKa of
ionization equilibria. Recall the relationship
![]()
Therefore,
![]()
which for benzoic acid is
![]()
Consider the substituted benzoic acid
Given
=0.71 for nitro groups and
=-0.13 for methyl groups, we calculate pKa=2.91,
which compares favorably with the experimental value of 2.97.
![]()
The second example illustrates the applicability of Hammett's
electronic descriptors in a QSAR relating the inhibition of bacterial
growth by a series of sulfonamides,
where X represents various substituents [5,6]. A QSAR was developed
based on the
values of the substituents,
![]()
where C is the minimum concentration of compound that inhibited
growth of E. coli. From this relationship, we see that
electron-withdrawing substituents favor inhibition of growth.
![]()
Hansch Analysis
![]()
QSAR based on Hammett's relationship utilize electronic properties as
the descriptors of structures. Difficulties were encountered when
investigators attempted to apply Hammett-type relationships to
biological systems, indicating that other structural descriptors were
necessary.
![]()
Robert Muir, a botanist at Pomona College, was studying the
biological activity of compounds that resembled indoleacetic acid and
phenoxyacetic acid, which function as plant growth regulators. In
attempting to correlate the structures of the compounds with their
activities, he consulted his colleague in chemistry, Corwin Hansch.
Using Hammett sigma parameters to account for the electronic effect
of substituents did not lead to meaningful QSAR. However, Hansch
recognized the importance of the lipophilicity, expressed as the
octanol-water partition coefficient, on biological activity [7]. We
now recognize this parameter to provide a measure of the
bioavailability of compounds, which will determine, in part, the
amount of the compound that gets to the target site.
![]()
Relationships were developed to correlate a structural parameter
(i.e., lipophilicity) with activity. In some cases, a univariate
relationship correlating structure and activity was adequate. The
form of the equation is:
![]()
where C is the molar concentration of compound that produces a
standard response (e.g., LD50, ED50). With other data, it was
observed that correlations were improved by combining Hammett's
electronic parameters and Hansch's measure of lipophilicity using an
equation such as
![]()
where
is the Hammett substituent parameter and pi
is defined analogously to
. That
is,
![]()
Significance of Slopes and Intercepts in Hansch
Analysis
Consideration has been given to the significance of the slopes in univariate QSAR involving the correlation of logP with toxicity. For example, an analysis of data in which the lysis of erythrocytes (hemolysis) by various neutral organic compounds (e.g., alcohols, carboxylic acids, amines, phenols, esters) was studied yielded the general equation:
Note that the slope of this equation is approximately 1 and that the intercept is approximately 0. Many other QSAR involving nonspecific toxicity also show correlations of logP with toxicity with a slope near 1. However, a number of QSAR involving neutral organic compounds have slopes considerably less than 1. The reasons for this phenomenon are not entirely clear. One analysis of the problem involves a consideration of the meaning of hydrophobicity at the molecular level. That is, hydrophobicity can be considered to be due largely to the free energy change associated with the desolvation of a compound as it moves from an aqueous phase to the biological phase with which it interacts to produce toxicity (e.g., entering a membrane, binding to a protein, etc.). It appears that the slope of the regression equation is related to the desolvation of the compound that must occur when it interacts at its target site. When the slope of the equation is approximately 1, the environment of the biological phase appears to be similar to that of octanol; partitioning of a compound from water into octanol would require complete desolvation of the compound.
For some of the examples of QSAR with a slope less than 1, the effect being measured is a result of ligands binding to proteins or DNA. For example, for the denaturation of horse heart cytochrome c by amides,
and for denaturation of T-4 phage DNA by alcohols,
In these cases, the ligands may only need to be partially desolvated for binding to occur. With partial desolvation, the free energy change is less than for complete desolvation, which may account for the slopes being less than 1.
The significance of the intercepts in these equations also has been considered to some extent. One example involves hemolysis by cationic and anionic compounds. An analysis of several sets of data revealed a mean intercept of 2.92 for cations and 2.11 for anions. Recall that for neutral compounds, the intercept is near 0. These data are interpreted to indicate that the charged compounds have a greater hemolytic potency than neutral compounds.
Parabolic Relationships with logP
The QSAR discussed thus far have involved linear relationships between logP and toxicity. In other cases, parabolic relationships between biological response and hydrophobicity have been observed as illustrated in the figure.
The line in the figure is a least squares fit of a parabola (logC = a logP - b (logP)2 +c) to the data points. The significance of this observation is that an optimum hydrophobicity may exist. One interpretation to account for this observation is that many membranes may have to be traversed for compounds to get to the target site, and compounds with the greatest hydrophobicity will become localized in the membranes they encounter initially, thereby slowing their transit to the target site.
Steric Effects
Thus far the contributions of electronic and hydrophobic parameters in QSAR have been discussed. The third major factor that often must be considered in QSAR involves steric effects. For studies involving reactivity of organic compounds, a steric parameter, Es, was defined by Taft as
where k is the rate constant for the acid hydrolysis of esters of the type
The transition state for this hydrolysis can be represented as
Assuming the electronic effects of substituent X can be ignored, the size of X will affect the transition state and hence the rate of reaction. By definition Es = 0 for X=H. Tables of values of Es for other substituents are available.
Another parameter that is related to molecular volume and steric effects is the molar refractivity (MR). Experimentally it is obtained from the equation
where n = index of refraction; d = density; MW = molecular weight.
Steric effects can be particularly difficult to define in complex biological systems. QSAR in biological systems have been developed using parameters such as Es and MR. In addition, factors such as van der Waals radii, standard bond angles and lengths, and conformational flexibility have been applied as a way to define the space occupied by molecules. However, it is often difficult to define a single parameter that can account for all of these factors. A more recent treatment of steric effects that is being applied to biological systems is comparative molecular field analysis (CoMFA). This approach, which examines and superimposes the conformations of molecules of interest, is an extension of ligand-based drug design which is described below.
![]()
QSAR are now developed using a variety of parameters as descriptors
of the structural properties of molecules. Hammett sigma values are
often used for electronic parameters, but quantum mechanically
derived electronic parameters also may be used. Other descriptors to
account for the shape, size, lipophilicity, polarizability, and other
structural properties also have been devised. A
QSAR
database has been established at Pomona College that summarizes
over 6000 datasets of biological and chemical QSAR.
![]()
Drug Design
![]()
Researchers have attempted for many years to develop drugs based on
QSAR. Easy access to computational resources was not available when
these efforts began, so attempts consisted primarily of statistical
correlations of structural descriptors with biological activities.
However, as access to high-speed computers and graphics workstations
became commonplace, this field has evolved into what is often termed
rational drug design or computer-assisted drug design.
![]()
We will discuss the application of QSAR to drug design, some examples
of which relied primarily on statistical correlation and some, on
computer-based visualization and modeling. An early example of QSAR
in drug design involves a series of 1-(X-phenyl)-3,3-dialkyl
triazenes.
These compounds were of interest for their anti-tumor activity, but
they also were mutagenic. QSAR was applied to understand how the
structure might be modified to reduce the mutagenicity without
significantly decreasing the anti-tumor activity. Mutagenic activity
was evaluated in the Ames test, and from those data, the following
QSAR was developed:
![]()
where C is the molar concentration required to give 30 revertants per
108 bacteria and
is a
"through resonance" electronic parameter [8,9]. From the equation, it
is seen that factors that favor mutagenicity are increased
lipophilicity and electron-donating substituents.
![]()
Studies of the anti-tumor activity were done against L1210 leukemia
in mice. From the data, the following QSAR was developed:

where C is the molar concentration of compound producing a 40%
increase in life span of mice, MR is molar refractivity, which is a
measure of molecular volume, and EsR is a steric parameter for the R
group [10]. Based on these equations, mutagenicity is more sensitive
than anti-tumor activity to the electronic effects of the
substituents. Thus, electron-withdrawing substituents were examined,
as illustrated in the example below:
By substituting a sulfonamide group at the para position, the
anti-tumor activity was reduced 1.2-fold, whereas the mutagenicity
was reduced by about 400-fold.
![]()
Computer-Assisted Design
![]()
Computer-assisted drug design (CADD), also called computer-assisted
molecular design (CAMD), represents more recent applications of
computers as tools in the drug design process. In considering this
topic, it is important to emphasize that computers cannot substitute
for a clear understanding of the system being studied. That is, a
computer is only an additional tool to gain better insight into the
chemistry and biology of the problem at hand.
![]()
In most current applications of CADD, attempts are made to find a
ligand (the putative drug) that will interact favorably with a
receptor that represents the target site. Binding of ligand to the
receptor may include hydrophobic, electrostatic, and hydrogen-bonding
interactions. In addition, solvation energies of the ligand and
receptor site also are important because partial to complete
desolvation must occur prior to binding.
![]()
This approach to CADD optimizes the fit of a ligand in a receptor
site. However, optimum fit in a target site does not guarantee that
the desired activity of the drug will be enhanced or that undesired
side effects will be diminished. Moreover, this approach does not
consider the pharmacokinetics of the drug.
![]()
The approach used in CADD is dependent upon the amount of information
that is available about the ligand and receptor. Ideally, one would
have 3-dimensional structural information for the receptor and the
ligand-receptor complex from X-ray diffraction or NMR. The ideal is
seldom realized. In the opposite extreme, one may have no
experimental data to assist in building models of the ligand and
receptor, in which case computational methods must be applied without
the constraints that the experimental data would provide.
![]()
Based on the information that is available, one can apply either
ligand-based or receptor-based molecular design methods. The
ligand-based approach is applicable when the structure of the
receptor site is unknown, but when a series of compounds have been
identified that exert the activity of interest. To be used most
effectively, one should have structurally similar compounds with high
activity, with no activity, and with a range of intermediate
activities. In recognition site mapping, an attempt is made to
identify a pharmacophore, which is a template derived from the
structures of these compounds. It is represented as a collection of
functional groups in three-dimensional space that is complementary to
the geometry of the receptor site.
![]()
In applying this approach, conformational analysis will be required,
the extent of which will be dependent on the flexibility of the
compounds under investigation. One strategy is to find the lowest
energy conformers of the most rigid compounds and superimpose them.
Conformational searching on the more flexible compounds is then done
while applying distance constraints derived from the structures of
the more rigid compounds. Ultimately, all of the structures are
superimposed to generate the pharmacophore. This template may then be
used to develop new compounds with functional groups in the desired
positions. In applying this strategy, one must recognize that one is
assuming that it is the minimum energy conformers that will bind most
favorably in the receptor site. In fact, there is no a priori
reason to exclude higher energy conformers as the source of
activity.
![]()
The receptor-based approach to CADD applies when a reliable model of
the receptor site is available, as from X-ray diffraction, NMR, or
homology modeling. With the availability of the receptor site, the
problem is to design ligands that will interact favorably at the
site, which is a docking
problem.
![]()
Example of CADD
![]()
Carbonic anhydrase
![]()
Carbonic anhydrase catalyzes the reaction , the
hydration of some aldehydes and ketones, and the hydrolysis of alkyl
and aryl esters. It is a zinc-containing enzyme of about 30,000
daltons, and the three-dimensional structure has been characterized
by X-ray diffraction. Physiologically, carbonic anhydrase is involved
in gastric, urinary, pancreatic, lacrimal, and cerebrospinal
secretions. Inhibitors of carbonic anhydrase include aromatic and
heterocyclic sulfonamides, and some of these compounds have found
application as diuretics.
![]()
Both traditional QSAR and computer graphical methods have been
applied to the development of sulfonamides and other compounds as
inhibitors of carbonic anhydrase. For example, Hansch et al. [11]
developed a QSAR based on the binding constants of 29
phenylsulfonamides to the enzyme. The equation that was derived was
the following:
![]()
where K is the binding constant, I1=1 if X is meta and 0
otherwise, and, I2 = 1 if X is ortho and 0 otherwise.
![]()
The negative coefficients of I1 and I2 suggest
that they account for unfavorable steric effects when substituents
are in the meta or ortho positions. Binding is favored by
electron-withdrawing substituents, which is consistent with the
hypothesis that the ionized form of -SO2NH2 binds to the zinc in the
active site of carbonic anhydrase [12].
![]()
Interactive computer graphics also was applied to understand better
the interaction of carbonic anhydrase inhibitors with the enzyme as
illustrated in Figure 2 [11].

Figure 2: Active site of carbonic anhydrase containing the
inhibitor MTS
[(4S-trans)-4-(methylamino)-5,6-dihydro-6-methyl-4H-thieno
(2,3-B)thiopyran-2-sulfonamide-7,7-dioxide]. The image was prepared
from the PDB file 1cin.pdb.
The active site is a cavity approximately 12 Angstroms deep with a
zinc atom (magenta) near the bottom of the cavity. The active site is
divided into a hydrophilic half (blue) and a hydrophobic half (red).
In the complex, the inhibitor appears to be bound such that the
sulfonamide moiety occupies the fourth coordination site of the zinc
atom, with the other three sites being occupied by histidine
residues. For subsequent discussion, note that the active site is
much larger than is required to accommodate an inhibitor of this
size.
![]()
Receptor-based drug design incorporates a number of molecular
modeling techniques, one of which is docking. The Kuntz research
group [13] applied their DOCK program to the identification of
compounds that may inhibit carbonic anhydrase. Structures of two of
the candidates are shown below.
These molecules are considerably larger than the arylsulfonamides
that traditionally are used as carbonic anhydrase inhibitors. In
fact, no arylsulfonamides were identified as potential inhibitors in
this study. These results probably arise because scoring of
candidates was based on the size and shape of the molecules. These
large candidates can engage in a greater number of favorable
interactions within the large carbonic anhydrase active site than can
the smaller arylsulfonamides. More recent versions of DOCK allow
scoring based on force fields, which include both van der Waals and
electrostatic interactions [14]. These results with DOCK illustrate
the potential for programs such as this one to search objectively for
ligands than are complementary to receptor sites, thereby assisting
researchers in identifying potential drugs than may be considerably
different from existing drugs. As yet, the efficacy as drugs of these
candidates identified by DOCK has not been demonstrated.
Applications of Other Modeling
Techniques ![]()
Once potential drugs have been identified by the methods described
above, other molecular modeling techniques may then be applied. For
example, geometry optimization may be
used to "relax" the structures and to identify low energy
orientations of drugs in receptor sites.
Molecular dynamics may assist
in exploring the energy landscape, and free energy simulations can be
used to compute the relative binding free energies of a series of
putative drugs.
References ![]()
[1] Borman, S. (1990) New QSAR Techniques Eyed for Environmental
Assessments. Chem. Eng. News 68: 20-23.
[2] Lipnick, R.L. (1986) Charles Ernest Overton: Narcosis Studies and
a Contribution to General Pharmacology. Trends Pharmacol. Sci. 7:
161-164.
[3] Hansch, C., Leo, A., and Taft, R.W. (1991) A Survey of Hammett
Substituent Constants and Resonance and Field Parameters. Chem. Rev.
91: 165-195.
[4] Hansch, C., Leo, A., and Hoekman, D. (1995) Exploring QSAR -
Hydrophobic, Electronic, and Steric Constants. American Chemical
Society, Washington, D.C.
[5] Seydel, J.K. (1966) Prediction of in Vitro Activity of
Sulfonamides, Using Hammett Constants or Spectrophotometric Data of
the Basic Amines for Calculation. Mol. Pharmacol. 2: 259-265.
[6] Hansch, C. (1974) Drug Research or the Luck of the Draw. J. Chem.
Ed. 51: 360-365.
[7] Hansch, C. (1969) A Quantitative Approach to Biochemical
Structure-Activity Relationships. Acct. Chem. Res. 2: 232-239.
[8] Venger, B.H., Hansch, C., Hatheway, G.J., and Amrein, Y.U. (1979)
Ames Test of 1-(X-Phenyl)-3,3-dialkyltriazenes. A Quantitative
Structure-Activity Study. J. Med. Chem. 22: 473-476.
[9] Hansch, C. (1984-85) The QSAR Paradigm in the Design of Less
Toxic Molecules. Drug Metab. Rev. 15: 1279-1294.
[10] Hatheway, G.J., Hansch, C., Kim, K.H., Milstein, S.R., Schmidt,
C.L., Smith, R.N., and Quinn, F.R. (1978) Antitumor
1-(X-Aryl)-3,3-dialkyltriazenes. 1. Quantitative Structure-Activity
Relationships vs. L1210 Leukemia in Mice. J. Med. Chem. 21:
563-574.
[11] Hansch, C., McClarin, J., Klein, T., and Langridge, R. (1985) A
Quantitative Structure-Activity Relationship and Molecular Graphics
Study of Carbonic Anhydrase Inhibitors. Mol. Pharmacol. 27:
493-498.
[12] Kumar, K., King, R.W., and Carey, P.R. (1974) Carbonic Anhydrase
- Aromatic Sulfonamide Complexes, A Resonance Raman Study. FEBS Lett.
48: 283-287.
[13] DesJarlais, R.L., Sheridan, R.P., Seibel, G.L., Dixon, J.S.,
Kuntz, I.D., and Venkataraghavan, R. (1988) Using Shape
Complementarity as an Initial Screen in Designing Ligands for a
Receptor Binding Site of Known Three-Dimensional Structure. J. Med.
Chem. 31: 722-729.
[14] Meng, E.C., Shoichet, B.K., and Kuntz, I.D. (1992) Automated
Docking with Grid-Based Energy Evaluation. J. Comput. Chem. 13:
505-524.
Copyright © 1997 David R. Bevan
All Rights Reserved
Dept. of Biochemistry
Virginia Tech
Comments to drbevan@vt.edu
Last Update: 3/22/97