Signaling expanse.
The expansion of knowledge in molecular biology is matched only by the scale of modern astronomical surveys in astrophysics. The end of last century brought the completion of the human genome project. We now have a nearly complete assembly of the human genome, with more than 20,000 protein-coding genes, with at least ten times as many more splice isoforms, and yet another exponential increase in diversity by post-translational protein modifications. We have observed at least 200 different types of cells in various tissues and organs in our bodies, and each new year brings discovery of yet others. The function of our organs and tissues is defined in large part by the expression of distinct molecules with specific activities in individual cells. What is the physical basis for order in such a complex system?
Insofar as living things reproduce, and no human starts from nothing, one source of order is a collective of molecules—the fundamental replicator, as formalized by Dawkins—that sustains reproduction of living things. The physical vehicles for such replicators may be the unknowable original molecular collective, or single cells as we know them today, or differentiated multicellular organisms like humans. In evolutionary terms, a genetic replicator and its physical vehicle coupled with natural selection are sufficient to explain the diversity of living things and their ecological adaptations. However, the human body and its cells are considerably more complex than a prototypical replicator.
The extent of organization of biochemical reactions comprising cellular metabolism of human cells can be appreciated from a simple estimation: assuming a reactant binding time of about 1 hundredth of a second, as limited by bimolecular binding in water, and a cell of at least 100,000 gene products constituting at least 100,000 chemical reactions, a random sequential metabolic process will take 20 years. As with Levinthal’s paradox of protein folding, the biological untenability of this random search means that relatively disjointed chemical reactions constituting cellular metabolism are organized. This results from either the preorganization of the initial state such that the search is not random, or from self-organization of the search process so that the search is not sequential.
At the cellular level, we now know that this organization is accomplished by cellular compartmentalization with functionally related reactants being restricted to particular organelles and spatial domains. At the molecular level, organization of metabolically related enzymes into supramolecular multifunctional assemblies and consequent metabolic channeling among active sites lead to both pre- and self-organization of the metabolic search. For example, more than half of the enzymes of the tricarboxylic acid cycle responsible for the consumption of sugars and production of energy-carrying metabolites in our cells, including malate dehydrogenase, citrate synthase, succinate dehydrogenase, and fumarase, function in mitochondria in the context of a supramolecular multifunctional enzyme. Global analyses of flux control coefficients of biochemical reactions of intermediary metabolism and networks of protein-protein interactions that partially constitute them suggest that cellular proteins are organized in mesoscopic assemblies composed on average of about 7 proteins and 5 chemical reactions. Such nodal architecture and its hierarchic topology mean that cellular metabolism is highly organized and interconnected. This also suggests that efficiency and coherence of its operation and control are the results of such an arrangement. Based on this, many have proposed that biological systems are generally modular.
One attractive possibility is that the molecular order in our cells is generated by template patterning. For example, in the case of DNA replication that accompanies cell division needed for the growth of our bodies, the reproduction of the ordered sequence of nucleotides that comprise each of our cellular genomes is a direct product of template patterning. DNA polymerases incorporate free nucleotides from disordered pools in our cells into ordered progeny chains by their pairing with the original parent molecules that exist prior to cell division. In contrast, the organization of proteins and other biological molecules into supramolecular assemblies and assemblies into cellular compartments cannot be templated, at least not completely. Certainly, in the case of oligomeric molecular complexes, where the assembly of the complex is nucleated by an oligomer that provides a template, parent complexes in dividing cells can provide templates for the organization of their progeny from newly formed subunits. Such complexes have been observed, but are in general vanishingly rare in human cells.
Most cellular complexes are formed by heterologous subunits, be they proteins, RNA, or other biological molecules. Ribosomes that are responsible for the translation of all our messenger RNAs into proteins are formed by more than 80 distinct molecular components. Their organization is specified by energetic interactions, van der Waals, Coulombic, and others, entropic favorability of desolvation as macromolecules compact into organized structures releasing free water and salt molecules. And most importantly, they are patterned through the organization by selection of configurations and interactions. The physics of this process is similar to the self-organization of biological macromolecules themselves, with the main difference being that the interactions are inter- as opposed to intra-molecular. Otherwise, their orderly binding is explained by statistical thermodynamics, with physiologic thermal energy and ligand concentrations providing the requisite energy for orderly assembly. The binding search is organized by the selection of stabilizing interactions out of an ensemble of those diversified by energetic sampling. It is indeed survival of the fittest, where the fit is physical, selected by energetic binding by virtue of its biological function.
Recent studies have also clarified the molecular mechanisms that regulate the interactions of molecules in cells. By and large, this regulation is accomplished by chemical modifications of interacting surfaces, with specific modifications blocking or promoting specific interactions. For example, most protein-protein interactions in our cells are regulated by the post-translational modifications of amino acid sidechains. This includes phosphorylation of serines, threonines and tyrosines, methylation of lysines and arginines, and so on. Recent mass spectrometry studies also demonstrate that human proteins can have more than a hundred distinct chemical modifications. While the biological functions of the majority of these modifications are currently unknown, many are essential to normal physiology.
Current measurements show that the majority of human proteins are phosphorylated at one or more sites, with many enzymes that catalyze some of these phosphorylation events being essential for life. Many have referred to the cellular processes of protein phosphorylation and the signals that these biochemical events convey as networks, pathways, and programs. These metaphors suggest the operation of a deterministic process, with direct correspondence between stimulus and outcome. It is indeed possible that cellular signal transduction in our bodies operates just like a computer. Ever since von Neumann proposed his computer model, and mechanical computation has entered virtually all spheres of our lives, the idea of universal computation is very tempting. Wolfram’s cellular automata are a striking example, leading to his recent proposal of a “theory of everything” based on computability. Here, I pose another explanation—using a shared evolutionary physics idea—based on the energetic favorability of stochastic sampling.
Just as conformational selection is responsible for the self-organization of biological molecules, and binding selection is responsible for the organization of molecular interactions in cells, biological signaling can be similarly organized by a selection process. Enzymes are known to catalyze post-translational protein modifications on diverse substrates. In the case of protein phosphorylation, kinases phosphorylate most human proteins, at least as evident from recent surveys. This is of course not entirely random: kinases have binding preferences, specified by the composition and structure of their surfaces and catalytic sites. However, it is entropically unlikely that these protein modifications are deterministic. The entropic cost of specifying protein interactions and their biologic function deterministically would require astronomically high inputs of energy, leading to metabolic demands that compromise their evolutionary fitness and ecologic feasibility.
Instead, entropic considerations favor stochastic sampling to diversify binding interactions. For example, those that promote biologic functions are selected through binding; the majority are otherwise stochastically diversified molecular ensembles as substrates of selection. Such a model is favored entropically in that a large fraction, if not the vast majority, of events are not deterministic. It is also supported by current experimental studies of specific kinases, which can phosphorylate diverse substrates in addition to those known to mediate their specific biologic functions in cells.
In this sense, biological order and function are self-selected. It is also favored by evolution insofar as genetic drift in amino acid composition of protein sequences would generate substitutions that can physically mimic post-translationally modified amino acids. For example, imagine a scenario where genetic substitution of alanine for glutamate physically stabilizes a particular protein-protein interaction. Its subsequent evolution to serine, associated with serine phosphorylation that is physically similar to glutamate in terms of physical size and electrostatic charge, would provide the basis for both evolutionary selection of serine phosphorylation, and interaction selection of phosphoserine binding. Such scenarios are observed frequently in experimental studies.
This suggests that many post-translational modifications of proteins can be functionally neutral, but those that contribute to biologic signaling can be selected and stabilized. This selection and stabilization can be both energetic and evolutionary: stochastic sampling of physically diversified molecules to select for biologic function, and natural selection of genetic variants to evolve fit alleles. In some cases, this can also involve feedback regulation, where induction of a post-translational modification would contribute to its biological selection and stabilization. This may explain many signaling processes in our cells, with negative and positive feedback loops contributing essential functions to signal transduction during development and homeostasis. Indeed, many characteristic features of biologic signaling, including robustness against noise, oscillatory and other dynamics, and potential for bistability and other regulatory states are specifically generated by such nested feedback loops.
Diversification and selection of chemical modifications as the basis for cellular signaling should also explain several key features of human cell biology. Many essential human proteins exhibit complex combinations of post-translational modifications. For example, histone proteins that form chromatin and package the DNA of our genomes have more than 20 distinct sites that can be variably phosphorylated, acetylated, methylated, sumoylated, ubiquitylated, and so on. There is recent evidence that specific modifications on specific amino acids are required for distinct biological functions in the control of gene expression and cell fate. For example, acetylation of lysine 27 on histone 3 contributes to the induction of high levels of expression of genes decorated with this histone modification. On the other hand, methylation of the same histone lysines promotes silencing of the bound genes. However, the combined methylation of lysines 27 and 4 maintains genes in a poised state, primed for de-repression. Distinct combinations of histone modifications have been proposed to be associated with distinct consequences for gene expression. Given multiple different histone proteins, the resultant combinatorial complexity has been proposed to involve the “histone code.”
The number of possible combinations of specific molecular modifications scales geometrically with the power of their number. Do twenty to the nth power possible combinations of modified histones determine specific biological states? This estimate is of course incomplete, as many human RNAs are also hypermodified, with dozens of modified bases in their nucleotide chains, and does not account for many other biologic macromolecules in our cells. For 10 modifications occurring on 20 sites, which is a rather conservative toy estimate, the total number of possible combinations is 100 billion billion (100 quintillion). This is not quite equal to the current estimate of the total number of stars in the visible universe (1 septillion), but approaches equivalence in terms of complexity. Can biology be as expansive as astronomy?
If evolutionary physics apply, as one would favor based on thermodynamic and genetic considerations, the combinatorial complexity of histone and other biological macromolecular modifications provides a diversified ensemble of molecular states subject to conformational and binding selection. This may be simple configurational selection through Boltzmann sampling. Alternatively, as with signaling feedback, specific combinations of histone modifications may lead to the binding of specific cofactors that are stabilized via feedback loops induced by distinct patterns of gene expression.
In the case of signaling that controls the growth and behavior of many types of somatic cells in our bodies, this model is supported by several observations. One of the first signaling pathways identified in human and other eukaryotic cells is the mitogen-activated or serum response signaling pathway. This is a process regulated by the binding of soluble protein growth factors, produced by neighboring cells, to cellular receptor proteins that activate intracellular kinase signaling to induce the expression of specific genes, leading to cell growth. The intracellular process of signal transduction is itself an enzymatic cascade that mediates sequential and branching protein phosphorylation. The activity of this pathway is regulated by feedback signaling that dampens kinase activity using protein and phosphatase regulators in response to activation. Concomitantly, signaling activity is also stabilized by positive feedback loops via transcription factors that cooperate in the induction of serum response and mitogen-activated genes, and ultimate cell growth. This is central to the development of many of our tissues, including heart, brain, and blood, to their regeneration and homeostasis during normal life, and many diseases of dysregulated somatic tissue growth such as cancer.
Stochastic sampling followed by conformational and binding selection is favored thermodynamically as the fundamental principle of cell signaling in human cells. Such a process is compatible with both neutral and adaptive evolution. While some aspects of cell signaling may have deterministic components, with exact correspondence between stimulus and outcome, many—if not most—should operate via physical sampling and selection. Biologic function, such as cell growth, motility, and regeneration, provides the basis for selection of states that mediate effective signaling, as opposed to those that may have biochemical activity, but do not contribute to the biologic fitness of their constituents and cells. In certain cases, selection may be due to feedback loops that reinforce functional signaling circuits. Such stabilizing molecular feedback loops are reminiscent of genetic replicators in evolutionary systems. Emerging methods for quantitative proteomics of single cells may reveal this in human embryonic stem cell development and tissue differentiation, where control of gene expression is biologically paramount and combinatorial complexity is astronomically high.
Understanding cell signaling in terms of physical statistical mechanics also leads to the recognition of several distinct properties and notable predictions. First, just as genetic variation provides the diversification process for the evolutionary selection of organisms, so can physical variation in substrate binding and enzymatic catalysis induced by thermodynamic fluctuations diversify biologic states for cell activity. In the case of cells at equilibrium, these would be fluctuations around mean probabilities, following Boltzmann statistics. In the case of cells away from equilibrium due to intake of energy, the fluctuations would comprise dissipative structures, described by Prigogine. In the case of kinase signaling, this would mean that a specific kinase would phosphorylate a shared set of substrates in a population of cells, but also additional substrates in phenotypically distinct cell subpopulations. The resultant plasticity can enable cells to achieve functional states needed for the differentiation of stem cells into somatic lineages during body growth and development, phenotypic transitions needed for regeneration during injury and homeostasis, and physiologic responses to stimuli, including those that may be new and not anticipated evolutionarily.
The resultant plasticity and fluctuations can also provide buffering and robustness against noise. This may be particularly important for processes that are subject to environmental fluctuations, but must maintain high fidelity, as is the case for organ and tissue development for example. Such robustness has been extensively documented in experimental models of cell signaling, and the fluctuation-dissipation framework provides the fundamental physical principle, while state selection extends Darwinian selection to molecular systems. Lastly, energetic fluctuations of cell signaling also explain the impossibility of perfect replicators, as originally postulated by Dawkins. Fundamentally, perfect signal transduction and perfect genetic replicators are disfavored entropically, as both require excess energy to convert thermal noise into biological order.
In this sense, the organization of cell signaling from thousands of genes, hundreds of thousands of their splice isoforms, and millions of biochemically modified proteins, not to mention all the other biological macromolecules, need not be templated by parent cells or encoded directly by the human genome. Thermodynamic fluctuations induced by physiologic temperature and environmental conditions can generate diversified molecular states, as defined by inter-molecular binding and catalysis. Their selection through biologic activity stabilizes those with functional outcomes, as opposed to those that are functionally neutral or deleterious, as specified by energetic favorability and ultimate cell survival, growth, and activity. If this state selection sounds reminiscent of conformational molecular selection of protein structure, this is because the underlying physics are shared. Both phenomena contribute to biologic function, which occurs at related scales, for atoms and their electrons in molecules, and for macromolecular assemblies and their biochemical reactions in cells. The associated selection principle enables complex systems to achieve order, as required for biologic function, while optimizing entropic costs. The variation in the details of specific interactions involved in the signaling of different molecules in our cells (peptides versus nucleic acids versus sugars versus lipids) are not known to deviate from this general principle. Now we need to define how different cells are organized in distinct tissues and organs in our bodies, and how this organization functions in normal development and human disease.