William A Beresford MA, D Phil ©
Professor of Anatomy
Anatomy Department, West Virginia University, Morgantown, USA


This Chapter is still as written in 1992, in order not to delay putting the book online. The amount of recent primary literature to read and try to turn into teaching topics was too much for now. However, an update is in the works
The identity of cells, i.e., the character or phenotype that each cell has, expresses itself in histological appearance and specific functions, e.g., organelle-free acidophilic cytoplasm and oxygen transport for the RBC. Compiling these characterisations has been the the meat of what has been outlined so far.
Molecular species, e.g., involucrin, uroplakin, etc. have been mentioned only as materials enabling particular tasks. It is time to tackle differential gene expression, or how a cell comes by its unique profile of specialized molecules.
Such molecular explanations are becoming a necessary part of cardiology, gastroenterology, immunology, oncology, surgery, and so forth. For example, cancer cells rearrange their genes, causing unusual, disruptive, and fatal expressions of materials; heart-muscle molecules change in disease; and lives are made difficult, if not miserable, by genes defective from conception. By starting now, one should be able to keep up with the changing complexity of molecular analyses of what cells are up to, and how molecular diagnosis and intervention will aid medical practice.

Histology stays in the picture because in situ hybridization and immunocytochemistry let one see some of the molecular action in relation to individual, identifiable cells and organelles.
In experimental testing for gene function with mutations, deletions, knock-outs, and excess gene dosage, histology reveals the altered phenotype at cell, tissue, and organ level, if there are phenotypic consequences.


1 Protein species provide the key to a given cell's nature and repertoire of activities - its phenotype. The proteins may be very abundant, e.g., keratin intermediate filaments in terminal keratinocytes, or minor in amount, but potent, as in the case of enzymes involved in the synthesis of a hormone or neurotransmitter.

2 Proteins are large molecules, distinctively shaped to offer regions or domains for interaction with other molecules. They achieve their eventual size, shape, and ability to act chemically, initially by the linear joining of specific amino acids in set sequences, based upon the informational content of DNA and RNA nucleotide sequences.

3 Proteins can be cell type-specific (CTS) in three ways:

  1. The protein is present in only one cell type - absolute specificity.
  2. The protein occurs in a few kinds of cell, but in each type is slightly modified to constitute an isoform of the protein - isoform specificity.
  3. The protein is found in several cell types, but its abundance is much greater in one or two types than in the others - quantitative 'specificity'.
'Tissue-specific' substitutes for CTS where one cell type predominates, e.g., in cardiac muscle.

4 In practice, there are hundreds of proteins that meet one or more of these criteria. For instance, alkaline phosphatase has separate isoforms in gut and placenta, a third isoform that is plentiful in bone, liver and kidney - the B/L/K isoform, and a fourth, similar to the placental, but occurring in thymus and testis.

5 The theoretical position is that a cell's molecular identity is represented by the constellation of special cell type-specific/luxury proteins, underpinned by a pattern of levels of housekeeping (basic-function) proteins, common to most cells of the species.


                    - bps   IIIIIIIIIIIIIII*IIIIIIIIIIIIIIII + bps
                                upstream      downstream

           gene's regulatory region        start site of transcription                    
  ___ _______________ _____________________|coding region --->____________________________________________________
5'__//_____ENHANCER_//______P R O M O T E R__________EXON1 INTRON1 EXON2 INTRON2 EXON3____________________________3'
*   $        #   distal         %     proximal           @                                                                 *

                                                    >start of translation                 cleavage & poly-A site
                                           5' untranslated                            > 3 ' untranslated region
Fig. 14 Regulation of a gene               region of RNA +
* 5' and 3' refer to carbon positions in nucleotides, and hence to nt attachment and DNA orientation
# Enhancers and repressors may be distant or close
$ The breaks // in the DNA keep the enhancer in view
% There may be more than one promoter
@ Number of exons and introns varies by gene, including no introns. Introns are transcribed, but spliced out to create the mRNA
+ The 5' UTR may influence translational efficiency


Moving now to the general cell-type-specifying mechanism of differential gene activation: how is a gene chosen for expression?
1 DNA accessibility to transcription: DNA undergoes localised changes in its binding to histones of the nucleosomes, and in the methylation of C-G cytosines.
2 The aim is for a transcription complex centred on RNA polymerase II to bind to the DNA of the gene to be transcribed, to initiate transcription, and to continue it, until a stop codon is met. Why does the polymerase bind here? And what else is needed?
3 RNA polymerase II binds and starts because: 4 Why is the transcription complex at the 5' end of this gene, and not of another (Fig. 14)? 5 The activating power of individual TFs is usually weak, and may be + or -. Several TFs in combination must be bound, and fall exactly into place, to create a transcriptional complex that transcribes.

6 In sum, the phenotypes of cells reflect the varied activities performed, special proteins subserve the functions, and selective gene control furnishes the proteins; hence the spectrum of cell types derives from the repertoire of combinations of transcription factors.
Because of the high informational content and synergistic/antagonistic possibilities of TF combinations, far fewer regulatory factors are needed than there are genes to be controlled.
Also, a restricted number of factors makes it easier to bring the production of the phenotype's many CTS proteins into play at roughly the same time - coordinated regulation. But, there is still a need for 'master' TFs to take the lead.

7 A consensus sequence in DNA is detected: either by the high number of nucleotides held in common with another sequence that is known to bind a TF; or the TF binds to a newly studied region of DNA, which sequencing then reveals to have most of the known binding sequence. As these lines of inquiry proceed, the idea gains power as: (i) other similar (homologous) sequences are found to bind the TF; (ii) it becomes evident that, even where binding-region nucleotides differ, there are restrictions on the differences, e.g., only purine substitutions are seen. A sample consensus sequence is GTTAATNATTAAC for hepatocyte nuclear factor 1, where N stands for any nucleotide.


TFs have devices to stabilize their shape to present an alpha helix to bind the DNA in a sequence-specific way, domains for pairing with other TFs as dimers, and domains for activating transcription by other protein-protein interactions. The classification of TFs is currently based on the structures concerned with DNA-binding and making dimers, rather than the transcription-activating or -silencing domains.

1 Types
1 Leucine-zipper - aligned ridges of leucine-rich regions on two such TFs (the same or different) join to create the 'zipper' union. The leucines are lined up so, because they occur every seventh residue along each coil. Nearby, is a basic region in the TF to bind to the DNA. The dimerization of TFs so created: (i) multiplies their instructional power, with 'allowed' and 'non-allowed' combinations; and (ii) presents the DNA-binding domains to match the DNA's shape.
2 Helix-loop-helix (HLH) - A basic DNA-binding domain lies adjacent to two alpha helices (13 & 15 amino acids (AA) long), separated by a loop (5-20 AA). The HLH region mediates oligomer formation between TFs, which can change the DNA-binding preferences. Several bHLH TFs recognize the sequence CANNTG.
3 Homeodomain is around 60 AA, arranged in a helix-turn-helix DNA-binding conformation. It came to notice through genetic-molecular studies of the products of homeotic genes controlling insect development.
POU domain comprises a 75-82 AA POU-specific domain, a variable link, and a 60 AA POU homeodomain: all involved in binding to DNA.
Why POU? The first TFs where the domain was noticed were Pit-1 (in pituitary cells), Oct-1 (general) and Oct-2 (B lymphocytes), and a TF controlling the nematode's gene unc-86. [Genes' names are in italics; their protein products in roman.] The octamer TFs bind to the 8-nt sequence ATTTGCAT.
4 Zinc-finger, C2-H2 - a zinc ion, tetrahedrally linked to pairs of appropriately spaced cysteines and histidines, creates short loops of amino acids (the fingers) to interact with the DNA.
5 Zinc-finger, C2-C2 , is a different (cysteine only), zinc-centred structure used to construct two fingers, which fold together and help orient the alpha-helical 'DNA-recognition' domains. The steroid/thyroid/retinoid receptors employ this motif. Attachment of the hormone ligand brings about the receptors' dissociation from heat shock protein 90, and movement into the nucleus, where they bind as dimers.

2 What controls TFs?
Positive and selective regulation
1 As proteins, their regulation can be at typical places in the general sequence of protein synthesis, e.g., transcription, alternative splicing of mRNA, protein stability, etc.
2 Auto-regulation, by the TF activating transcription of its own gene, e.g., for MyoD 1, Pit-1, which helps maintain and stabilize the phenotype specified by the TF, and renders the cell less dependent on the outside stimuli that evoked the phenotype.
3 Dimerization: homo- and heterodimerization of TFs.
4 Ligand activation, e.g., the binding of steroid and thyroid hormones and retinoids causes their receptors to be moved into the nucleus, and to activate transcription. The DNA sequence to which the receptor-ligand complex attaches is a 'something' response element, e.g., oestrogen RE (ERE); thyroid RE (TRE); and the CRE allows genes to be controlled by the CREB TFs stimulated by cyclic AMP.
5 Phosphorylation of TFs can induce DNA-binding, e.g., by CREB, or transcriptional activation, e.g., by Oct-2.

Negative regulation
6 Heterodimerization, e.g., the Id factor has a HLH, but no basic region to bind DNA. When Id forms heterodimers with bHLH TFs, e.g., MyoD, binding to DNA is blocked.
7 Competitors for DNA binding - competitive inhibition, e.g., NF-kappaB binds to the CCAAT box of the foetal g-globin gene, obstructing CP1's activation of the gene.
8 Inactivation by bound protein factors that do not prevent DNA binding - quenching. NF-kB's control of an Ig light chain gene in B cells is prevented by a cytoplasmic protein IkappaB, which detains NF-kB in the cytoplasm, until the IkB is phosphorylated.
9 Non-translation of TF mRNA, e.g., Pit-1 mRNA is made, but not translated, in corticotrophs and gonadotrophs.
10 A great excess of one factor in solution may so tie up its normal binding partner, another TF, that the latter is unavailable for participating in the transcription complex - squelching.
11 The TF itself inhibits transcription as a silencer TF [negative regulation by, not of the TF], e.g., thyroid hormone receptor alone (without ligand) can bind to the TRE, causing a repression of transcription.


The following steps provide levels of possible regulation: decision points in the overall choice of how much of what kind of protein is to be formed by the cell.
1 Extra- and intracellular signalling, with signalling molecules, receptors, signal-transduction machinery, binding proteins, and transport into the nucleus.
2 DNA accessibility to signals, regulatory factors, and the polymerization apparatus.
3 Transcription: pre-initiation, initiation, elongation, and termination.
4 RNA processing of the primary transcripts to make mRNA.
5 Stabilization of the mRNA.
6 Transport of the mRNA to the ribosomes in the cytoplasm.
7 Use and re-use of the mRNA in translation to protein sequences.
8 Direction of the protein to sites for post-translational modification, e.g., cleavage, glycosylation, phosphorylation, addition of prosthetic groups, e.g., haeme to globin.
9 The use of chaperones for the stability and folding of the protein.
10 Intracellular storage or degradation of the product.


1 The above progression creates a hierarchy of control points: if no primary RNA is transcribed, post-transcriptional controls are redundant; if a mRNA is made unstable, post-translational influences are superfluous.
2 For most CTS proteins, the prime control is at transcription.
3 The mechanisms can act in concert, thus as transcription is increased, the mRNA produced may be made more stable, and translational and post-translational efficiencies improved.
4 Signals from outside the cell act not only on transcription, but on the other steps, and upon the intracellular signalling pathways, which include feedback loops and network interactions.
5 Many cell type-specific products are constructed by means other than differential transcription: one gene yields more than one protein or polypeptide.

6 Significant examples whereby one gene results in different products are:

7 Variants of a protein can derive from multiple genes. These can differ slightly in their coding region, but markedly in how and when they are regulated, and may be scattered over different chromosomes, e.g., non-muscle myosin heavy chains A & B on 22 & 17 respectively. On the other hand, a family of genes can be close together on the same chromosome, may share some controls, and be in a developmentally meaningful order 5' to 3', e.g., the complex of beta globin genes on chromosome 11 is under the control of a distant upstream 'locus control region'. But genes do not have to be on the same chromosome to be regulated coordinately.


1 Although it is possible to pick out several abundant luxury proteins on a two-dimensional electrophoresis gel, the regulation of a protein's synthesis has to be studied one protein at a time. The underlying assumption is that far fewer than a hundred proteins can illustrate the general principles of regulation; and that by looking at eight or so CTS proteins in hepatocytes or skeletal muscle cells, one can conclude that since five, say, proteins are synthesized in coordination (they appear at the same time in development, and are extinguished together in de-differentiation), and three proteins are not, one can conclude that coordinate regulation occurs, but is not obligatory; and one has to go on examining proteins case by case.
2 What goes on in humans may not be exactly what transpires in animal cells, and transformed human ones that are not above living and multiplying in plastic dishes, but is close.
3 Cells acquire their identity in stages, controlled by sequences of signals and cell-cell interactions. Cells continue to respond to their environment as their activities are controlled to fit in. Where is the line between control of ongoing activities, and regulation of the phenotype to be maintained as the means to execute the activities?
4 What is known about cells is patchy, and varies in amount: much for hepatocytes, far less for pericytes.
5 In considering differentiation, the properties common to cells also have significance, but attention is seized by the differences. Likewise, quantitative differences are less inspiring then qualitative ones, although probably not that much further from the truth of cell differentiation.
6 Ubiquitous cells - fibroblasts, endothelial and smooth muscle cells, and macrophages - are adapted to the local needs of each organ that they serve in: there is no single hard-and-fast cell phenotype.


Questions for a given cell type are: What are the CTS proteins? And in what sense: absolute, isoform, quantitative?
For each, at what stage of synthesis is the primary control? When is it transcriptional? What are the cis aspects - the regulatory regions and sequences of DNA? And what are the corresponding trans-acting factors - the TFs - in terms of: their class (e.g., bHLH vs. homeodomain, specific versus general), dimerization, regulation, and what is special about the circumstances, e.g., the role of growth factors.
These questions form the basis for Table 6 presenting a few results for some cell types. The point is to have a small armamentarium of informed molecular questions with which to confront issues of cell phenotype.
Viewed in total, there is a daunting jungle of interactions among a host of sometimes cryptically abbreviated entities. In practice, investigators take them on one cell type and one gene at a time, and then look for evidence of coordination.

Cell type          DNA: position & Sequence         Cell type-specific       
& gene                                              transcription factors

Skeletal muscle
fast skeletal    Enhancer internal regulatory       MyoD, myogenin, Myf-5
troponin I       element (IRE) in intron 1, with    bind the MRF
                 25 bps muscle reg.factor-binding
                 sequence (MRF)

Thyroid follicular cell
thyroglobulin    Sites A, B, C, & K in promoter     TTF-1 bind A,B, & C
                 (-168 to -42); a consensus         TTF-2 binds to K
                 sequence for TTF-1

Pituitary lactotroph
prolactin       Proximal enhancer has four sites    Pit-1
                (-200 to -38); distal enhancer
                also has 4 sites (-1718 to -1386)

 ? globin       Proximal promoter (+12 to -60)      NF-E1
                including CCAAT; distal
                promoter (-252 to -226)

albumin         Promoter (-185 to -74) with CCAAT   HNF-1
                proximal element (PE) -62 to -45
                distal element II (-123 to -110)

a-          Promoter regions I through V        NF-1 binds IA; 
fetoprotein     (-1 to -839); enhancers at -2.5,    C/EBP - IB & V
                -50, & -6.5 kb;                     HNF-1 & C/EBP to II
                repressor at -250 to -836           NP-III binds III
                                                    NP-IV  binds IV

ApoB-100        Proximal promoter sequences         C/EBP to more distal
lipoprotein     -169 to -152 & -86 to -61           AF1 to more proximal
The full 1992 table included the necessary: animal species for the protein; type of the CTS TF ; general/ubiquitous TFs participating; and references. All will be given in the coming version.

More points on transcription-factor action

  1. One gene can have multiple binding sites for one TF.
  2. One CTS T factor can be used in the control of many CTS genes, e.g., hepatatocyte NF-1 for albumin, fibrinogens, a1-antitrypsin, a-fetoprotein, & transthyretin.
  3. There can be several different CTS TFs for the activation of one gene; and one factor can be used before another during development, e.g., MyoD precedes myogenin.
  4. Negative regulation by TFs, rather than at the chromatin level, is not uncommon. One use is to repress expression in the adult cell of an embryonically active gene, e.g. a-fetoprotein.
  5. A so-called cell-type-specific TF can be used by closely related cells, e.g., in erythrocytes and megakaryocytes.


1 Cell-specific gene regulation goes on under the influence of hormones, extracellular-matrix components, growth factors, etc. Such factors affect phenotype, and are not just physiological modulators of levels of activities, whose nature is specified once and for all when the cell first becomes terminally differentiated.
2 The regulation of one cell phenotype is very complicated, given the many CTS genes to be set, the numerous TFs, and the many levels of control for each protein, including the TFs. It is a little early to recognize the integrating mechanisms that make the task manageable for the cell, but they are starting to take shape as temporal and spatial patterns of homeodomain gene expression.
3 Is this all too high-flown for clinicians? More elaborate versions of the above table are appearing in the journals of clinical research. For example, the table in Eckert RL et al. The epidermis: genes on - genes off. J Invest Dermatol 1997;109:501-509. It covers many genes and transcription factors, for just keratinocytes of the epidermis, and paves the way for strategies of diagnosis and treatment, just a few years off.
4 The goal is to target therapy at the molecular controls on the activity of particular cells. Histology, with its approaches and methods, is there to show one whether the molecularly corrected cell is also now working properly in its cell-to-cell and organ contexts.
William A Beresford, Anatomy Department, School of Medicine, West Virginia University, Morgantown, WV 26506-9128, USA - - e-mail: -- wberesfo@wvu.edu -- wberesfo@hotmail.com -- beresfo@wvnvm.wvnet.edu -- fax: 304-293-8159