William A Beresford MA, D Phil ©
Professor of Anatomy
Anatomy Department, West Virginia University, Morgantown, USA
This Chapter is still as written in 1992, in order not to delay putting the
book online. The amount of recent primary literature to read and try to
turn into teaching topics was too much for now. However, an update is in the works
The identity of cells, i.e., the character or phenotype that each cell has,
expresses itself in histological appearance and specific functions, e.g.,
organelle-free acidophilic cytoplasm and oxygen transport for the RBC.
Compiling these characterisations has been the the meat of what
has been outlined so far.
Molecular species, e.g., involucrin, uroplakin, etc. have been mentioned only as
materials enabling particular tasks. It is time to tackle differential
gene expression, or how a cell comes by its unique profile of specialized
Such molecular explanations are becoming a necessary part of cardiology,
gastroenterology, immunology, oncology, surgery, and so forth. For example,
cancer cells rearrange their genes, causing unusual, disruptive, and fatal
expressions of materials; heart-muscle molecules change in disease; and
lives are made difficult, if not miserable, by genes defective from conception.
By starting now, one should be able to keep up with the changing complexity of
molecular analyses of what cells are up to, and how molecular diagnosis
and intervention will aid medical practice.
Histology stays in the picture because in situ hybridization and
immunocytochemistry let one see some of the molecular action in relation to
individual, identifiable cells and organelles.
In experimental testing for gene function with mutations, deletions, knock-outs, and excess
gene dosage, histology reveals the altered phenotype at cell, tissue, and
organ level, if there are phenotypic consequences.
A PROTEINS AS THE KEY TO CELL IDENTITY
1 Protein species provide the key to a given cell's nature and
repertoire of activities - its phenotype. The proteins may be very abundant,
e.g., keratin intermediate filaments in terminal keratinocytes, or minor in
amount, but potent, as in the case of enzymes involved in the
synthesis of a hormone or neurotransmitter.
2 Proteins are large molecules, distinctively shaped to offer regions or
domains for interaction with other molecules. They achieve their
eventual size, shape, and ability to act chemically, initially by the
linear joining of specific amino acids in set sequences, based upon the
informational content of DNA and RNA nucleotide sequences.
3 Proteins can be cell type-specific (CTS) in three ways:
'Tissue-specific' substitutes for CTS where one cell type predominates, e.g.,
in cardiac muscle.
- The protein is present in only one cell type - absolute specificity.
- The protein occurs in a few kinds of cell, but in each type is
slightly modified to constitute an isoform of the protein - isoform specificity.
- The protein is found in several cell types, but its abundance is much
greater in one or two types than in the others - quantitative 'specificity'.
4 In practice, there are hundreds of proteins that meet one or more of
these criteria. For instance, alkaline phosphatase has separate isoforms in
gut and placenta, a third isoform that is plentiful in bone, liver and kidney
- the B/L/K isoform, and a fourth, similar to the placental, but occurring in
thymus and testis.
5 The theoretical position is that a cell's molecular identity is represented
by the constellation of special cell type-specific/luxury proteins,
underpinned by a pattern of levels of housekeeping (basic-function)
proteins, common to most cells of the species.
B THE ANATOMY OF A GENE
- bps IIIIIIIIIIIIIII*IIIIIIIIIIIIIIII + bps
gene's regulatory region start site of transcription
___ _______________ _____________________|coding region --->____________________________________________________
5'__//_____ENHANCER_//______P R O M O T E R__________EXON1 INTRON1 EXON2 INTRON2 EXON3____________________________3'
* $ # distal % proximal @ *
>start of translation cleavage & poly-A site
5' untranslated > 3 ' untranslated region
Fig. 14 Regulation of a gene region of RNA +
* 5' and 3' refer to carbon positions in nucleotides, and hence to nt
attachment and DNA orientation
# Enhancers and repressors may be distant or close
$ The breaks // in the DNA keep the enhancer in view
% There may be more than one promoter
@ Number of exons and introns varies by gene, including no introns. Introns
are transcribed, but spliced out to create the mRNA
+ The 5' UTR may influence translational efficiency
C TRANSCRIPTIONAL REGULATION
Moving now to the general cell-type-specifying mechanism of differential gene
activation: how is a gene chosen for expression?
1 DNA accessibility to transcription: DNA undergoes localised changes
in its binding to histones of the nucleosomes, and in the methylation of C-G
2 The aim is for a transcription complex centred on RNA polymerase
II to bind to the DNA of the gene to be transcribed, to initiate
transcription, and to continue it, until a stop codon is met. Why does the
polymerase bind here? And what else is needed?
3 RNA polymerase II binds and starts because:
4 Why is the transcription complex at the 5' end of this gene, and not
of another (Fig. 14)?
- (i) the DNA here includes particular nucleotides, e.g., it is rich in -A-T-
sequences (the TATA box), which have bound the first of several
general/basic transcription factors (here the TFIID or TATA factor) to
this core promoter region of the DNA.
What is promoted is the start of transcription; also, one strand of the DNA
has to be chosen.
- (ii) there is, 25 or so basepairs (bps) downstream from the TATA box, a
transcription start-site in the nucleotide sequence;
- (iii) activating units of cell type-specific transcription factors bound
elswhere to the DNA, have caused the attachment of other general TFs and RNA
polymerase II to the transcription initiation complex on the promoter.
5 The activating power of individual TFs is usually weak, and may be + or -.
Several TFs in combination must be bound, and fall exactly into place,
to create a transcriptional complex that transcribes.
- (i) Mechanisms making the DNA locally accessible to regulatory factors.
- (ii) Special sequences in the DNA bind several cell-specific and
general/ubiquitous TFs in combination. The DNA sequences bind factors that favour or
reduce transcription: the sequences are termed enhancers, or
silencers and repressors, respectively.
- (iii) Enhancers and silencers can work from thousands of basepairs away
from the promoter region, upstream or, less often, downstream of the gene's
start site, because the DNA bends and loops to allow the bound enhancer TFs
to participate in the transcription complex: a position-independent action.
Also, most enhancers act regardless of their 5'-3' orientation: orientation
Repressors also impede transcription, but the term implies a sequence
that is not position-independent - it has to lie between an upstream enhancer
and the promoter.
- (iv) The upstream promoter region has sequences that bind either
general or CTS TFs, e.g., CCAAT with Sp1 (general), but here 5'-3'
orientation still matters.
- (v) Silencer, enhancer, and alternative-promoter DNA sequences may be
present in introns or more rarely exons, so although a gene's coding region
has absolute limits, the regulatory region can overlap it.
6 In sum, the phenotypes of cells reflect the varied activities
performed, special proteins subserve the functions, and selective gene control
furnishes the proteins; hence the spectrum of cell types derives from the
repertoire of combinations of transcription factors.
Because of the high informational content and synergistic/antagonistic
possibilities of TF combinations, far fewer regulatory factors are needed than
there are genes to be controlled.
Also, a restricted number of factors makes it easier to bring the production
of the phenotype's many CTS proteins into play at roughly the same time -
coordinated regulation. But, there is still a need for 'master' TFs to take
7 A consensus sequence in DNA is detected: either by the high
number of nucleotides held in common with another sequence that is known
to bind a TF; or the TF binds to a newly studied region of DNA, which
sequencing then reveals to have most of the known binding sequence. As these
lines of inquiry proceed, the idea gains power as: (i) other similar
(homologous) sequences are found to bind the TF; (ii) it becomes evident
that, even where binding-region nucleotides differ, there are restrictions
on the differences, e.g., only purine substitutions are seen. A sample
consensus sequence is GTTAATNATTAAC for hepatocyte nuclear factor 1, where N
stands for any nucleotide.
D TRANSCRIPTION FACTORS
TFs have devices to stabilize their shape to present an alpha helix to bind
the DNA in a sequence-specific way, domains for pairing with other TFs as
dimers, and domains for activating transcription by other protein-protein
interactions. The classification of TFs is currently based on the structures
concerned with DNA-binding and making dimers, rather than the
transcription-activating or -silencing domains.
1 Leucine-zipper - aligned ridges of leucine-rich regions on two such
TFs (the same or different) join to create the 'zipper' union. The leucines
are lined up so, because they occur every seventh residue along each coil.
Nearby, is a basic region in the TF to bind to the DNA. The dimerization
of TFs so created: (i) multiplies their instructional power, with 'allowed'
and 'non-allowed' combinations; and (ii) presents the DNA-binding domains to
match the DNA's shape.
2 Helix-loop-helix (HLH) - A basic DNA-binding domain lies adjacent to
two alpha helices (13 & 15 amino acids (AA) long), separated by a loop (5-20
AA). The HLH region mediates oligomer formation between TFs, which can change
the DNA-binding preferences. Several bHLH TFs recognize the sequence CANNTG.
3 Homeodomain is around 60 AA, arranged in a helix-turn-helix
DNA-binding conformation. It came to notice through genetic-molecular studies
of the products of homeotic genes controlling insect development.
POU domain comprises a 75-82 AA POU-specific domain, a variable
link, and a 60 AA POU homeodomain: all involved in binding to DNA.
Why POU? The first TFs where the domain was noticed were Pit-1 (in
pituitary cells), Oct-1 (general) and Oct-2 (B lymphocytes), and a TF
controlling the nematode's gene unc-86. [Genes' names are in italics;
their protein products in roman.] The octamer TFs bind to the 8-nt sequence
4 Zinc-finger, C2-H2 - a zinc ion, tetrahedrally linked to pairs of
appropriately spaced cysteines and histidines, creates short loops of amino
acids (the fingers) to interact with the DNA.
5 Zinc-finger, C2-C2 , is a different (cysteine only), zinc-centred
structure used to construct two fingers, which fold together and help orient
the alpha-helical 'DNA-recognition' domains. The steroid/thyroid/retinoid
receptors employ this motif. Attachment of the hormone ligand brings about the
receptors' dissociation from heat shock protein 90, and movement into the
nucleus, where they bind as dimers.
2 What controls TFs?
Positive and selective regulation
1 As proteins, their regulation can be at typical places in the
general sequence of protein synthesis, e.g., transcription, alternative
splicing of mRNA, protein stability, etc.
2 Auto-regulation, by the TF activating transcription of its own gene,
e.g., for MyoD 1, Pit-1, which helps maintain and stabilize the phenotype
specified by the TF, and renders the cell less dependent on the outside
stimuli that evoked the phenotype.
3 Dimerization: homo- and heterodimerization of TFs.
4 Ligand activation, e.g., the binding of steroid and thyroid
hormones and retinoids causes their receptors to be moved into the nucleus, and
to activate transcription. The DNA sequence to which the receptor-ligand
complex attaches is a 'something' response element, e.g., oestrogen RE (ERE);
thyroid RE (TRE); and the CRE allows genes to be controlled by the CREB TFs
stimulated by cyclic AMP.
5 Phosphorylation of TFs can induce DNA-binding, e.g., by CREB, or
transcriptional activation, e.g., by Oct-2.
6 Heterodimerization, e.g., the Id factor has a HLH, but no basic
region to bind DNA. When Id forms heterodimers with bHLH TFs, e.g., MyoD,
binding to DNA is blocked.
7 Competitors for DNA binding - competitive inhibition, e.g.,
NF-kappaB binds to the CCAAT box of the foetal g-globin gene, obstructing
CP1's activation of the gene.
8 Inactivation by bound protein factors that do not prevent DNA binding -
quenching. NF-kB's control of an Ig light chain gene in B cells is
prevented by a cytoplasmic protein IkappaB, which detains NF-kB in the
cytoplasm, until the IkB is phosphorylated.
9 Non-translation of TF mRNA, e.g., Pit-1 mRNA is made, but not
translated, in corticotrophs and gonadotrophs.
10 A great excess of one factor in solution may so tie up its normal binding
partner, another TF, that the latter is unavailable for participating in the
transcription complex - squelching.
11 The TF itself inhibits transcription as a silencer TF [negative
regulation by, not of the TF], e.g., thyroid hormone receptor alone (without
ligand) can bind to the TRE, causing a repression of transcription.
E STEPS IN PROTEIN SYNTHESIS
The following steps provide levels of possible regulation: decision points in
the overall choice of how much of what kind of protein is to be formed by
1 Extra- and intracellular signalling, with signalling molecules,
receptors, signal-transduction machinery, binding proteins, and transport into the
2 DNA accessibility to signals, regulatory factors, and the
3 Transcription: pre-initiation, initiation, elongation, and
4 RNA processing of the primary transcripts to make mRNA.
5 Stabilization of the mRNA.
6 Transport of the mRNA to the ribosomes in the cytoplasm.
7 Use and re-use of the mRNA in translation to protein sequences.
8 Direction of the protein to sites for post-translational modification,
e.g., cleavage, glycosylation, phosphorylation, addition of prosthetic groups,
e.g., haeme to globin.
9 The use of chaperones for the stability and folding of the protein.
10 Intracellular storage or degradation of the product.
F LEVELS OF REGULATION
1 The above progression creates a hierarchy of control points: if no primary
RNA is transcribed, post-transcriptional controls are redundant; if a mRNA is
made unstable, post-translational influences are superfluous.
2 For most CTS proteins, the prime control is at transcription.
3 The mechanisms can act in concert, thus as transcription is increased, the
mRNA produced may be made more stable, and translational and
post-translational efficiencies improved.
4 Signals from outside the cell act not only on transcription, but on the
other steps, and upon the intracellular signalling pathways, which include
feedback loops and network interactions.
5 Many cell type-specific products are constructed by means other than
differential transcription: one gene yields more than one protein or
6 Significant examples whereby one gene results in different
7 Variants of a protein can derive from multiple genes. These can
differ slightly in their coding region, but markedly in how and when they
are regulated, and may be scattered over different chromosomes, e.g.,
non-muscle myosin heavy chains A & B on 22 & 17 respectively. On the other
hand, a family of genes can be close together on the same chromosome,
may share some controls, and be in a developmentally meaningful order 5' to 3',
e.g., the complex of beta globin genes on chromosome 11 is under the
control of a distant upstream 'locus control region'. But genes do not have
to be on the same chromosome to be regulated coordinately.
- (i) The peptide, pro-opiomelanocortin, is cleaved at different
sites to make ACTH, MSH, and/or an opioid.
- (ii) Alternative splicing of exons, from a common primary transcript,
is a frequent device to vary the mRNA and hence the product, e.g., plasma
fibronectin from hepatocytes lacks the ED-A subunit that is included in
fibroblastic fibronectin; in fact, about 20 isoforms are derived from this
- (iii) Switched promoters: for the chick collagen a2(I) gene,
transcription starts in intron 2, having been switched from the bone/tendon
promoter lying before exon 1 that was used, while the cartilage was still
mesenchyme. The resulting transcripts in cartilage are out of phase for
producing a2(I) collagen and none results.
- (iv) Combined use of alternative promoters and alternative
RNA splicing, e.g., the glucokinase gene has a pancreatic promoter and
first exon upstream to the hepatocytic promoter and first exon
[Pp-E1p-Ph-E1h]: in the
pancreas, transcription is started by the upstream promoter and, of the two
exons 1 transcribed, the second, hepatic, one is spliced out for the
pancreatic mRNA. In the liver, transcription starts at the downstream
promoter, so that the pancreatic exon is not transcribed, and splicing
includes the hepatic exon 1.
- (v) Post-translational variations include: glycosylation of
? [I am still looking for a simple example of where a product is glycosylated
so in one cell type, and otherwise in another.
Lymphocyte subtypes offer too complicated instances for this
- (vi) Only in lymphocytes and cancer cells, and then for only certain
genes, is the product varied by rearranging the DNA.
G CELL PHENOTYPE: UNTIDINESS OF THE CONCEPT
1 Although it is possible to pick out several abundant luxury proteins on a
two-dimensional electrophoresis gel, the regulation of a protein's synthesis
has to be studied one protein at a time. The underlying assumption is that
far fewer than a hundred proteins can illustrate the general principles
of regulation; and that by looking at eight or so CTS proteins in hepatocytes
or skeletal muscle cells, one can conclude that since five, say, proteins are
synthesized in coordination (they appear at the same time in development, and
are extinguished together in de-differentiation), and three proteins are
not, one can conclude that coordinate regulation occurs, but is not obligatory;
and one has to go on examining proteins case by case.
2 What goes on in humans may not be exactly what transpires in animal cells,
and transformed human ones that are not above living and multiplying in
plastic dishes, but is close.
3 Cells acquire their identity in stages, controlled by sequences of signals
and cell-cell interactions. Cells continue to respond to their environment as
their activities are controlled to fit in. Where is the line between control
of ongoing activities, and regulation of the phenotype to be maintained as
the means to execute the activities?
4 What is known about cells is patchy, and varies in amount: much for
hepatocytes, far less for pericytes.
5 In considering differentiation, the properties common to cells also have
significance, but attention is seized by the differences. Likewise,
quantitative differences are less inspiring then qualitative ones, although
probably not that much further from the truth of cell differentiation.
6 Ubiquitous cells - fibroblasts, endothelial and smooth muscle cells, and
macrophages - are adapted to the local needs of each organ that they serve
in: there is no single hard-and-fast cell phenotype.
H EXAMPLES OF THE MOLECULAR CONTROL OF CELLULAR IDENTITY
Questions for a given cell type are: What are the CTS proteins? And in what
sense: absolute, isoform, quantitative?
For each, at what stage of synthesis is the primary control? When is it
transcriptional? What are the cis aspects - the regulatory regions
and sequences of DNA? And what are the corresponding trans-acting
factors - the TFs - in terms of: their class (e.g., bHLH vs. homeodomain,
specific versus general), dimerization, regulation, and what is special
about the circumstances, e.g., the role of growth factors.
These questions form the basis for Table 6 presenting a few results
for some cell types. The point is to have a small armamentarium of informed
molecular questions with which to confront issues of cell phenotype.
Viewed in total, there is a daunting jungle of interactions among a host
of sometimes cryptically abbreviated entities. In practice, investigators
take them on one cell type and one gene at a time, and then look for
evidence of coordination.
Table 6 MOLECULAR REGULATION OF PHENOTYPE IN PARTICULAR CELLS
Cell type DNA: position & Sequence Cell type-specific
& gene transcription factors
fast skeletal Enhancer internal regulatory MyoD, myogenin, Myf-5
troponin I element (IRE) in intron 1, with bind the MRF
25 bps muscle reg.factor-binding
Thyroid follicular cell
thyroglobulin Sites A, B, C, & K in promoter TTF-1 bind A,B, & C
(-168 to -42); a consensus TTF-2 binds to K
sequence for TTF-1
prolactin Proximal enhancer has four sites Pit-1
(-200 to -38); distal enhancer
also has 4 sites (-1718 to -1386)
? globin Proximal promoter (+12 to -60) NF-E1
including CCAAT; distal
promoter (-252 to -226)
albumin Promoter (-185 to -74) with CCAAT HNF-1
proximal element (PE) -62 to -45
distal element II (-123 to -110)
a- Promoter regions I through V NF-1 binds IA;
fetoprotein (-1 to -839); enhancers at -2.5, C/EBP - IB & V
-50, & -6.5 kb; HNF-1 & C/EBP to II
repressor at -250 to -836 NP-III binds III
NP-IV binds IV
ApoB-100 Proximal promoter sequences C/EBP to more distal
lipoprotein -169 to -152 & -86 to -61 AF1 to more proximal
The full 1992 table included the necessary: animal species for the protein; type of the
CTS TF ; general/ubiquitous TFs participating; and references. All will be
given in the coming version.
More points on transcription-factor action
- One gene can have multiple binding sites for one TF.
- One CTS T factor can be used in the control of many CTS genes, e.g., hepatatocyte
NF-1 for albumin, fibrinogens, a1-antitrypsin, a-fetoprotein, &
- There can be several different CTS TFs for the activation of one gene; and
one factor can be used before another during development, e.g., MyoD precedes
- Negative regulation by TFs, rather than at the chromatin level, is not
uncommon. One use is to repress expression in the adult cell of an embryonically active gene,
- A so-called cell-type-specific TF can be used by closely related cells,
e.g., in erythrocytes and megakaryocytes.
I FINAL COMMENTS
1 Cell-specific gene regulation goes on under the influence of hormones,
extracellular-matrix components, growth factors, etc. Such factors affect
phenotype, and are not just physiological modulators of levels of activities,
whose nature is specified once and for all when the cell first becomes
2 The regulation of one cell phenotype is very complicated, given the many
CTS genes to be set, the numerous TFs, and the many levels of control for
each protein, including the TFs. It is a little early to recognize the
integrating mechanisms that make the task manageable for the cell, but
they are starting to take shape as temporal and spatial patterns of
homeodomain gene expression.
3 Is this all too high-flown for clinicians? More elaborate versions of
the above table are appearing in the journals of clinical research. For
example, the table in Eckert RL et al. The epidermis: genes on - genes off.
J Invest Dermatol 1997;109:501-509. It covers many genes and
transcription factors, for just keratinocytes of the epidermis, and paves the
way for strategies of diagnosis and treatment, just a few years off.
4 The goal is to target therapy at the molecular controls on the activity
of particular cells. Histology, with its approaches and methods, is there
to show one whether the molecularly corrected cell is also now working
properly in its cell-to-cell and organ contexts.