1. The Central Dogma and Why Transcription Matters
Before diving into the mechanics, it’s worth grounding yourself in the big picture. The central dogma of molecular biology describes the flow of genetic information: DNA → RNA → Protein. Transcription is the middle step — the process by which the genetic information encoded in DNA is “read” and copied into RNA. It is an intermediate step in gene expression, not the final product. Critically, only what the cell needs at a given moment is transcribed, making transcription a major point of regulation.
The three main processes you need to master for this module are:
DNA Replication and Repair — the high-fidelity copying of the entire genome before cell division.
RNA Synthesis and Processing (Transcription) — the selective reading of specific genes into RNA molecules.
Protein Synthesis (Translation) — the decoding of mRNA into a polypeptide chain, the actual functional end product.
2. What is a Gene?
The definition of a gene has evolved substantially over time, and you should know all three versions for your exam.
The classic definition describes a gene simply as a portion of DNA that determines a single phenotype — a trait you can observe.
The Beadle and Tatum definition (1940) refined this to “one gene — one protein”: each gene encodes the information needed to produce one protein. This was a landmark insight but turned out to be an oversimplification.
The current definition is the most accurate and the one to learn: a gene is a fragment of DNA (or sometimes RNA) that contains the primary sequence needed to produce a biologically functional gene product, which can be either an RNA molecule or a protein. This is broader because many genes produce functional RNAs that never get translated into protein at all.
3. The Human Genome — Scale and Composition
Understanding the scale of the human genome helps you appreciate why regulation of transcription is so important.
The human genome contains 23 pairs of chromosomes and approximately 3.1 billion base pairs. Within this enormous amount of DNA, there are only 20,000–30,000 protein-coding genes. Remarkably, less than 2% of the genome actually encodes proteins. So what is the rest doing?
Looking at the genomic composition: about 74.9% is intergenic sequence (DNA between genes), 24% is intronic sequence (non-coding sequences within genes), and only 1.1% is exonic (the actual protein-coding portions).
The function of roughly half of all discovered genes is still unknown.
In terms of RNA diversity, the cell produces several types:
Messenger RNA (mRNA) carries the protein-coding message from nucleus to ribosome. It makes up a small fraction of total RNA by mass, but is the most diverse in sequence.
Ribosomal RNA (rRNA) is by far the most abundant, constituting 80–90% of total cellular RNA. It is essential for building ribosomes and catalyzing protein synthesis.
Transfer RNA (tRNA) is the most numerous type by molecule count. Each tRNA carries a specific amino acid to the ribosome, in a sequence dictated by the mRNA template. There is at least one tRNA for each of the 20 amino acids.
Non-coding RNAs (ncRNAs) — there are approximately 80,000 non-redundant non-coding RNA genes in humans, including long non-coding RNAs (lncRNAs), small nuclear RNAs (snRNAs), and microRNAs (miRNAs). These are involved in gene regulation and numerous other cellular processes.
4. RNA — Structure and Chemistry
Before tackling the transcription machinery, you need to understand the product being made.
RNA differs from DNA in three key ways. First, it contains the sugar ribose instead of deoxyribose (ribose has a hydroxyl group at the 2’ carbon, deoxyribose does not). Second, it uses uracil (U) instead of thymine (T) — uracil lacks the methyl group that thymine carries. Third, RNA normally exists as a single strand, not a double helix, although it can fold back on itself to form secondary structures.
RNA is synthesized from ribonucleoside triphosphates (ATP, GTP, CTP, UTP) by the enzyme RNA polymerase, which is DNA-dependent — meaning it reads DNA as a template.
5. The RNA Polymerase Enzyme
RNA polymerase is the central enzyme of transcription. Its complexity varies across organisms.
Viral RNA polymerase is the simplest, consisting of just 1 subunit.
Prokaryotic RNA polymerase has 5 different subunits with a total molecular weight of approximately 450 kDa. The subunit composition is: two α subunits (α₂), one β subunit, one β’ subunit, and one ω subunit. Together these form the core enzyme. When the σ (sigma) subunit joins, you get the holoenzyme (α₂ββ’ωσ), which is the form capable of recognizing and binding promoters.
The roles of each bacterial subunit are worth knowing:
The α subunits (2 copies) serve as a scaffold for assembling the rest of the complex. Their C-terminal domain (αCTD) interacts with transcription factors and upstream promoter DNA to regulate transcription.
The β subunit forms the active site for RNA synthesis. It binds the DNA template strand and incoming ribonucleoside triphosphates (rNTPs), and contributes to the “claw” structure through which DNA threads.
The β’ subunit plays a similar catalytic role to β — together they constitute the active site. It also stabilizes the RNA-DNA hybrid during elongation.
The ω subunit assists in proper folding and structural integrity of the enzyme, though it is not essential for transcription itself.
Eukaryotic RNA polymerases have at least 12 subunits and come in three distinct forms — Pol I, Pol II, and Pol III — each responsible for transcribing different types of RNA.
RNA Polymerase I transcribes the large ribosomal RNA precursor (pre-rRNA), which is processed into the 18S, 5.8S, and 28S rRNA molecules.
RNA Polymerase II transcribes pre-mRNA (which becomes mRNA), as well as several types of non-coding RNAs including snRNAs and miRNAs. This is the most heavily regulated polymerase.
RNA Polymerase III transcribes tRNA, the small 5S rRNA, and several other small RNAs.
An important clinical note: in prokaryotes, all RNA types are made by a single polymerase — this is why antibiotics like rifamycin, which target the bacterial RNA polymerase, can be highly specific and useful therapeutically.
6. The Biochemistry of RNA Synthesis
The overall reaction is:
(NMP)ₙ + NTP → (NMP)ₙ₊₁ + PPᵢ
Each new nucleotide is added to the 3’ end of the growing RNA chain. Specifically, the 3’-OH group of the last nucleotide in the chain acts as a nucleophile and attacks the α-phosphate of the incoming NTP, releasing pyrophosphate (PPᵢ). The subsequent hydrolysis of PPᵢ by pyrophosphatase drives the reaction forward, making it essentially irreversible.
Two Mg²⁺ ions in the active site are essential to the mechanism. One Mg²⁺ ion promotes the deprotonation of the 3’-OH group, making it a better nucleophile. The second Mg²⁺ ion binds to the incoming NTP and facilitates the departure of the PPᵢ group.
Several key features distinguish RNA synthesis from DNA synthesis:
- The template is a single strand of DNA, read in the 3’ → 5’ direction. The RNA product is synthesized in the 5’ → 3’ direction.
- The substrate is NTPs (ribonucleoside triphosphates), not dNTPs.
- No primer is needed — unlike DNA polymerase, RNA polymerase can initiate synthesis de novo (from scratch). This is a major mechanistic difference.
- The product is a single-stranded RNA molecule (ssRNA).
- Base pairing rules between DNA template and RNA transcript: A pairs with U, U pairs with A, G pairs with C, C pairs with G.
- The RNA transcript is complementary to the template (antisense) strand and identical in sequence to the coding (sense/non-template) strand, except that T is replaced by U and the sugar is ribose rather than deoxyribose.
- RNA polymerase lacks proofreading (3’→5’ exonuclease) activity. This means errors in transcription cannot be corrected by the polymerase itself. The error rate is consequently higher than in DNA replication, but this is tolerated because many copies of each mRNA are made and individual errors have a diluted effect.
7. The Three Stages of Transcription
Transcription proceeds through three distinct phases: initiation, elongation, and termination.
7.1 Initiation
Initiation is the most tightly regulated step. It requires the polymerase to find the correct starting point on the genome.
Promoters are specific DNA sequences that signal to RNA polymerase where to begin transcription. The transcription start site itself is designated +1 — this is the nucleotide whose sequence corresponds to the first nucleotide incorporated into the RNA. Sequences downstream (toward the 3’ end of the coding strand) are numbered positively (+2, +3, etc.), while sequences upstream are numbered negatively (-1, -2, etc.).
In bacteria, the σ (sigma) subunit of the holoenzyme recognizes two highly conserved consensus sequences on the non-template (coding) strand:
The -10 element (Pribnow Box): located approximately 10 nucleotides upstream of +1. In E. coli, the consensus sequence is TATAAT. This is the most critical element for initial polymerase binding and is recognized directly by the σ factor.
The -35 element: located approximately 35 nucleotides upstream of +1. The consensus sequence here is TTGACA. This region is also important for polymerase binding and promoter recognition.
There is also an upstream promoter element (UP element), an AT-rich region between positions -60 and -40, to which the α subunit of RNA polymerase binds. This further stabilizes the interaction.
The spacing between the -35 and -10 regions matters — 17 bp is optimal for efficient transcription. Spacers of 16 or 18 bp are found but are less efficient.
Mutations in these consensus sequences have predictable effects:
Up-promoter mutations make the sequences more similar to the consensus, increasing transcription rate.
Down-promoter mutations reduce similarity to consensus, decreasing transcription efficiency.
The initiation sequence of events:
- RNA polymerase (holoenzyme) binds DNA and scans until it reaches a promoter.
- Initial binding near the promoter creates a closed complex — DNA remains double-stranded.
- The polymerase then unwinds approximately 12 base pairs of DNA, creating a transcription bubble and forming the open complex, which is highly stable and ready for synthesis.
Once the open complex forms, RNA polymerase has two nucleotide-binding sites. The initiation site preferentially binds ATP or GTP (purines) with a Km of ~100 μM. This is why RNA transcripts almost always begin with a purine at the 5’ end. The elongation site can bind any of the four ribonucleotides with a Km of ~10 μM (higher affinity).
After the first ~10 nucleotides are synthesized, the σ subunit dissociates from the core enzyme. This transition stabilizes the elongation complex and prevents the polymerase from slipping backward. The σ factor is then recycled to initiate transcription at another promoter.
7.2 Elongation
During elongation, the core RNA polymerase moves along the DNA template in the 3’ → 5’ direction, synthesizing RNA in the 5’ → 3’ direction.
Inside the transcription bubble, the growing RNA strand forms a transient RNA-DNA hybrid of approximately 8 base pairs in length. As the polymerase advances, the DNA ahead is unwound and the DNA behind is rewound. The RNA is displaced from the hybrid as it grows.
When the σ subunit leaves, it is replaced by NusA protein, which modulates elongation speed and plays a role in coupling transcription to translation in bacteria.
The movement of the polymerase along the DNA creates topological stress — positive supercoils form ahead of the moving polymerase bubble and negative supercoils form behind it. Topoisomerases resolve this torsional strain: topoisomerase I (type I) and topoisomerase II (gyrase) act upstream and downstream of the transcription site respectively.
The polymerase slides along the DNA in two patterns: continuous sliding (one nucleotide at a time) or discontinuous (caterpillar) movement, where it moves in bursts.
7.3 Termination
Termination is the process by which the RNA polymerase stops transcribing and releases both the DNA template and the newly synthesized RNA. In bacteria, there are two mechanisms.
Rho-independent (intrinsic) termination does not require any additional protein factors. The key features are:
The DNA encodes a GC-rich, palindromic (inverted repeat) sequence followed by a run of 4–8 adenines on the template strand (transcribed as 4–8 uracils in the RNA).
As the polymerase transcribes through this region, it slows down because GC-rich DNA is harder to unwind. This pause allows the GC-rich sequence in the RNA to fold back on itself, forming a stem-loop (hairpin) structure. This hairpin destabilizes the RNA-DNA hybrid inside the polymerase.
The subsequent poly-U tail that follows is particularly weak — A-U base pairs are not very stable — further weakening the RNA-DNA interaction and causing the transcript to be released.
Rho-dependent termination involves the ρ (rho) factor, a hexameric protein that functions as an RNA-DNA helicase. The mechanism is less frequent but more complex:
The ρ factor binds to a site near the 5’ end of the nascent RNA at a region called the rut site (rho utilization site). Using ATP hydrolysis for energy, rho translocates along the RNA toward the 3’ end, moving faster than RNA polymerase is synthesizing RNA.
Meanwhile, RNA polymerase has paused (the NusA protein is thought to contribute to this pausing). When rho catches up to the stalled polymerase, it unwinds the RNA-DNA hybrid and causes release of the transcript.
8. Transcription in Eukaryotes — Key Differences
Eukaryotic transcription is substantially more complex than prokaryotic transcription. Key differences include:
Transcription occurs in the nucleus, while translation occurs in the cytoplasm — the two processes are spatially separated (unlike in bacteria, where they are coupled).
Three different RNA polymerases handle different RNA types (as described in section 5).
Eukaryotic genes have two major functional parts: the structural gene (transcribed into RNA) and a regulatory portion(promoters and enhancers) that controls when and how much is transcribed.
Eukaryotic promoters recognized by RNA Pol II typically contain:
A TATA box (consensus: TATAAA) located at approximately -25. This is functionally analogous to the bacterial Pribnow box.
A CAAT box (consensus: GGXCAATCT) located at approximately -75, which increases transcription efficiency.
Enhancers — regulatory sequences that can be located thousands of base pairs away from the gene (upstream or downstream) and stimulate transcription when bound by specific transcription factors.
A critical distinction: eukaryotic RNA polymerase II cannot bind the promoter directly. It requires the assembly of general transcription factors (GTFs) to recruit it and position it correctly.
8.1 The Pre-Initiation Complex (PIC)
The assembly of the pre-initiation complex at an RNA Pol II promoter follows a defined order:
-
TBP (TATA-Binding Protein) recognizes and binds the TATA box, forming the foundation of the complex. TBP has a characteristic saddle-like shape with brackets rich in phenylalanine residues whose aromatic rings mimic nitrogenous bases in size and shape, enabling specific DNA contact in the minor groove. TBP binding bends the DNA sharply.
-
TBP is part of the larger TFIID complex, which contains TBP plus multiple TBP-associated factors (TAFs). TFIID interacts with both positive and negative regulatory proteins.
-
TFIIA stabilizes the TBP-DNA interaction.
-
TFIIB binds to TBP and recruits the RNA Pol II–TFIIF complex.
-
TFIIF binds tightly to RNA Pol II and prevents it from binding nonspecific DNA sequences. It brings Pol II to the promoter.
-
TFIIE recruits TFIIH and contributes ATPase and helicase activities.
-
TFIIH is the final and critical GTF. It unwinds DNA at the promoter (helicase activity) and phosphorylates the CTD of RNA Pol II (kinase activity). This phosphorylation is what triggers the transition from initiation to elongation.
The assembled complex (TBP + TFIIA/B/D/E/F/H + Pol II at the promoter) is the pre-initiation complex (PIC). Importantly, only non-phosphorylated RNA Pol II enters the PIC.
It is worth noting that GTFs alone produce only a basal, low level of transcription. High-level, regulated transcription requires additional specific transcription factors (coactivators/activators).
8.2 Specific Transcription Factors
Beyond the GTFs, gene-specific transcription factors bind to regulatory sequences typically located 6–20 bp (response elements, usually 15–20 bp long) upstream of the transcription start site. These factors are generally dimers and interact with the PIC (and with each other) to either increase (activators) or decrease (repressors) transcription frequency. They are gene- or gene-group-specific, unlike the general factors.
Many transcription factors respond to external signals — hormones, nutrients, and second messengers — making transcription a key point at which cells respond to their environment.
DNA-binding domains of transcription factors interact with DNA primarily through the major groove of B-DNA. About 80% of known regulatory proteins use one of three structural motifs:
The helix-turn-helix (HTH) motif — first identified in prokaryotic proteins. These proteins bind as dimers to symmetrical DNA sites. Two α-helices are separated by a β-turn loop. The C-terminal helix fits into the major groove and makes base-specific contacts; the N-terminal helix stabilizes the interaction through hydrophobic interactions.
The zinc finger (Zn-finger) motif — a loop of amino acids stabilized by a central Zn²⁺ ion coordinates cysteine and/or histidine residues. The α-helical portion inserts into the major groove.
The leucine zipper (bZIP) motif — two α-helices dimerize through hydrophobic leucine interactions, with the DNA-binding helices inserting into the major groove.
In addition to their DNA-binding domains, these transcription factors also contain separate domains for protein–protein interactions, allowing them to communicate with the GTFs and the PIC.
8.3 RNA Polymerase II Phosphorylation and Elongation
The C-terminal domain (CTD) of the largest subunit of RNA Pol II contains a heptapeptide repeat sequence: (Tyr-Ser-Pro-Thr-Ser-Pro-Ser)₂₆₋₅₀. This CTD exists in two forms:
Pol IIa — hypophosphorylated form. This is the form that enters the PIC.
Pol IIo — hyperphosphorylated form. This is the active elongating form.
The transition is driven by TFIIH’s kinase activity, which phosphorylates the CTD serine residues using ATP. Once phosphorylated, the CTD serves as a docking platform for elongation factors (ELL, P-TEFb/SII, Elongin) that bind the phosphorylated CTD and increase the elongation rate by suppressing pausing.
After the synthesis of the first 60–70 nucleotides, TFIIE and then TFIIH are released.
At termination, elongation factors dissociate, the CTD is dephosphorylated by specific phosphatases associated with termination factors, and the polymerase is released. The dephosphorylated Pol IIa can then re-enter the PIC for a new round of transcription.
9. RNA Polymerase as a Therapeutic Target
Several clinically relevant compounds work by targeting RNA polymerase:
Rifamycin (Rifampicin) binds specifically to the bacterial RNA polymerase β subunit, blocking the initiation of transcription. It does not inhibit eukaryotic RNA polymerases, making it selectively toxic to bacteria. It is a cornerstone treatment for tuberculosis and meningitis.
α-Amanitin is a cyclic polypeptide from the deadly fungus Amanita phalloides (death cap mushroom). It is a potent RNA polymerase II inhibitor. It blocks the translocation of Pol II during elongation. Ingestion is lethal, as it shuts down mRNA synthesis in hepatocytes, causing liver failure.
Actinomycin D intercalates into DNA at GC-rich sequences and interferes with the stability of the PIC by disrupting DNA-transcription factor interactions, thereby blocking RNA Pol II recruitment. It has anti-tumour activity.
3’-Deoxyadenosine (Cordycepin) is a nucleotide analogue that lacks the 3’-OH group. When incorporated into RNA, it cannot form a phosphodiester bond with the next nucleotide, terminating the chain. It has antifungal activity.
10. Eukaryotic mRNA Maturation (Post-Transcriptional Processing)
In eukaryotes, RNA is not used directly as it comes off the polymerase. The initial transcript is called pre-mRNA and it must undergo three major processing steps before becoming the mature, translatable mRNA. These modifications serve to protect the RNA, regulate its export from the nucleus, and increase translation efficiency. The three stages are capping, polyadenylation, and splicing.
10.1 Capping (5’ End Processing)
Capping occurs very early — as soon as approximately 25–30 nucleotides have been synthesized, while the RNA is still being transcribed.
The 5’ end of the nascent RNA is sealed by the addition of a 7-methylguanosine (m⁷G) cap through an unusual 5’ to 5’ triphosphate bond (not the typical 3’ to 5’ phosphodiester bond found in the RNA chain itself). The process involves three enzymatic steps:
- RNA triphosphatase removes one phosphate from the 5’-triphosphate end of the RNA, converting it to a diphosphate.
- RNA guanylyltransferase transfers a GMP from GTP to this diphosphate end, creating the unique 5’–5’ triphosphate linkage.
- Guanine-N7-methyltransferase methylates the guanine at position N7, using S-adenosylmethionine (SAM) as the methyl donor. SAM is converted to S-adenosylhomocysteine (SAH) in the process.
- Additionally, 2’-O-methyltransferase can add methyl groups to the 2’-OH positions of the first and second nucleotides adjacent to the cap (creating cap 1 and cap 2 structures).
Once formed, the cap binds to the Cap Binding Complex (CBC), which anchors it to the phosphorylated CTD of RNA Pol II — elegantly linking capping to the act of transcription itself.
The functions of the 5’ cap are:
Protection of the mRNA from degradation by 5’→3’ exonucleases.
A recognition signal for ribosomes during translation initiation — the ribosome scans from the 5’ end and requires the cap to initiate.
Facilitating export of the mRNA from the nucleus.
10.2 Polyadenylation (3’ End Processing)
At the 3’ end, the pre-mRNA undergoes cleavage and addition of a poly(A) tail.
The process is triggered by a specific sequence in the pre-mRNA: AAUAAA (the polyadenylation signal), which is located upstream of the actual cleavage site.
Step 1 — Cleavage: A complex of proteins called the Cleavage and Polyadenylation Specificity Factor (CPSF) and Cleavage Stimulation Factor (CstF) recognize the AAUAAA signal and a GU-rich downstream element respectively. Together with additional cleavage factors, they cut the pre-mRNA at a point 10–30 nucleotides downstream of AAUAAA.
Step 2 — Poly(A) addition: Poly(A) polymerase (PAP) adds adenine nucleotides one at a time to the newly formed 3’-OH end. The reaction is:
RNA + nATP → RNA–(AMP)ₙ + nPPᵢ
Initially about 10 adenosines are added slowly. Then Poly(A) Binding Proteins (PABPs) bind to the growing tail, stimulating PAP to processively add more adenosines. The final tail length is 50–250 adenine residues in mammals.
The functions of the poly(A) tail:
Stability — protects the mRNA 3’ end from degradation by 3’→5’ exonucleases, extending its half-life in the cytoplasm.
Translation efficiency — promotes efficient translation initiation by helping recruit ribosomal components.
Nuclear export — facilitates transport of the mature mRNA from nucleus to cytoplasm (during which the tail is slightly shortened).
mRNA half-life correlates with poly(A) tail length. For example, the cFOS mRNA (encoding a proto-oncogene transcription factor) has a half-life of only 10–30 minutes and a short poly(A) tail, enabling rapid response and rapid shutdown. Globin mRNA (hemoglobin) has a half-life of ~24 hours and a long poly(A) tail, appropriate for a protein needed in large, sustained quantities.
In eukaryotes, RNA degradation proceeds in the 3’ → 5’ direction via ribonucleases, which remove the poly(A) tail first and then the 5’ cap. In bacteria, polynucleotide phosphorylases act on the 5’ end and degrade RNA from the 5’ end immediately — there is no protective cap.
10.3 Splicing — Removal of Introns
This is perhaps the most complex post-transcriptional processing step and the one most unique to eukaryotes.
Introns are non-coding RNA fragments that interrupt the coding regions of genes. Human genes have on average 8 introns per gene. Introns can constitute up to 90% of the primary transcript. They are almost non-existent in prokaryotes and simple eukaryotes like yeasts.
Exons are the coding regions — the sequences that will appear in the mature mRNA and ultimately be translated.
The significance of introns is enormous. They increase the energy cost of replication and transcription but provide enormous genetic flexibility through alternative splicing — different combinations of exons can be joined to generate different mRNA molecules and therefore different protein isoforms from a single gene. This is why humans can produce approximately 100,000 different proteins from only ~30,000 genes. Introns can also give rise to miRtrons — intron-derived microRNAs with gene regulatory functions.
There are four categories of introns:
Group I introns — self-splicing, requiring no ATP and no protein factors. They use a transesterification reaction in which the 3’-OH of an external guanine nucleoside (guanosine, GMP, GDP, or GTP) acts as the nucleophile, attacking the 5’ splice site phosphodiester bond. The 3’-OH of the freed exon 1 then attacks the 3’ splice site, joining the two exons and releasing the linearized intron, which circularizes.
Group II introns — also self-splicing (no ATP, no protein). The mechanism uses an internal adenosine residue within the intron itself. Its 2’-OH acts as the nucleophile, attacking the 5’ splice site. This creates a characteristic lariat (loop) structure at the branch point, in which the adenosine has three phosphodiester bonds (2’, 3’, and 5’). The 3’-OH of the freed exon then attacks the 3’ splice site, joining exons and releasing the intron lariat. This mechanism is evolutionarily significant — it is essentially identical to the mechanism used by the spliceosome for nuclear pre-mRNA splicing.
Spliceosomal introns — the most common type in eukaryotic nuclear pre-mRNA. These require the spliceosome, a large ribonucleoprotein complex, and ATP hydrolysis. They use the same lariat mechanism as Group II introns, which suggests that the spliceosome machinery may have evolved from Group II introns.
tRNA introns — removed by a dedicated endonuclease, requiring ATP.
The Spliceosome in Detail
The spliceosome is composed of five small nuclear ribonucleoproteins: U1, U2, U4, U5, and U6 snRNPs (pronounced “snurps”). Each snRNP consists of a U-rich snRNA (100–200 nucleotides) and approximately 8 associated proteins.
The spliceosome recognizes specific consensus sequences at the intron boundaries:
The 5’ splice site begins with GU (corresponding to GT in DNA).
The 3’ splice site ends with AG (corresponding to AG in DNA).
An internal branch point sequence contains the key adenosine that forms the lariat. The consensus around this branch point is CCCCUUUUUUCAGACG (pyrimidine-rich).
The splicing mechanism in spliceosomal introns mirrors Group II self-splicing:
- The 2’-OH of the branch point adenosine attacks the 5’ splice site phosphodiester bond (nucleophilic attack). This creates the lariat intermediate and frees the 3’-OH of exon 1.
- The freed 3’-OH of exon 1 attacks the 3’ splice site, joining the two exons via a new phosphodiester bond and releasing the intron lariat.
- The intron lariat is then degraded.
The energy cost of spliceosomal splicing is high. While initial commitment complex formation is ATP-independent, all subsequent steps — snRNP recruitment, active spliceosome assembly, and spliceosome disassembly (by helicases) after splicing — require ATP hydrolysis.
Alternative Splicing
Splicing can be constitutive (all exons joined in a fixed order, standard splicing) or regulated (alternative splicing).
In regulated alternative splicing, specific exons are included or skipped depending on cellular signals, developmental stage, or tissue type. A classic example: the troponin T gene can generate two different mRNAs — one in which exon 3 is retained and one in which exon 4 is included — producing different troponin T isoforms in different muscle types.
Introns also have functional significance in disease. A single G→A point mutation in an intron of the hemoglobin gene creates a cryptic splice site that is recognized by the spliceosome. The resulting abnormal splicing extends the downstream exon, producing pathological hemoglobin — this is one molecular basis of beta-thalassemia.
11. Gene Expression Regulation
11.1 The Operon Model
Genes encoding enzymes of the same metabolic pathway are often clustered together on the chromosome in an operon— a unit of coordinated transcription. This is particularly prominent in prokaryotes but also occurs in eukaryotes.
An operon has a structural gene cluster (the actual coding sequences), controlled by an adjacent operator sequence (a regulatory DNA sequence). Regulatory proteins bind the operator to control transcription of the whole unit. Mutations in a single regulatory gene can therefore affect all proteins encoded by the structural genes simultaneously.
Induction: increased transcription of genes in response to a small molecule (inducer). For example, the lac operon in E. coli — in the presence of lactose (the inducer), the lac repressor is inactivated (by binding to allolactose), and transcription of the lactose metabolism genes proceeds.
Repression: decreased transcription in response to a metabolite (repressor). A repressor protein binds the operator and physically blocks RNA polymerase from transcribing the structural genes.
11.2 Chromatin Remodelling and Epigenetic Regulation (Eukaryotes)
In eukaryotes, transcription is also controlled at the level of chromatin structure — the physical accessibility of DNA.
Euchromatin is loosely packed chromatin where DNA is accessible to transcription machinery — it is transcriptionally active.
Heterochromatin is densely packed chromatin where DNA is inaccessible — it is transcriptionally inactive.
Chromatin consists of DNA wrapped around histone octamers (two copies each of H2A, H2B, H3, and H4) to form nucleosomes. The lysine-rich N-terminal “tails” of histones protrude from the nucleosome and are subject to multiple covalent modifications.
Histone modifications include:
Acetylation — added by Histone Acetyltransferases (HATs). Acetylation of lysine residues on histone tails neutralizes their positive charge, weakening the interaction between histones and the negatively charged DNA backbone. This loosens chromatin (making it more like euchromatin), exposes docking sites for general transcription factors (TAFs), and increases transcription. Conversely, Histone Deacetylases (HDACs) remove acetyl groups, restoring the positive charge, tightening chromatin, and causing gene silencing. HDAC inhibitors (such as trapoxin, trichostatin A, and depudecin) are being investigated as anti-tumour agents because they can induce apoptosis in cancer cells.
Methylation of histones — context-dependent; can activate or repress transcription depending on which residue is methylated and the degree of methylation.
Ubiquitination — attachment of ubiquitin to histones; involved in transcription regulation and DNA repair.
SUMOylation — attachment of SUMO (Small Ubiquitin-like Modifier) proteins, generally associated with transcriptional repression.
Phosphorylation — primarily on H3, associated with chromosome condensation during mitosis and also with transcription activation.
11.3 DNA Methylation
DNA methylation is an essential epigenetic mechanism involving the addition of a methyl group directly to DNA bases, most commonly:
C5-methylcytosine — the predominant form in mammals (important clinically).
N4-methylcytosine and N6-methyladenine — more common in bacteria.
In mammals, methylation occurs specifically on cytosines that are followed by guanines — these CpG dinucleotidestend to cluster in regions called CpG islands, which are often found in gene promoters.
The enzyme responsible is DNA methyltransferase (DNMT), which transfers a methyl group from S-adenosylmethionine (SAM) to cytosine, producing 5-methylcytosine (MeC) and releasing S-adenosylhomocysteine (SAH).
Methylation of CpG islands in the promoter region leads to gene silencing. This occurs through two mechanisms: the methyl groups directly inhibit binding of transcription factors, and methylated cytosines recruit methyl-CpG-binding proteins, which in turn recruit HDACs, causing chromatin compaction.
Together, histone modifications (particularly acetylation/deacetylation) and DNA methylation constitute the epigenetic code — heritable changes in gene expression that do not involve changes in DNA sequence. Gene-activating marks include: active (open) chromatin + unmethylated cytosines + acetylated histones. Gene-silencing marks include: condensed chromatin + methylated cytosines + deacetylated histones.
12. Summary of Key Comparisons
To consolidate, here is a side-by-side comparison of prokaryotic vs. eukaryotic transcription features most likely to appear in your exam:
Location: Cytosol (prokaryotes) vs. Nucleus (eukaryotes).
RNA polymerases: 1 type (prokaryotes) vs. 3 types — Pol I, II, III — each for different RNA classes (eukaryotes).
Promoter elements: -10 (Pribnow box, TATAAT) and -35 (TTGACA) (prokaryotes) vs. TATA box at -25 and CAAT box at -75 with distal enhancers (eukaryotes).
Promoter recognition: σ subunit binds directly (prokaryotes) vs. TBP/TFIID and GTFs required; Pol II does not bind promoter directly (eukaryotes).
Primer required: No (both).
Proofreading: None (both).
mRNA processing: None — mRNA is used directly (prokaryotes) vs. 5’ capping, 3’ polyadenylation, splicing all required (eukaryotes).
Coupling of transcription/translation: Simultaneous (prokaryotes) vs. Separated spatially and temporally (eukaryotes).
Termination: Rho-dependent or Rho-independent (prokaryotes) vs. CTD dephosphorylation, cleavage/polyadenylation signal-dependent (eukaryotes).