07 DNA and RNA

Introduction & The Central Dogma
Functions of Nucleotides
Structure of Nucleotides
- 3.1 Nitrogenous Bases
- 3.2 The Pentose Sugar
- 3.3 Nucleosides and Nucleotides
- 3.4 Nomenclature
Phosphodiester Bonds and Polynucleotides
Primary Structure of Nucleic Acids
DNA Secondary Structure: The Double Helix
- 6.1 Historical Background
- 6.2 Chargaff’s Rules
- 6.3 The Watson & Crick Model
- 6.4 Base Complementarity & Geometry
Structural Variations of DNA
Unusual DNA/RNA Structures
DNA Denaturation and Renaturation
Nonenzymatic Transformations of Nucleotides
DNA Methylation
RNA: Structure and Types
Nucleases: Degradation of Nucleic Acids
Other Functions of Nucleotides
Genome, Genes, and Chromosomes
Chromatin and Nucleosomes
DNA Supercoiling and Topoisomerases
SMC Proteins: Cohesins and Condensins
Mitochondrial DNA (mtDNA)
Polymerase Chain Reaction (PCR)
Conclusion

1. Introduction & The Central Dogma

The study of nucleic acids is foundational to understanding life at the molecular level. This module covers the structure and function of DNA and RNA, from their basic chemical building blocks (nucleotides) up to the complex three-dimensional organization of chromosomes.

The Central Dogma of Molecular Biology

The central dogma, first articulated by Francis Crick, describes the flow of genetic information within a biological system:

DNA  →  RNA  →  Protein
    Transcription  Translation

DNA replication: DNA is copied to produce identical DNA molecules.
Transcription: The information in a DNA sequence is copied into messenger RNA (mRNA).
Translation: The mRNA sequence is decoded by ribosomes to synthesize a protein.

An important addition to the dogma:

Reverse transcription: In some viruses (retroviruses, e.g., HIV), RNA is reverse-transcribed back into DNA using the enzyme reverse transcriptase. This is an exception to the classical unidirectional flow.

Context: In a living cell, all three processes often occur simultaneously. In prokaryotes, transcription and translation can occur at the same time (coupled), while in eukaryotes they are spatially separated (transcription in the nucleus, translation in the cytoplasm).

2. Functions of Nucleotides

Nucleotides are not only the building blocks of nucleic acids. They serve multiple essential roles in the cell:

Energy currency: ATP (adenosine triphosphate) is the primary energy carrier in metabolic reactions. The hydrolysis of its phosphoanhydride bonds releases free energy (~30.5 kJ/mol per anhydride bond).
Signal transduction: Cyclic nucleotides such as cAMP (cyclic adenosine monophosphate) and cGMP act as second messengers, relaying signals from extracellular hormones and neurotransmitters (acting via G protein-coupled receptors, GPCRs) into intracellular responses.
Enzyme cofactors and metabolic intermediates: Many coenzymes contain nucleotide components:
- NAD⁺/NADH (nicotinamide adenine dinucleotide)
- FAD/FADH₂ (flavin adenine dinucleotide)
- Coenzyme A (contains a 3’-phosphoadenosine diphosphate moiety)
Building blocks of nucleic acids: DNA and RNA are both linear polymers of nucleotides.

3. Structure of Nucleotides

Every nucleotide is composed of three characteristic components:

Component	Description
Nitrogenous base	A purine or pyrimidine ring system
Pentose sugar	A five-carbon sugar (ribose in RNA; deoxyribose in DNA)
Phosphate group	One or more phosphate groups linked to the 5’ carbon of the sugar

Terminology clarification: A nucleotide without the phosphate group is called a nucleoside (base + sugar only).

3.1 Nitrogenous Bases

Nitrogenous bases are heterocyclic aromatic compounds. They are:

Planar (aromatic ring system)
Hydrophobic (contributing to base-stacking interactions in double-stranded nucleic acids)
Basic (contain nitrogen atoms that can accept protons)

There are two structural families:

Purines (bicyclic: pyrimidine ring fused to an imidazole ring)

Adenine (A)
Guanine (G)

Pyrimidines (monocyclic: single six-membered ring)

Cytosine (C) — found in both DNA and RNA
Thymine (T) — found only in DNA
Uracil (U) — found only in RNA (replaces thymine)

Why does DNA use thymine instead of uracil? Cytosine spontaneously deaminates to uracil at a measurable rate. If DNA contained uracil, this deamination would be mutagenic and difficult to correct (since uracil would be indistinguishable from a “normal” base). Instead, DNA uses thymine (5-methyluracil), so any uracil found in DNA is immediately recognized as a deaminated cytosine and repaired by specialized enzymes.

Minor (modified) bases

In addition to the five major bases, nucleic acids—especially tRNA and rRNA—contain modified bases such as:

5-Methylcytidine and N⁶-Methyladenosine (important in epigenetics and RNA modification)
Inosine, Pseudouridine (Ψ), 7-Methylguanosine, 4-Thiouridine
5-Hydroxymethylcytidine (found in certain bacteriophage DNA)

3.2 The Pentose Sugar

The sugar in nucleic acids is always present in its β-D-furanose (closed five-membered ring) form.

RNA contains β-D-ribose (has a hydroxyl group –OH at the 2’ carbon)
DNA contains β-D-2’-deoxyribose (has only –H at the 2’ carbon; lacks the 2’-OH)

Functional importance of the 2’-OH: The 2’-OH in RNA makes it susceptible to alkaline hydrolysis (because it can attack the adjacent phosphodiester bond to form a 2’,3’-cyclic monophosphate intermediate). DNA, lacking this group, is more chemically stable — an advantage for a molecule storing genetic information long-term.

Sugar Puckering (Conformation)

The furanose ring is not planar; four out of five atoms are approximately coplanar, and one carbon is displaced above or below this plane. Two major conformations:

C-2’ endo: C-2’ is on the same side as C-5’ (common in B-form DNA)
C-3’ endo: C-3’ is on the same side as C-5’ (common in A-form DNA and RNA)

3.3 Nucleosides and Nucleotides

Nucleoside: Base + Sugar, connected by an N-β-glycosidic bond between the anomeric carbon (C-1’) of the sugar and nitrogen N-9 of purines or N-1 of pyrimidines.
- This bond does not form spontaneously due to thermodynamic constraints; it requires enzymatic catalysis in vivo.
Nucleotide: Nucleoside + Phosphate group, connected by a phosphoester bond to the 5’-OH of the sugar.
- A nucleotide with one phosphate = nucleoside monophosphate (e.g., AMP)
- Two phosphates = nucleoside diphosphate (e.g., ADP)
- Three phosphates = nucleoside triphosphate (e.g., ATP)

Positions of Phosphate Groups

The phosphate can be attached at the 2’, 3’, or 5’ carbon of the sugar:

Adenosine 5’-monophosphate (AMP): phosphate at C-5’
Adenosine 2’-monophosphate: phosphate at C-2’
Adenosine 3’-monophosphate: phosphate at C-3’
Adenosine 2’,3’-cyclic monophosphate: phosphate bridging both 2’ and 3’ oxygens

3.4 Nomenclature

Ribonucleotides (RNA building blocks)

Base	Nucleoside	Nucleotide	Symbol
Adenine	Adenosine	Adenylate (AMP)	A
Guanine	Guanosine	Guanylate (GMP)	G
Uracil	Uridine	Uridylate (UMP)	U
Cytosine	Cytidine	Cytidylate (CMP)	C

Deoxyribonucleotides (DNA building blocks)

Base	Nucleoside	Nucleotide	Symbol
Adenine	Deoxyadenosine	Deoxyadenylate (dAMP)	dA
Guanine	Deoxyguanosine	Deoxyguanylate (dGMP)	dG
Thymine	Deoxythymidine	Deoxythymidylate (dTMP)	dT
Cytosine	Deoxycytidine	Deoxycytidylate (dCMP)	dC

4. Phosphodiester Bonds and Polynucleotides

Successive nucleotides in a nucleic acid chain are joined by 3’,5’-phosphodiester bonds: a phosphate group bridges the 3’-OH of one nucleotide to the 5’ carbon of the next.

    5' end
    |
    Phosphate
    |
    Sugar (C3'–OH)
    |
    Phosphodiester bond
    |
    Phosphate
    |
    Sugar (C3'–OH)
    |
    ...
    |
    3' end (free –OH)

Key properties:

The backbone of each strand (alternating sugar–phosphate units) runs in a specific direction, giving the strand polarity: 5’ → 3’.
Chains are always written from 5’ end (free phosphate) to 3’ end (free hydroxyl).
The phosphate groups are negatively charged at physiological pH (pKa ~1), making DNA and RNA polyanions. This is why they bind positively charged proteins (like histones) so readily.

RNA hydrolysis in alkaline conditions: Because RNA has a 2’-OH, in basic solution the 2’-oxygen attacks the adjacent phosphorus, forming a 2’,3’-cyclic monophosphate intermediate, which then opens to give a mixture of 2’- and 3’-monophosphates. This is why RNA is labile in alkali while DNA is stable — DNA lacks the 2’-OH needed to initiate this reaction.

5. Primary Structure of Nucleic Acids

The primary structure of a nucleic acid is defined as:

The covalent backbone (sugar–phosphate chain)
The sequence of nitrogenous bases along this backbone

Higher levels of structure:

Secondary structure: Regular, stable structures formed by base-pairing (e.g., the DNA double helix, RNA hairpins)
Tertiary structure: Complex three-dimensional folding of large molecules (e.g., chromosomal looping, tRNA L-shaped structure)

6. DNA Secondary Structure: The Double Helix

6.1 Historical Background

Year	Scientist(s)	Contribution
1869	Friedrich Miescher	Isolated DNA (“nucleolin”) from white blood cells
1940	Avery, MacLeod, McCarty	Demonstrated DNA is the genetic material (transformation experiment with S. pneumoniae)
1940	Erwin Chargaff	Established Chargaff’s rules of base composition
1952	Hershey & Chase	Confirmed DNA (not protein) carries genetic information using radiolabeled bacteriophages
1950–53	Rosalind Franklin & Maurice Wilkins	Produced X-ray diffraction patterns of DNA fibers, revealing its helical structure
1953	James Watson & Francis Crick	Proposed the double-helix model of DNA structure

Franklin’s X-ray data were crucial: the cross-shaped pattern of spots indicated a helical structure, and the heavy bands at the periphery were due to the regularly spaced, stacked bases. The pattern revealed two periodicities: 3.4 Å (distance between adjacent base pairs) and 34 Å (distance for one complete helical turn).

6.2 Chargaff’s Rules

Analysis of DNA base composition from multiple organisms led to four rules:

A = T and G = C (molar equivalence of complementary bases)
Therefore, Purines (A+G) = Pyrimidines (T+C)
The base composition varies between species (species-specific)
The base composition is constant within a species, regardless of tissue type, age, nutritional state, or environment

These rules implied that A pairs specifically with T and G pairs specifically with C — the basis for the antiparallel complementary strands of the double helix.

6.3 The Watson & Crick Model

The double helix model satisfies:

Thermodynamic requirements: Hydrophilic sugar-phosphate backbone faces outward (toward water); hydrophobic bases are inside, protected from water and stabilized by base-stacking interactions (van der Waals forces between aromatic rings) and hydrogen bonds between complementary bases.
Chargaff’s rules: A pairs with T (2 hydrogen bonds); G pairs with C (3 hydrogen bonds)
X-ray data: Correct dimensions and periodicity

Key structural features of B-form DNA (the physiologically relevant form):

Right-handed double helix
Two antiparallel strands: one runs 5’→3’, the other 3’→5’
Base pairs are nearly perpendicular to the helix axis (tilted ~6°)
3.4 Å rise per base pair; 10.5 base pairs per helical turn (36 Å or 3.6 nm per turn) in solution
Helix diameter: ~20 Å (2 nm)
Features a major groove (wider, where most DNA-binding proteins interact) and a minor groove (narrower)
The two grooves arise from the asymmetric positioning of the base pairs relative to the backbone

6.4 Base Complementarity & Geometry

Watson-Crick base pairing:

A–T: 2 hydrogen bonds
G–C: 3 hydrogen bonds (therefore, G–C pairs are stronger)

A crucial geometric feature: A=T and G≡C base pairs have the same overall geometry (same C-1’–C-1’ distance of ~10.85 Å). This means any sequence of base pairs can be accommodated in the helix without distorting the backbone — the double helix can encode unlimited amounts of information.

Anti vs. Syn conformation: The base can rotate about the N-glycosidic bond relative to the sugar. In the anticonformation (most common in B-DNA), the base projects away from the sugar. In the syn conformation, the base projects over the sugar. Z-DNA has alternating syn/anti conformations in its purine/pyrimidine residues.

7. Structural Variations of DNA

Under different conditions, DNA can adopt three distinct helical forms:

Feature	A-form	B-form	Z-form
Helical sense	Right-handed	Right-handed	Left-handed
Diameter	~26 Å	~20 Å	~18 Å
Base pairs/turn	11	10.5	12
Rise/base pair	2.6 Å	3.4 Å	3.7 Å
Base tilt	20°	6°	7°
Sugar pucker	C-3’ endo	C-2’ endo	C-2’ endo (pyr); C-3’ endo (pur)
Glycosyl bond	Anti	Anti	Anti (pyr); Syn (pur)
Major groove	Narrow, deep	Wide, accessible	Barely apparent
Minor groove	Wide, shallow	Narrow	Narrow and deep

A-form DNA

Favored in dehydrated (low water) conditions and in RNA–DNA hybrids
Also the most common form observed in DNA crystals
Base pairs are tilted ~20° from perpendicular to the helix axis
Whether A-form occurs significantly in living cells is uncertain

B-form DNA

The physiologically dominant form under normal cellular conditions
The reference structure for all DNA studies
Bases are nearly perpendicular to the helix axis
Most DNA-binding proteins recognize and interact with B-form DNA

Z-form DNA

Left-handed double helix — the backbone follows a zigzag path (hence “Z”)
Favored by sequences with alternating purines and pyrimidines (e.g., 5’-CGCGCG-3’) and by 5-methylcytosine
The major groove is nearly absent; minor groove is narrow and deep
Short stretches of Z-DNA have been detected in both bacteria and eukaryotes
May play a role in regulation of gene expression and genetic recombination (precise role still under investigation)

8. Unusual DNA/RNA Structures

Beyond the standard double helix, nucleic acids can form a variety of unusual structures:

Palindromic Sequences and Inverted Repeats

A palindrome in molecular biology refers to a sequence that reads the same on both strands (in the 5’→3’ direction). Example:

5'–GAATTC–3'
3'–CTTAAG–5'

These are sites recognized by restriction enzymes.

Mirror repeats are sequences that read the same on a single strand in both directions.

Hairpins and Cruciforms

A hairpin forms when a single-stranded DNA or RNA folds back on itself, with complementary sequences pairing to form a stem, and unpaired bases forming a loop.
In double-stranded DNA, palindromic sequences can form cruciform structures (cross-shaped), where each strand forms its own hairpin. These structures are thermodynamically unstable but can form transiently during replication or transcription.

Hoogsteen Base Pairs and Triple Helices

In standard Watson-Crick pairing, hydrogen bonds involve N-1 and the amino group of purines.
In Hoogsteen pairing, the purine is in the syn conformation, using N-7 and the amino group, allowing a third strand to bind in the major groove.
This enables the formation of triple-stranded DNA (triplex DNA): one purine-rich strand base-pairs with the pyrimidine strand via Watson-Crick bonds AND with the third strand via Hoogsteen bonds.
Triplex DNA has roles in gene regulation and may be exploited for therapeutic purposes.

G-Quadruplexes

Guanosine-rich sequences (especially in telomeres) can form quadruplex structures: four guanines associate via Hoogsteen hydrogen bonds to form a G-quartet plane; multiple G-quartets stack to form a G-quadruplex.
These are stabilized by central metal cations (K⁺ or Na⁺).
Adjacent strands can be parallel or antiparallel relative to each other.
G-quadruplexes are thought to play roles in telomere maintenance, transcriptional regulation, and genome stability.

9. DNA Denaturation and Renaturation

Denaturation

DNA denaturation (also called “melting”) is the reversible disruption of:

Hydrogen bonds between complementary base pairs
Base-stacking interactions

…causing the double helix to unwind into two separate single strands. No covalent bonds are broken in this process.

Promoting factors:

High temperature
Extreme pH (very high or very low)
Denaturing chemicals (e.g., urea, formamide)

Hyperchromic Effect and UV Absorption

In the intact double helix, base-stacking interactions decrease UV absorption at 260 nm — this is the hypochromic effect.
Upon denaturation, base stacking is lost, and UV absorption increases (~40% increase) — this is the hyperchromic effect.
Denaturation can be monitored conveniently by measuring A₂₆₀ as a function of temperature.

Melting Temperature (Tₘ)

The melting temperature (Tₘ) is defined as the temperature at which 50% of the DNA is in single-stranded form.

Each DNA species has a characteristic Tₘ.
Tₘ increases with increasing G+C content because:
- G–C pairs have 3 hydrogen bonds (vs. 2 for A–T), requiring more energy to break
- G–C pairs also contribute more to base-stacking interactions
Tₘ also depends on salt concentration (higher [Na⁺] stabilizes DNA by shielding the negatively charged backbone)

Practical application: Knowing the Tₘ is critical for designing PCR experiments — the annealing temperature of primers is calculated from their G+C content.

Renaturation (Annealing)

When denatured DNA is slowly cooled below its Tₘ, complementary strands can re-anneal (renature) through base-pair formation. This specificity is the basis for:

PCR (Polymerase Chain Reaction)
Southern and Northern blotting
Nucleic acid hybridization techniques

10. Nonenzymatic Transformations of Nucleotides

DNA damage is unavoidable in living cells. While repair mechanisms exist, some damage escapes correction, leading to mutations. The main types of spontaneous and chemical DNA damage are:

1. Deamination

The spontaneous loss of an exocyclic amino group from a base:

Cytosine → Uracil (~100 events/cell/day in mammals)
- If not repaired, U pairs with A instead of G, causing a C→T transition mutation
5-Methylcytosine → Thymine (particularly problematic because the product thymine is a normal DNA base and harder to recognize as a mutation)
Adenine → Hypoxanthine (pairs with C instead of T)
Guanine → Xanthine

Evolutionary implication: The high rate of 5-methylcytosine deamination to thymine explains why CpG dinucleotides are underrepresented in mammalian genomes; over evolutionary time, methylated CpGs have been converted to TpG dinucleotides.

2. Depurination

Spontaneous hydrolysis of the N-β-glycosidic bond between a purine base and the deoxyribose:

Leaves an abasic (AP = apurinic/apyrimidinic) site in the DNA
~10,000 purines lost per mammalian cell per day
AP sites block replication and transcription unless repaired

3. Thymine Dimers (UV Damage)

UV light induces covalent bonds between adjacent pyrimidines on the same strand:

Cyclobutane thymine dimers: A four-membered ring forms between C-5 and C-6 of adjacent thymines (most common)
6-4 Photoproducts: A bond forms between C-6 of one thymine and C-4 of the next

Both lesions:

Create kinks or bends in the DNA double helix
Block DNA replication and transcription
Are repaired by nucleotide excision repair (NER)

UV and ionizing radiation account for ~10% of all DNA damage from environmental agents.

4. Chemical Mutagens

Agent	Effect
Nitrous acid (HNO₂)	Promotes deamination of C→U, A→hypoxanthine
Bisulfite	Also promotes deamination; used as a food preservative
Dimethylsulfate (alkylating agent)	Methylates G at O6 position → O6-methylguanine cannot pair with C
Nitrosamines	Precursors of nitrous acid; present in processed meats

5. Oxidative Damage

Reactive Oxygen Species (ROS) — including superoxide anion (O₂•⁻), hydrogen peroxide (H₂O₂), and hydroxyl radical (OH•) — are generated by:

Normal aerobic metabolism (especially mitochondria)
Ionizing radiation
Environmental toxins

ROS cause:

Oxidation of bases (e.g., 8-oxoguanine, which mispairs with adenine)
Sugar oxidation leading to strand breaks

Cellular defenses:

Catalase: converts H₂O₂ → H₂O + O₂
Superoxide dismutase (SOD): converts O₂•⁻ → H₂O₂
Glutathione system: eliminates peroxides

Despite these defenses, a fraction of ROS escapes and causes cumulative DNA damage — contributing to aging and cancer.

11. DNA Methylation

Certain bases in DNA are enzymatically methylated after replication. In eukaryotes:

Cytosine and Adenine are most commonly methylated
~5% of all cytosine residues are methylated to 5-methylcytosine in eukaryotic cells
Methylation is concentrated at CpG dinucleotides (CpG islands), particularly in gene promoter regions

Key facts:

All known DNA methyltransferases use S-adenosylmethionine (SAM) as the methyl group donor
Different cell types show different methylation patterns (tissue-specific methylation profiles = methylome)
Methylation of promoter regions generally represses transcription (gene silencing)
Methylation patterns can be heritable through cell division without altering the DNA sequence — this is epigenetic regulation

Clinical relevance: Aberrant DNA methylation is a hallmark of cancer. Hypermethylation of tumor suppressor gene promoters silences them; global hypomethylation leads to genomic instability. Analysis of DNA methylation (methylome analysis) is a powerful tool in cancer diagnostics and epigenetics research.

12. RNA: Structure and Types

Types of RNA

Type	Full Name	Primary Function
mRNA	Messenger RNA	Template for protein synthesis (carries the code from DNA to ribosome)
tRNA	Transfer RNA	Adapter molecules that bring amino acids to the ribosome; decode mRNA codons
rRNA	Ribosomal RNA	Structural and catalytic component of ribosomes
Ribozymes	Catalytic RNA	RNA molecules with enzymatic activity (e.g., self-splicing introns, RNase P)

RNA vs. DNA: Key Structural Differences

Feature	DNA	RNA
Sugar	2’-Deoxyribose	Ribose
Bases	A, T, G, C	A, U, G, C
Strands	Usually double-stranded	Usually single-stranded
Stability	More stable (no 2’-OH)	Less stable (susceptible to alkaline hydrolysis)
Helix form	B-form (usually)	A-form (in double-stranded regions)

Secondary Structure of RNA

Although RNA is single-stranded, it extensively folds back on itself to form complex secondary structures through intramolecular base-pairing. This folding is driven by the tendency to maximize hydrogen bonding and hydrophobic base-stacking interactions.

Common RNA secondary structure elements:

Hairpin loops: A stem (base-paired region) and a loop (unpaired region)
Internal loops: Bulges within a double-stranded region where bases are unpaired on one or both sides
Pseudoknots and other complex tertiary structures

Double-stranded RNA regions adopt an A-form right-handed helix (never B-form; Z-form RNA has been produced in the laboratory under extreme conditions but is not physiologically relevant).

Unconventional Base Pairs in RNA

RNA can form base pairs beyond the standard Watson-Crick pairs:

G–U wobble pairs: Particularly important in tRNA anticodon–codon interactions
Hoogsteen pairs and reverse Hoogsteen pairs
Interactions involving three bases simultaneously (base triples)
Interactions involving modified bases (e.g., 7-methylguanosine, inosine)

These non-canonical interactions greatly expand RNA’s ability to fold into complex three-dimensional structures and perform catalytic functions.

tRNA Structure

tRNA is a particularly well-characterized RNA with an elaborate secondary structure:

Approximately 73–93 nucleotides in length
Cloverleaf secondary structure with four stem-loop (arm) regions:
- Acceptor arm (AA stem): 5’ and 3’ ends of the tRNA; amino acid is attached to the 3’ end (sequence …CCA-3’)
- D arm (D loop): Contains the unusual base dihydrouridine (D); involved in ribosome interaction
- Anticodon arm: Contains the anticodon triplet (recognizes the mRNA codon); the wobble position is at the 5’ end of the anticodon
- TΨC arm: Contains the sequence thymine-pseudouridine-cytosine (ribothymidine + pseudouridine Ψ); involved in ribosome interaction
- Variable arm: Present between TΨC and anticodon arms; variable in size (absent in some tRNAs)

The three-dimensional structure of tRNA is an L-shaped fold (tertiary structure), where the acceptor end and anticodon are at opposite ends of the L.

Pseudouridine (Ψ): This is an unusual isomer of uridine in which uracil is attached to ribose through C-5 (instead of the normal N-1). This modified nucleoside is important for tRNA stability and function.

13. Nucleases: Degradation of Nucleic Acids

Nucleases are enzymes that catalyze the hydrolysis of phosphodiester bonds in nucleic acids.

Classification

By substrate:

Deoxyribonucleases (DNases): degrade DNA
Ribonucleases (RNases): degrade RNA
Pancreatic RNase A specifically cleaves RNA at phosphodiester bonds 3’ to pyrimidine residues, producing 2’,3’-cyclic monophosphate intermediates

By position of cleavage:

Endonucleases: cleave internal phosphodiester bonds, producing fragments
Exonucleases: cleave from one end of the molecule (either 5’→3’ or 3’→5’ direction)

Restriction Endonucleases (Restriction Enzymes)

These are bacterial enzymes (part of the restriction-modification system) that:

Recognize specific palindromic sequences (typically 4–8 bp recognition sites)
Cut both strands of the double-stranded DNA at or near the recognition site

Two types of cuts:

Sticky ends (cohesive ends): Staggered cuts leave short single-stranded overhangs (e.g., EcoRI cuts 5’-G↓AATTC-3’, leaving 5’-AATT overhangs)
Blunt ends: Even cuts leave no overhangs (e.g., SmaI cuts 5’-CCC↓GGG-3’)

Molecular biology applications: Restriction enzymes are indispensable tools for molecular cloning. Sticky ends from compatible restriction enzymes can be ligated together by DNA ligase to create recombinant DNA molecules — the basis of genetic engineering.

14. Other Functions of Nucleotides

Energy Storage and Transfer

The nucleoside triphosphates, especially ATP, carry chemical energy:

Hydrolysis of a phosphoester bond (from nucleoside monophosphate): releases ~14 kJ/mol
Hydrolysis of a phosphoanhydride bond (the bonds between phosphate groups in ADP/ATP): releases ~30.5 kJ/mol per bond

The two bonds between the three phosphate groups in ATP are phosphoanhydride bonds (high-energy bonds). This energy is used to drive thermodynamically unfavorable reactions.

Coenzymes Containing Nucleotide Components

Many metabolic coenzymes contain an adenosine unit:

NAD⁺/NADH (contains adenosine + nicotinamide; involved in redox reactions in catabolism)
FAD/FADH₂ (contains riboflavin + adenosine; used in the citric acid cycle and electron transport chain)
Coenzyme A (CoA-SH) (contains adenosine + pantothenic acid + mercaptoethylamine; carries acyl groups)

Second Messengers

cAMP (cyclic adenosine 3’,5’-monophosphate): Synthesized from ATP by adenylyl cyclase (activated by GPCRs). Acts as a second messenger in countless hormonal signaling pathways. Activates protein kinase A (PKA).
cGMP (cyclic guanosine 3’,5’-monophosphate): Similar role; activated by nitric oxide signaling and natriuretic peptides.

Neurotransmitters and Platelet Signaling

Nucleotides also act extracellularly:

ATP binds to P2X receptors (ligand-gated ion channels) in postsynaptic membranes → involved in taste sensation, inflammation, and smooth muscle contraction
ADP binds to P2Y receptors on platelets → promotes platelet aggregation and blood clotting
- Clinically relevant: Clopidogrel (Plavix) is an antiplatelet drug that irreversibly blocks P2Y₁₂ ADP receptors

Alarmone: ppGpp

Guanosine 5’-diphosphate, 3’-diphosphate (ppGpp), also called guanosine tetraphosphate, is an alarmone produced in bacteria under conditions of nutritional stress (e.g., amino acid starvation). It triggers the stringent response, globally reprogramming bacterial gene expression to shut down growth-related processes.

15. Genome, Genes, and Chromosomes

The Human Genome

The haploid human genome contains approximately 3 billion (3 × 10⁹) base pairs of DNA
Distributed across 23 chromosomes (3,054,815,472 bp with X; 2,963,015,935 bp with Y)
Individual chromosome size: ~50 million to 300 million bp
Humans are diploid (46 chromosomes in somatic cells): 22 pairs of autosomes + 1 pair of sex chromosomes (XX or XY)
Total length of DNA in a human cell: ~2 meters
Total DNA in the human body (~10¹⁴ cells): ~2 × 10¹¹ km (compare: Earth-Sun distance = 1.5 × 10⁸ km)

Visualization: Riccardo Sabatini printed Craig Venter’s genome in 175 volumes totaling 262,000 pages.

The Term “Genome”

The genome refers to the complete nucleotide sequence of an organism, including both coding and non-codingsequences.

Prokaryotic vs. Eukaryotic Genomes

Feature	Prokaryotic Genome	Eukaryotic Genome
Size	Small	Large
Organization	Compact, circular	Linear chromosomes
Nucleus	Absent (cytoplasmic)	Present (membrane-bound)
Extra elements	Plasmids	Telomeres, centromeres
Gene structure	Mostly uninterrupted	Many interrupted (introns/exons)
Repetitive sequences	Few	Many
Transcription & Translation	Coupled (simultaneous)	Separated in space and time

Composition of the Human Genome

Component	Approximate %
Protein-coding genes (exons)	~1.5%
Introns	~26%
LINEs (Long Interspersed Nuclear Elements)	~20%
SINEs (Short Interspersed Nuclear Elements)	~13%
LTR retrotransposons	~8%
Miscellaneous heterochromatin	~8%
Segmental duplications	~5%
Simple sequence repeats	~3%
DNA transposons	~3%
Miscellaneous unique sequences	~12%

Non-coding regions include:

Introns (non-coding segments within genes)
Regulatory elements (promoters, enhancers, silencers)
Non-coding RNAs (rRNA, tRNA, miRNA, lncRNA, etc.)

Gene Definition

A gene is defined as a portion of DNA that encodes the primary sequence of a final gene product, which may be:

A polypeptide (protein-coding gene)
An RNA with structural or catalytic function (rRNA, tRNA, ribozyme genes)

Introns and Exons (Eukaryotic Gene Structure)

Many eukaryotic genes (but few prokaryotic genes) are interrupted by non-coding sequences called introns (intervening sequences):

Exons: The coding segments, which are retained in the mature mRNA
Introns: Non-coding sequences that are removed from the primary transcript (pre-mRNA) during RNA splicing

Example: The hemoglobin β-subunit gene spans ~851 bp in genomic DNA but the coding sequence (exons) totals only ~126 bp.

RNA Splicing is the process of precisely removing introns and ligating exons to generate a contiguous mRNA. This is carried out by a large ribonucleoprotein complex called the spliceosome.

Chromosomal Features

A eukaryotic chromosome contains:

Unique sequences (genes) and dispersed repetitive sequences
Multiple replication origins (unlike prokaryotes with a single origin)
Centromere: Site of kinetochore assembly; where spindle fibers attach during cell division
Telomeres: Repetitive sequences (e.g., TTAGGG in humans) at chromosome ends; protect against degradation and maintain chromosomal integrity

Karyotyping

Karyotyping is the visualization and analysis of an organism’s complete set of chromosomes. Clinical applications include detection of:

Down syndrome (Trisomy 21): extra chromosome 21
Klinefelter syndrome (XXY): extra X chromosome in males
Turner syndrome (XO): only one X chromosome in females

16. Chromatin and Nucleosomes

Chromatin

In the eukaryotic nucleus, DNA is not naked; it is tightly associated with proteins to form chromatin:

~90% of chromatin proteins are histones
Also contains significant amounts of non-histone proteins and RNA
During interphase: chromatin is partially decondensed to allow transcription and replication
Post-replication: chromatin condenses into visible chromosomes

Two functional states:

Euchromatin: Lightly packed, transcriptionally active regions
Heterochromatin: Tightly packed, transcriptionally inactive regions (includes constitutive heterochromatin at centromeres and telomeres, and facultative heterochromatin in silenced gene regions)

Electron microscopy reveals chromatin as “beads on a string” — the beads are nucleosomes.

Histones

Histones are small, highly conserved basic proteins rich in the positively charged amino acids lysine (Lys) and arginine (Arg), which interact with the negatively charged phosphate backbone of DNA.

Five histone classes:

Histone	MW (Da)	Lys (%)	Arg (%)	Role
H1	21,130	29.5	11.3	Linker histone; binds connecting DNA
H2A	13,960	10.9	19.3	Core histone
H2B	13,774	16.0	16.4	Core histone
H3	15,273	19.6	13.3	Core histone
H4	11,236	10.8	13.7	Core histone

Nucleosome Structure

The nucleosome is the fundamental repeating unit of chromatin:

Histone octamer core: 2 copies each of H2A, H2B, H3, and H4
DNA wrapping: ~146 bp of DNA wrapped 1.65 times around the histone octamer
Linker DNA: ~54 bp of DNA connecting adjacent nucleosomes (bound by histone H1)
Repeat unit: ~200 bp total per nucleosome

Geometry of DNA binding to the nucleosome: The histone core does not bind randomly. The sequence of the bound DNA matters:

Regions with two or more A-T base pairs favor DNA curvature

Regions with two or more G-C base pairs resist curvature

Alternating A-T-rich regions at ~10 bp intervals (one helical turn) in the correct phase help the DNA to wrap tightly around the nucleosome

Histone Modifications (Epigenetic Marks)

The N-terminal “tails” of core histones (which protrude outside the nucleosome core) can be reversibly modified:

Modification	Residue	Effect
Acetylation	Lys	Neutralizes positive charge → loosens DNA–histone interaction → activates transcription
Methylation	Lys, Arg	Can activate or repress transcription depending on position and degree
Phosphorylation	Ser, Thr	Involved in chromosome condensation (mitosis) and DNA repair signaling
Ubiquitination	Lys	Various effects on transcription and DNA repair

These modifications constitute the “histone code”, which is read by regulatory proteins to modulate:

Chromatin structure
Gene transcription
DNA repair
Cell cycle progression

Higher-Order Chromatin Folding

Successive levels of organization compact DNA progressively:

Naked DNA: 2 nm double helix
Nucleosome array (“beads on a string”): 11 nm fiber
30-nm chromatin fiber (solenoid): ~6 nucleosomes per helical turn; requires H1
300-nm loops: Chromatin loops anchored to a protein scaffold
250-nm fiber (further coiling of loops)
Metaphase chromosome: 700–1400 nm; maximally compacted

This hierarchical compaction reduces the effective length of DNA by ~10,000-fold in a metaphase chromosome.

Role of lncRNA in Chromatin Structure

Long non-coding RNAs (lncRNAs) contribute to chromosome organization:

(a) lncRNAs interact with DNA-binding proteins to tether distant DNA segments together
(b) lncRNAs interact with specific DNA sequences and recruit gene-regulatory proteins to regions, suppressing or activating transcription nearby

The most famous example is XIST RNA, which coats one entire X chromosome in female mammals, triggering its inactivation (X-chromosome inactivation).

17. DNA Supercoiling and Topoisomerases

DNA Supercoiling

The DNA double helix itself can be coiled — this is supercoiling. It arises when the axis of the double helix is itself coiled in space (also called a superhelix).

The most common source of supercoiling in cells is underwinding of the double helix in a closed circular DNA (fewer helical turns than expected for relaxed B-form DNA)
This creates negative supercoils (the DNA is wound in the opposite direction to the right-handed helix)
Negative supercoiling is favorable for processes requiring strand separation (replication, transcription, recombination) because it partially pre-unwinds the helix

Supercoiling is also relevant in linear eukaryotic chromosomes because topological domains are maintained by protein attachment points.

States of circular DNA:

State	Description
Relaxed	Normal B-form, no supercoiling
Strained (underwound)	Fewer turns than expected
Supercoiled	Strain accommodated by coiling of the helix axis
Strand separated	At very high levels of underwinding

Topoisomerases

Topoisomerases are enzymes that change the topology of DNA (i.e., alter the number of supercoils) by transiently breaking and rejoining phosphodiester bonds.

Type I Topoisomerases

Mechanism: Transiently break ONE strand of the double helix
1. The active-site tyrosine forms a covalent 5’-phosphotyrosyl protein-DNA linkage (cleaves one strand)
2. The unbroken strand passes through the break (or the broken strand rotates)
3. The break is religated (3’-OH attacks the phosphotyrosyl linkage)
Effect: Changes the linking number by 1 per catalytic cycle
ATP not required
Relaxes both positive and negative supercoils

Type II Topoisomerases

Mechanism: Transiently break BOTH strands of the double helix
- An intact segment of duplex DNA is passed through the double-strand break
- Both strands are then religated
Effect: Changes the linking number by 2 per catalytic cycle
Requires ATP hydrolysis
Can introduce negative supercoils (DNA gyrase in bacteria), relax supercoils, or decatenate (separate interlinked circular DNA molecules after replication)

Summary of Topoisomerase Types

Family	Type	Mechanism	Domain
IA	Relaxes (−)	Strand passage	Bacteria, eukaryotes
IB	Swivelase	Strand rotation	Bacteria, eukaryotes
IIA (DNA gyrase)	Introduces (−) supercoils	Strand passage	Bacteria
IIA (Topo IIα, IIβ)	Relaxes (+ or −)	Strand passage	Eukaryotes
IIA (Topo IV)	Decatenase	Strand passage	Bacteria

Topoisomerases as Drug Targets

Topoisomerases are excellent therapeutic targets because they are essential for cell survival and proliferation:

Topoisomerase I Inhibitors (Camptothecins):

Topotecan: Used for ovarian and lung cancer
Irinotecan: Used for colorectal cancer

Topoisomerase II Inhibitors:

Etoposide (VP-16): Used for lung cancer
Doxorubicin (adriamycin): Used for breast cancer (and other cancers)

Mechanism of action: These drugs stabilize the enzyme-DNA cleavage complex (the “cleavable complex”), trapping broken DNA ends. When a replication fork or transcription machinery encounters a drug-stabilized cleavage complex, it triggers double-strand breaks, leading to cell death. Cancer cells, which divide more rapidly, are more susceptible.

Antibiotic targets: Bacterial DNA gyrase (Type IIA) and Topo IV are targets of fluoroquinolone antibiotics(e.g., ciprofloxacin), which are selectively toxic to bacteria because these enzymes differ significantly from their eukaryotic counterparts.

18. SMC Proteins: Cohesins and Condensins

SMC proteins (Structural Maintenance of Chromosomes) are a family of large ATPases essential for maintaining chromosome structure and integrity.

Structure of SMC Proteins

Each SMC protein has a characteristic architecture:

Globular N-terminal and C-terminal domains: Each contributes to an ATPase (ABC-type) active site
Two α-helical coiled-coil regions: Connect the terminal domains to a central hinge domain
SMC proteins function as dimers (forming a V-shaped structure with the hinge at the apex and the ATPase head domains at the tips)

Types of SMC Complexes

Cohesins

Function: Hold sister chromatids together after DNA replication until anaphase
Loaded onto chromosomes during S phase (replication)
Essential for proper chromosome segregation in mitosis and meiosis
Cohesin ring encircles the two sister chromatid DNA molecules

Condensins

Function: Drive chromosome condensation as cells enter mitosis
Essential for compacting the chromatin from its interphase state into short, thick mitotic chromosomes

Cell Cycle Dynamics

During the cell cycle:

S phase: DNA replication; cohesin is deposited, linking sister chromatids
G2 phase: Condensin begins chromosome condensation in preparation for mitosis
Prophase → Metaphase: Maximum condensation; chromosomes align at the metaphase plate
Anaphase: Cohesin is cleaved (by separase), allowing sister chromatids to separate to opposite poles

19. Mitochondrial DNA (mtDNA)

Structure and Properties

Mitochondria contain their own circular double-stranded DNA:

Circular (not linear like nuclear chromosomes)
Multiple copies per mitochondrion: ~100 copies in leukocytes to ~10,000 in neurons
Replication is independent of the cell cycle (continuous, not just in S phase)

Genetic Content

The human mitochondrial genome encodes:

2 ribosomal RNA (rRNA) genes
22 tRNA genes
13 protein-coding genes — all encoding subunits of the oxidative phosphorylation (OXPHOS) complexes (Complex I, III, IV, and ATP synthase)

The majority of mitochondrial proteins (~1500) are actually encoded by the nuclear genome, synthesized in the cytoplasm, and imported into mitochondria.

Maternal Inheritance

Mitochondria (and their DNA) are inherited exclusively through the mother, because sperm mitochondria are eliminated after fertilization.

Homoplasmy vs. Heteroplasmy

Homoplasmy: All mtDNA copies in a cell have the same sequence
Heteroplasmy: A cell contains two or more distinct populations of mtDNA (e.g., wild-type and mutant)

Clinical significance: Pathogenic mtDNA mutations often need to exceed a threshold level of heteroplasmy (typically ~60–90%) before causing disease. Below this threshold, sufficient wild-type mitochondria compensate.

mtDNA Mutations and Diseases

Due to:

High copy number (many replications per cell division)
Less efficient DNA repair compared to the nucleus
Proximity to ROS generated by the electron transport chain

mtDNA accumulates mutations at a higher rate than nuclear DNA.

Common mitochondrial diseases and associated mutations:

Mutation	Disease
m.3243A>G	MELAS (mitochondrial encephalomyopathy, lactic acidosis and stroke-like episodes); MIDD
m.8344A>G	MERRF (myoclonic epilepsy with ragged red fibres)
m.14459G>A	MELAS, MILS, cardiomyopathy
m.13513G>A	MELAS, MILS
m.8993T>G	NARP (neurogenic muscle weakness, ataxia, retinitis pigmentosa); MILS
m.8483_13459del	PMPS (Pearson marrow-pancreas syndrome); KSS (Kearns–Sayre syndrome)

mtDNA Release and Inflammation

Damaged or stressed mitochondria can release mtDNA into the cytosol or into the bloodstream:

Cytoplasmic mtDNA: Activates innate immune signaling pathways (e.g., NLRP3 inflammasome, cGAS-STING pathway), triggering inflammation
Circulating cell-free mtDNA (CCF-mtDNA): Released from cells passively (cell damage) or actively (via extracellular vesicles); can be detected in blood as a biomarker of mitochondrial stress, trauma, or systemic inflammation

20. Polymerase Chain Reaction (PCR)

Developed by Kary Mullis in 1983 (Nobel Prize in Chemistry, 1993), PCR is a technique to amplify a specific DNA sequence exponentially in vitro.

Requirements

Target DNA (template)
Two oligonucleotide primers flanking the region to be amplified (one complementary to each strand)
Thermostable DNA polymerase (e.g., Taq polymerase from Thermus aquaticus, which survives high temperatures)
dNTPs (all four deoxyribonucleoside triphosphates)
Buffer with Mg²⁺

PCR Cycle (Three Steps)

Denaturation (~95°C): Heat separates the two DNA strands
Annealing (~50–65°C, depends on primer Tₘ): Primers bind (hybridize) to their complementary sequences on each strand
Extension (~72°C): Taq polymerase extends the primers in the 5’→3’ direction, synthesizing new DNA

Exponential Amplification

Each cycle doubles the amount of target DNA. After 20 cycles: ~10⁶-fold amplification. After 30 cycles: ~10⁹-fold amplification.

Why does PCR exploit DNA denaturation/renaturation? The very properties of DNA base-pair complementarity and the reversibility of denaturation make PCR possible. The melting temperature concept — calculated from the base composition of the primers — determines the optimal annealing temperature.

Applications of PCR

Diagnosis of infectious diseases (e.g., PCR for SARS-CoV-2, HIV, tuberculosis)
Genetic testing and prenatal diagnosis
Forensic science (DNA fingerprinting)
Sequencing (PCR amplification precedes sequencing)
Cloning of genes
Detection of mutations (RT-PCR, qPCR, etc.)

21. Conclusion

This module has provided a comprehensive survey of nucleic acid biochemistry, from the atomic level to the chromosomal level:

Nucleotides — the monomeric building blocks — consist of a nitrogenous base, a pentose sugar, and a phosphate group. Their precise chemistry dictates all higher-order properties of DNA and RNA.
Polynucleotide chains are linked by 3’,5’-phosphodiester bonds, giving strands directionality (polarity).
DNA’s double-helical structure, governed by Watson-Crick base pairing and base-stacking, is the molecular basis of genetic inheritance. The B-form is physiologically dominant, but A- and Z-forms have important roles.
Unusual structures (hairpins, cruciforms, triple helices, G-quadruplexes) expand the functional repertoire of DNA and RNA.
DNA can be denatured and re-annealed — a property exploited in PCR, hybridization, and countless molecular biology techniques.
DNA is subject to damage (deamination, depurination, oxidation, UV damage, chemical mutagenesis). Repair pathways counteract these, but failures lead to mutations.
DNA methylation provides an epigenetic layer of gene regulation, influencing transcription without altering the base sequence.
RNA adopts complex secondary structures enabling its diverse roles as mRNA, tRNA, rRNA, ribozyme, and regulatory RNA.
Nucleases degrade nucleic acids; restriction enzymes are powerful tools in molecular biology.
Nucleotides serve multiple cellular functions beyond being nucleic acid building blocks: energy currency (ATP), signaling (cAMP, cGMP), coenzymes (NAD⁺, FAD, CoA).
The human genome (~3 billion bp) is organized in linear chromosomes, most of which is non-coding. Genes are often interrupted by introns.
Chromatin and nucleosomes package DNA into the nucleus. Histone modifications and DNA methylation together constitute the epigenome, regulating gene expression across cell types.
Supercoiling and topoisomerases control DNA topology, which is essential for replication, transcription, and chromosome segregation. Topoisomerases are key therapeutic targets.
SMC proteins (cohesins and condensins) maintain chromosome architecture through the cell cycle.
Mitochondrial DNA is a separate, maternally inherited circular genome; its mutations cause a spectrum of metabolic diseases.
PCR harnesses the principles of DNA denaturation and base-pair complementarity to amplify specific sequences — one of the most transformative tools in modern biology and medicine.

Study tip: Focus on understanding the mechanistic reasons for each structural feature — for example, why B-form DNA predominates at physiological conditions, why RNA is less stable than DNA, and why histones are basic proteins. These mechanistic insights will help you apply knowledge to novel questions in your exam.

Reference: David L. Nelson & Michael M. Cox, Lehninger Principles of Biochemistry, 7th or 8th Edition, W.H. Freeman, New York.

Quartz 5

Explorer