Table of Contents

  1. Introduction & The Central Dogma
  2. Functions of Nucleotides
  3. Structure of Nucleotides
  4. Phosphodiester Bonds and Polynucleotides
  5. Primary Structure of Nucleic Acids
  6. DNA Secondary Structure: The Double Helix
  7. Structural Variations of DNA
  8. Unusual DNA/RNA Structures
  9. DNA Denaturation and Renaturation
  10. Nonenzymatic Transformations of Nucleotides
  11. DNA Methylation
  12. RNA: Structure and Types
  13. Nucleases: Degradation of Nucleic Acids
  14. Other Functions of Nucleotides
  15. Genome, Genes, and Chromosomes
  16. Chromatin and Nucleosomes
  17. DNA Supercoiling and Topoisomerases
  18. SMC Proteins: Cohesins and Condensins
  19. Mitochondrial DNA (mtDNA)
  20. Polymerase Chain Reaction (PCR)
  21. Conclusion

1. Introduction & The Central Dogma

The study of nucleic acids is foundational to understanding life at the molecular level. This module covers the structure and function of DNA and RNA, from their basic chemical building blocks (nucleotides) up to the complex three-dimensional organization of chromosomes.

The Central Dogma of Molecular Biology

The central dogma, first articulated by Francis Crick, describes the flow of genetic information within a biological system:

DNA  →  RNA  →  Protein
    Transcription  Translation
  • DNA replication: DNA is copied to produce identical DNA molecules.
  • Transcription: The information in a DNA sequence is copied into messenger RNA (mRNA).
  • Translation: The mRNA sequence is decoded by ribosomes to synthesize a protein.

An important addition to the dogma:

  • Reverse transcription: In some viruses (retroviruses, e.g., HIV), RNA is reverse-transcribed back into DNA using the enzyme reverse transcriptase. This is an exception to the classical unidirectional flow.

Context: In a living cell, all three processes often occur simultaneously. In prokaryotes, transcription and translation can occur at the same time (coupled), while in eukaryotes they are spatially separated (transcription in the nucleus, translation in the cytoplasm).


2. Functions of Nucleotides

Nucleotides are not only the building blocks of nucleic acids. They serve multiple essential roles in the cell:

  1. Energy currency: ATP (adenosine triphosphate) is the primary energy carrier in metabolic reactions. The hydrolysis of its phosphoanhydride bonds releases free energy (~30.5 kJ/mol per anhydride bond).

  2. Signal transduction: Cyclic nucleotides such as cAMP (cyclic adenosine monophosphate) and cGMP act as second messengers, relaying signals from extracellular hormones and neurotransmitters (acting via G protein-coupled receptors, GPCRs) into intracellular responses.

  3. Enzyme cofactors and metabolic intermediates: Many coenzymes contain nucleotide components:

    • NAD⁺/NADH (nicotinamide adenine dinucleotide)
    • FAD/FADH₂ (flavin adenine dinucleotide)
    • Coenzyme A (contains a 3’-phosphoadenosine diphosphate moiety)
  4. Building blocks of nucleic acids: DNA and RNA are both linear polymers of nucleotides.


3. Structure of Nucleotides

Every nucleotide is composed of three characteristic components:

ComponentDescription
Nitrogenous baseA purine or pyrimidine ring system
Pentose sugarA five-carbon sugar (ribose in RNA; deoxyribose in DNA)
Phosphate groupOne or more phosphate groups linked to the 5’ carbon of the sugar

Terminology clarification: A nucleotide without the phosphate group is called a nucleoside (base + sugar only).

3.1 Nitrogenous Bases

Nitrogenous bases are heterocyclic aromatic compounds. They are:

  • Planar (aromatic ring system)
  • Hydrophobic (contributing to base-stacking interactions in double-stranded nucleic acids)
  • Basic (contain nitrogen atoms that can accept protons)

There are two structural families:

Purines (bicyclic: pyrimidine ring fused to an imidazole ring)

  • Adenine (A)
  • Guanine (G)

Pyrimidines (monocyclic: single six-membered ring)

  • Cytosine (C) — found in both DNA and RNA
  • Thymine (T) — found only in DNA
  • Uracil (U) — found only in RNA (replaces thymine)

Why does DNA use thymine instead of uracil? Cytosine spontaneously deaminates to uracil at a measurable rate. If DNA contained uracil, this deamination would be mutagenic and difficult to correct (since uracil would be indistinguishable from a “normal” base). Instead, DNA uses thymine (5-methyluracil), so any uracil found in DNA is immediately recognized as a deaminated cytosine and repaired by specialized enzymes.

Minor (modified) bases

In addition to the five major bases, nucleic acids—especially tRNA and rRNA—contain modified bases such as:

  • 5-Methylcytidine and N⁶-Methyladenosine (important in epigenetics and RNA modification)
  • InosinePseudouridine (Ψ)7-Methylguanosine4-Thiouridine
  • 5-Hydroxymethylcytidine (found in certain bacteriophage DNA)

3.2 The Pentose Sugar

The sugar in nucleic acids is always present in its β-D-furanose (closed five-membered ring) form.

  • RNA contains β-D-ribose (has a hydroxyl group –OH at the 2’ carbon)
  • DNA contains β-D-2’-deoxyribose (has only –H at the 2’ carbon; lacks the 2’-OH)

Functional importance of the 2’-OH: The 2’-OH in RNA makes it susceptible to alkaline hydrolysis (because it can attack the adjacent phosphodiester bond to form a 2’,3’-cyclic monophosphate intermediate). DNA, lacking this group, is more chemically stable — an advantage for a molecule storing genetic information long-term.

Sugar Puckering (Conformation)

The furanose ring is not planar; four out of five atoms are approximately coplanar, and one carbon is displaced above or below this plane. Two major conformations:

  • C-2’ endo: C-2’ is on the same side as C-5’ (common in B-form DNA)
  • C-3’ endo: C-3’ is on the same side as C-5’ (common in A-form DNA and RNA)

3.3 Nucleosides and Nucleotides

  • Nucleoside: Base + Sugar, connected by an N-β-glycosidic bond between the anomeric carbon (C-1’) of the sugar and nitrogen N-9 of purines or N-1 of pyrimidines.

    • This bond does not form spontaneously due to thermodynamic constraints; it requires enzymatic catalysis in vivo.
  • Nucleotide: Nucleoside + Phosphate group, connected by a phosphoester bond to the 5’-OH of the sugar.

    • A nucleotide with one phosphate = nucleoside monophosphate (e.g., AMP)
    • Two phosphates = nucleoside diphosphate (e.g., ADP)
    • Three phosphates = nucleoside triphosphate (e.g., ATP)

Positions of Phosphate Groups

The phosphate can be attached at the 2’, 3’, or 5’ carbon of the sugar:

  • Adenosine 5’-monophosphate (AMP): phosphate at C-5’
  • Adenosine 2’-monophosphate: phosphate at C-2’
  • Adenosine 3’-monophosphate: phosphate at C-3’
  • Adenosine 2’,3’-cyclic monophosphate: phosphate bridging both 2’ and 3’ oxygens

3.4 Nomenclature

Ribonucleotides (RNA building blocks)

BaseNucleosideNucleotideSymbol
AdenineAdenosineAdenylate (AMP)A
GuanineGuanosineGuanylate (GMP)G
UracilUridineUridylate (UMP)U
CytosineCytidineCytidylate (CMP)C

Deoxyribonucleotides (DNA building blocks)

BaseNucleosideNucleotideSymbol
AdenineDeoxyadenosineDeoxyadenylate (dAMP)dA
GuanineDeoxyguanosineDeoxyguanylate (dGMP)dG
ThymineDeoxythymidineDeoxythymidylate (dTMP)dT
CytosineDeoxycytidineDeoxycytidylate (dCMP)dC

4. Phosphodiester Bonds and Polynucleotides

Successive nucleotides in a nucleic acid chain are joined by 3’,5’-phosphodiester bonds: a phosphate group bridges the 3’-OH of one nucleotide to the 5’ carbon of the next.

    5' end
    |
    Phosphate
    |
    Sugar (C3'–OH)
    |
    Phosphodiester bond
    |
    Phosphate
    |
    Sugar (C3'–OH)
    |
    ...
    |
    3' end (free –OH)

Key properties:

  • The backbone of each strand (alternating sugar–phosphate units) runs in a specific direction, giving the strand polarity: 5’ → 3’.
  • Chains are always written from 5’ end (free phosphate) to 3’ end (free hydroxyl).
  • The phosphate groups are negatively charged at physiological pH (pKa ~1), making DNA and RNA polyanions. This is why they bind positively charged proteins (like histones) so readily.

RNA hydrolysis in alkaline conditions: Because RNA has a 2’-OH, in basic solution the 2’-oxygen attacks the adjacent phosphorus, forming a 2’,3’-cyclic monophosphate intermediate, which then opens to give a mixture of 2’- and 3’-monophosphates. This is why RNA is labile in alkali while DNA is stable — DNA lacks the 2’-OH needed to initiate this reaction.


5. Primary Structure of Nucleic Acids

The primary structure of a nucleic acid is defined as:

  • The covalent backbone (sugar–phosphate chain)
  • The sequence of nitrogenous bases along this backbone

Higher levels of structure:

  • Secondary structure: Regular, stable structures formed by base-pairing (e.g., the DNA double helix, RNA hairpins)
  • Tertiary structure: Complex three-dimensional folding of large molecules (e.g., chromosomal looping, tRNA L-shaped structure)

6. DNA Secondary Structure: The Double Helix

6.1 Historical Background

YearScientist(s)Contribution
1869Friedrich MiescherIsolated DNA (“nucleolin”) from white blood cells
1940Avery, MacLeod, McCartyDemonstrated DNA is the genetic material (transformation experiment with S. pneumoniae)
1940Erwin ChargaffEstablished Chargaff’s rules of base composition
1952Hershey & ChaseConfirmed DNA (not protein) carries genetic information using radiolabeled bacteriophages
1950–53Rosalind Franklin & Maurice WilkinsProduced X-ray diffraction patterns of DNA fibers, revealing its helical structure
1953James Watson & Francis CrickProposed the double-helix model of DNA structure

Franklin’s X-ray data were crucial: the cross-shaped pattern of spots indicated a helical structure, and the heavy bands at the periphery were due to the regularly spaced, stacked bases. The pattern revealed two periodicities: 3.4 Å (distance between adjacent base pairs) and 34 Å (distance for one complete helical turn).

6.2 Chargaff’s Rules

Analysis of DNA base composition from multiple organisms led to four rules:

  1. A = T and G = C (molar equivalence of complementary bases)
  2. Therefore, Purines (A+G) = Pyrimidines (T+C)
  3. The base composition varies between species (species-specific)
  4. The base composition is constant within a species, regardless of tissue type, age, nutritional state, or environment

These rules implied that A pairs specifically with T and G pairs specifically with C — the basis for the antiparallel complementary strands of the double helix.

6.3 The Watson & Crick Model

The double helix model satisfies:

  • Thermodynamic requirements: Hydrophilic sugar-phosphate backbone faces outward (toward water); hydrophobic bases are inside, protected from water and stabilized by base-stacking interactions (van der Waals forces between aromatic rings) and hydrogen bonds between complementary bases.
  • Chargaff’s rules: A pairs with T (2 hydrogen bonds); G pairs with C (3 hydrogen bonds)
  • X-ray data: Correct dimensions and periodicity

Key structural features of B-form DNA (the physiologically relevant form):

  • Right-handed double helix
  • Two antiparallel strands: one runs 5’→3’, the other 3’→5’
  • Base pairs are nearly perpendicular to the helix axis (tilted ~6°)
  • 3.4 Å rise per base pair; 10.5 base pairs per helical turn (36 Å or 3.6 nm per turn) in solution
  • Helix diameter: ~20 Å (2 nm)
  • Features a major groove (wider, where most DNA-binding proteins interact) and a minor groove (narrower)
  • The two grooves arise from the asymmetric positioning of the base pairs relative to the backbone

6.4 Base Complementarity & Geometry

Watson-Crick base pairing:

  • A–T: 2 hydrogen bonds
  • G–C: 3 hydrogen bonds (therefore, G–C pairs are stronger)

A crucial geometric feature: A=T and G≡C base pairs have the same overall geometry (same C-1’–C-1’ distance of ~10.85 Å). This means any sequence of base pairs can be accommodated in the helix without distorting the backbone — the double helix can encode unlimited amounts of information.

Anti vs. Syn conformation: The base can rotate about the N-glycosidic bond relative to the sugar. In the anticonformation (most common in B-DNA), the base projects away from the sugar. In the syn conformation, the base projects over the sugar. Z-DNA has alternating syn/anti conformations in its purine/pyrimidine residues.


7. Structural Variations of DNA

Under different conditions, DNA can adopt three distinct helical forms:

FeatureA-formB-formZ-form
Helical senseRight-handedRight-handedLeft-handed
Diameter~26 Å~20 Å~18 Å
Base pairs/turn1110.512
Rise/base pair2.6 Å3.4 Å3.7 Å
Base tilt20°
Sugar puckerC-3’ endoC-2’ endoC-2’ endo (pyr); C-3’ endo (pur)
Glycosyl bondAntiAntiAnti (pyr); Syn (pur)
Major grooveNarrow, deepWide, accessibleBarely apparent
Minor grooveWide, shallowNarrowNarrow and deep

A-form DNA

  • Favored in dehydrated (low water) conditions and in RNA–DNA hybrids
  • Also the most common form observed in DNA crystals
  • Base pairs are tilted ~20° from perpendicular to the helix axis
  • Whether A-form occurs significantly in living cells is uncertain

B-form DNA

  • The physiologically dominant form under normal cellular conditions
  • The reference structure for all DNA studies
  • Bases are nearly perpendicular to the helix axis
  • Most DNA-binding proteins recognize and interact with B-form DNA

Z-form DNA

  • Left-handed double helix — the backbone follows a zigzag path (hence “Z”)
  • Favored by sequences with alternating purines and pyrimidines (e.g., 5’-CGCGCG-3’) and by 5-methylcytosine
  • The major groove is nearly absent; minor groove is narrow and deep
  • Short stretches of Z-DNA have been detected in both bacteria and eukaryotes
  • May play a role in regulation of gene expression and genetic recombination (precise role still under investigation)

8. Unusual DNA/RNA Structures

Beyond the standard double helix, nucleic acids can form a variety of unusual structures:

Palindromic Sequences and Inverted Repeats

palindrome in molecular biology refers to a sequence that reads the same on both strands (in the 5’→3’ direction). Example:

5'–GAATTC–3'
3'–CTTAAG–5'

These are sites recognized by restriction enzymes.

Mirror repeats are sequences that read the same on a single strand in both directions.

Hairpins and Cruciforms

  • hairpin forms when a single-stranded DNA or RNA folds back on itself, with complementary sequences pairing to form a stem, and unpaired bases forming a loop.
  • In double-stranded DNA, palindromic sequences can form cruciform structures (cross-shaped), where each strand forms its own hairpin. These structures are thermodynamically unstable but can form transiently during replication or transcription.

Hoogsteen Base Pairs and Triple Helices

  • In standard Watson-Crick pairing, hydrogen bonds involve N-1 and the amino group of purines.
  • In Hoogsteen pairing, the purine is in the syn conformation, using N-7 and the amino group, allowing a third strand to bind in the major groove.
  • This enables the formation of triple-stranded DNA (triplex DNA): one purine-rich strand base-pairs with the pyrimidine strand via Watson-Crick bonds AND with the third strand via Hoogsteen bonds.
  • Triplex DNA has roles in gene regulation and may be exploited for therapeutic purposes.

G-Quadruplexes

  • Guanosine-rich sequences (especially in telomeres) can form quadruplex structures: four guanines associate via Hoogsteen hydrogen bonds to form a G-quartet plane; multiple G-quartets stack to form a G-quadruplex.
  • These are stabilized by central metal cations (K⁺ or Na⁺).
  • Adjacent strands can be parallel or antiparallel relative to each other.
  • G-quadruplexes are thought to play roles in telomere maintenance, transcriptional regulation, and genome stability.

9. DNA Denaturation and Renaturation

Denaturation

DNA denaturation (also called “melting”) is the reversible disruption of:

  • Hydrogen bonds between complementary base pairs
  • Base-stacking interactions

…causing the double helix to unwind into two separate single strands. No covalent bonds are broken in this process.

Promoting factors:

  • High temperature
  • Extreme pH (very high or very low)
  • Denaturing chemicals (e.g., urea, formamide)

Hyperchromic Effect and UV Absorption

  • In the intact double helix, base-stacking interactions decrease UV absorption at 260 nm — this is the hypochromic effect.
  • Upon denaturation, base stacking is lost, and UV absorption increases (~40% increase) — this is the hyperchromic effect.
  • Denaturation can be monitored conveniently by measuring A₂₆₀ as a function of temperature.

Melting Temperature (Tₘ)

The melting temperature (Tₘ) is defined as the temperature at which 50% of the DNA is in single-stranded form.

  • Each DNA species has a characteristic Tₘ.
  • Tₘ increases with increasing G+C content because:
    • G–C pairs have 3 hydrogen bonds (vs. 2 for A–T), requiring more energy to break
    • G–C pairs also contribute more to base-stacking interactions
  • Tₘ also depends on salt concentration (higher [Na⁺] stabilizes DNA by shielding the negatively charged backbone)

Practical application: Knowing the Tₘ is critical for designing PCR experiments — the annealing temperature of primers is calculated from their G+C content.

Renaturation (Annealing)

When denatured DNA is slowly cooled below its Tₘ, complementary strands can re-anneal (renature) through base-pair formation. This specificity is the basis for:

  • PCR (Polymerase Chain Reaction)
  • Southern and Northern blotting
  • Nucleic acid hybridization techniques

10. Nonenzymatic Transformations of Nucleotides

DNA damage is unavoidable in living cells. While repair mechanisms exist, some damage escapes correction, leading to mutations. The main types of spontaneous and chemical DNA damage are:

1. Deamination

The spontaneous loss of an exocyclic amino group from a base:

  • Cytosine → Uracil (~100 events/cell/day in mammals)
    • If not repaired, U pairs with A instead of G, causing a C→T transition mutation
  • 5-Methylcytosine → Thymine (particularly problematic because the product thymine is a normal DNA base and harder to recognize as a mutation)
  • Adenine → Hypoxanthine (pairs with C instead of T)
  • Guanine → Xanthine

Evolutionary implication: The high rate of 5-methylcytosine deamination to thymine explains why CpG dinucleotides are underrepresented in mammalian genomes; over evolutionary time, methylated CpGs have been converted to TpG dinucleotides.

2. Depurination

Spontaneous hydrolysis of the N-β-glycosidic bond between a purine base and the deoxyribose:

  • Leaves an abasic (AP = apurinic/apyrimidinic) site in the DNA
  • ~10,000 purines lost per mammalian cell per day
  • AP sites block replication and transcription unless repaired

3. Thymine Dimers (UV Damage)

UV light induces covalent bonds between adjacent pyrimidines on the same strand:

  • Cyclobutane thymine dimers: A four-membered ring forms between C-5 and C-6 of adjacent thymines (most common)
  • 6-4 Photoproducts: A bond forms between C-6 of one thymine and C-4 of the next

Both lesions:

  • Create kinks or bends in the DNA double helix
  • Block DNA replication and transcription
  • Are repaired by nucleotide excision repair (NER)

UV and ionizing radiation account for ~10% of all DNA damage from environmental agents.

4. Chemical Mutagens

AgentEffect
Nitrous acid (HNO₂)Promotes deamination of C→U, A→hypoxanthine
BisulfiteAlso promotes deamination; used as a food preservative
Dimethylsulfate (alkylating agent)Methylates G at O6 position → O6-methylguanine cannot pair with C
NitrosaminesPrecursors of nitrous acid; present in processed meats

5. Oxidative Damage

Reactive Oxygen Species (ROS) — including superoxide anion (O₂•⁻), hydrogen peroxide (H₂O₂), and hydroxyl radical (OH•) — are generated by:

  • Normal aerobic metabolism (especially mitochondria)
  • Ionizing radiation
  • Environmental toxins

ROS cause:

  • Oxidation of bases (e.g., 8-oxoguanine, which mispairs with adenine)
  • Sugar oxidation leading to strand breaks

Cellular defenses:

  • Catalase: converts H₂O₂ → H₂O + O₂
  • Superoxide dismutase (SOD): converts O₂•⁻ → H₂O₂
  • Glutathione system: eliminates peroxides

Despite these defenses, a fraction of ROS escapes and causes cumulative DNA damage — contributing to aging and cancer.


11. DNA Methylation

Certain bases in DNA are enzymatically methylated after replication. In eukaryotes:

  • Cytosine and Adenine are most commonly methylated
  • ~5% of all cytosine residues are methylated to 5-methylcytosine in eukaryotic cells
  • Methylation is concentrated at CpG dinucleotides (CpG islands), particularly in gene promoter regions

Key facts:

  • All known DNA methyltransferases use S-adenosylmethionine (SAM) as the methyl group donor
  • Different cell types show different methylation patterns (tissue-specific methylation profiles = methylome)
  • Methylation of promoter regions generally represses transcription (gene silencing)
  • Methylation patterns can be heritable through cell division without altering the DNA sequence — this is epigenetic regulation

Clinical relevance: Aberrant DNA methylation is a hallmark of cancer. Hypermethylation of tumor suppressor gene promoters silences them; global hypomethylation leads to genomic instability. Analysis of DNA methylation (methylome analysis) is a powerful tool in cancer diagnostics and epigenetics research.


12. RNA: Structure and Types

Types of RNA

TypeFull NamePrimary Function
mRNAMessenger RNATemplate for protein synthesis (carries the code from DNA to ribosome)
tRNATransfer RNAAdapter molecules that bring amino acids to the ribosome; decode mRNA codons
rRNARibosomal RNAStructural and catalytic component of ribosomes
RibozymesCatalytic RNARNA molecules with enzymatic activity (e.g., self-splicing introns, RNase P)

RNA vs. DNA: Key Structural Differences

FeatureDNARNA
Sugar2’-DeoxyriboseRibose
BasesA, T, G, CA, U, G, C
StrandsUsually double-strandedUsually single-stranded
StabilityMore stable (no 2’-OH)Less stable (susceptible to alkaline hydrolysis)
Helix formB-form (usually)A-form (in double-stranded regions)

Secondary Structure of RNA

Although RNA is single-stranded, it extensively folds back on itself to form complex secondary structures through intramolecular base-pairing. This folding is driven by the tendency to maximize hydrogen bonding and hydrophobic base-stacking interactions.

Common RNA secondary structure elements:

  • Hairpin loops: A stem (base-paired region) and a loop (unpaired region)
  • Internal loops: Bulges within a double-stranded region where bases are unpaired on one or both sides
  • Pseudoknots and other complex tertiary structures

Double-stranded RNA regions adopt an A-form right-handed helix (never B-form; Z-form RNA has been produced in the laboratory under extreme conditions but is not physiologically relevant).

Unconventional Base Pairs in RNA

RNA can form base pairs beyond the standard Watson-Crick pairs:

  • G–U wobble pairs: Particularly important in tRNA anticodon–codon interactions
  • Hoogsteen pairs and reverse Hoogsteen pairs
  • Interactions involving three bases simultaneously (base triples)
  • Interactions involving modified bases (e.g., 7-methylguanosine, inosine)

These non-canonical interactions greatly expand RNA’s ability to fold into complex three-dimensional structures and perform catalytic functions.

tRNA Structure

tRNA is a particularly well-characterized RNA with an elaborate secondary structure:

  • Approximately 73–93 nucleotides in length
  • Cloverleaf secondary structure with four stem-loop (arm) regions:
    • Acceptor arm (AA stem): 5’ and 3’ ends of the tRNA; amino acid is attached to the 3’ end (sequence …CCA-3’)
    • D arm (D loop): Contains the unusual base dihydrouridine (D); involved in ribosome interaction
    • Anticodon arm: Contains the anticodon triplet (recognizes the mRNA codon); the wobble position is at the 5’ end of the anticodon
    • TΨC arm: Contains the sequence thymine-pseudouridine-cytosine (ribothymidine + pseudouridine Ψ); involved in ribosome interaction
    • Variable arm: Present between TΨC and anticodon arms; variable in size (absent in some tRNAs)

The three-dimensional structure of tRNA is an L-shaped fold (tertiary structure), where the acceptor end and anticodon are at opposite ends of the L.

Pseudouridine (Ψ): This is an unusual isomer of uridine in which uracil is attached to ribose through C-5 (instead of the normal N-1). This modified nucleoside is important for tRNA stability and function.


13. Nucleases: Degradation of Nucleic Acids

Nucleases are enzymes that catalyze the hydrolysis of phosphodiester bonds in nucleic acids.

Classification

By substrate:

  • Deoxyribonucleases (DNases): degrade DNA
  • Ribonucleases (RNases): degrade RNA
  • Pancreatic RNase A specifically cleaves RNA at phosphodiester bonds 3’ to pyrimidine residues, producing 2’,3’-cyclic monophosphate intermediates

By position of cleavage:

  • Endonucleases: cleave internal phosphodiester bonds, producing fragments
  • Exonucleases: cleave from one end of the molecule (either 5’→3’ or 3’→5’ direction)

Restriction Endonucleases (Restriction Enzymes)

These are bacterial enzymes (part of the restriction-modification system) that:

  • Recognize specific palindromic sequences (typically 4–8 bp recognition sites)
  • Cut both strands of the double-stranded DNA at or near the recognition site

Two types of cuts:

  1. Sticky ends (cohesive ends): Staggered cuts leave short single-stranded overhangs (e.g., EcoRI cuts 5’-G↓AATTC-3’, leaving 5’-AATT overhangs)
  2. Blunt ends: Even cuts leave no overhangs (e.g., SmaI cuts 5’-CCC↓GGG-3’)

Molecular biology applications: Restriction enzymes are indispensable tools for molecular cloning. Sticky ends from compatible restriction enzymes can be ligated together by DNA ligase to create recombinant DNA molecules — the basis of genetic engineering.


14. Other Functions of Nucleotides

Energy Storage and Transfer

The nucleoside triphosphates, especially ATP, carry chemical energy:

  • Hydrolysis of a phosphoester bond (from nucleoside monophosphate): releases ~14 kJ/mol
  • Hydrolysis of a phosphoanhydride bond (the bonds between phosphate groups in ADP/ATP): releases ~30.5 kJ/mol per bond

The two bonds between the three phosphate groups in ATP are phosphoanhydride bonds (high-energy bonds). This energy is used to drive thermodynamically unfavorable reactions.

Coenzymes Containing Nucleotide Components

Many metabolic coenzymes contain an adenosine unit:

  • NAD⁺/NADH (contains adenosine + nicotinamide; involved in redox reactions in catabolism)
  • FAD/FADH₂ (contains riboflavin + adenosine; used in the citric acid cycle and electron transport chain)
  • Coenzyme A (CoA-SH) (contains adenosine + pantothenic acid + mercaptoethylamine; carries acyl groups)

Second Messengers

  • cAMP (cyclic adenosine 3’,5’-monophosphate): Synthesized from ATP by adenylyl cyclase (activated by GPCRs). Acts as a second messenger in countless hormonal signaling pathways. Activates protein kinase A (PKA).
  • cGMP (cyclic guanosine 3’,5’-monophosphate): Similar role; activated by nitric oxide signaling and natriuretic peptides.

Neurotransmitters and Platelet Signaling

Nucleotides also act extracellularly:

  • ATP binds to P2X receptors (ligand-gated ion channels) in postsynaptic membranes → involved in taste sensation, inflammation, and smooth muscle contraction
  • ADP binds to P2Y receptors on platelets → promotes platelet aggregation and blood clotting
    • Clinically relevant: Clopidogrel (Plavix) is an antiplatelet drug that irreversibly blocks P2Y₁₂ ADP receptors

Alarmone: ppGpp

Guanosine 5’-diphosphate, 3’-diphosphate (ppGpp), also called guanosine tetraphosphate, is an alarmone produced in bacteria under conditions of nutritional stress (e.g., amino acid starvation). It triggers the stringent response, globally reprogramming bacterial gene expression to shut down growth-related processes.


15. Genome, Genes, and Chromosomes

The Human Genome

  • The haploid human genome contains approximately 3 billion (3 × 10⁹) base pairs of DNA
  • Distributed across 23 chromosomes (3,054,815,472 bp with X; 2,963,015,935 bp with Y)
  • Individual chromosome size: ~50 million to 300 million bp
  • Humans are diploid (46 chromosomes in somatic cells): 22 pairs of autosomes + 1 pair of sex chromosomes (XX or XY)
  • Total length of DNA in a human cell: ~2 meters
  • Total DNA in the human body (~10¹⁴ cells): ~2 × 10¹¹ km (compare: Earth-Sun distance = 1.5 × 10⁸ km)

Visualization: Riccardo Sabatini printed Craig Venter’s genome in 175 volumes totaling 262,000 pages.

The Term “Genome”

The genome refers to the complete nucleotide sequence of an organism, including both coding and non-codingsequences.

Prokaryotic vs. Eukaryotic Genomes

FeatureProkaryotic GenomeEukaryotic Genome
SizeSmallLarge
OrganizationCompact, circularLinear chromosomes
NucleusAbsent (cytoplasmic)Present (membrane-bound)
Extra elementsPlasmidsTelomeres, centromeres
Gene structureMostly uninterruptedMany interrupted (introns/exons)
Repetitive sequencesFewMany
Transcription & TranslationCoupled (simultaneous)Separated in space and time

Composition of the Human Genome

ComponentApproximate %
Protein-coding genes (exons)~1.5%
Introns~26%
LINEs (Long Interspersed Nuclear Elements)~20%
SINEs (Short Interspersed Nuclear Elements)~13%
LTR retrotransposons~8%
Miscellaneous heterochromatin~8%
Segmental duplications~5%
Simple sequence repeats~3%
DNA transposons~3%
Miscellaneous unique sequences~12%

Non-coding regions include:

  • Introns (non-coding segments within genes)
  • Regulatory elements (promoters, enhancers, silencers)
  • Non-coding RNAs (rRNA, tRNA, miRNA, lncRNA, etc.)

Gene Definition

gene is defined as a portion of DNA that encodes the primary sequence of a final gene product, which may be:

  • polypeptide (protein-coding gene)
  • An RNA with structural or catalytic function (rRNA, tRNA, ribozyme genes)

Introns and Exons (Eukaryotic Gene Structure)

Many eukaryotic genes (but few prokaryotic genes) are interrupted by non-coding sequences called introns (intervening sequences):

  • Exons: The coding segments, which are retained in the mature mRNA
  • Introns: Non-coding sequences that are removed from the primary transcript (pre-mRNA) during RNA splicing

Example: The hemoglobin β-subunit gene spans ~851 bp in genomic DNA but the coding sequence (exons) totals only ~126 bp.

RNA Splicing is the process of precisely removing introns and ligating exons to generate a contiguous mRNA. This is carried out by a large ribonucleoprotein complex called the spliceosome.

Chromosomal Features

A eukaryotic chromosome contains:

  • Unique sequences (genes) and dispersed repetitive sequences
  • Multiple replication origins (unlike prokaryotes with a single origin)
  • Centromere: Site of kinetochore assembly; where spindle fibers attach during cell division
  • Telomeres: Repetitive sequences (e.g., TTAGGG in humans) at chromosome ends; protect against degradation and maintain chromosomal integrity

Karyotyping

Karyotyping is the visualization and analysis of an organism’s complete set of chromosomes. Clinical applications include detection of:

  • Down syndrome (Trisomy 21): extra chromosome 21
  • Klinefelter syndrome (XXY): extra X chromosome in males
  • Turner syndrome (XO): only one X chromosome in females

16. Chromatin and Nucleosomes

Chromatin

In the eukaryotic nucleus, DNA is not naked; it is tightly associated with proteins to form chromatin:

  • ~90% of chromatin proteins are histones
  • Also contains significant amounts of non-histone proteins and RNA
  • During interphase: chromatin is partially decondensed to allow transcription and replication
  • Post-replication: chromatin condenses into visible chromosomes

Two functional states:

  • Euchromatin: Lightly packed, transcriptionally active regions
  • Heterochromatin: Tightly packed, transcriptionally inactive regions (includes constitutive heterochromatin at centromeres and telomeres, and facultative heterochromatin in silenced gene regions)

Electron microscopy reveals chromatin as “beads on a string” — the beads are nucleosomes.

Histones

Histones are small, highly conserved basic proteins rich in the positively charged amino acids lysine (Lys) and arginine (Arg), which interact with the negatively charged phosphate backbone of DNA.

Five histone classes:

HistoneMW (Da)Lys (%)Arg (%)Role
H121,13029.511.3Linker histone; binds connecting DNA
H2A13,96010.919.3Core histone
H2B13,77416.016.4Core histone
H315,27319.613.3Core histone
H411,23610.813.7Core histone

Nucleosome Structure

The nucleosome is the fundamental repeating unit of chromatin:

  • Histone octamer core: 2 copies each of H2A, H2B, H3, and H4
  • DNA wrapping: ~146 bp of DNA wrapped 1.65 times around the histone octamer
  • Linker DNA: ~54 bp of DNA connecting adjacent nucleosomes (bound by histone H1)
  • Repeat unit: ~200 bp total per nucleosome

Geometry of DNA binding to the nucleosome: The histone core does not bind randomly. The sequence of the bound DNA matters:

  • Regions with two or more A-T base pairs favor DNA curvature
  • Regions with two or more G-C base pairs resist curvature
  • Alternating A-T-rich regions at ~10 bp intervals (one helical turn) in the correct phase help the DNA to wrap tightly around the nucleosome

Histone Modifications (Epigenetic Marks)

The N-terminal “tails” of core histones (which protrude outside the nucleosome core) can be reversibly modified:

ModificationResidueEffect
AcetylationLysNeutralizes positive charge → loosens DNA–histone interaction → activates transcription
MethylationLys, ArgCan activate or repress transcription depending on position and degree
PhosphorylationSer, ThrInvolved in chromosome condensation (mitosis) and DNA repair signaling
UbiquitinationLysVarious effects on transcription and DNA repair

These modifications constitute the “histone code”, which is read by regulatory proteins to modulate:

  • Chromatin structure
  • Gene transcription
  • DNA repair
  • Cell cycle progression

Higher-Order Chromatin Folding

Successive levels of organization compact DNA progressively:

  1. Naked DNA: 2 nm double helix
  2. Nucleosome array (“beads on a string”): 11 nm fiber
  3. 30-nm chromatin fiber (solenoid): ~6 nucleosomes per helical turn; requires H1
  4. 300-nm loops: Chromatin loops anchored to a protein scaffold
  5. 250-nm fiber (further coiling of loops)
  6. Metaphase chromosome: 700–1400 nm; maximally compacted

This hierarchical compaction reduces the effective length of DNA by ~10,000-fold in a metaphase chromosome.

Role of lncRNA in Chromatin Structure

Long non-coding RNAs (lncRNAs) contribute to chromosome organization:

  • (a) lncRNAs interact with DNA-binding proteins to tether distant DNA segments together
  • (b) lncRNAs interact with specific DNA sequences and recruit gene-regulatory proteins to regions, suppressing or activating transcription nearby

The most famous example is XIST RNA, which coats one entire X chromosome in female mammals, triggering its inactivation (X-chromosome inactivation).


17. DNA Supercoiling and Topoisomerases

DNA Supercoiling

The DNA double helix itself can be coiled — this is supercoiling. It arises when the axis of the double helix is itself coiled in space (also called a superhelix).

  • The most common source of supercoiling in cells is underwinding of the double helix in a closed circular DNA (fewer helical turns than expected for relaxed B-form DNA)
  • This creates negative supercoils (the DNA is wound in the opposite direction to the right-handed helix)
  • Negative supercoiling is favorable for processes requiring strand separation (replication, transcription, recombination) because it partially pre-unwinds the helix

Supercoiling is also relevant in linear eukaryotic chromosomes because topological domains are maintained by protein attachment points.

States of circular DNA:

StateDescription
RelaxedNormal B-form, no supercoiling
Strained (underwound)Fewer turns than expected
SupercoiledStrain accommodated by coiling of the helix axis
Strand separatedAt very high levels of underwinding

Topoisomerases

Topoisomerases are enzymes that change the topology of DNA (i.e., alter the number of supercoils) by transiently breaking and rejoining phosphodiester bonds.

Type I Topoisomerases

  • Mechanism: Transiently break ONE strand of the double helix
    1. The active-site tyrosine forms a covalent 5’-phosphotyrosyl protein-DNA linkage (cleaves one strand)
    2. The unbroken strand passes through the break (or the broken strand rotates)
    3. The break is religated (3’-OH attacks the phosphotyrosyl linkage)
  • Effect: Changes the linking number by 1 per catalytic cycle
  • ATP not required
  • Relaxes both positive and negative supercoils

Type II Topoisomerases

  • Mechanism: Transiently break BOTH strands of the double helix
    • An intact segment of duplex DNA is passed through the double-strand break
    • Both strands are then religated
  • Effect: Changes the linking number by 2 per catalytic cycle
  • Requires ATP hydrolysis
  • Can introduce negative supercoils (DNA gyrase in bacteria), relax supercoils, or decatenate (separate interlinked circular DNA molecules after replication)

Summary of Topoisomerase Types

FamilyTypeMechanismDomain
IARelaxes (−)Strand passageBacteria, eukaryotes
IBSwivelaseStrand rotationBacteria, eukaryotes
IIA (DNA gyrase)Introduces (−) supercoilsStrand passageBacteria
IIA (Topo IIα, IIβ)Relaxes (+ or −)Strand passageEukaryotes
IIA (Topo IV)DecatenaseStrand passageBacteria

Topoisomerases as Drug Targets

Topoisomerases are excellent therapeutic targets because they are essential for cell survival and proliferation:

Topoisomerase I Inhibitors (Camptothecins):

  • Topotecan: Used for ovarian and lung cancer
  • Irinotecan: Used for colorectal cancer

Topoisomerase II Inhibitors:

  • Etoposide (VP-16): Used for lung cancer
  • Doxorubicin (adriamycin): Used for breast cancer (and other cancers)

Mechanism of action: These drugs stabilize the enzyme-DNA cleavage complex (the “cleavable complex”), trapping broken DNA ends. When a replication fork or transcription machinery encounters a drug-stabilized cleavage complex, it triggers double-strand breaks, leading to cell death. Cancer cells, which divide more rapidly, are more susceptible.

Antibiotic targets: Bacterial DNA gyrase (Type IIA) and Topo IV are targets of fluoroquinolone antibiotics(e.g., ciprofloxacin), which are selectively toxic to bacteria because these enzymes differ significantly from their eukaryotic counterparts.


18. SMC Proteins: Cohesins and Condensins

SMC proteins (Structural Maintenance of Chromosomes) are a family of large ATPases essential for maintaining chromosome structure and integrity.

Structure of SMC Proteins

Each SMC protein has a characteristic architecture:

  • Globular N-terminal and C-terminal domains: Each contributes to an ATPase (ABC-type) active site
  • Two α-helical coiled-coil regions: Connect the terminal domains to a central hinge domain
  • SMC proteins function as dimers (forming a V-shaped structure with the hinge at the apex and the ATPase head domains at the tips)

Types of SMC Complexes

Cohesins

  • Function: Hold sister chromatids together after DNA replication until anaphase
  • Loaded onto chromosomes during S phase (replication)
  • Essential for proper chromosome segregation in mitosis and meiosis
  • Cohesin ring encircles the two sister chromatid DNA molecules

Condensins

  • Function: Drive chromosome condensation as cells enter mitosis
  • Essential for compacting the chromatin from its interphase state into short, thick mitotic chromosomes

Cell Cycle Dynamics

During the cell cycle:

  • S phase: DNA replication; cohesin is deposited, linking sister chromatids
  • G2 phase: Condensin begins chromosome condensation in preparation for mitosis
  • Prophase → Metaphase: Maximum condensation; chromosomes align at the metaphase plate
  • Anaphase: Cohesin is cleaved (by separase), allowing sister chromatids to separate to opposite poles

19. Mitochondrial DNA (mtDNA)

Structure and Properties

Mitochondria contain their own circular double-stranded DNA:

  • Circular (not linear like nuclear chromosomes)
  • Multiple copies per mitochondrion: ~100 copies in leukocytes to ~10,000 in neurons
  • Replication is independent of the cell cycle (continuous, not just in S phase)

Genetic Content

The human mitochondrial genome encodes:

  • 2 ribosomal RNA (rRNA) genes
  • 22 tRNA genes
  • 13 protein-coding genes — all encoding subunits of the oxidative phosphorylation (OXPHOS) complexes (Complex I, III, IV, and ATP synthase)

The majority of mitochondrial proteins (~1500) are actually encoded by the nuclear genome, synthesized in the cytoplasm, and imported into mitochondria.

Maternal Inheritance

Mitochondria (and their DNA) are inherited exclusively through the mother, because sperm mitochondria are eliminated after fertilization.

Homoplasmy vs. Heteroplasmy

  • Homoplasmy: All mtDNA copies in a cell have the same sequence
  • Heteroplasmy: A cell contains two or more distinct populations of mtDNA (e.g., wild-type and mutant)

Clinical significance: Pathogenic mtDNA mutations often need to exceed a threshold level of heteroplasmy (typically ~60–90%) before causing disease. Below this threshold, sufficient wild-type mitochondria compensate.

mtDNA Mutations and Diseases

Due to:

  • High copy number (many replications per cell division)
  • Less efficient DNA repair compared to the nucleus
  • Proximity to ROS generated by the electron transport chain

mtDNA accumulates mutations at a higher rate than nuclear DNA.

Common mitochondrial diseases and associated mutations:

MutationDisease
m.3243A>GMELAS (mitochondrial encephalomyopathy, lactic acidosis and stroke-like episodes); MIDD
m.8344A>GMERRF (myoclonic epilepsy with ragged red fibres)
m.14459G>AMELAS, MILS, cardiomyopathy
m.13513G>AMELAS, MILS
m.8993T>GNARP (neurogenic muscle weakness, ataxia, retinitis pigmentosa); MILS
m.8483_13459delPMPS (Pearson marrow-pancreas syndrome); KSS (Kearns–Sayre syndrome)

mtDNA Release and Inflammation

Damaged or stressed mitochondria can release mtDNA into the cytosol or into the bloodstream:

  • Cytoplasmic mtDNA: Activates innate immune signaling pathways (e.g., NLRP3 inflammasomecGAS-STING pathway), triggering inflammation
  • Circulating cell-free mtDNA (CCF-mtDNA): Released from cells passively (cell damage) or actively (via extracellular vesicles); can be detected in blood as a biomarker of mitochondrial stress, trauma, or systemic inflammation

20. Polymerase Chain Reaction (PCR)

Developed by Kary Mullis in 1983 (Nobel Prize in Chemistry, 1993), PCR is a technique to amplify a specific DNA sequence exponentially in vitro.

Requirements

  • Target DNA (template)
  • Two oligonucleotide primers flanking the region to be amplified (one complementary to each strand)
  • Thermostable DNA polymerase (e.g., Taq polymerase from Thermus aquaticus, which survives high temperatures)
  • dNTPs (all four deoxyribonucleoside triphosphates)
  • Buffer with Mg²⁺

PCR Cycle (Three Steps)

  1. Denaturation (~95°C): Heat separates the two DNA strands
  2. Annealing (~50–65°C, depends on primer Tₘ): Primers bind (hybridize) to their complementary sequences on each strand
  3. Extension (~72°C): Taq polymerase extends the primers in the 5’→3’ direction, synthesizing new DNA

Exponential Amplification

Each cycle doubles the amount of target DNA. After 20 cycles: ~10⁶-fold amplification. After 30 cycles: ~10⁹-fold amplification.

Why does PCR exploit DNA denaturation/renaturation? The very properties of DNA base-pair complementarity and the reversibility of denaturation make PCR possible. The melting temperature concept — calculated from the base composition of the primers — determines the optimal annealing temperature.

Applications of PCR

  • Diagnosis of infectious diseases (e.g., PCR for SARS-CoV-2, HIV, tuberculosis)
  • Genetic testing and prenatal diagnosis
  • Forensic science (DNA fingerprinting)
  • Sequencing (PCR amplification precedes sequencing)
  • Cloning of genes
  • Detection of mutations (RT-PCR, qPCR, etc.)

21. Conclusion

This module has provided a comprehensive survey of nucleic acid biochemistry, from the atomic level to the chromosomal level:

  1. Nucleotides — the monomeric building blocks — consist of a nitrogenous base, a pentose sugar, and a phosphate group. Their precise chemistry dictates all higher-order properties of DNA and RNA.

  2. Polynucleotide chains are linked by 3’,5’-phosphodiester bonds, giving strands directionality (polarity).

  3. DNA’s double-helical structure, governed by Watson-Crick base pairing and base-stacking, is the molecular basis of genetic inheritance. The B-form is physiologically dominant, but A- and Z-forms have important roles.

  4. Unusual structures (hairpins, cruciforms, triple helices, G-quadruplexes) expand the functional repertoire of DNA and RNA.

  5. DNA can be denatured and re-annealed — a property exploited in PCR, hybridization, and countless molecular biology techniques.

  6. DNA is subject to damage (deamination, depurination, oxidation, UV damage, chemical mutagenesis). Repair pathways counteract these, but failures lead to mutations.

  7. DNA methylation provides an epigenetic layer of gene regulation, influencing transcription without altering the base sequence.

  8. RNA adopts complex secondary structures enabling its diverse roles as mRNA, tRNA, rRNA, ribozyme, and regulatory RNA.

  9. Nucleases degrade nucleic acids; restriction enzymes are powerful tools in molecular biology.

  10. Nucleotides serve multiple cellular functions beyond being nucleic acid building blocks: energy currency (ATP), signaling (cAMP, cGMP), coenzymes (NAD⁺, FAD, CoA).

  11. The human genome (~3 billion bp) is organized in linear chromosomes, most of which is non-coding. Genes are often interrupted by introns.

  12. Chromatin and nucleosomes package DNA into the nucleus. Histone modifications and DNA methylation together constitute the epigenome, regulating gene expression across cell types.

  13. Supercoiling and topoisomerases control DNA topology, which is essential for replication, transcription, and chromosome segregation. Topoisomerases are key therapeutic targets.

  14. SMC proteins (cohesins and condensins) maintain chromosome architecture through the cell cycle.

  15. Mitochondrial DNA is a separate, maternally inherited circular genome; its mutations cause a spectrum of metabolic diseases.

  16. PCR harnesses the principles of DNA denaturation and base-pair complementarity to amplify specific sequences — one of the most transformative tools in modern biology and medicine.


Study tip: Focus on understanding the mechanistic reasons for each structural feature — for example, why B-form DNA predominates at physiological conditions, why RNA is less stable than DNA, and why histones are basic proteins. These mechanistic insights will help you apply knowledge to novel questions in your exam.


Reference: David L. Nelson & Michael M. Cox, Lehninger Principles of Biochemistry, 7th or 8th Edition, W.H. Freeman, New York.