HTML Generation

This chapter explains how pulldown-cmark generates HTML output from Markdown events.

Overview

HTML generation is implemented in the html module and consists of two main components:

  1. The HtmlWriter struct which manages state and writes HTML tags
  2. Helper functions for converting events to HTML tags and handling special cases

The HTML generation process works by:

  1. Taking an iterator of Markdown events
  2. Converting each event into corresponding HTML tags
  3. Managing state for special cases like tables and tight lists
  4. Writing the HTML tags to the provided output

The HtmlWriter

The core type responsible for HTML generation is HtmlWriter:

#![allow(unused)]
fn main() {
struct HtmlWriter<'a, I, W> {
    iter: I,        // Iterator supplying events
    writer: W,      // Writer to write to
    end_newline: bool,  // Whether last write ended with newline
    in_non_writing_block: bool,  // In metadata block (no output)
    table_state: TableState,  // Current state for table processing
    table_alignments: Vec<Alignment>,  // Column alignments for current table
    table_cell_index: usize,  // Current cell index in table row
    numbers: HashMap<CowStr<'a>, usize>,  // For footnote numbering
}
}

The writer keeps track of:

  • The current table state (head vs body)
  • Table column alignments
  • Current cell index
  • Footnote numbering
  • Whether we're in a non-writing block like metadata
  • Whether the last write ended with a newline

Event Processing

The main event processing loop lives in HtmlWriter::run(). For each event:

  1. The event is matched and dispatched to the appropriate handler
  2. HTML tags are written based on the event type
  3. State is updated as needed

Key event handling patterns:

Block Elements

Block elements like paragraphs, headings, lists etc. are wrapped in HTML tags:

#![allow(unused)]
fn main() {
match event {
    Start(Tag::Paragraph) => write("<p>"),
    End(EndTag::Paragraph) => write("</p>\n"),
    // etc
}
}

Inline Elements

Inline elements like emphasis and links are handled similarly but without newlines:

#![allow(unused)]
fn main() {
match event {
    Start(Tag::Emphasis) => write("<em>"),
    End(EndTag::Emphasis) => write("</em>"),
    // etc
}
}

Text Content

Text content is HTML escaped and written directly:

#![allow(unused)]
fn main() {
match event {
    Text(text) => escape_html_body_text(&mut writer, &text),
    // etc
}
}

Complex Elements

More complex elements like tables require managing state:

#![allow(unused)]
fn main() {
match event {
    Start(Tag::Table(alignments)) => {
        self.table_alignments = alignments;
        self.write("<table>")?;
    }
    // etc
}
}

HTML Safety

The functions escape_html() and escape_href() are used throughout the library for escaping special characters. The escaping functions live in the pulldown-cmark-escape crate.

Writer Interface

The HTML writer is generic over the writer type W, allowing output to:

  • Strings via fmt::Write
  • Files/IO via io::Write

This generic design lets users choose the most efficient output method for their use case. For example:

  • Using String is convenient for in-memory processing and testing
  • Using BufWriter<File> is efficient for writing directly to disk
  • Using a network socket allows streaming HTML over a connection
  • Using a custom writer enables special handling like compression or logging

The StrWrite trait provides a common interface to abstract over these different writers:

#![allow(unused)]
fn main() {
pub trait StrWrite {
    type Error;
    fn write_str(&mut self, s: &str) -> Result<(), Self::Error>;  
}
}

This abstraction over the writer type means the HTML generation code can focus on correct tag generation and structure without worrying about the specific output destination. It also allows users to easily integrate pulldown-cmark's HTML output into their existing I/O pipelines.

Public API

The main public API consists of:

#![allow(unused)]
fn main() {
// Write HTML to a String
pub fn push_html<'a, I>(s: &mut String, iter: I) 
where I: Iterator<Item = Event<'a>>

// Write HTML to an IO writer
pub fn write_html_io<'a, I, W>(writer: W, iter: I) -> io::Result<()> 
where I: Iterator<Item = Event<'a>>,
      W: io::Write

// Write HTML to a fmt writer
pub fn write_html_fmt<'a, I, W>(writer: W, iter: I) -> fmt::Result
where I: Iterator<Item = Event<'a>>,
      W: fmt::Write
}

Performance Considerations

HTML generation aims to be efficient by:

  1. Minimizing string allocations
  2. Using buffered writers
  3. Avoiding recursion in the core loop

Note:

#![allow(unused)]
fn main() {
// Using unbuffered writers (like Files) will be slow
// Wrap them in BufWriter for better performance
let file = BufWriter::new(File::create("output.html")?);
write_html_io(file, parser);
}

This ensures good performance even with large documents.

Customization

The HTML output can be customized by:

  1. Using a custom writer implementation
  2. Preprocessing the event stream
  3. Post-processing the HTML output
  4. Using the parser options to enable/disable features