Skip to main content
Toolsbase Logo

Markdown and HTML Conversion Best Practices

Toolsbase Editorial Team
MarkdownHTMLCommonMarkGFMStatic SiteXSS

The Problem Markdown Spent a Decade Not Solving

John Gruber created Markdown in 2004 with a deliberate philosophy: readable plain text that maps cleanly to HTML. His Markdown.pl Perl script was the only implementation, and his spec left many edge cases undefined.

The result: GitHub's Markdown, Reddit's Markdown, Stack Overflow's Markdown, and every blogging platform's Markdown behaved slightly differently. The same input could render differently depending on the parser. "Markdown" was more of a family of dialects than a standard.

CommonMark's formalization and GitHub Flavored Markdown's adoption have largely resolved this. But understanding which Markdown variant you're targeting still matters.


CommonMark: Formalizing the Standard

History

Around 2012, John MacFarlane (author of pandoc) and collaborators began work on a rigorously specified Markdown standard, accounting for every edge case. It launched as CommonMark in 2014.

The CommonMark Spec contains 652 specification examples, each with defined expected output. Conforming parsers include cmark (C reference), marked (JavaScript), commonmark.js, and commonmark-py.

CommonMark Syntax

# Heading 1
## Heading 2

Paragraphs are separated by blank lines.
A line break within a paragraph is part of the same paragraph.

**bold**, *italic*

- Unordered list
- Another item
  - Nested item (two spaces or tab)

1. Ordered list
2. Second item

[Link text](https://example.com)

![Alt text](image.png)

> Blockquote

`inline code`

```javascript
// Fenced code block
const x = 1;

CommonMark's Strict Rules

CommonMark explicitly resolves the ambiguities that Markdown.pl left undefined. One notable example:

## List continuation

- Item 1

  This paragraph is inside the list item.
  (requires alignment with the list marker)

- Item 2

CommonMark uses a "continuation indent" rule (aligning with the first content character of the list item) rather than a fixed 4-space indent, which produces different results from Markdown.pl in some cases.


GFM: GitHub Flavored Markdown

GitHub Flavored Markdown (GFM) extends CommonMark with features needed for GitHub's use cases. The GFM Spec is published by GitHub and has become the de facto Markdown standard for developer-facing content.

Key GFM Extensions

Tables

CommonMark has no table syntax. GFM adds it:

| Algorithm | Speed | Security |
|-----------|-------|----------|
| SHA-256   | Medium | High    |
| BLAKE3    | High   | High    |

Task Lists

- [x] Completed task
- [ ] Pending task
- [ ] Another pending task

CommonMark requires angle brackets for autolinks (<https://example.com>). GFM automatically links bare URLs beginning with https://.

https://github.com   → automatically linked (GFM only)
<https://github.com> → works in CommonMark too

Strikethrough

~~strikethrough~~ → <del>strikethrough</del>

Fenced Code Blocks with Language Identifier

Both CommonMark and GFM support language-tagged fenced code blocks, but GFM's spec is more precise about rendering expectations.

```python
def hello():
    print("Hello, World!")

---

## How Markdown-to-HTML Conversion Works

### The Conversion Pipeline

Markdown text ↓ tokenize (lexical analysis) token tree (AST) ↓ render raw HTML string ↓ sanitize ← CRITICAL safe HTML ↓ inject into DOM / SSR rendered page


Omitting the sanitization step is an XSS vulnerability.

### JavaScript Implementation

```javascript
import { marked } from 'marked';
import DOMPurify from 'dompurify';

// ❌ Dangerous: user input directly inserted as HTML
const rawHtml = marked.parse(userInput);
element.innerHTML = rawHtml; // XSS risk

// ✅ Safe: sanitize before insertion
const safeHtml = DOMPurify.sanitize(marked.parse(userInput));
element.innerHTML = safeHtml;
// marked configuration
import { marked } from 'marked';

marked.setOptions({
  gfm: true,     // GitHub Flavored Markdown
  breaks: false, // don't convert single newlines to <br> (CommonMark-compliant)
});

const html = marked.parse('## Hello\n\n**World**');

XSS Vulnerabilities in Markdown Conversion

Markdown-to-HTML conversion is the most common path to XSS in content management features, because developers often underestimate how much control Markdown gives to the author.

Attack Vectors

1. Inline HTML Pass-Through

Most Markdown parsers pass inline HTML through unchanged:

Normal paragraph.

<script>alert('XSS!')</script>

<img src="x" onerror="alert('XSS!')">

<a href="javascript:alert('XSS')">Click me</a>
[Click here](javascript:alert('XSS'))

This renders as a legitimate-looking anchor tag that executes JavaScript on click.

3. Event Handler Injection

<div onmouseover="alert('XSS')">Hover over me</div>

Mitigation: DOMPurify

DOMPurify removes unsafe HTML based on the WHATWG HTML Living Standard's parsing rules.

import DOMPurify from 'dompurify';

// Default: allows common safe HTML tags and attributes
const clean = DOMPurify.sanitize(dirty);

// Stricter: explicit allowlist
const clean = DOMPurify.sanitize(dirty, {
  ALLOWED_TAGS: [
    'p', 'br', 'strong', 'em', 'del', 'a', 'ul', 'ol', 'li',
    'code', 'pre', 'blockquote', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6',
    'table', 'thead', 'tbody', 'tr', 'th', 'td', 'img',
  ],
  ALLOWED_ATTR: ['href', 'src', 'alt', 'title', 'class', 'id'],
  ALLOW_DATA_ATTR: false, // disallow data-* attributes
});

// Add rel="noopener noreferrer" to external links
DOMPurify.addHook('afterSanitizeAttributes', (node) => {
  if (node.tagName === 'A') {
    const href = node.getAttribute('href') || '';
    if (href && !href.startsWith('#') && !href.startsWith('/')) {
      node.setAttribute('target', '_blank');
      node.setAttribute('rel', 'noopener noreferrer');
    }
  }
});

Server-Side Sanitization (Node.js)

const sanitizeHtml = require('sanitize-html');
const { marked } = require('marked');

function markdownToSafeHtml(markdown) {
  const raw = marked.parse(markdown);
  return sanitizeHtml(raw, {
    allowedTags: sanitizeHtml.defaults.allowedTags.concat([
      'h1', 'h2', 'h3', 'del', 'img',
    ]),
    allowedAttributes: {
      ...sanitizeHtml.defaults.allowedAttributes,
      a: ['href', 'name', 'target', 'rel'],
      img: ['src', 'alt', 'width', 'height'],
    },
    allowedSchemes: ['http', 'https', 'mailto'],  // excludes javascript:
  });
}

Convert and preview Markdown with the Markdown-HTML Converter.


HTML-to-Markdown Conversion

Reversing the conversion — HTML to Markdown — is inherently lossy. HTML has constructs for which Markdown has no equivalent.

Information Loss Table

HTML What Happens in Markdown
<span style="color: red"> Style lost
<div class="alert"> Class lost
<figure><figcaption> Becomes plain paragraph
<table colspan> Merged cells cannot be represented
<details>/<summary> May convert partially (GFM extended parsers)

turndown Library (JavaScript)

const TurndownService = require('turndown');
const { gfm } = require('@joplin/turndown-plugin-gfm');

const td = new TurndownService({
  headingStyle: 'atx',       // # style
  codeBlockStyle: 'fenced',  // ``` style
  bulletListMarker: '-',
});

td.use(gfm); // enable table and strikethrough conversion

const markdown = td.turndown(`
  <h1>Hello</h1>
  <p>This is a <strong>test</strong>.</p>
  <table>
    <tr><th>Name</th><th>Age</th></tr>
    <tr><td>Alice</td><td>28</td></tr>
  </table>
`);
// ## Hello\n\nThis is a **test**.\n\n| Name | Age |\n| --- | --- |\n| Alice | 28 |

Static Site Generators: Production Patterns

Next.js with the unified Ecosystem

The unified / remark / rehype ecosystem processes Markdown at the AST level, enabling flexible transformation pipelines.

// lib/markdown.js
import { unified } from 'unified';
import remarkParse from 'remark-parse';
import remarkGfm from 'remark-gfm';
import remarkRehype from 'remark-rehype';
import rehypeSanitize from 'rehype-sanitize'; // XSS mitigation
import rehypeHighlight from 'rehype-highlight';
import rehypeStringify from 'rehype-stringify';

export async function markdownToHtml(markdown) {
  const result = await unified()
    .use(remarkParse)
    .use(remarkGfm)
    .use(remarkRehype)
    .use(rehypeSanitize)      // always include this
    .use(rehypeHighlight)
    .use(rehypeStringify)
    .process(markdown);

  return String(result);
}
// app/blog/[slug]/page.tsx
import { markdownToHtml } from '@/lib/markdown';

export default async function BlogPost({ params }) {
  const post = await getPost(params.slug);
  const content = await markdownToHtml(post.body);

  return (
    <article
      dangerouslySetInnerHTML={{ __html: content }}
      className="prose prose-neutral max-w-none"
    />
  );
}

Front Matter Parsing

---
title: "Article Title"
date: "2026-04-15"
tags: ["Markdown", "HTML"]
---

# Body

Article content begins here.
import matter from 'gray-matter';
import { readFileSync } from 'fs';

const fileContent = readFileSync('post.md', 'utf-8');
const { data: frontmatter, content: body } = matter(fileContent);

console.log(frontmatter.title); // "Article Title"
console.log(frontmatter.date);  // Date object or string depending on parser config

Best Practices

1. Declare Which Dialect You Use

Document whether your project uses CommonMark, GFM, or another variant. This prevents subtle rendering differences when multiple authors contribute.

2. Use a Linter

markdownlint enforces consistent Markdown style. A VS Code extension is available.

// .markdownlintrc.json
{
  "MD013": false,  // disable line length limit
  "MD033": false,  // allow inline HTML when needed
  "MD041": true    // require h1 at start of file
}

3. Always Specify Language in Code Blocks

```javascript    ← specify language
const x = 1;
const x = 1;

4. Write Alt Text for Images

![Diagram showing the request lifecycle](diagram.png)  ← good

![](diagram.png)  ← missing alt text: accessibility and SEO issue

The WHATWG HTML Living Standard specifies that alt is a required attribute on img elements (decorative images use alt=""). Screenreaders rely on it.

5. Sanitize User-Generated Content — Without Exception

If users can write Markdown that gets rendered as HTML, XSS protection is mandatory. No sanitization library is perfect, but DOMPurify and rehype-sanitize are actively maintained and widely audited.


Summary

Topic Recommendation
Base spec CommonMark
GitHub / developer content GFM (CommonMark + tables, task lists, etc.)
Markdown → HTML marked or unified; always sanitize
User-generated content DOMPurify (client) or rehype-sanitize (server)
HTML → Markdown turndown (understand information loss)
Static sites unified ecosystem (remark + rehype)

Markdown's strength is its balance between ease of authorship and expressive output. Understanding the CommonMark/GFM distinction and treating sanitization as non-negotiable ensures safe, consistent content handling across any platform.


References