Markdown and HTML Conversion Best Practices
The Problem Markdown Spent a Decade Not Solving
John Gruber created Markdown in 2004 with a deliberate philosophy: readable plain text that maps cleanly to HTML. His Markdown.pl Perl script was the only implementation, and his spec left many edge cases undefined.
The result: GitHub's Markdown, Reddit's Markdown, Stack Overflow's Markdown, and every blogging platform's Markdown behaved slightly differently. The same input could render differently depending on the parser. "Markdown" was more of a family of dialects than a standard.
CommonMark's formalization and GitHub Flavored Markdown's adoption have largely resolved this. But understanding which Markdown variant you're targeting still matters.
CommonMark: Formalizing the Standard
History
Around 2012, John MacFarlane (author of pandoc) and collaborators began work on a rigorously specified Markdown standard, accounting for every edge case. It launched as CommonMark in 2014.
The CommonMark Spec contains 652 specification examples, each with defined expected output. Conforming parsers include cmark (C reference), marked (JavaScript), commonmark.js, and commonmark-py.
CommonMark Syntax
# Heading 1
## Heading 2
Paragraphs are separated by blank lines.
A line break within a paragraph is part of the same paragraph.
**bold**, *italic*
- Unordered list
- Another item
- Nested item (two spaces or tab)
1. Ordered list
2. Second item
[Link text](https://example.com)

> Blockquote
`inline code`
```javascript
// Fenced code block
const x = 1;
CommonMark's Strict Rules
CommonMark explicitly resolves the ambiguities that Markdown.pl left undefined. One notable example:
## List continuation
- Item 1
This paragraph is inside the list item.
(requires alignment with the list marker)
- Item 2
CommonMark uses a "continuation indent" rule (aligning with the first content character of the list item) rather than a fixed 4-space indent, which produces different results from Markdown.pl in some cases.
GFM: GitHub Flavored Markdown
GitHub Flavored Markdown (GFM) extends CommonMark with features needed for GitHub's use cases. The GFM Spec is published by GitHub and has become the de facto Markdown standard for developer-facing content.
Key GFM Extensions
Tables
CommonMark has no table syntax. GFM adds it:
| Algorithm | Speed | Security |
|-----------|-------|----------|
| SHA-256 | Medium | High |
| BLAKE3 | High | High |
Task Lists
- [x] Completed task
- [ ] Pending task
- [ ] Another pending task
Autolinks
CommonMark requires angle brackets for autolinks (<https://example.com>). GFM automatically links bare URLs beginning with https://.
https://github.com → automatically linked (GFM only)
<https://github.com> → works in CommonMark too
Strikethrough
~~strikethrough~~ → <del>strikethrough</del>
Fenced Code Blocks with Language Identifier
Both CommonMark and GFM support language-tagged fenced code blocks, but GFM's spec is more precise about rendering expectations.
```python
def hello():
print("Hello, World!")
---
## How Markdown-to-HTML Conversion Works
### The Conversion Pipeline
Markdown text ↓ tokenize (lexical analysis) token tree (AST) ↓ render raw HTML string ↓ sanitize ← CRITICAL safe HTML ↓ inject into DOM / SSR rendered page
Omitting the sanitization step is an XSS vulnerability.
### JavaScript Implementation
```javascript
import { marked } from 'marked';
import DOMPurify from 'dompurify';
// ❌ Dangerous: user input directly inserted as HTML
const rawHtml = marked.parse(userInput);
element.innerHTML = rawHtml; // XSS risk
// ✅ Safe: sanitize before insertion
const safeHtml = DOMPurify.sanitize(marked.parse(userInput));
element.innerHTML = safeHtml;
// marked configuration
import { marked } from 'marked';
marked.setOptions({
gfm: true, // GitHub Flavored Markdown
breaks: false, // don't convert single newlines to <br> (CommonMark-compliant)
});
const html = marked.parse('## Hello\n\n**World**');
XSS Vulnerabilities in Markdown Conversion
Markdown-to-HTML conversion is the most common path to XSS in content management features, because developers often underestimate how much control Markdown gives to the author.
Attack Vectors
1. Inline HTML Pass-Through
Most Markdown parsers pass inline HTML through unchanged:
Normal paragraph.
<script>alert('XSS!')</script>
<img src="x" onerror="alert('XSS!')">
<a href="javascript:alert('XSS')">Click me</a>
2. javascript: Protocol in Links
[Click here](javascript:alert('XSS'))
This renders as a legitimate-looking anchor tag that executes JavaScript on click.
3. Event Handler Injection
<div onmouseover="alert('XSS')">Hover over me</div>
Mitigation: DOMPurify
DOMPurify removes unsafe HTML based on the WHATWG HTML Living Standard's parsing rules.
import DOMPurify from 'dompurify';
// Default: allows common safe HTML tags and attributes
const clean = DOMPurify.sanitize(dirty);
// Stricter: explicit allowlist
const clean = DOMPurify.sanitize(dirty, {
ALLOWED_TAGS: [
'p', 'br', 'strong', 'em', 'del', 'a', 'ul', 'ol', 'li',
'code', 'pre', 'blockquote', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6',
'table', 'thead', 'tbody', 'tr', 'th', 'td', 'img',
],
ALLOWED_ATTR: ['href', 'src', 'alt', 'title', 'class', 'id'],
ALLOW_DATA_ATTR: false, // disallow data-* attributes
});
// Add rel="noopener noreferrer" to external links
DOMPurify.addHook('afterSanitizeAttributes', (node) => {
if (node.tagName === 'A') {
const href = node.getAttribute('href') || '';
if (href && !href.startsWith('#') && !href.startsWith('/')) {
node.setAttribute('target', '_blank');
node.setAttribute('rel', 'noopener noreferrer');
}
}
});
Server-Side Sanitization (Node.js)
const sanitizeHtml = require('sanitize-html');
const { marked } = require('marked');
function markdownToSafeHtml(markdown) {
const raw = marked.parse(markdown);
return sanitizeHtml(raw, {
allowedTags: sanitizeHtml.defaults.allowedTags.concat([
'h1', 'h2', 'h3', 'del', 'img',
]),
allowedAttributes: {
...sanitizeHtml.defaults.allowedAttributes,
a: ['href', 'name', 'target', 'rel'],
img: ['src', 'alt', 'width', 'height'],
},
allowedSchemes: ['http', 'https', 'mailto'], // excludes javascript:
});
}
Convert and preview Markdown with the Markdown-HTML Converter.
HTML-to-Markdown Conversion
Reversing the conversion — HTML to Markdown — is inherently lossy. HTML has constructs for which Markdown has no equivalent.
Information Loss Table
| HTML | What Happens in Markdown |
|---|---|
<span style="color: red"> |
Style lost |
<div class="alert"> |
Class lost |
<figure><figcaption> |
Becomes plain paragraph |
<table colspan> |
Merged cells cannot be represented |
<details>/<summary> |
May convert partially (GFM extended parsers) |
turndown Library (JavaScript)
const TurndownService = require('turndown');
const { gfm } = require('@joplin/turndown-plugin-gfm');
const td = new TurndownService({
headingStyle: 'atx', // # style
codeBlockStyle: 'fenced', // ``` style
bulletListMarker: '-',
});
td.use(gfm); // enable table and strikethrough conversion
const markdown = td.turndown(`
<h1>Hello</h1>
<p>This is a <strong>test</strong>.</p>
<table>
<tr><th>Name</th><th>Age</th></tr>
<tr><td>Alice</td><td>28</td></tr>
</table>
`);
// ## Hello\n\nThis is a **test**.\n\n| Name | Age |\n| --- | --- |\n| Alice | 28 |
Static Site Generators: Production Patterns
Next.js with the unified Ecosystem
The unified / remark / rehype ecosystem processes Markdown at the AST level, enabling flexible transformation pipelines.
// lib/markdown.js
import { unified } from 'unified';
import remarkParse from 'remark-parse';
import remarkGfm from 'remark-gfm';
import remarkRehype from 'remark-rehype';
import rehypeSanitize from 'rehype-sanitize'; // XSS mitigation
import rehypeHighlight from 'rehype-highlight';
import rehypeStringify from 'rehype-stringify';
export async function markdownToHtml(markdown) {
const result = await unified()
.use(remarkParse)
.use(remarkGfm)
.use(remarkRehype)
.use(rehypeSanitize) // always include this
.use(rehypeHighlight)
.use(rehypeStringify)
.process(markdown);
return String(result);
}
// app/blog/[slug]/page.tsx
import { markdownToHtml } from '@/lib/markdown';
export default async function BlogPost({ params }) {
const post = await getPost(params.slug);
const content = await markdownToHtml(post.body);
return (
<article
dangerouslySetInnerHTML={{ __html: content }}
className="prose prose-neutral max-w-none"
/>
);
}
Front Matter Parsing
---
title: "Article Title"
date: "2026-04-15"
tags: ["Markdown", "HTML"]
---
# Body
Article content begins here.
import matter from 'gray-matter';
import { readFileSync } from 'fs';
const fileContent = readFileSync('post.md', 'utf-8');
const { data: frontmatter, content: body } = matter(fileContent);
console.log(frontmatter.title); // "Article Title"
console.log(frontmatter.date); // Date object or string depending on parser config
Best Practices
1. Declare Which Dialect You Use
Document whether your project uses CommonMark, GFM, or another variant. This prevents subtle rendering differences when multiple authors contribute.
2. Use a Linter
markdownlint enforces consistent Markdown style. A VS Code extension is available.
// .markdownlintrc.json
{
"MD013": false, // disable line length limit
"MD033": false, // allow inline HTML when needed
"MD041": true // require h1 at start of file
}
3. Always Specify Language in Code Blocks
```javascript ← specify language
const x = 1;
const x = 1;
4. Write Alt Text for Images
 ← good
 ← missing alt text: accessibility and SEO issue
The WHATWG HTML Living Standard specifies that alt is a required attribute on img elements (decorative images use alt=""). Screenreaders rely on it.
5. Sanitize User-Generated Content — Without Exception
If users can write Markdown that gets rendered as HTML, XSS protection is mandatory. No sanitization library is perfect, but DOMPurify and rehype-sanitize are actively maintained and widely audited.
Summary
| Topic | Recommendation |
|---|---|
| Base spec | CommonMark |
| GitHub / developer content | GFM (CommonMark + tables, task lists, etc.) |
| Markdown → HTML | marked or unified; always sanitize |
| User-generated content | DOMPurify (client) or rehype-sanitize (server) |
| HTML → Markdown | turndown (understand information loss) |
| Static sites | unified ecosystem (remark + rehype) |
Markdown's strength is its balance between ease of authorship and expressive output. Understanding the CommonMark/GFM distinction and treating sanitization as non-negotiable ensures safe, consistent content handling across any platform.
References
- CommonMark Spec — MacFarlane et al.
- GitHub Flavored Markdown Spec
- WHATWG HTML Living Standard — img element,
altattribute - DOMPurify GitHub — Mario Heiderich / Cure53
- unified / remark / rehype ecosystem
- Daring Fireball: Markdown — John Gruber, 2004
