![]() |
VOOZH | about |
On November 8, 2026, Thariq Shihipar, engineering lead for Claude Code at Anthropic, published a long article on X with a provocative title: "Using Claude Code: The Unreasonable Effectiveness of HTML." In the first 16 hours, the post racked up more than 4.4 million views, 8,200 likes, and 15,700 bookmarks, sparking a heated debate on Hacker News, Threads, and LinkedIn. The thesis is simple and counterintuitive: for daily Claude Code users, the old love affair with Markdown carries a hidden cost, and HTML is the format Anthropic itself is adopting as the internal default for plans, code reviews, design systems, and reports.
Thariq Shihipar is not an outside evangelist. He is the engineering lead for Claude Code at Anthropic, the company behind the Opus 4.7 model and the agentic CLI of the same name. Before joining Anthropic, Shihipar co-founded a startup in the Y Combinator W20 batch, passed through Southpark Commons, and worked at the MIT Media Lab. His view, then, reflects the internal practices of a team that uses Claude Code in daily production to build Claude Code itself, not a lab experiment. In March 2026, Shihipar had already published a guide on Claude Code Skills, based on hundreds of skills used internally, which remains an industry reference.
The viewpoint counts double for two reasons. First, those who lead an AI product team tend to see, before everyone else, when a convention stops working. Second, choosing the output format for an agent is not an aesthetic preference, but a decision that affects how much work developers can actually verify and reuse.
Markdown won the LLM format war because it is simple, portable, and predates the rise of agents. Shihipar does not deny it. He argues, however, that agents grew faster than the format, and that today Markdown caps the value of their output. His argument breaks down into five points, summarized below before we dig into each.
Alongside these five points, Shihipar highlights a sixth operational advantage: Claude Code ingests context from many sources (filesystem, MCP servers like Slack or Linear, browser via Claude in Chrome, git history). Turning that context into an HTML artifact is almost always more expressive than producing a Markdown summary.
The most tangible difference is the information density a format can carry per line of text. Markdown handles paragraphs, lists, links, code blocks, and simple tables well. It stops there. Everything else, from diagrams to charts to margin notes to UI component states, requires workarounds like ASCII art, references to external images, or actual HTML blocks embedded in Markdown, losing the portability that was its strength.
HTML, by contrast, can render in a single self-contained file: complex tables with sticky headers, vector illustrations in SVG, executable code snippets inside script tags, JavaScript interactions tied to the DOM, spatial layouts with absolute positioning or canvas, inline images in base64, and stylesheets scoped to the document. Shihipar shows a screenshot in which Claude Code, forced to "represent colors" in Markdown, tries to approximate them with Unicode characters, a solution as creative as it is symptomatic of the format's limits.
This density is not an aesthetic whim. When a developer needs to vet the implementation plan for a complex feature, see two mockup variants side by side with different palettes, or quickly grasp where a PR shifts the data flow, every extra visual element trims the time it takes to form an informed opinion. The savings are measurable in minutes per task, and they multiply across weekly task counts.
Shihipar's second observation is both statistical and personal: nobody really reads a Markdown file longer than a hundred lines, himself included. As an engineer, he acknowledges and states it openly. The problem is that Claude, since it began operating with the 1 million token context window introduced with Opus 4.7, regularly produces implementation plans and technical specs running to many hundreds of lines.
An HTML document survives that length because the model can structure internal navigation with tabs, side tables of contents, collapsible sections, and responsive layouts. On the desktop, readers flip through eight sections. On the phone, the document reorganizes itself for the vertical display, and in no case does the reader have to scroll through an undifferentiated wall of text. Claude Code's May 2026 updates, from the plugin marketplace to multi-worktree support, have made long, complex tasks more frequent: the output format must be able to carry that complexity.
Markdown has a practical problem that anyone working on a team discovers quickly: browsers don't render it natively well. A .md file shared by email or message requires either a proper editor, manual conversion, or an attachment. HTML, by contrast, ships pre-rendered in any browser. Uploading the file to S3 or to a corporate CDN produces a link a coworker opens with a click, from any device, with no prerequisites.
Shihipar notes that the odds someone actually reads the spec, report, or PR writeup rise nonlinearly when the format is HTML. It's the same logic by which a well-laid-out article gets read in full more often than a draft in monospace font. For anyone leading a distributed team, that's a hard argument to dismiss.
The fourth advantage is what Shihipar calls "two-way interaction." HTML is not just an output format, it's also an input format. With a few lines of JavaScript, Claude Code can produce a document containing sliders to vary an animation parameter, knobs to tune a button's easing curve, and buttons that turn the current UI state into a prompt to paste into a new Claude session.
This interaction changes the nature of the workflow. The developer no longer receives a static artifact to approve or reject, but a small, custom tool for exploring the decision space. The pattern is similar to the one described in Decoupling Brain from Hands for agent architectures, but applied to a single output document. Logic stays in the model, while the exploration interface is delivered to the user in a manipulable format.
To prove the thesis, Shihipar published a companion site with twenty self-contained HTML files, all generated by Claude Code, each one illustrating a real use case. The page is publicly available at thariqs.github.io/html-effectiveness. The categories cover nine areas worth a closer look, because they account for a large share of the real work of a modern software engineer.
Three examples: a side-by-side comparison of three different approaches to the same problem, mockups of four visual directions for an empty-state UI, and a complete implementation plan with timeline, data-flow diagrams, and risk table. This is the use case Shihipar suggests starting with, because the difference from a Markdown plan is immediately visible.
Three examples that cover the moment code has to be understood by someone else. A PR review with the diff rendered directly, margin annotations, and color-coded severity tags; a PR writeup with motivation, before/after, and a file-by-file tour; and a module map that visualizes the package as a box-and-arrow diagram. Shihipar says he attaches an HTML explanation file to every PR he opens, replacing the default GitHub diff view for the trickier passages.
Four examples spanning living design systems (with color tokens, typography, and spacing as clickable swatches), component variants with all states and intents, animation sandboxes with duration and easing sliders, and clickable flows of four linked screens. The HTML design system is especially interesting because it works as a single reference file for other outputs, ensuring visual consistency across documents generated in different sessions.
Two examples: a sheet of SVG figures to drop into blog posts, and an annotated flowchart of a deploy pipeline with clickable steps. The same pattern applies to architecture diagrams, sequence diagrams, and data hierarchies. It's an alternative to Mermaid or external tools, with the upside that the model can iterate on the output without leaving the editor.
A keyboard-navigable slide deck, a feature explainer with a TL;DR, collapsible request-path steps, and configuration snippets in tabs, plus a concept explainer with a live visualization and glossary. The HTML deck is a strong alternative to PowerPoint or Keynote for internal technical presentations: zero installs, shareable by link, editable in a few prompts.
Weekly status reports with charts, incident post-mortems with minute-by-minute timelines, and three single-purpose editors: a drag-and-drop board for ticket triage, a feature-flag editor with warnings and diff export, and a prompt tuner with live re-rendering. This last category best illustrates the "throwaway UI" philosophy: Claude Code is not building a product, it's building a small, single-use tool for a single task.
The Hacker News thread on the article topped a thousand points, a sign of a polarizing debate. The three most upvoted critiques deserve a hearing before adopting the approach blindly.
The first critique is editorial: HTML makes human co-authoring hard. A heavily upvoted comment, signed by tmhrtly, sums it up: "if it is a spec sheet of something complex, I want to be able to go in and edit what was produced. With an HTML document, that is much harder than with a Markdown one." It's a fair point. Markdown can be edited in any text editor, while serious HTML requires dedicated tools or front-end skills.
The second critique concerns tokens and lock-in. HTML is significantly less token-efficient than Markdown. The comment from ryandsilva notes that this could benefit Anthropic's ecosystem, nudging users toward Claude's proprietary tools. Shihipar counters that with the 1 million token window of Opus 4.7, the extra cost is not noticeable in context, but it remains a real monetary cost for those paying by the token.
The third critique is ironic: the article arguing that Markdown is unreadable beyond 100 lines is itself a long article, written in elaborate HTML. Planktonne observes that the real issue is not the format, it's bloated content. A point that applies to every discussion of document length and format.
The token critique deserves empirical verification, not blanket defense. For a repeatable test, I asked Claude Opus 4.7 for the same code review writeup (a PR with 280 modified lines, 4 files touched, 3 findings of mixed severity) in three distinct formats. Tokens counted with the official Anthropic tokenizer, independent sessions to avoid caching effects.
| Output format | Output tokens | % of 1MM window | Opus output cost |
|---|---|---|---|
| Plain Markdown | ~1,140 | 0.11% | $0.017 |
| Lean semantic HTML | ~2,760 | 0.28% | $0.041 |
| Full HTML (inline CSS, rendered diff, severity badges) | ~5,480 | 0.55% | $0.082 |
Read in absolute terms, the numbers deflate the criticism. The full HTML version takes up just over half a percentage point of Opus 4.7's context window and costs eight cents in output. For Pro or Max subscribers, the cost is already included in the flat fee. For pay-as-you-go API users, the monthly difference across thirty complex artifacts stays under three dollars. The cost worth tracking is not the price, it is generation time: a rich HTML takes five times the seconds of an equivalent MD, and in conversational flows that latency is felt.
The most technical critique of HTML, the one about unreadable Git diffs, gets resolved through a clean separation pattern. The idea is simple: the HTML template lives in one file, the data that varies between versions in a second JSON file, and rendering happens at open time in the browser through a few lines of JavaScript. The diff becomes clean because it only modifies the payload, leaving the structure intact.
<script type="application/json" id="data">
{ "title": "Audit Q2", "rows": [...] }
</script>
<template id="row">
<tr><td class="k"></td><td class="v"></td></tr>
</template>
<script>
const d = JSON.parse(document.getElementById('data').textContent);
</script>
The operational advantage is twofold. First, Git review focuses on data, exactly as you would do with a YAML config. Second, the same template serves to generate a hundred different reports just by swapping the JSON block, and Claude can iterate on data without rewriting the structure.
While Anthropic debates internally whether to make HTML the default, the community has already started packaging the approach into reusable skills. The project dogum/html-artifacts (3 GitHub stars at the time of publication, Apache 2.0 license) implements a skill that teaches Claude to recognize when a request benefits from a self-contained HTML artifact instead of a Markdown response.
The skill's structure is exemplary for anyone who wants to understand how Claude Code Skills work in production. The main SKILL.md contains a recognition heuristic and a set of explicit carve-outs for cases where Markdown remains appropriate (short conversations, code snippets, genuinely concise content). Eight reference files dedicated to single use categories load selectively when needed, keeping context consumption low. It's the same pattern described in the workflow for turning NotebookLM sources into permanent Claude skills, applied to a different domain.
In his article, Shihipar advises against turning the idea into a skill too early. His recommendation is iterative prompts like "make me an HTML file" or "make me an HTML artifact," and only building a skill after gaining concrete experience with which cases work and which don't.
Early months of experience have led the community to coalesce around a set of constraints to specify explicitly in the prompt, to prevent Claude from generating HTML that works today and breaks in six months. The guiding principle is single-file architecture: one HTML file, zero external dependencies, autonomous forever.
<style> in the <head>, never linked external sheets or CDNs<script>, no imports from unpkg or jsDelivrsrc attribute, or inline SVG, never remote URLsThese five constraints, written once in an html-rules.md file and referenced in CLAUDE.md or in a personal skill, ensure that artifacts generated today remain readable two years from now even if CDNs disappear or libraries break in major versions. Same principle by which a PDF stays openable for decades: self-sufficiency matters more than technical elegance.
An aspect the Hacker News debate touched without going deep concerns security. HTML artifacts generated by Claude Code can contain JavaScript, and that JavaScript executes when the file opens in the browser. For documents generated on innocuous tasks, the risk is theoretical. It becomes concrete in three operational scenarios.
First is refeed: opening a new Claude Code session and passing it the previous HTML file as context. If that file contains hidden instructions in comments or data attributes, the agent ingests them as part of the prompt. The model is trained to ignore this kind of injection, but the risk grows when the file passes hand to hand without review.
The second scenario concerns shared reports: uploading HTML to an authenticated corporate domain means executing arbitrary code in a context with access to that domain's cookies. Standard mitigation is to serve artifacts from a separate sandbox subdomain, or force the browser to render them without JavaScript via Content-Security-Policy: script-src 'none'.
The third is subtlest: HTML used as input to downstream pipelines (parsers, scrapers, RAG) inherits the trust you would give to Markdown. If HTML comes from an agent, treat it as untrusted input, sanitize and validate before entering any automatic flow. Operational reference is the threat model published by the Anthropic team in Best practices for skills: clean separation between generated content and execution channel.
For those who want to test the approach without sinking time into a custom skill, here are three prompts adapted from the examples Shihipar published, tailored to common scenarios.
I haven't decided on the cut for the onboarding screen yet. Generate 6 markedly different approaches, vary layout, tone and information density. Show them in a single HTML file in a grid, so I can compare them side by side. Label each one with the trade-off it is making.
Help me review this PR by creating an HTML artifact that describes it. I am not familiar with the streaming and backpressure logic: focus the analysis there. Render the actual diff with inline margin annotations, color-code findings by severity and add whatever you need to convey the concept clearly.
I do not really understand how our rate limiter works. Read the relevant code and produce an HTML explainer page: a token-bucket flow diagram, the 3-4 key code snippets annotated, and a "gotchas" section at the bottom. Optimize it for a one-time read.
The practical rule emerging in teams that adopt HTML seriously is the companion pattern. For every significant feature, the PR contains two artifacts: feature.md, a versioned source readable in Git diff and editable by anyone, and feature.html, the visual execution of the document, generated by Claude reading the MD and enriching it with tables, diagrams, and interactive flows. The first is source of truth for technical review, the second is the vehicle for non-technical stakeholders.
The git flow becomes: edit the MD, regenerate the HTML with a claude regen feature command, commit both. Git review stays clean because the MD diff is readable, the HTML is a noisy artifact but not what gets looked at in code review. Solves the editorial dilemma raised in the Hacker News thread without sacrificing the format's narrative advantages.
A use case Shihipar's examples don't directly cover, but with immediate impact for any consulting or professional services firm, is generating documents destined for both digital and paper channels. Technical audit reports, articulated proposals for public tenders, contract specifications, due diligence for mergers and acquisitions: all documents that live first as PDF email attachments and then as printed signed copies.
The HTML pattern with dedicated @media print resolves the dual-purpose in a single file. The same document opens fluidly on desktop with scrollable tables and interactive links, and produces a print-ready A4 PDF with header, footer, page numbering, and controlled breaks when you launch print or export from Chrome with Ctrl+P. Difference with a Word-generated PDF is that the HTML source remains editable via prompt, and Claude can regenerate the document after each brief change without restarting layout from scratch.
After weeks of testing on real projects, my position is clear on one point: the value of HTML grows with task complexity and with the number of people involved in the review. For a solo developer working on a few-line fix, Markdown remains faster and easier to edit. For a distributed team that has to align designers, backend engineers, and product managers on a fifteen-file feature, the HTML artifact is a superior communication tool, and the extra token cost is more than offset by the drop in clarification rounds.
The sticking point is co-authoring. Markdown edits anywhere, while serious HTML requires front-end skills or a round-trip through Claude Code for changes. For my workflows on Claude skills for mobile development and on technical assessment quotes, I'm adopting a practical rule: HTML when the output is meant to be read by others, Markdown when I'll edit it myself several times before delivery. It's a thin line, but it holds in practice.
Shihipar's enthusiasm should not obscure the contexts where Markdown remains clearly superior, and where forcing HTML produces only overhead. An honest article on the topic must list them, because anyone adopting the format blindly ends up paying costs without real benefits.
The operational rule summarizing these carve-outs is simple: HTML when the document has a third-party reader who will not modify it, Markdown when the document is collaborative, indexed, or destined to be consumed by automatic pipelines.
The debate on agent output format touches territory the community has explored over the last five years with different tools. Worth positioning HTML against the two most mature alternatives: computational notebooks (Jupyter, Observable, Marimo) and hybrid formats (MDX, Astro Markdoc, AsciiDoc).
Notebooks offer maximum interactivity, because they execute code live and show runtime output. The price is the kernel: whoever opens the file needs a Python environment, a Jupyter server, or an Observable connection. To share an implementation plan with a product manager, the kernel is a barrier. Static HTML wins for portability.
Hybrid formats (MDX in particular) combine narrative Markdown with inline React components. They are superior to raw HTML for documentation portals and static sites, but require a build toolchain (Vite, Astro, Next.js) to produce final output. For throwaway artifacts generated by an agent, the toolchain is overhead. Raw Claude HTML wins for zero-toolchain and zero-deploy.
The pattern emerging in teams that work heavily with AI is durability stratification: raw HTML for artifacts that live hours or days, MDX in repository for product documentation that lives months, notebooks for interactive data exploration when the audience has the environment. Not alternatives in conflict, different layers of the same workflow.
No. Shihipar himself advises against it for early attempts. Just write in the prompt "create an HTML file" or "create an HTML artifact" and specify what it should contain. Skills come later, once you've figured out which HTML patterns work for your recurring use cases.
Between 2 and 4 times as much in generation time. Cost varies by complexity (an implementation plan with SVG and tables consumes far more than a text explainer). With the 1M token window of Opus 4.7, usage doesn't saturate the context, but it remains a real expense if you pay by the token.
By opening it in a browser. On Mac, just type open plan.html after Claude Code writes it; on Linux, xdg-open. To share with a coworker, upload it to S3, Cloudflare R2, or any corporate CDN, and send the link. GitHub Pages also works if the file is public.
This is the real weak point of the approach. HTML diffs are noisy and hard to review in classic code reviews. For files headed to the repository, it's better to keep Markdown as the source, or to accept that the HTML review happens visually on the rendered artifact rather than on the git diff.
Create a design-system.html file that captures the brand tokens (colors, typography, base components) and use it as a reference in subsequent prompts. Claude Code, instructed to read it, will keep visual consistency across documents generated in different sessions, even weeks apart.
Yes, if there are at least three stakeholders. The probability that the document is actually read, according to Shihipar's reported experience, rises nonlinearly when the format is HTML. For a personal memo or notes meant for yourself, Markdown remains more practical.
Only if you ask explicitly. AI-generated HTML, left to itself, often lacks ARIA attributes, descriptive alt text, and consistent tab order. Markdown has free baseline accessibility because standard converters always produce semantic headings and mandatory image alts. For HTML artifacts compliant with WCAG 2.2 AA, add a clause to the prompt: "respect WCAG 2.2 AA, descriptive alt text, minimum 4.5:1 color contrast, logical focus order". Without that explicit constraint, the risk of delivering a document that excludes screen reader users is high.
Yes, but with two caveats. First, HTML costs many more input tokens than an MD summary of the same content, so if the file is long it pays to have an MD synthesis regenerated before refeeding. Second, security: if the file comes from an uncontrolled source, instructions hidden in comments or attributes can be interpreted as prompts. For artifacts generated in the current session, the risk is null. For files downloaded from third parties, strip comments and scripts before refeed.
For data exploration, notebooks win. Jupyter, Observable, and Marimo execute code live, maintain variable state across cells, and let you iterate on data without regenerating the whole document. Static HTML is the final delivery, not the investigation tool. Recommended pattern: use the notebook to explore and build the analysis, ask Claude to generate a standalone HTML report of results for the final stakeholder presentation.
A historical perspective clarifies the debate better than any benchmark. In 2012 the New York Times published Snow Fall, a longform feature integrating text, video, interactive maps, and parallax scrolling in a single web document. It was hailed as spectacular but unrepeatable: producing it took a team of six for two weeks. Ten years later any newsroom has pipelines producing interactive articles in a single day. The leap in quality did not come from content, it came from industrialization of the medium. AI agent HTML outputs are walking the same arc: today they are remarkable artifacts, in a few years they will be the silent baseline of any deliverable that goes beyond three readers.
HTML as the default in Claude Code is not an aesthetic move, it's an operational decision. For those who lead distributed teams, build complex skills on the Claude Code harness, or publish artifacts meant to be read more than edited, Thariq Shihipar's approach delivers a tangible gain in clarity and adoption.
The next step is to try the three starter prompts published in this guide. A test session on a real task says far more than a thousand articles on the topic. To stay current on Anthropic's internal practices, it's worth following Shihipar's official account and Karpathy's reference CLAUDE.md notes on GitHub trending.
If you have technical questions, prompt examples you'd like to see analyzed, or specific use cases for your company, send a message through the contact form at the bottom of this page. I respond within 48 hours with an analysis tailored to your real workflow.
Subscribe to the newsletter to receive new articles directly in your inbox.
Subscribe to the newsletter to receive new articles directly in your inbox.
3.4k readers worldwide, every Saturday