URL 编码 vs HTML 编码：有什么区别？

引言

URL 编码和 HTML 编码是 Web 开发中两种基础的编码技术。虽然它们的目的相似——安全地表示特殊字符——但它们运行在完全不同的上下文中，有着不同的规则。混淆两者可能导致链接失效、渲染异常，甚至引发 XSS（跨站脚本攻击）等安全漏洞。

什么是 URL 编码？

URL 编码（也称为百分号编码）将字符转换为可以在 URL 中安全传输的格式。由于 URL 只能包含有限的 ASCII 字符集，任何超出此范围的字符都必须编码为 %XX 格式，其中 XX 是字符的十六进制 ASCII 值。

例如，空格字符变为 %20，& 符号变为 %26。

// JavaScript 中的 URL 编码
const url = 'https://example.com/search?q=hello world&lang=en';
const encoded = encodeURI(url);
console.log(encoded);
// "https://example.com/search?q=hello%20world&lang=en"

// 编码查询参数值
const query = 'price < $100 & free shipping';
const encodedQuery = encodeURIComponent(query);
console.log(encodedQuery);
// "price%20%3C%20%24100%20%26%20free%20shipping"

encodeURI vs encodeURIComponent

JavaScript 提供了两个 URL 编码函数，理解它们的区别至关重要：

函数	编码范围	不编码的字符	使用场景
`encodeURI`	大多数特殊字符	`:/?#[]@!$&'()*+,;=`	完整 URL
`encodeURIComponent`	所有特殊字符	`-_.!~*'()`	查询参数值

// encodeURI 保留 URL 结构字符
encodeURI('https://example.com/path?query=value&other=123');
// "https://example.com/path?query=value&other=123"

// encodeURIComponent 编码所有可能破坏 URL 的字符
encodeURIComponent('https://example.com/path?query=value&other=123');
// "https%3A%2F%2Fexample.com%2Fpath%3Fquery%3Dvalue%26other%3D123"

当你有一个完整 URL 时使用 encodeURI，编码单独的参数值时使用 encodeURIComponent。

试试 CodeKit 上的 URL 编解码工具来即时编码或解码 URL。

什么是 HTML 编码？

HTML 编码（也称为 HTML 实体编码）将特殊字符转换为 HTML 实体，使其可以在 HTML 文档中安全显示。如果不编码，<、> 和 & 等字符会被解释为 HTML 标记而非文本内容。

常见的 HTML 实体：

字符	实体名称	实体编号
`<`	`<`	`<`
`>`	`>`	`>`
`&`	`&`	`&`
`"`	`"`	`"`
`'`	`'`	`'`
`©`	`©`	`©`

// JavaScript 中的 HTML 编码
function htmlEncode(str) {
  const div = document.createElement('div');
  div.textContent = str;
  return div.innerHTML;
}

console.log(htmlEncode('<script>alert("xss")</script>'));
// "&lt;script&gt;alert(&quot;xss&quot;)&lt;/script&gt;"

// HTML 解码
function htmlDecode(str) {
  const div = document.createElement('div');
  div.innerHTML = str;
  return div.textContent;
}

使用 CodeKit 上的 HTML 编解码工具来快速进行 HTML 实体编码和解码。

核心区别

方面	URL 编码	HTML 编码
上下文	URL 和查询字符串	HTML 文档内容
格式	`%XX`（百分号+十六进制）	`&name;` 或 `&#NNN;`
主要目标	安全的 URL 传输	安全的 HTML 渲染
空格处理	`%20` 或 `+`	` ` 或保持空格
安全作用	防止 URL 注入	防止 XSS 攻击

何时使用哪种编码

使用 URL 编码的场景：

构建包含用户输入的查询字符串
动态构建 URL
在 HTTP 请求中发送数据
编码 URL 中的路径段

// 正确：URL 编码查询参数中的用户输入
const userInput = 'coffee & tea';
const url = `https://api.example.com/search?q=${encodeURIComponent(userInput)}`;
// "https://api.example.com/search?q=coffee%20%26%20tea"

使用 HTML 编码的场景：

在网页上显示用户生成的内容
渲染可能包含 HTML 字符的文本
在服务端渲染模板中防止 XSS
在 HTML 中展示代码示例

// 正确：显示前先 HTML 编码用户输入
const comment = 'I love <b>bold</b> text & more!';
const safe = htmlEncode(comment);
// "I love &lt;b&gt;bold&lt;/b&gt; text &amp; more!"
document.getElementById('output').innerHTML = safe;

常见错误

1. 双重编码

对已编码的数据再次编码会导致输出异常：

// 错误：双重编码
const encoded = encodeURIComponent('hello world'); // "hello%20world"
const doubleEncoded = encodeURIComponent(encoded); // "hello%2520world" — 出错！

// 正确：只编码一次
const correct = encodeURIComponent('hello world'); // "hello%20world"

2. 使用错误的编码类型

// 错误：用 HTML 编码处理 URL
const url = htmlEncode('https://example.com?q=a&b=c');
// 在 href 属性中无法正常工作

// 错误：用 URL 编码处理 HTML 内容
const html = encodeURIComponent('<div>Hello</div>');
// 在浏览器中无法正确渲染

3. 忘记编码用户输入

这是最危险的错误——它为注入攻击打开了大门：

// 危险：未编码的用户输入
element.innerHTML = `你好，${userInput}！`; // XSS 漏洞！

// 安全：先进行 HTML 编码
element.innerHTML = `你好，${htmlEncode(userInput)}！`;

总结

URL 编码和 HTML 编码在 Web 开发中各有其用。URL 编码确保数据可以安全地在 URL 中传输，HTML 编码确保数据可以在网页中安全显示。在正确的上下文中使用正确的编码，对于功能和安全都至关重要。

需要编码或解码 URL 或 HTML 实体？试试 CodeKit 上的 URL 编码器和 HTML 编码器工具——所有处理都在浏览器本地完成。