Why is <meta charset="utf-8"> important?

·

3 min read

I'm currently participating in the #100DaysOfCode challenge and documenting my journey on Twitter. So far, I've been reviewing the holy trifecta of web development: HTML, CSS, and JavaScript. On Day 4, I shared that one of the things I reviewed was the importance of including <meta charset="utf-8"> in an HTML file.

{% twitter 1316600310782140427 %}

I got a response asking to explain why. As I was typing my answer, I found that I had a lot to say to fit into one tweet, and it would be easier to write up a blog post.

What is <meta charset="utf-8">?

Let's break down the line <meta charset="utf-8"> to derive its meaning:

  • <meta> is a HTML tag that contains metadata about a web page, or more specifically, descriptors that tell search engines what type of content a web page contains that is hidden from display.
  • charset is an HTML attribute that defines the character encoding for your browser to use when displaying the website content.
  • utf-8 is a specific character encoding.

In other words, <meta charset="utf-8"> tells the browser to use the utf-8 character encoding when translating machine code into human-readable text and vice versa to be displayed in the browser.

Why 'utf-8'?

Today, more than 90% of all websites use UTF-8. Before TF-8 became the standard, ASCII was used. Unfortunately, ASCII only encodes English characters, so if you used other languages whose alphabet does not consist of English characters, the text wouldn't be properly displayed on your screen.

For example, say I wanted to display some Arabic text that says "Hello World!" on a screen using the following snippet of code with the charset set equal to ascii:

<!DOCTYPE html>
<html>
<head>
  <meta charset="ascii"> <!-- char encoding is set equal to ASCII -->
</head>
<body>
  <h1>!مرحبا بالعالم</h1>
</body>
</html>

Now, if you go to your browser, you'll see that the text is displayed as gibberish 🥴: "Hello World!" in Arabic using ASCII charset

However, if we change the charset to utf-8, the code is as follows:

<!DOCTYPE html>
<html>
<head>
  <meta charset="utf-8"> <!-- char encoding is set equal to UTF-8 -->
</head>
<body>
  <h1>!مرحبا بالعالم</h1>
</body>
</html>

The text is now displayed properly 🥳:

"Hello World!" in Arabic using UTF-8 charset

Thus, UTF-8 was created to address ASCII's shortcomings and can translate almost every language in the world. Because of this and its backward compatibility with ASCII, almost all browsers support UTF-8.

What if I forget to include <meta charset="utf-8"> in my HTML file?

Don't worry — HTML5 to the rescue! 🦸

The default character encoding used in HTML5 is UTF-8. This means if you include <!DOCTYPE html> at the top of your HTML file (which declares that it's an HTML5 file), it'll automatically use UTF-8 unless specified otherwise.

Furthermore, most browsers use UTF-8 by default if no character encoding is specified. But because that's not guaranteed, it's better to just include a character encoding specification using the <meta> tag in your HTML file.

There you have it. 🎉 Feel free to leave any comments or thoughts below. If you want to follow my #100DaysOfCode journey, follow me on Twitter at @maggiecodes_. Happy coding!