Quantcast
Channel: What is the difference between UTF-8 and Unicode? - Stack Overflow
Viewing all articles
Browse latest Browse all 22

Answer by Dimos for What is the difference between UTF-8 and Unicode?

$
0
0

They are the same thing, aren't they?

No, they aren't.


I think the first sentence of the Wikipedia page you referenced gives a nice, brief summary:

UTF-8 is a variable width character encoding capable of encoding all 1,112,064 valid code points in Unicode using one to four 8-bit bytes.

To elaborate:

  • Unicode is a standard, which defines a map from characters to numbers, the so-called code points, (like in the example below). For the full mapping, you can have a look here.

    ! -> U+0021 (21),  " -> U+0022 (22),  \# -> U+0023 (23)
  • UTF-8 is one of the ways to encode these code points in a form a computer can understand, aka bits. In other words, it's a way/algorithm to convert each of those code points to a sequence of bits or convert a sequence of bits to the equivalent code points. Note that there are a lot of alternative encodings for Unicode.


Joel gives a really nice explanation and an overview of the history here.


Viewing all articles
Browse latest Browse all 22

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>