Quantcast
Channel: What is the difference between UTF-8 and Unicode? - Stack Overflow
Viewing all articles
Browse latest Browse all 22

Answer by InGeek for What is the difference between UTF-8 and Unicode?

$
0
0

This article explains all the details http://kunststube.net/encoding/

WRITING TO BUFFER

if you write to a 4 byte buffer, symbol with UTF8 encoding, your binary will look like this:

00000000 11100011 10000001 10000010

if you write to a 4 byte buffer, symbol with UTF16 encoding, your binary will look like this:

00000000 00000000 00110000 01000010

As you can see, depending on what language you would use in your content this will effect your memory accordingly.

e.g. For this particular symbol: UTF16 encoding is more efficient since we have 2 spare bytes to use for the next symbol. But it doesn't mean that you must use UTF16 for Japan alphabet.

READING FROM BUFFER

Now if you want to read the above bytes, you have to know in what encoding it was written to and decode it back correctly.

e.g. If you decode this :00000000 11100011 10000001 10000010into UTF16 encoding, you will end up with not

Note: Encoding and Unicode are two different things. Unicode is the big (table) with each symbol mapped to a unique code point. e.g. symbol (letter) has a (code point): 30 42 (hex). Encoding on the other hand, is an algorithm that converts symbols to more appropriate way, when storing to hardware.

30 42 (hex) - > UTF8 encoding - > E3 81 82 (hex), which is above result in binary.30 42 (hex) - > UTF16 encoding - > 30 42 (hex), which is above result in binary.

enter image description here


Viewing all articles
Browse latest Browse all 22

Trending Articles