My explanation, after reading numerous posts and articles about this topic:

1 - The Unicode Character Table

"Unicode" is a giant table, that is 21bits wide, these 21bits provide room for 1,114,112 code points / values / fields / places to store characters in.

Out of those 1,114,112 code points, 1,111,998 are able to store Unicode characters, because there are 2048 code points reserved as surrogates, and 66 code points reserved as non-characters. So, there are 1,111,998 code points that can store a unique character, symbol, emoji and etc.

However, as of now, only 144,697 out of those 1,114,112 code points, have been used. These 144,697 code points contain characters that cover all of the languages, as well as symbols, emojis and etc.

Each character in the "Unicode" is assigned to a specific code point aka has a specific value / Unicode number. For Example the character "❤", uses exactly one code point out of the 1,114,112 code points. It has the value (aka Unicode number) of "U+2764". This is a hexadecimal code point consisting of two bytes, which in binary is represented as 00100111 01100100. But to represent this code point, UTF-8 encoding uses 3 bytes (24 bits), which is represented in binary as 11100010 10011101 10100100 (without the two empty space characters, each of which is using 1 bit, and I have added them for visual purposes only, in order to make the 24bits more readable, so please ignore them).

Now, how is our computer supposed to know if those 3 bytes "11100010 10011101 10100100" are to be read separate or together? If those 3 bytes are read separate and then converted to characters the result would be "Ô, Ø, ñ", which is quite the difference compared to our heart emoji "❤".

2 - Encoding Standards (UTF-8, ISO-8859, Windows-1251 and etc)

In order to solve this problem people have invented the Encoding Standards.The most popular one being UTF-8, since 2008. UTF-8 accounts for an average of 97.6% of all web pages, that is why we will UTF-8, for the example below.

2.1 - What is Encoding?

Encoding, simply said means to to convert something, from one thing to another. In our case we are converting data, more specifically bytes to the UTF-8 format, I would also like to rephrase that sentence as: "converting bytes to UTF-8 bytes", although it might not be technically correct.

2.2 Some information about the UTF-8 format, and why it's so important

UTF-8 uses a minimum of 1 bytes to store a character and a maximum of 4 bytes. Thanks to the UTF-8 format we can have characters which take more than 1 byte of information.

This is very important, because if it was not for the UTF-8 format, we would not be able to have such a vast diversity of alphabets, since the letters of some alphabets can't fit into 1 byte, We also wouldn't have emojis at all, since each one requires at least 3 bytes. I am pretty sure you got the point by now, so let's continue forward.

2.3 Example of Encoding a Chinese character to UTF-8

Now, lets say we have the Chinese character "汉".

This character takes exactly 16 binary bits "01101100 01001001", thus as we discussed above, we can not read this character, unless we encode it to UTF-8, because the computer will have no way of knowing, if these 2 bytes are to be read separately or together.

Converting this "汉" character's 2 bytes into, as I like to call it UTF-8 bytes, will result in the following:

(Normal Bytes) "01101100 01001001" -> (UTF-8 Encoded Bytes) "11100110 10110001 10001001"

Now, how did we end up with 3 bytes instead of 2? How is that supposed to be UTF-8 Encoding, turning 2 bytes into 3?

In order to explain how the UTF-8 encoding works, I am going to literally copy the reply of @MatthiasBraun, a big shoutout to him for his terrific explanation.

2.4 How does the UTF-8 encoding actually work?

What we have here is the template for Encoding bytes to UTF-8. This is how Encoding happens, pretty exciting if you ask me!

Now, take a good look at the table below and then we are going to go through it together.

        Binary format of bytes in sequence:        1st Byte    2nd Byte    3rd Byte    4th Byte    Number of Free Bits   Maximum Expressible Unicode Value        0xxxxxxx                                                7             007F hex (127)        110xxxxx    10xxxxxx                                (5+6)=11          07FF hex (2047)        1110xxxx    10xxxxxx    10xxxxxx                  (4+6+6)=16          FFFF hex (65535)        11110xxx    10xxxxxx    10xxxxxx    10xxxxxx    (3+6+6+6)=21          10FFFF hex (1,114,111)

The "x" characters in the table above represent the number of "FreeBits", those bits are empty and we can write to them.
The other bits are reserved for the UTF-8 format, they are used asheaders / markers. Thanks to these headers, when the bytes are beingread using the UTF-8 encoding, the computer knows, which bytes to readtogether and which separately.
The byte size of your character, after being encoded using the UTF-8 format,depends on how many bits you need to write.
In our case the "汉" character is exactly 2 bytes or 16bits:
"01101100 01001001"
thus the size of our character after being encoded to UTF-8, will be 3 bytes or 24bits
"11100110 10110001 10001001"
because "3 UTF-8 bytes" have 16 Free Bits, which we can write to
Solution, step by step below:

2.5 Solution:

        Header  Place holder    Fill in our Binary   Result                 1110    xxxx            0110                 11100110        10      xxxxxx          110001               10110001        10      xxxxxx          001001               10001001

2.6 Summary:

        A Chinese character:      汉        its Unicode value:        U+6C49        convert 6C49 to binary:   01101100 01001001        encode 6C49 as UTF-8:     11100110 10110001 10001001

3 - The difference between UTF-8, UTF-16 and UTF-32

Original explanation of the difference between the UTF-8, UTF-16 and UTF-32 encodings: https://javarevisited.blogspot.com/2015/02/difference-between-utf-8-utf-16-and-utf.html

The main difference between UTF-8, UTF-16, and UTF-32 character encodings is how many bytes they require to represent a character in memory:

UTF-8 uses a minimum of 1 byte, but if the character is bigger, then it can use 2, 3 or 4 bytes. UTF-8 is also compatible with the ASCII table.

UTF-16 uses a minimum of 2 bytes. UTF-16 can not take 3 bytes, it can either take 2 or 4 bytes. UTF-16 is not compatible with the ASCII table.

UTF-32 always uses 4 bytes.

Remember: UTF-8 and UTF-16 are variable-length encodings, where UTF-8 can take 1 to 4 bytes, while UTF-16 will can take either 2 or 4 bytes. UTF-32 is a fixed-width encoding, it always takes 32 bits.

Answer by nightboy for What is the difference between UTF-8 and Unicode?

1 - The Unicode Character Table

2 - Encoding Standards (UTF-8, ISO-8859, Windows-1251 and etc)

2.1 - What is Encoding?

2.2 Some information about the UTF-8 format, and why it's so important

2.3 Example of Encoding a Chinese character to UTF-8

2.4 How does the UTF-8 encoding actually work?

2.5 Solution:

2.6 Summary:

3 - The difference between UTF-8, UTF-16 and UTF-32

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112