Let me use an example to illustrate this topic:
A Chinese character: 汉its Unicode value: U+6C49convert 6C49 to binary: 01101100 01001001
Nothing magical so far, it's very simple. Now, let's say we decide to store this character on our hard drive. To do that, we need to store the character in binary format. We can simply store it as is '01101100 01001001'. Done!
But wait a minute, is '01101100 01001001' one character or two characters? You knew this is one character because I told you, but when a computer reads it, it has no idea. So we need some sort of encoding to tell the computer to treat it as one.
This is where the rules of UTF-8 come in: https://www.fileformat.info/info/unicode/utf8.htm
Binary format of bytes in sequence1st Byte 2nd Byte 3rd Byte 4th Byte Number of Free Bits Maximum Expressible Unicode Value0xxxxxxx 7 007F hex (127)110xxxxx 10xxxxxx (5+6)=11 07FF hex (2047)1110xxxx 10xxxxxx 10xxxxxx (4+6+6)=16 FFFF hex (65535)11110xxx 10xxxxxx 10xxxxxx 10xxxxxx (3+6+6+6)=21 10FFFF hex (1,114,111)
According to the table above, if we want to store this character using the UTF-8 format, we need to prefix our character with some 'headers'. Our Chinese character is 16 bits long (count the binary value yourself), so we will use the format on row 3 above as it provides enough space:
Header Place holder Fill in our Binary Result 1110 xxxx 0110 1110011010 xxxxxx 110001 1011000110 xxxxxx 001001 10001001
Writing out the result in one line:
11100110 10110001 10001001
This is the UTF-8 binary value of the Chinese character! See for yourself: https://www.fileformat.info/info/unicode/char/6c49/index.htm
Summary
A Chinese character: 汉its Unicode value: U+6C49convert 6C49 to binary: 01101100 01001001encode 6C49 as UTF-8: 11100110 10110001 10001001
P.S. If you want to learn this topic in Python, click here.