Page History
...
Unicode is preferred to ASCII because it permits the inclusion of accents, scientific symbols and characters used in
Gloss | ||
---|---|---|
|
Gloss | ||
---|---|---|
|
Gloss | ||
---|---|---|
|
Anchor | ||||
---|---|---|---|---|
|
...
Anchor | ||||
---|---|---|---|---|
|
Characters in the
Gloss | ||
---|---|---|
|
...
.
...
| byte 0 |
|
---|---|---|
0 | bits 0-6 |
Anchor | ||||
---|---|---|---|---|
|
Characters in the
Gloss | ||
---|---|---|
|
...
Table 20. Two byte encoding |
|
|
|
|
|
|
| byte 0 |
|
|
| byte 1 |
|
|
|
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1 | 0 | bits 6-10 | 1 | 0 | bits 0-5 |
Anchor | ||||
---|---|---|---|---|
|
Characters in the
Gloss | ||
---|---|---|
|
...
...
...
...
...
...
...
...
...
...
byte 0 |
|
|
|
| byte 1 |
|
| byte 2 |
|
|
---|---|---|---|---|---|---|---|---|---|---|
1 | 1 | 1 | 0 | bits 12-15 | 1 | 0 | bits 6-11 | 1 | 0 | bits 0-5 |
...
The first bits of each byte indicate the role of the byte. A zero bit terminates this role information. Thus possible byte values are:
Bits | Byte value |
| |||||||
---|---|---|---|---|---|---|---|---|---|
Anchor | _df58b4f1-b439-4d5c-9783-fe7e397d04b0__a | _df58b4f1-b439-4d5c-9783-fe7e397d04b0__a | Table 22. UTF-8 Encoding Rules
|
| Bits | Byte value | 0???? ?? ? | 000-127 | Single byte encoding of a character |
10??? ?? ? | 128-191 | Continuation of a multi-byte encoding | |||||||
110?? ?? ? | 192-223 | First byte of a two byte character encoding | |||||||
1110? ?? ? | 224-239 | First byte of a three byte character encoding | |||||||
1111? ?? ? | 240-255 | Invalid in UTF-8 |
Anchor | ||||
---|---|---|---|---|
|
...
...
...
...
...
...
...
...
...
Character | S | C | T |
|
|
|
|
|
|
---|---|---|---|---|---|---|---|---|---|
0053 | 0043 | 0054 | 00AE |
| 2462 |
|
|
| |
Bytes | 01010011 | 01000011 | 01010100 | 11000010 | 10101110 | 11101111 | 10111111 | 10111111 |