Page History
Gloss | |||
---|---|---|---|
| |||
Anchor | _3332ca9b-c878-4bea-9cb2-9f1de32dbe59 | _3332ca9b-c878-4bea-9cb2-9f1de32dbe59 | |
Anchor | |||
|
Specref | ||||
---|---|---|---|---|
|
Unicode is preferred to ASCII because it permits the inclusion of accents, scientific symbols and characters used in
Gloss | ||
---|---|---|
|
Gloss | ||
---|---|---|
|
Gloss | ||
---|---|---|
|
Gloss | ||
---|---|---|
|
|
Gloss | ||
---|---|---|
|
Gloss | ||
---|---|---|
|
...
ASCII characters are encoded as a single byte.
- Greek, Hebrew, Arabic and most accented European characters are encoded as two bytes;
- All other characters are encoded as three bytes;
- The individual characters are encoded according to the following rules.
...
Characters in the
Gloss | ||
---|---|---|
|
byte 0 |
|
---|---|
0 | bits 0-6 |
...
Characters in the
Gloss | ||
---|---|---|
|
byte 0 |
|
|
| byte 1 |
|
|
|
---|---|---|---|---|---|---|---|
1 | 1 | 0 | bits 6-10 | 1 | 0 | bits 0-5 |
...
Characters in the
Gloss | ||
---|---|---|
|
byte 0 |
|
|
|
| byte 1 |
|
| byte 2 |
|
|
---|---|---|---|---|---|---|---|---|---|---|
1 | 1 | 1 | 0 | bits 12-15 | 1 | 0 | bits 6-11 | 1 | 0 | bits 0-5 |
...
The first bits of each byte indicate the role of the byte. A zero bit terminates this role information. Thus possible byte values are:
Bits | Byte value |
| ||||
---|---|---|---|---|---|---|
0???? ?? ? | 000-127 | Single byte encoding of a character | ||||
10??? ?? ? | 128-191 | Continuation of a multi-byte encoding | ||||
110?? ?? ? | 192-223 | First byte of a two byte character encoding | ||||
1110? ?? ? | 224-239 | First byte of a three byte character encoding | ||||
1111? ?? ? | 240-255 | Invalid in UTF-8 |
...
Character
...
S
...
C
...
T
...
...
...
...
...
...
...
...
0053
...
0043
...
0054
...
00AE
...
...
2462
...
...
...
was adopted is an IETF Internet Standard (it was initially adopted by IETF in 1996 to restrict some code values in 1998 and 2003). In 2008 UTF-8 became the most widely used for of encoding in web pages.
Gloss | ||
---|---|---|
|
Gloss | ||
---|---|---|
|
Footnote Macro |
---|
Note that SNOMED CT does not use, or require use of, the Byte Order Mark (BOM) specified by the Unicode standard because all SNOMED CT release files use UTF-8. |
of characters in
Gloss | ||
---|---|---|
|
...
Display Footnotes Macro |
---|
...
Bytes
...
01010011
...
01000011
...
01010100
...
11000010
...
10101110
...
11101111
...
10111111
...