Page History
Gloss | |||
---|---|---|---|
| |||
Anchor | _3332ca9b-c878-4bea-9cb2-9f1de32dbe59 | _3332ca9b-c878-4bea-9cb2-9f1de32dbe59 | |
Anchor | |||
|
Specref | ||||
---|---|---|---|---|
|
Unicode is preferred to ASCII because it permits the inclusion of accents, scientific symbols and characters used in
Gloss | ||
---|---|---|
|
Gloss | ||
---|---|---|
|
Gloss | ||
---|---|---|
|
Gloss | ||
---|---|---|
|
|
...
ASCII characters are encoded as a single byte.
- Greek, Hebrew, Arabic and most accented European characters are encoded as two bytes;
- All other characters are encoded as three bytes;
- The individual characters are encoded according to the following rules.
...
Characters in the 'u+0000' to 'u+007f' are encoded as a single byte.
...
...
byte 0
...
...
0
...
bits 0-6
...
Characters in the 'u+0080' to 'u+07ff' are encoded as two bytes.
...
...
...
...
...
...
...
...
byte 0
...
...
...
...
byte 1
...
...
...
...
1
...
1
...
0
...
bits 6-10
...
1
...
0
...
bits 0-5
...
Characters in the 'u+0800' to 'u+ffff' are encoded as three bytes:
...
...
...
...
...
...
...
...
...
...
...
byte 0
...
...
...
...
...
byte 1
...
...
...
byte 2
...
...
...
1
...
1
...
1
...
0
...
bits 12-15
...
1
...
0
...
bits 6-11
...
1
...
0
...
bits 0-5
...
The first bits of each byte indicate the role of the byte. A zero bit terminates this role information. Thus possible byte values are:
...
...
...
Bits
...
Byte value
...
0???? ?? ?
...
000-127
...
Single byte encoding of a character
...
10??? ?? ?
...
128-191
...
Continuation of a multi-byte encoding
...
110?? ?? ?
...
192-223
...
First byte of a two byte character encoding
...
1110? ?? ?
...
224-239
...
First byte of a three byte character encoding
...
1111? ?? ?
...
240-255
...
Invalid in UTF-8
was adopted is an IETF Internet Standard (it was initially adopted by IETF in 1996 to restrict some code values in 1998 and 2003). In 2008 UTF-8 became the most widely used for of encoding in web pages.
Gloss | ||
---|---|---|
|
Gloss | ||
---|---|---|
|
Footnote Macro |
---|
Note that SNOMED CT does not use, or require use of, the Byte Order Mark (BOM) specified by the Unicode standard because all SNOMED CT release files use UTF-8. |
of characters in
Gloss | ||
---|---|---|
|
...
Display Footnotes Macro |
---|
...
...
...
...
...
...
...
...
...
...
Character
...
S
...
C
...
T
...
...
...
...
...
...
...
...
0053
...
0043
...
0054
...
00AE
...
...
2462
...
...
...
...
Bytes
...
01010011
...
01000011
...
01010100
...
11000010
...
10101110
...
11101111
...
10111111
...