The SCTID includes a check-digit, which is generated using Verhoeff's dihedral check. This section explains the algorithm used and includes sample source code for generating and checking the check-digit in Java Script and Microsoft Visual Basic.
Verhoeff's Dihedral Group D5 Check
The mathematical description of this technique may appear complex but in practice it can be reduced to a pair of two-dimensional arrays, a single dimensional inverse array and a simple computational procedure. These three arrays are shown in the following tables.
- The first array contains the result of "Dihedral D5" multiplication;
- The second array consists of 8 rows of which two are defined while the rest are derived by applying the following formula: F(i, j) = F(i - 1, F(1, j)) ;
- The third array consists of a single row containing the inverse of the Dihedral D5 array it identifies the location of all the zero values in the first array.
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
1 | 1 | 2 | 3 | 4 | 0 | 6 | 7 | 8 | 9 | 5 |
2 | 2 | 3 | 4 | 0 | 1 | 7 | 8 | 9 | 5 | 6 |
3 | 3 | 4 | 0 | 1 | 2 | 8 | 9 | 5 | 6 | 7 |
4 | 4 | 0 | 1 | 2 | 3 | 9 | 5 | 6 | 7 | 8 |
5 | 5 | 9 | 8 | 7 | 6 | 0 | 4 | 3 | 2 | 1 |
6 | 6 | 5 | 9 | 8 | 7 | 1 | 0 | 4 | 3 | 2 |
7 | 7 | 6 | 5 | 9 | 8 | 2 | 1 | 0 | 4 | 3 |
8 | 8 | 7 | 6 | 5 | 9 | 3 | 2 | 1 | 0 | 4 |
9 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
1 | 1 | 5 | 7 | 6 | 2 | 8 | 3 | 0 | 9 | 4 |
2 | 5 | 8 | 0 | 3 | 7 | 9 | 6 | 1 | 4 | 2 |
3 | 8 | 9 | 1 | 6 | 0 | 4 | 3 | 5 | 2 | 7 |
4 | 9 | 4 | 5 | 3 | 1 | 2 | 6 | 8 | 7 | 0 |
5 | 4 | 2 | 8 | 6 | 5 | 7 | 3 | 9 | 0 | 1 |
6 | 2 | 7 | 9 | 3 | 8 | 0 | 6 | 4 | 1 | 5 |
7 | 7 | 0 | 4 | 6 | 9 | 1 | 3 | 2 | 5 | 8 |
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 4 | 3 | 2 | 1 | 5 | 6 | 7 | 8 | 9 |
The identifier is checked by starting at the rightmost digit of the identifier (the check-digit itself) and proceeding to the left processing each digit as follows:
- Check = ArrayDihedralD5 ( Check, ArrayFunctionF(( Position Modulus 8), Digit ))
Check = the running value of the check-sum (starts at zero and modified by each step).
Position = the position of the digit (counted from the right starting at zero).
Digit = the value of the digit.
The final value of Check should be zero. Otherwise the check has failed.
When calculating the check-digit the same process is applied with a minor variation:
Position is the position that the digit will have when the check-digit has been appended.
The final value of Check is applied to the Inverse D5 array to find the correct check-digit.
Check-digit= ArrayInverseD5 ( Check ).
Sample Java Script for computing Verhoeff's Dihedral Check
A live version of an HTML form and JavaScript is available in section 6.4.1 SNOMED CT Identifier Check.
Sample Visual Basic for computing Verhoeff's Dihedral Check
Reasons for using a check-digit
Although a user should rarely type the SCTID, experience suggests that from time to time this will happen. A user may also copy and paste an SCTID. There is a significant risk of errors in these processes and inclusion of a check-digit is intended to reduce the risk of such errors passing undetected. The choice of check-digit algorithm has been made to maximize the detection of common typographical errors. These have been analyzed by in a paper by J. Verhoeff ("Error Detecting Decimal Codes", Mathematical Center Tract 29, The Mathematical Center, Amsterdam, 1969) and subsequently cited in Wagner and Putter, ("Error Detecting Decimal Digits", CACM, Vol 32, No. 1, January 1989). These papers give a detailed categorization of the sorts of errors humans make in dealing with decimal numbers, based on a study of 12000 errors:
- single errors: a becomes b (60% to 95% of all errors).
- omitting or adding a digit (10% to 20%).
- adjacent transpositions: ab becomes ba (10% to 20%).
- twin errors: aa becomes bb (0.5% to 1.5%).
- jump transpositions: acb becomes bca (0.5% to 1.5%).
- jump twin errors: aca becomes bcb (below 1%).
- phonetic errors: a0 becomes 1a -similar pronunciation e.g. thirty or thirteen (0.5% to 1.5%).
In the explanations above, a is not equal to b, but c can be any decimal digit.
A brief comparison of check-digit effectiveness
The IBM Check
The check-sums used for credit cards (the IBM check) picks up the most common errors but miss some adjacent transpositions and many jump transpositions. Assuming the pattern of errors described above, on average it will miss between 4% and 5% of expected errors.
The ISBN Check (Modulus 11)
The ISBN modulus 11 (used for UK NHS number) picks up more errors than the IBM checksum. Leaving 2% to 3% of errors undetected. However, it generates a check-sum value of 0 to 10 and thus cannot be represented as a single check-digit in about 9% of cases. The ISBN convention is to use "X" to represent the check-digit value 10 but this is incompatible with an Integer representation. The UK NHS number uses this check-sum but regards and number generating a check-sum of 10 as an invalid identifier. This approach could be applied to the SCTID but this would render 9% of possible values unusable in each partition and namespace. This would prevent a simple sequence of values from being allocated as the item identifier within any namespace. More significantly the unusable item identifier would differ in each namespace or partition and this would prevent simple transpositions of item identifiers between partitions and namespaces.
Partitions could be a useful way of distinguishing developmental and released components and revising the partition and recalculating the check-digit would then be an elegant way to activate these components for a distribution version. It seems unwise to prevent future development and maintenance by using a check-sum that will prevent this.
Verhoeff's Check
Verhoeff's check catches all single errors, all adjacent transpositions, over 95% of twin errors, over 94% of jump transpositions and jump twin errors, and most phonetic errors. Therefore, like modulus 11, the Verhoeff check reduces the undetected error rate to 2% or 3%. Unlike modulus 11, it does this using a single decimal check-digit and without limiting the range of valid numbers.
The majority of the undetected errors with both modulus 11 and Verhoeff result from additions or omissions of digits. Any check-digit method is likely to miss 10% of such errors and since these comprise 10% to 20%. The Verhoeff scheme also misses four jump twin errors involving digits with a difference of 5 (i.e. 050 vs. 505, 161 vs. 616, 272 vs. 727, and 494 vs. 949).
Feedback