2067 View
5 CommentIn discussionComments enabled
In the category:
Open
The semantics of historical associations, quality of the corpus of already authored associations, and future editorial guidance in their authoring and end-user processing.
As a follow-up to the April 2019 agenda item discussion on historical associations, I was tasked with giving examples of associations already authored that might be questionable. The attached spreadsheet is a preliminary contribution, which I'm submitting now in case I don't get time to make it less preliminary! Its original goal was to support an exercise in both trying to crudely quantify the error rate within the existing corpus of historical associations, and perhaps to begin to identify some error patterns and thereby a language for talking about and understanding the problem.
The spreadsheet (which so far I've only managed to half complete) is the concatenation of four lists of 100 inactive codes and one of their historical associations, with each of the four lists being drawn randomly from one of four defined fragments of the space of all inactive concepts; which fragment each concept in the list of 400 comes from is indicated by the TEST value in column 1 of the table.
The four fragments are defined as:
The set of all inactive concepts for which at least one nominated active substitute can be determined, where the preferred terms of the inactive code and the new active substitute are NOT lexically identical, and where the claimed or apparent semantics of the stated association(s) traversed between the inactive code and its nominated substitute claim to be :
Test1: inactive and active substitute are exactly identical (SAME_AS, REPLACED_BY, MOVED_TO/FROM)
Test2: the active substitute corresponds to only one possible interpretation of the meaning of an intrinsically ambiguous inactive code. However, although alternative interpretations therefore also exist, all others neither are nor can be also represented in SNOMED (solitary MAY_BE)
Test3: no concept corresponding to the exact meaning of the inactive code should ever in fact exist in SNOMED. The nominated active substitute is therefore (one of) the nearest proximal parent(s) that it is still legitimately possible to represent in SNOMED. It expresses all parts of the meaning of the original inactive code that can be represented in SNOMED (WAS_A)
Test4: the substitute corresponds to only one possible interpretation of the meaning of an intrinsically ambiguous inactive code. Alternative interpretations therefore also exist, and in fact all alternates that can be also represented in SNOMED are actually represented there and so are nominated elsewhere (but very likely not in this table) as possible alternate substitutes for the inactive code (multiple MAY_BE)
My expectation was that the error rates would most likely be very different across these four specific fragments as defined above - and generally rather higher than the error rate for another fragment NOT presented here, where the preferred term s for the inactive code and its substitute are lexically identical. My somewhat hurried trawl through Fragments 1 and 2 (below) suggest that this expectation is correct.
Jim Case That's because the site wouldn't let me upload files to a message. So I pasted an image grab of the relevant data. I'll try again later today.
Jim CasePaul Amos Here's an updated version of the original spreadie. but this time as a ZIp to stop confluence mangling it into something else.
The original list is still there as Sheet1, but I've tweaked the selection algorithm to screen out all the cases where the inactive code was originally an Extinct code from CTV3, for which it will always be a challenge to nominate a sensible active substitute. The revised list of 400 is therefore probably a more representative sampling of the problem space.
Hi Jeremy Rogers, many thanks. I will copy this across into the subproject page.
PS - Anne is away on annual leave for 3 weeks so she has suggested we progress without her. I will send round a further doodle poll for the next couple of weeks.
5 Comments
Jeremy Rogers
As a follow-up to the April 2019 agenda item discussion on historical associations, I was tasked with giving examples of associations already authored that might be questionable. The attached spreadsheet is a preliminary contribution, which I'm submitting now in case I don't get time to make it less preliminary! Its original goal was to support an exercise in both trying to crudely quantify the error rate within the existing corpus of historical associations, and perhaps to begin to identify some error patterns and thereby a language for talking about and understanding the problem.
The spreadsheet (which so far I've only managed to half complete) is the concatenation of four lists of 100 inactive codes and one of their historical associations, with each of the four lists being drawn randomly from one of four defined fragments of the space of all inactive concepts; which fragment each concept in the list of 400 comes from is indicated by the TEST value in column 1 of the table.
The four fragments are defined as:
The set of all inactive concepts for which at least one nominated active substitute can be determined, where the preferred terms of the inactive code and the new active substitute are NOT lexically identical, and where the claimed or apparent semantics of the stated association(s) traversed between the inactive code and its nominated substitute claim to be :
Test1: inactive and active substitute are exactly identical (SAME_AS, REPLACED_BY, MOVED_TO/FROM)
Test2: the active substitute corresponds to only one possible interpretation of the meaning of an intrinsically ambiguous inactive code. However, although alternative interpretations therefore also exist, all others neither are nor can be also represented in SNOMED (solitary MAY_BE)
Test3: no concept corresponding to the exact meaning of the inactive code should ever in fact exist in SNOMED. The nominated active substitute is therefore (one of) the nearest proximal parent(s) that it is still legitimately possible to represent in SNOMED. It expresses all parts of the meaning of the original inactive code that can be represented in SNOMED (WAS_A)
Test4: the substitute corresponds to only one possible interpretation of the meaning of an intrinsically ambiguous inactive code. Alternative interpretations therefore also exist, and in fact all alternates that can be also represented in SNOMED are actually represented there and so are nominated elsewhere (but very likely not in this table) as possible alternate substitutes for the inactive code (multiple MAY_BE)
My expectation was that the error rates would most likely be very different across these four specific fragments as defined above - and generally rather higher than the error rate for another fragment NOT presented here, where the preferred term s for the inactive code and its substitute are lexically identical. My somewhat hurried trawl through Fragments 1 and 2 (below) suggest that this expectation is correct.
Jim Case
Jeremy Rogers , the attached sheet does not appear to be in spreadsheet format and is a bit difficult to digest. Possible a different files format?
Jeremy Rogers
Jim Case That's because the site wouldn't let me upload files to a message. So I pasted an image grab of the relevant data. I'll try again later today.
Jeremy Rogers
Jim Case Paul Amos Here's an updated version of the original spreadie. but this time as a ZIp to stop confluence mangling it into something else.
The original list is still there as Sheet1, but I've tweaked the selection algorithm to screen out all the cases where the inactive code was originally an Extinct code from CTV3, for which it will always be a challenge to nominate a sensible active substitute. The revised list of 400 is therefore probably a more representative sampling of the problem space.
History Association QA.zip
Paul Amos
Hi Jeremy Rogers, many thanks. I will copy this across into the subproject page.
PS - Anne is away on annual leave for 3 weeks so she has suggested we progress without her. I will send round a further doodle poll for the next couple of weeks.