Welcome and agenda | | Please note that the SLPG will be meeting in London on Sunday 5th April (9am to 12:30pm) - see schedule |
Concrete values | Linda Bird | SCG, ECL, STS, ETL - Ready for publication - on hold until after MAG meeting in April confirming requirement for Boolean datatype. |
URIs | | Please review updates to the URI specification, and raise any comments in the next 2 weeks. Draft URI standard for review - URI Standard - 2.6 URIs for Language Syntaxes
- 2.7 URIs for Language Instances
- 2.8 URIs for Modelling Resources
- 3.1 Resolving SNOMED CT URIs
|
Expression Constraint Language | Linda Bird | NEXT STEP FOR ECL: - Agreement in Malaysia - ECL will add the following (no regex - just wild card and word prefix any order):
{{ term = [ termSearchType : ] "String", language = <langCode> }} - Example - {{ term = "heart att", language = es }}
- Question - Do we want to reconsider including optional parameters for 'type', 'dialect' and 'acceptability'
- typeId = 900000000000013009 ; type = <synonym | fsn>
- dialectId = 900000000000508004 ; dialect = <en-GB | en-AU | en-Patient | de-CardioSpecialist>
- dialectId = 900000000000508004 + 900000000000509007 ; dialect = en-GB + en-US
- acceptabilityId = 900000000000549004 ; acceptability = <acceptable | preferred >
Term Search Type - Wild Card Match (collation) - e.g.
- {{ term = wild:"*heart*“ }}
- {{ term = wild (sv):"*hjärta*“ }}
- Word Prefix Any Order - e.g.
- {{ term = match:“hear att” }}
- Default (word prefix any order) - e.g.
- {{ term = "hear att" }}
- {{ term = "*heart*“ }}
Potential Examples - << 64572001 |Disease| {{ term = “heart”}}
- << 64572001 |Disease| {{ term = “heart”, language = "en"}}
- << 64572001 |Disease| {{ term = “heart”, language = "en"}} AND << 64572001 |Disease| {{ term = “hjärta”, language = "sv"}}
- << 64572001 |Disease| {{ term = “heart”, language = "en"}} {{ term = “hjärta”, language = "sv"}}
- << 64572001 |Disease| {{ term = “heart”, language = "en"}} OR << 64572001 |Disease| {{ term = “hjärta", language = "sv"}}
- << 64572001 |Disease| {{ (term = “heart”, language = "en") OR (term = “hjärta", language = "sv")}}
- (<< 64572001 |Disease|: |Associated morphology| = *) {{ term = “heart”, language = "en", }} {{ term = “hjärta", language = "sv"}}
- (<< 64572001 |Disease| {{ term = “*cardio*” }}) MINUS (<< 64572001 |Disease| {{ term != “*heart*” }})
- Recommendation to be made on (based on investigation of grammar):
- << 64572001 |Disease| {{ term = “heart”, language = "en"}} AND {{ term = “hjärta”, language = "sv"}}
- << 64572001 |Disease| ( {{ term = “heart”, language = "en"}} OR {{ term = “hjärta”, language = "sv"}} )
- << 64572001 |Disease| ( {{ term = “heart”, language = "en"}} MINUS {{ term = “hjärta”, language = "sv"}} )
Use Cases - Intentionally define a reference set for chronic disease. Starting point was ECL with modelling; This misses concepts modelled using the pattern you would expect. So important in building out that reference set.
- Authors quality assuring names of concepts
- Checking translations, retranslating. Queries for a concept that has one word in Swedish, another word in English
- AU use case would have at most 3 or 4 words in match
- Consistency of implementation in different terminology services
- Authoring use cases currently supported by description templates
- A set of the "*ectomy"s and "*itis"s
Questions - Do we include 'typeId' - e.g. << 64572001 |Disease| {{ D.term = “*heart*”, typeId = 900000000000013009 |Synonym| }}
- Do we include 'type' - e.g. << 64572001 |Disease| {{ D.term = “*heart*”, D.type = synonym }}
- Do we include 'languageCode' - e.g. << 64572001 |Disease| {{ D.term = “*heart*”, D.type = synonym, D.languageCode = “en” }}
- Do we include 'caseSignificanceId' - e.g. << 64572001 |Disease| {{ D.term = “*Heart*”, D.caseSignificanceId = 900000000000017005 |case sensitive|}}
- Do we include 'caseSignificance' - e.g. << 64572001 |Disease| {{ D.term = “*Heart*”, D.caseSignificance = sensitive }}
- Do we include 'language' and 'version' - e.g. << 64572001 |Disease| {{ term = “*heart*” }} VERSION = http://…, LANGUAGE = (999001881000000108|Gastro LRS|, |GB English|)
- Do we include syntactic sugar - e.g.
- << 64572001 |Disease| {{ preferredTerm = “*heart*”, languageRefSet = en-gb}}
- << 64572001 |Disease| {{ fullySpecifiedTerm = “*heart*”, languageRefSet=en-gb}}
- << 64572001 |Disease| {{ acceptableTerm = “*heart*”, languageRefSet = en-gb}}
- << 64572001 |Disease| {{ preferredTerm = “*heart*”}} FROM version = X, language = Y
- NO
- Do we use/require the "D" at the start of "term"?
- Packaging - How do we package this extension to ECL
- A new version of ECL - version 1.5
|
Querying Refset Attributes | Linda Bird | Proposed syntax to support querying and return of alternative refset attributes (To be included in the SNOMED Query Language) - Example use cases
- Execution of maps from international substance concepts to AMT substance concepts
- Find the anatomical parts of a given anatomy structure concept (in |Anatomy structure and part association reference set)
- Find potential replacement concepts for an inactive concept in record
- Find the order of a given concept in an Ordered component reference set
- Find a concept with a given order in an Ordered component reference set
- Potential syntax to consider (brainstorming ideas)
- SELECT ??
- SELECT 123 |referenced component|, 456 |target component|
FROM 799 |Anatomy structure and part association refset| WHERE 123 |referenced component| = (< 888 |Upper abdomen structure| {{ term = "*heart*" }} ) - SELECT id, moduleId
FROM concept WHERE id IN (< |Clinical finding|) AND definitionStatus = |primitive| - SELECT id, moduleId
FROM concept, ECL("< |Clinical finding") CF WHERE concept.id = CF.sctid AND definitionStatus = |primitive| - SELECT ??? |id|, ??? |moduleId|
FROM concept ( < |Clinical finding| {{ term = "*heart*" }} {{ definitionStatus = |primitive| }} ) - Question - Can we assume some table joins - e.g. Concept.id = Description.conceptId etc ??
- Examples
- Try to recast relationships table as a Refset table → + graph-based extension
- Find primitive concepts in a hierarchy
- ROW ... ?
- ROWOF (|Anatomy structure and part association refset|) ? (|referenced component| , |target component|)
- same as: ^ |Anatomy structure and part association refset|
- ROWOF (|Anatomy structure and part association refset|) . |referenced component|
- same as: ^ |Anatomy structure and part association refset|
- ROWOF (|Anatomy structure and part association refset|) {{ |referenced component| = << |Upper abdomen structure|}} ? |targetComponentId|
- ROWOF (< 900000000000496009|Simple map type reference set| {{ term = "*My hospital*"}}) {{ 449608002|Referenced component| = 80581009 |Upper abdomen structure|}} ? 900000000000505001 |Map target|
- (ROW (< 900000000000496009|Simple map type reference set| {{ term = "*My hospital*"}}) : 449608002|Referenced component| = 80581009 |Upper abdomen structure| ).900000000000505001 |Map target|
- # ... ?
- # |Anatomy structure and part association refset| ? |referenced component\
- # (|Anatomy struture and part association refset| {{|referenced component| = << |Upper abdomen structure|) ? |targetComponentid|
- ? notation + Filter refinement
- |Anatomy structure and part association refset| ? |targetComponentId|
- |Anatomy structure and part association refset| ? |referencedComponent| (Same as ^ |Anatomy structure and part association refset|)
(|Anatomy structure and part association refset| {{ |referencedComponent| = << |Upper abdomen structure}} )? |targetComponentId| - ( |Anatomy structure and part association refset| {{ |targetComponentId| = << |Upper abdomen structure}} ) ? |referencedComponent|
- ( |My ordered component refset|: |Referenced component| = |Upper abdomen structure ) ? |priority order|
- ? |My ordered component refset| {{ |Referenced component| = |Upper abdomen structure| }} . |priority order|
- ? |My ordered component refset| . |referenced component|
- equivalent to ^ |My ordered component refset|
- ? (<|My ordered component refset|) {{ |Referenced component| = |Upper abdomen structure| }} . |priority order|
- ? (<|My ordered component refset| {{ term = "*map"}} ) {{ |Referenced component| = |Upper abdomen structure| }} . |priority order|
- REFSETROWS (<|My ordered component refset| {{ term = "*map"}} ) {{ |Referenced component| = |Upper abdomen structure| }} SELECT |priority order|
- Specify value to be returned
- ? 449608002 |Referenced component|?
734139008 |Anatomy structure and part association refset|
- ^ 734139008 |Anatomy structure and part association refset| (Same as previous)
- ? 900000000000533001 |Association target component|?
734139008 |Anatomy structure and part association refset| - ? 900000000000533001 |Association target component|?
734139008 |Anatomy structure and part association refset| : 449608002 |ReferencedComponent| = << |Upper abdomen structure| - ? 900000000000533001 |Association target component|?
734139008 |Anatomy structure and part association refset| {{ 449608002 |referencedComponent| = << |Upper abdomen structure| }} - (? 900000000000533001 |Association target component|?
734139008 |Anatomy structure and part association refset| : 449608002 |ReferencedComponent| = (<< |Upper abdomen structure|) : |Finding site| = *)
|
Returning Attributes | Michael Lawley | Proposal (by Michael) for discussion - Currently ECL expressions can match (return) concepts that are either the source or the target of a relationship triple (target is accessed via the 'reverse' notation or 'dot notation', but not the relationship type (ie attribute name) itself.
For example, I can write: << 404684003|Clinical finding| : 363698007|Finding site| = <<66019005|Limb structure| << 404684003|Clinical finding| . 363698007|Finding site| But I can't get all the attribute names that are used by << 404684003|Clinical finding| - Perhaps something like:
- ? R.type ? (<< 404684003 |Clinical finding|)
- This could be extended to, for example, return different values - e.g.
- ? |Simple map refset|.|maptarget| ? (^|Simple map refset| AND < |Fracture|)
|
Reverse Member Of | Michael Lawley | Proposal for discussion What refsets is a given concept (e.g. 421235005 |Structure of femur|) a member of? - Possible new notation for this:
- ^ . 421235005 |Structure of femur|
- ? X ? 421235005 |Structure of femur| = ^ X
|
Expression Templates | | - ON HOLD WAITING FROM IMPLEMENTATION FEEDBACK FROM INTERNAL TECH TEAM
- WIP version - https://confluence.ihtsdotools.org/display/WIPSTS/Template+Syntax+Specification
- Added a 'default' constraint to each replacement slot - e.g. default (72673000 |Bone structure (body structure)|)
- Enabling 'slot references' to be used within the value constraint of a replacement slot - e.g. [[ +id (<< 123037004 |Body structure| MINUS << $findingSite2) @findingSite1]]
- Allowing repeating role groups to be referenced using an array - e.g. $rolegroup[1] or $rolegroup[!=SELF]
- Allow reference to 'SELF' in role group arrays
- Adding 'sameValue' and 'allOrNone' constraints to information slots - e.g. sameValue ($site), allOrNone ($occurrence)
- See changes in red here: 5.1. Normative Specification
Examples: [[+id]]: [[1..*] @my_group sameValue(morphology)] { |Finding site| = [[ +id (<<123037004 |Body structure (body structure)| MINUS << $site[! SELF ] ) @site ]] , |Associated morphology| = [[ +id @my_morphology ]] } - Implementation feedback on draft updates to Expression Template Language syntax
- Use cases from the Quality Improvement Project:
- Multiple instances of the same role group, with some attributes the same and others different. Eg same morphology, potentially different finding sites.
Note that QI Project is coming from a radically different use case. Instead of filling template slots, we're looking at existing content and asking "exactly how does this concept fail to comply to this template?" For discussion:
[[0..1]] { [[0..1]]
246075003 |Causative agent|
= [[+id (<
410607006 |Organism|
) @Organism]] }
Is it correct to say either one of the cardinality blocks is redundant? What are the implications of 1..1 on either side? This is less obvious for the self grouped case. Road Forward for SI- Generate the parser from the ABNF and implement in the Template Service
- User Interface to a) allow users to specify template at runtime b) tabular (auto-completion) lookup → STL
- Template Service to allow multiple templates to be specified for alignment check (aligns to none-off)
- Output must clearly indicate exactly what feature of concept caused misalignment, and what condition was not met.
Additional note: QI project is no longer working in subhierarchies. Every 'set' of concepts is selected via ECL. In fact most reports should now move to this way of working since a subhierarchy is the trivial case. For a given template, we additionally specify the "domain" to which it should be applied via ECL. This is much more specific than using the focus concept which is usually the PPP eg Disease. FYI Michael Chu |
Description Templates | Kai Kewley | - ON HOLD
- Previous discussion (in Malaysia)
- Overview of current use
- Review of General rules for generating descriptions
- Removing tags, words
- Conditional removal of words
- Automatic case significance
- Generating PTs from target PTs
- Reordering terms
- Mechanism for sharing general rules - inheritance? include?
- Description Templates for translation
- Status of planned specification
|
Query Language - Summary from previous meetings
| | FUTURE WORK Examples: version and dialect Notes
- Allow nested where, version, language
- Scope of variables is inner query
|
| Examples: where Notes - Allow nested variable definitions, but recommend that people don't due to readability
- Scope of variables is the inner query
- No recursion e.g X WHERE X = 1234 MINUS X
- ie can't use a variable in its own definition
- ie X is only known on the left of the corresponding WHERE, and not on the right of the WHERE
|
Keywords for Term-based searching: - D.term
- D.term = "*heart*"
- D.term = wild:"*heart*"
- D.term = regex:".*heart.*"
- D.term = match:"hear att"
- D.term = (sv) wild: "*heart*"
- D.languageCode
- D.languageCode = "en"
- D.languageCode = "es"
- D.caseSignificanceId
- D.caseSignificanceId = 900000000000448009 |entire term case insensitive|
- D.caseSignificanceId = 900000000000017005 |entire term case sensitive|
- D.caseSignificanceId = 900000000000020002 |only initial character case insensitive|
- D.caseSignificance
- D.caseSignificance = "insensitive"
- D.caseSignificance = "sensitive"
- D.caseSignificance = "initialCharInsensitive"
- D.typeId
- D.typeId = 900000000000003001 |fully specified name|
- D.typeId = 900000000000013009 |synonym|
- D.typeId = 900000000000550004 |definition|
- D.type
- D.type = "FSN"
- D.type = "fullySpecifiedName"
- D.type = "synonym"
- D.type = "textDefinition"
- D.acceptabilityId
- D.acceptabilityId = 900000000000549004 |acceptable|
- D.acceptabilityId = 900000000000548007 |preferred|
- D.acceptability
- D.acceptability = "acceptable"
- D.acceptability = "preferred"
Additional Syntactic Sugar - FSN
- FSN = "*heart"
- D.term = "*heart", D.type = "FSN"
- D.term = "*heart", D.typeId = 900000000000003001 |fully specified name|
- FSN = "*heart" LANGUAGE X
- D.term = "*heart", D.type = "FSN", D.acceptability = * LANGUAGE X
- D.term = "*heart", D.typeId = 900000000000003001 |fully specified name|, acceptabilityId = * LANGUAGE X
- synonym
- synonym = "*heart"
- D.term = "*heart", D.type = "synonym"
- D.term = "*heart", D.typeId = 900000000000013009 |synonym|
- synonym = "*heart" LANGUAGE X
- D.term = "*heart", D.type = "synonym", D.acceptability = * LANGUAGE X
- D.term = "*heart", D.typeId = 900000000000013009 |synonym|, (D.acceptabilityId = 900000000000549004 |acceptable| OR D.acceptabilityId = 900000000000548007 |preferred|) LANGUAGE X
- synonymOrFSN
- synonymOrFSN = "*heart"
- synonym = "*heart" OR FSN = "*heart"
- D.term = "*heart", (D.type = "synonym" OR D.type = "fullySpecifiedName")
- synonymOrFSN = "*heart" LANGUAGE X
- synonym = "*heart" OR FSN = "*heart" LANGUAGE X
- D.term = "*heart", (D.type = "synonym" OR D.type = "fullySpecifiedName"), D.acceptability = * LANGUAGE X
- textDefinition
- textDefinition = "*heart"
- D.term = "*heart", D.type = "definition"
- D.term = "*heart", D.typeId = 900000000000550004 |definition|
- textDefinition = "*heart" LANGUAGE X
- D.term = "*heart", D.type = "definition", D.acceptability = * LANGUAGE X
- D.term = "*heart", D.typeId = 900000000000550004 |definition|, D.acceptabilityId = * LANGUAGE X
- Unacceptable Terms
- (D.term = "*heart") MINUS (D.term = "*heart", D.acceptability = * LANGUAGE X)
|
Language preferences using multiple language reference sets LRSs that use the same Language tend to use 'Addition' - i.e. child LRS only includes additional acceptable terms, but can override the preferred term E.g. Regional LRS that adds local dialect to a National LRS E.g. Specialty-specific LRS E.g. Irish LRS that adds local preferences to the en-GB LRS
LRSs that define a translation to a different language tend to use 'Replacement' - i.e. child LRS replaces set of acceptable and preferred terms for any associated concept
|
Confirm next meeting date/time | | Next meeting is scheduled for Wednesday 11th March 2020 at 20:00 UTC. |
2 Comments
Ed Cheetham
A few thoughts on item 4 – 'search'...
I’m sure it’s achievable (and desirable) to introduce a ‘minimal’ search filter to the ECL whilst deferring the ability to specify various parameters. However in order to maximise the chance of a search filter producing the ‘intended’ results wherever it is used, it looks as though a number of (processing) assumptions need to be declared rather than left implicit for each implementation. Eventually many of these default ‘assumptions’ could themselves become configurable.
To start, and in a slightly different spirit, looking at the search string candidate syntax itself:
I presume the existing stringValue = 1*(anyNonEscapedChar / escapedChar) would be used to specify a search string.
The current definition of escapedChar would allow quotation marks to be used in a search and thus distinguish between "ghosts" and ghosts:
However escapedChar would also need to be extended to include an escape for the asterisk in order to test for terms containing *, e.g.:
Regarding a default ‘description substrate’: this seems to include some inevitable assumptions. At least one language reference set (LRS) is ‘essential’ [1], [2] in any SNOMED CT build, and thus in any default ECL substrate (“the set of active components from the snapshot release (in distribution normal form) of the SNOMED CT versioned edition currently loaded into the given tool” [3]). However if the ability to specify the actual LRS (or set of LRSs) is deferred, then only specifying the languageCode could lead to inconsistent results in implementations that use the same Descriptions table but filter it by different LRSs. Clearly this is better than nothing, but the potential for variation needs to be articulated.
Tokenization.
I know we touched on this on the call - I feel it’s significant. I am unaware of any truly ‘standard’ tokenizer for indexing descriptions – if somebody knows of a suitable one then this specification should cite it as the assumed position. Anne and Fadi mention a set of candidate separating characters in the Search and data entry guide [4], specifically item 4 here [5], and I suspect that we all use something like this.
I know I’m personally guilty of tokenising differently on different occasions. My RF1-based implementation uses " ""'.,:;!?()[]{}<>^-+/#~*&%$£", more recently I see that I used ".,/=-+><^?@~{}[]'\"*():;#" in an unrelated project, and my basic RF2 snapshot build uses Sqlite’s FTS3/4 simple tokenizer [6]. All have behaved ‘well enough’ but I suspect they are sufficiently different for the occasional variation in search/matching behaviour to arise.
Building on Daniel’s comment on the call that many non-English languages that use compound words perform poorly with a ‘word prefix’ search convention… There is a glimpse of this also in English terms with their idiosyncratic use of hyphens or compounding, and consequently some implementations treat hyphens as special case token separators, concatenating rather than splitting tokens to try and ‘normalise’ these variations, thus indexing these two concepts to appear ‘the same’ for search purposes:
Likewise, some implementations treat periods/full stops differently if they seem to be separating the letters of an acronym, and for the following examples would give both a key of LE rather than the second one L and E:
I'm not saying that either variant would or should be specified in a 'default' tokenizer, just that some implementations may already use them and may need to be aware that they are not 'assumed' in the creation of the search string.
Normalisation.
Here I confess I get simultaneously out of my depth, English-centric and ‘my search use case’-centric. For me, search normalisation has predominantly about making the token and candidate text as much like basic ASCII characters (and thus my keyboard) set to minimise false negatives in search (esp. diacritic removal & case-independence).
However I recognise that non-English languages will have different requirements, and indeed there are times when I want to retain diacritic or case sensitivity.
What I feel I'm bumping up against is a tension between my use case and published language-sensitive normalisation options to present as a 'default'. W3C’s string matching WG Note [7] and its referenced Unicode standards comes close, but I’m scratching my head to understand if any of the ‘appropriate normalization step’ choices at [8] or referenced from the Unicode collation document [9] correspond to my (I hope not unreasonable) desire for a search based on ‘match:“SJOGR SYN” language=en’ to include 762303003 | Pediatric onset Sjögren syndrome (disorder) in its return set, even though all ‘en’ terms associated with this concept include the accented Sjögren form. Maybe it is unreasonable and I (and other 'en' users) will instead have to specify all diacritic-sensitive search variants - and therefore need to know them in advance!
Clearly there are plenty of other things to consider regarding ‘search’, but the points above seemed like a reasonable set to kick off a discussion.
Kind regards
Ed
[1] https://confluence.ihtsdotools.org/display/DOCEXTPG/4.3.2.4.1+Language+Reference+Set
[2] https://confluence.ihtsdotools.org/display/DOCTSG/4.2+Essential+Reference+Sets
[3] https://confluence.ihtsdotools.org/display/DOCECL/3.2+Expression+Constraint+and+Query+Requirements
[4] https://confluence.ihtsdotools.org/display/DOCSEARCH/Search+and+Data+Entry+Guide
[5] https://confluence.ihtsdotools.org/display/DOCSEARCH/Appendix+B+-+Future+Additions+to+this+Guide
[6] https://www.sqlite.org/fts3.html#tokenizer
[7] http://www.w3.org/TR/charmod-norm/
[8] http://www.w3.org/TR/charmod-norm/#performNorm
[9] http://www.unicode.org/reports/tr10/#Step_1
Linda Bird
Wow! A lot of really important points here Ed. Let's work through these at the next few meetings.
Kind regards,
Linda.