Page tree

3702 View 8 Comment In discussion Comments enabled In the category: Undefined

What should be the interpretation of cardinality of reverse attribute? Does the cardinality apply to the source or the destination of the relationship? Just to clarify!

Contributors (4)

8 Comments

  1. Interesting question Daniel!

    If we consider an example:

       < 105590001 |Substance|: [3..3] R 127489000 |Has active ingredient| = *

    Then I assume your question is, should this be read as:

    1. Descendants of Substance which are the active ingredient of exactly 3 products, OR
    2. Descendants of Substance which are the active ingredient of a product containing exactly 3 active ingredients

    So if executed over a substrate of:

    • X has active ingredient S1
    • X has active ingredient S2
    • X has active ingredient S3
    • Y has active ingredient S1
    • Z has active ingredient S1

    Approach 1 would result in the set {S1} and approach 2 would result in the set {S1, S2, S3}.

    To me it would make sense to apply the 'R' first, before the cardinality ... which means you would be applying the cardinality of [3..3] to the substrate:

    • S1 is active ingredient of X
    • S2  is active ingredient of X
    • S3  is active ingredient of X
    • S1 is active ingredient of Y
    • S1  is active ingredient of Z

    In which case, the answer would be {S1} - interpretation 1 - and in terms of your original question, the cardinality would apply to the source of the relationship (for each selected destination).

    However, I would be interested to know if others agree or disagree with this. This certainly looks like an area in which we should improve the documentation.

    P.S. - Interestingly, (based on a very quick analysis) I think that SnoQuery may use interpretation 2 and Ontoserver may use interpretation 1 (as did I).

     

     

    1. For what it's worth, Snow Owl/the IHTSDO terminology server are using interpretation 1.

      1. Thanks Brandon! Perfect!

  2. This is a SQL interpretation of Approach 1 applied to < 105590001 |Substance|: [3..3] R 127489000 |Has active ingredient| = *

    SELECT DISTINCT relationships.destinationId FROM relationships
    WHERE
    relationships.active = 1 AND
    relationships.destinationId IN (SELECT SubtypeId FROM transitiveclosure WHERE SupertypeId = 105590001 AND PathLength > 0) AND # substance
    relationships.typeId = 127489000 # active ingredient
    GROUP BY relationships.destinationId
    HAVING count(relationships.Id) = 3

    Can we get agreement on this interpretation?

  3. Personally, I think interpretation #2 might actually by the more correct, though perhaps less obviously useful interpretation in the specific domain of drug ingredients. Given that we don't actually have inverse attributes, #2 arguably fits better with the original idea of the R operator and notation as I understood it, of needing to be able to ask:

    For the set of all * that satisfies:

    *:[3..3] 127489000 |Has active ingredient|= <105590001|Substance|

    ...what is the non-redundant set of values that we encounter in the 105590001|Substance| slot?

    Ie for the set of all drugs with exactly three ingredients, what is the set of substances found?

    Though it does of course then beg the question of what notation is equivalent to interpretation #1 since this is itself also a perfectly valid question to ask!

  4. So, do we need both versions?

       < 105590001 |Substance|: [3..3] R 127489000 |Has active ingredient| = * - Approach #1

       < 105590001 |Substance|: R [3..3] 127489000 |Has active ingredient| = * -  Approach #2

    Not super clear, and what would be the corresponding dot notation?

     

  5. Approach #2 could be written as:

    <105590001|Substance|: R 127489000 |Has active ingredient|= (*:[3..3] 127489000 |Has active ingredient|= <105590001|Substance|)

    Then, approach #1 would be the preferred interpretation, or?

    Interestingly, in SnQuery, this and the original query give the same results (at least same number and, by manual inspection, the same concepts).

  6. I agree that both approaches could be useful, and as you suggest Daniel:

    • Approach #1 could be written as: 
      • < 105590001 |Substance|: [3..3] R 127489000 |Has active ingredient| = * 
    • Approach #2 could be written as:
      • <105590001|Substance|: R 127489000 |Has active ingredient|= (*:[3..3] 127489000 |Has active ingredient|= <105590001|Substance|)

    It's interesting to consider that approach 2 can also be written using dot notation as:

      • (*:[3..3] 127489000 |Has active ingredient|= <105590001|Substance|).127489000 |Has active ingredient|

    However, I can't think of an even mildly intuitive way of representing approach 1 using dot notation. Do we need one?

    Kind regards,
    Linda.