Opened 4 years ago

Closed 2 years ago

#78 closed enhancement (fixed)

shared GPC RxNORM/NDFRT medications ontology

Reported by: campbell Owned by: mish
Priority: major Milestone: data-domains3
Component: data-stds Keywords: onc-stds
Cc: nateapathy, Aaron.Meyer@…, dlyoshih Blocked By:
Blocking: #181

Description

Obtain RXNorm from NLM and NDRFT from VA
Develop, validate and test schema Negotiate and install metadata for RxNORM/NDFRT
Distribute schema enhancement and update GPC i2b2 ontology database who: UNMC;KU when: 4/1/2014-5/15/2014

Attachments (2)

summarize_terms.sql (493 bytes) - added by dconnolly 2 years ago.
gpc-terms-summary.csv (3.3 KB) - added by dconnolly 2 years ago.

Download all attachments as: .zip

Change History (24)

comment:1 Changed 4 years ago by dconnolly

  • Milestone set to data-domains2

comment:2 Changed 4 years ago by dconnolly

  • Priority changed from major to minor

low priority until Apr 1 start date.

comment:3 Changed 4 years ago by ngraham

...In progress towards shared GPC RxNorm/NDFRT medication ontology...

Here's an Oracle SQL query that builds an i2b2 compatible VA Class hierarchy from just RxNorm. It does not yet include the Semantic clinical drug and form (SCDF)/Semantic branded drug and form (SBDF).

In other words, Disulfiram Oral Tablet which is an SCDF isn't included in the ontology created by the query below. However, the VA hierarchy above it is created \i2b2\Drug Products by VA Class\[AD000] ANTIDOTES,DETERRENTS AND POISON CONTROL\[AD100] ALCOHOL DETERRENTS\.

with 
va_classes as (
    select 
    con1.str parent_name, con1.rxcui parent_rxcui, con1.rxaui parent_rxaui, 
    rel.rel, 
    con2.str child_name, con2.rxcui child_rxcui, con2.rxaui child_rxaui
    from rxnorm.rxnrel rel
    join rxnorm.rxnconso con1 on con1.rxaui=rel.rxaui1
    join rxnorm.rxnconso con2 on con2.rxaui=rel.rxaui2
    join rxnorm.rxnsat sat2 on sat2.code = con2.scui
    where rel.sab = 'NDFRT' and rel.rel = 'CHD' and sat2.atv = 'VA Class'
    
    union all
    
    -- Make "Drug Products by VA Class" (RxAUI = 3149154) the top of the hierarchy
    select 
      null parent_name, null parent_rxcui, null parent_rxaui,
      'CHD' rel,
      con.str child_name, con.rxcui child_rxcui, con.rxaui child_rxaui
    from rxnorm.rxnconso con
    where con.rxaui = 3149154
    ),
paths as (
    select level as lvl, 
    -- Use the name as the path rather than rxaui: SYS_CONNECT_BY_PATH(child_name, '\') || '\' as concept_path
    '\i2b2' || SYS_CONNECT_BY_PATH(child_rxaui, '\') || '\' as concept_path, 
    vac.*
    from va_classes vac
    start with parent_rxaui is null
    connect by prior child_rxaui = parent_rxaui
    )  
select lvl c_hlevel, concept_path c_fullname, child_name as c_name, 'N' c_synonym_cd, 'FA' c_visualattributes,
    null c_totalnum, 'RXCUI:' || child_rxcui c_basecode, null c_metadataxml, 'concept_cd' c_facttablecolumn,
    'concept_dimension' c_tablename, 'concept_path' c_columnname, 'T' c_columndatatype, 'LIKE' c_operator,
    concept_path c_dimcode, null c_comment, null c_tooltip, '@' m_applied_path, sysdate as update_date, 
    sysdate as download_date, sysdate as import_date, 'RxNORM' as sourcesystem_cd, null valuetype_cd,
    null m_exclusion_cd, null c_path, null c_symbol
from paths;

comment:4 Changed 4 years ago by ngraham

The following query builds on the one from comment:3 by adding the SCDF/SBDF terms from RxNorm under the VA classes. I've added the result to babel (see "GPC: Drug Products by VA Class").

This new hierarchy should mostly match the KUMC hierarchy (without the Epic med id leaves). I've gone about building the hierarchy for GPC directly from RxNorm (see query below). For KUMC ETL code, I originally wrote it sorta backwards - I started with the Epic med ID and worked my way up to VA class.

Note that none of the SCDF/SBDF items linked directly as a child to the VA classes. So, I used the RxNorm relationships table to link them indirectly using the 'isa' (read "is a") relationships. For example, "DISULFIRAM 500MG TAB [VA Product]" (which is from NDFRT and links to a VA class) is a "Disulfiram Oral Tablet" (which is the RxNorm SCDF I display under the VA class).

with 
va_classes as (
    select 
    con1.str parent_name, con1.rxcui parent_rxcui, con1.rxaui parent_rxaui, 
    rel.rel, 
    con2.str child_name, con2.rxcui child_rxcui, con2.rxaui child_rxaui
    from rxnorm.rxnrel rel
    join rxnorm.rxnconso con1 on con1.rxaui=rel.rxaui1
    join rxnorm.rxnconso con2 on con2.rxaui=rel.rxaui2
    join rxnorm.rxnsat sat2 on sat2.code = con2.scui
    where rel.sab = 'NDFRT' and rel.rel = 'CHD' and sat2.atv = 'VA Class'
    
    union all
    
    -- Make "Drug Products by VA Class" (RxAUI = 3149154) the top of the hierarchy
    select 
      null parent_name, null parent_rxcui, null parent_rxaui,
      'CHD' rel,
      con.str child_name, con.rxcui child_rxcui, con.rxaui child_rxaui
    from rxnorm.rxnconso con
    where con.rxaui = 3149154
    ),
scdf_sbdf as (
  select
    con1.str sdf_name, con1.rxcui sdf_rxcui, con1.rxaui sdf_rxaui, con1.sab sdf_sab,
    con2.str isa_name, con2.rxcui isa_rxcui, con2.rxaui isa_rxaui, con2.sab isa_sab
    /* None of the concepts marked as SCDF are direct children of any of the VA classes.  So, utilze the 'isa' 
    relationship.
    */
  from rxnorm.rxnrel rel 
  join rxnorm.rxnconso con1 on con1.rxcui=rel.rxcui1
  join rxnorm.rxnconso con2 on con2.rxcui=rel.rxcui2
  where rel.rela = 'isa' and (con1.tty = 'SCDF' or con1.tty = 'SBDF')
  ),
sdf_children as (
  select distinct --Lots of things match the 'isa' criteria - just need the distinct SCDF/SBDF
      va.child_name va_name, va.child_rxcui va_rxcui, va.child_rxaui va_rxaui, 
      rel.rel, 
      sdf.sdf_name, sdf.sdf_rxcui, sdf.sdf_rxaui--, sdf.sdf_sab
  from rxnorm.rxnrel rel
  join va_classes va on va.child_rxaui=rel.rxaui1
  join scdf_sbdf sdf on sdf.isa_rxaui = rel.rxaui2 and rel.rel = 'CHD'
  -- where sdf.sdf_sab = 'RXNORM' --Note, all SCDF/SBDF that join on rxaui are from RXNORM
  ),
paths as (
    select level as lvl, 
    -- Use the name as the path rather than rxaui: SYS_CONNECT_BY_PATH(child_name, '\') || '\' as concept_path
    '\i2b2' || SYS_CONNECT_BY_PATH(child_rxaui, '\') || '\' as concept_path, 
    va_sdf.*
    from (
      select * from va_classes va
      union all
      select * from sdf_children sdf
      ) va_sdf
    start with parent_rxaui is null
    connect by prior child_rxaui = parent_rxaui
    )  
select lvl c_hlevel, concept_path c_fullname, child_name as c_name, 'N' c_synonym_cd, 'FA' c_visualattributes,
    null c_totalnum, 'RXCUI:' || child_rxcui c_basecode, null c_metadataxml, 'concept_cd' c_facttablecolumn,
    'concept_dimension' c_tablename, 'concept_path' c_columnname, 'T' c_columndatatype, 'LIKE' c_operator,
    concept_path c_dimcode, null c_comment, null c_tooltip, '@' m_applied_path, sysdate as update_date, 
    sysdate as download_date, sysdate as import_date, 'RxNORM' as sourcesystem_cd, null valuetype_cd,
    null m_exclusion_cd, null c_path, null c_symbol
from paths
;

comment:5 Changed 4 years ago by ngraham

  • Priority changed from minor to major
  • Status changed from new to accepted

Now that we've passed the 4/1 start date, this is not minor priority. This is in progress.

comment:6 Changed 4 years ago by ngraham

  • Owner changed from ngraham to mhoag
  • Status changed from accepted to assigned

I've been working on rewriting the medication mapping code to be cleaner and more portable little bits at a time over the past couple of weeks. I'm at a point where I'm fairly happy with the results. Matt has agreed to review - see med_ontology_g78 branch ending with changeset [ed36d5e947c2]. See also code review notes.

I changed the paths (c_fullname) to be constructed of the RxAUIs and Clarity medication IDs (rather than a combination of names/RxCUIs). This is much cleaner than all the string manipulation I did before to keep paths unique and short enough to fit in the column. So, to preserve saved queries at KU I plan to load the old medication ontology with all items hidden. That way future queries will use the new paths but old ones will still work.

For KU data, it looks like the medication concepts that get mapped somewhere in the hierarchy make up over 99% of medication facts (orders, administrations, dispenses, etc). I then put the top 500 medications (by fact count) in a folder called Other Medication Concepts. I've gone to some work to manually curate some of the medications - see load_med_map_manual_curation paver task.

comment:7 Changed 4 years ago by mhoag

  • Owner changed from mhoag to ngraham

Checklist

Is the use-case clear?

Mostly, but there is little to no documentation about how to construct the per-site curated data for med_map_manual_curation.csv.

Is the use-case reasonably well captured in automated tests?

Yes

Is it clear how to run the tests? which end of the code is up?

e.g. with a README

README could use some notes/pointers on how to create med_map_manual_curation.csv.

Is the code something you would be happy to maintain? (cf. WritingQualityCode?)

Is the code DRY?

See by-file comments

Could we expect our peers in the informatics community to use this code?

With a little more documentation we could expect our peers to use this code and generate their own manual med mapping (though it may be painful)

Is the code secure? Is the flow of authority clear?

Most the addition of SQL code, python code adheres to our authority model.

Are external design constraints clear?

For example, is it clear which names were made up and which come from elsewhere?
Are sources cited?

Naming is explicit and
Sources are cited (RxNorm docs and paper on how to mashup RxNorm and NDF-RT Hierarchy)

Is the copyright notice clear and correct?

If we're using anyone else's work, is it clearly licensed and acknowledged?

Copyright is clear on all relevant files

File by File Review

epic_etl.py and heron_build.py

I guess I don't really see the need for the task load_med_map_curated_data since there is no need to enforce order dependencies. The most concise representation would be to put all three of the med mapping curated data tasks in the needs block of load_source_concepts.

epic_med_mapping.sql

45, 73: Should we give a hint here about what we do now with multiple GCNs? Seems we allow for multiple GCNs to map to a single med_id

46, 74: What is a "suppressed" column (which one is it)?

43-93: This code seems awfully similar… is it a candidate for DRY? The more I look at it the more I think maybe not.

124-131: I like this strategy! But KUH|MEDICATION_ID is a magic string that is referenced outside this file (e.g. it could be referenced in epic_concepts_load.sql). If you are willing this would be a good time to start a string constants sql file. Then just cross join your med_constants view with the string constants one, and it will work w/o any other code changes.

158: A comment as to why 3257489 is excluded would be nice.

380-398: It seems like there are other ways to organize the "other" bucket rather than just treating it as a flat 500 items. Maybe, organize the "others" further in alphabetical buckets or by fact count buckets (less desirable because they could jump from bucket to bucket from release to release). I'm sure we could come up with numerous desirable ways to organize them.

424: Was this change (removal of select *) motivated by a SQL error? i.e. was there a change to rxnorm_terms?

487: Add a comment stating that the initial med_map_manual_curation.csv can be generated from the query below. Also include a description of the columns (how they are used, what they reference), and their constraints (part of the primary key, can be null, not used programmatically, etc.) So if other institutions wanted to manually map meds they could follow their nose to this starting point.

Clarity_table_extract_script.txt

88: zc_pharm_class - what is this used for?
NG: Was supposed to be for mapping meds with pharm class synonyms. Didn't end up going that way.

med_map_manual_curation.csv

Looking at this table, it is not immediately clear what kind of information is stored in each column, and which columns are used in the code and which are used as reference. That is, the curated data is too complicated to be self explanatory and needs documentation somewhere explain its use and constraints.

Further, it seems that this curated table was initially generated via some script (over 7000 rows, with fact and patient counts for each medication…), but it is not immediately obvious where/how it was generated. And since this table is site specific, that is an important detail.

As I have discovered (through discussion with Nathan), the query to initially generate this curated data can be found in epic_med_mapping.sql 490-532, but it does not describe how the columns are used, their constraints, or what to do with the curated table after it has been generated. I would add comments to that effect here, and maybe even a README level pointer.

Additionally I would reorganize the columns to aid in readability.

  • 1st and 2nd columns should be CLARITY_MEDICATION_ID, CLARITY_NAME, since those are the most interesting columns to someone editing the table (maybe also sort on one of the two columns so that there is a visual indication that a med maps to two locations in the hierarchy?)
  • 3rd and 4th columns should be VA_RXAUI, VA_NAME, since that is the top level mapping. Also, I believe that these columns should ALWAYS be populated (constraint) since they only add in readability.
  • 5th and 6th columns should be SDF_RXAUI, SDF_NAME, respectively (order is switched currently) and optionally blank if the med is mapped to the VA instead of the SDF.
  • 7th and 8th columns should be FACTS, PATIENTS counts, since they are just for reference (not used as a part of the ETL, just a column to sort on when editing the mapping).

med_map_like_name.csv

removing the extra column seemed like a good idea. I was about to suggest book-ending the like names with '%' in the SQL so that the curated data was agnostic to SQL, but it appears (for good reason?) that some of the terms have '%' embedded in them (e.g. %ALBUTEROL%ACTUATION%)

heron_load/curated_data/med_map_exclude_modifiers.csv

I wonder about using the full concept_cd... Maybe a truncated version would be better with just med id (e.g. 28705 instead of KUH|MEDICATION_ID:28705). In my mind the KUH|MEDICATION_ID is sort of implied by curated data file name. Also was the "Name" column going to be used programmatically at any point or is that just a convenience when looking at the file on its own?

comment:8 follow-up: Changed 3 years ago by dconnolly

I'm inclined to close this once the .csv file is shared; addressing points from the description:

Obtain RXNorm from NLM and NDRFT from VA

As noted in comment:3, we downloaded the RXNorm files from NLM. It's updated weekly; the copy we grabbed was RxNorm_full_06032013.zip (md5sum b9079f7963bf473867b1b74c410ad9f1).

Develop, validate and test schema Negotiate and install metadata for RxNORM/NDFRT

It's installed on babel as GPC: Drug Products by VA Class.

The code is yet to be integrated into the default branch; see #152.

Distribute schema enhancement and update GPC i2b2 ontology database

Nathan, please upload the .csv file to CDT. (cf #76)

comment:9 in reply to: ↑ 8 Changed 3 years ago by ngraham

  • Resolution set to fixed
  • Status changed from assigned to closed

Replying to dconnolly:

I'm inclined to close this once the .csv file is shared

I've uploaded rxnorm.csv to the CDT.

audit notes

select count(*)
from gpc_terms
where c_fullname like '%\Medications%'
;
-- 5805

select sum(h_int(replace(c_fullname, '\i2b2', '')))
from gpc_terms
where c_fullname like '%\Medications%'
;
-- -21923471739

update gpc_terms
set update_date = timestamp '2015-04-21 12:40:16' --date of philip's msg to gpc-dev
where c_fullname like '%\Medications%'
;
Last edited 3 years ago by dconnolly (previous) (diff)

comment:10 Changed 3 years ago by bos

UTHSCSA has merged med_ontology_g78 into our default, but we temporarily lowered the 99% threshold of mapped meds. We were unable to generate med_map_manual_curation.csv due to a missing clarity table (zc_pharm_class) in our data source. we achieved 65% mapping using the KU csv file, and we have requested a refresh of source data from UTMed which will allow us to try the data curation and increase our mapping completeness. A related (UTHSCSA ticket) has details on loading RXNORM from NIH.

comment:11 Changed 3 years ago by dconnolly

  • Blocking 181 added

comment:12 follow-up: Changed 3 years ago by ngraham

  • Resolution fixed deleted
  • Status changed from closed to reopened

We've changed the way the fullnames are created for the medication ontology. See #152. So, let's load those on Babel. See also #181.

comment:13 in reply to: ↑ 12 Changed 3 years ago by ngraham

  • Resolution set to fixed
  • Status changed from reopened to closed

Replying to ngraham:

We've changed the way the fullnames are created for the medication ontology. See #152. So, let's load those on Babel.

Done. See ticket:181#comment:4.

comment:14 follow-up: Changed 3 years ago by dconnolly

  • Resolution fixed deleted
  • Status changed from closed to reopened

Nathan,

What do you make of the list Nate sent?

comment:15 in reply to: ↑ 14 Changed 3 years ago by ngraham

Replying to dconnolly:

What do you make of the list Nate sent?

Response Mon Apr 20 15:40:08 CDT 2015. I called Nate and he's going to send me Aaron's phone number so we can hopefully work through some of these questions in real-time.

All this reminded that the KUMC medication mapping code (epic_med_mapping.sql) uses terms marked as "suppressed" (see RxNorm documentation for definition). Perhaps we shouldn't be?

comment:16 Changed 3 years ago by ngraham

  • Cc nateapathy Aaron.Meyer@… added
  • Resolution set to fixed
  • Status changed from reopened to closed

I chatted with Aaron on the phone and I think we're on the same page with respect to the SCDF/SBDF being the lowest (most granular) level of the current GPC medications ontology.

Aaron noted that he's not concerned with the current design and he doesn't have any questions at this time. He said that ~30% of their order volume (CMH I suppose? I didn't think to clarify on the phone) is currently mapped to the GPC ontology. For the remaining ~70%, they have the RxCUIs (via NDC) but the code to map these CUI's to the parent SCDF/SBDF via the RxNorm relationship table(s) hasn't been completed yet.

I suggested that (epic_med_mapping.sql) might be a useful reference. It's likely not directly reusable of course - for one thing, at KU most of our medications map to RxNorm via GCN rather than NDC (in fact, the KU code doesn't even bother with NDC normalization).

I've created #272 to keep track of the question from comment:14 regarding obsolete terms.

comment:17 Changed 2 years ago by dconnolly

  • Milestone changed from data-domains2 to data-domains3
  • Resolution fixed deleted
  • Status changed from closed to reopened

I have added the Semantic Clinical Drug level concepts to the hierarchy which should make it easier to map to.

-- Reeder 19 Oct

Perhaps that's not a substantive change to the ontology, but I don't know enough to be sure.

comment:18 Changed 2 years ago by dconnolly

  • Blocked By 390 added

comment:19 Changed 2 years ago by ngraham

During the GPC-dev call on 2015.10.20, we discussed Phillip's changes and it looks like the resulting hierarchy isn't completely compatible.

For example, running a query for ACAMPROSATE CA 333MG TAB,EC [VA Product] on Babel using Phillip's changes wouldn't work at KU.

  • Ontology location on Babel: GPC: Medications #78 \ [AD000] ANTIDOTES,DETERRENTS AND POISON CONTROL \ [AD900] ANTIDOTES/DETERRENTS,OTHER \ Acetylcysteine Injectable Solution \ ACETYLCYSTEINE 200MG/ML (PF) SOLN,INJ [VA Product]
  • i2b2 path: \GPC\Medications\RXAUI:3149154\RXAUI:3257490\RXAUI:3257495\RXAUI:1582116\RXAUI:3095805\

At KU, we don't show ACETYLCYSTEINE 200MG/ML (PF) SOLN,INJ [VA Product] - the RxNorm based hierarchy stops one level higher (Acetylcysteine Injectable Solution) and under that we have local KU Hospital/Epic terminology (ACETYLCYSTEINE 200 MG/ML (20 %) IV SOLN, for example).

  • KU/HERON i2b2 path: \i2b2\Medications\RXAUI:3257490\RXAUI:3257495\RXAUI:1582116\MEDICATION_ID:xxxxx\

comment:20 Changed 2 years ago by dconnolly

  • Cc dlyoshih added
  • Owner changed from ngraham to mish
  • Status changed from reopened to assigned

WISC took the ball 20 Oct

Tom / Wisc: currently working on standard meds and will consider including the RXCUI level (good timing!)

Changed 2 years ago by dconnolly

Changed 2 years ago by dconnolly

comment:21 Changed 2 years ago by dconnolly

  • Blocked By 390 removed

comment:22 Changed 2 years ago by dconnolly

  • Resolution set to fixed
  • Status changed from assigned to closed

I gather WISC had positive experience with this update (experience report perhaps to follow) so let's call it done.

To be precise on what it is, since local variation in columns such as c_basecode and c_dimcode are out of scope of this decision, let's be clear on what's critically in scope, i.e. the paths:

c_hlevel terms path_hash
1 1 499612130
2 32 -6222070501
3 331 4198142981
4 3396 -120819493038
5 9639 -84534248949
6 6577 -130940847385

attachment:summarize_terms.sql was used to compute these numbers.
attachment:gpc-terms-summary.csv has the corresponding numbers for all of gpc_terms.

For reference:

Note: See TracTickets for help on using tickets.