nearby: DevTeams, SoftwareDev, KeyGoalTracking, gpc-dev

GPC HackathonTwo Jan 22-23 2015 in San Antonio

Hosted by the Clinical Informatics Research Division, UT Health Science Center in San Antonio.

The gpc-dev group is invited, and the whole GPC community is welcome. GPC is not sponsoring travel.

We plan to facilitate remote participation in technical topics.

The agenda was discussed at the Dec 18 GPC Global Call, though suggestions are still welcome; send to gpc-dev@….


  • Alex Bokov ​bokov@…, hosting contact
  • Dan Connolly (dconnolly@… 913-945-6741) and Russ Waitman, chairs

Registration, Attendance, Sign-in

Please fill out the attendance survey.

The host also needs us to sign in on site.

attendance survey admin access

Agenda: 8am-5pm Jan 22, 23

Note: Phase 2 planning runs in parallel as needed all day Thu and Fri.

Wed, Jan 21 (optional)

  • 17:00 Meet and Greet Social at 210ceviche (9502 Interstate 10 Frontage Road #101, San Antonio, TX 78230)

Thu, Jan 22

Dinner: 18:30 to 20:00 at Texas Roadhouse (16915 San Pedro Avenue, Hollywood Park, TX 78232)

Fri, Jan 23

For those that aren't flying out right away, let's find some place to hang out after.

Technical topics: Hacking, Discussion

We would like site team members to bring laptops and have connectivity to complete specific joint development capabilities for phase 2 and set us up for phase 2.

PCORNet DRN (CDM) query from GPC i2b2

Breast Cancer Survey Finder File

  • Hacking Session led by Connolly @ DevTeams#kumc (and UIOWA?)
  • resolve remaining issues with DataSecurity#bc-survey-sop
    • the query and data elements #204
    • "freezing" corresponding MRNs, date shifts #110
    • installing data builder or setting up a work-alike process (#202, #205)
    • submitting the resulting data set (#211)
    • QA, verification with tumor registrar, etc. as detailed in DataSecurity#bc-wp1
    • discussion of wp2, 3, 4

Building Analytic Data Sets

  • design discussion led by Connolly @ DevTeams#kumc, Bokov @ DevTeams#uthscsa
  • experience with handling data builder results, leading to refinements of the data builder design (#106)
  • demo of how and why to pull I2B2 data in a domain-specific, one-row-per-visit manner (and how to integrate this code into Data Builder)

Integrating Text Notes

  • Hacking sessions led by DevTeams#mcw
  • Would like to have a couple sites work with Jay, Glenn, or George at MCW and Matt Hoag at KUMC to replicate the MCW note de-id pipeline into i2b2.
  • PCORI's slides describing the phase 2 highlight notes, NLP, and computable phenotypes. Having the ability to de-identify notes is foundation.
  • Slides are located here
  • If you are interested in going thru the process of loading the running the software Please bring the following to the session:
  • If you want to bring some test data , please do !

EMR Integration

  • experience report, design discussion led by McClay? @ DevTeams#unmc
  • Embedding Research in clinical practice: EMR integration for Prospective Trials
  • focus on the specific PCORnet "plays" we need to be able to execute for prospective trials in the clinical workflow with the EMR.
    • data collection options (clinician, patient, PROs, EMR flowsheets, notes, REDCap)
    • trial recruitment alerting
    • intervention selection/ordering and randomization

Support for ALS, Obesity surveys

  • ALS Survey post-mortem?
  • weight cohort data elements (for cohort characterization) #33
    • assessment of obstacles to implementing BMI percentiles #210
  • what we have learned so far about what is needed for i2b2 to fully support the needs of the cohort queries.

GPC Ontology v1

HERON Code Sharing

One reason that we're finding data alignment challenging is that not all sites share HERON code (see DevTeams#kumc). This looks to be a point of emphasis for GPC going in to phase 2.

  1. What are people's obstacles to using Heron code? Would sites use more of it if it wasn't an all-or-none commitment? How much of a barrier is due to infrastructure differences?
  2. How can Heron code be better modularized so that there is a clean, documented way to incorporate (and merge changes to) bits and pieces of it in your ETL process without copy-pasting stuff by hand each time you want to incorporate an ETL feature that another site implemented?
    • UTHSCSA's special_needs module for modular Heron flexibility and heron_load crash recovery (#125)
    • UTHSCSA's SID/service_name feature (#38)
    • UTHSCSA's mock i2b2 version config
    • suggestion: use dynamic generated code (for strings and one-off test cases)
      • UTHSCSA currently does post processing to strip KU strings from heron_terms for babel, leaving our local i2b2 in in KU-mode, which we'd like to fix, preferably it can be done in a configurable/sharable way to help others. But we need time/resources to do this.
    • suggestion: make hard coded one-off test cases generic (or use dynamic code generation templates) (#88)
    • code standard suggestion: (perhaps an unspoken rule already) add a threshold and 1/0 error for generic test cases that cause records to be omitted or null-ed
    • data staging suggestion: most recent site helps next (or updates the docs?). Documentation on staging data from external sources (UMLS, NIH, CDC, WHO, etc.) leaves much to be desired.
    • data staging suggestion: centralize documentation of curated data how-tos
  3. What conventions can we come up with that permit incorporation of site-specific code that doesn't benefit KUMC be done in a manner that does not impact KUMC or add to their workload or risk?
    • example: UTHSCSA's naacr mrn_mapping (ask Angela)
    • how to separate sharable/non-sharable code?
    • how to flag existing portability issues (e.g. continue creating gpc-trac tickets with "portable" in the summary, or do something else?)
  4. What kind of post ETL analysis/verification do people do? Do people review the upload_status table? What thresholds for non-uploaded records give you pause? How do you find out when you’re potentially missing data?

For reference, tickets with "portable" in the summary:

Ticket Summary Owner Type Status Priority Milestone
#44 portable HERON ETL for NAACCR reeder enhancement assigned major tumor-reg-18
#55 source of idx_table not portable Nathan Graham problem closed major
#66 all_facts_use_known_concepts test in HERON ETL are not portable Nathan Graham problem closed major
#71 portable HERON ETL for Epic Matt Hoag enhancement assigned minor
#88 HERON ETL SQL data check thresholds are not portable and not always applicable Nathan Graham design-issue closed minor
#104 ETL for MRN from enterprise master patient index is not portable Hubert Hickman defect assigned minor
#129 parameterize i2b2 and CPT terminology tables since I2B2METADATA2 isn't portable Alex F. Bokov enhancement assigned minor
#179 Data Builder not portable to i2b2 1.7 VM due to lack of postgresql support Dan Connolly problem closed major data-agg1

Infrastructure Discussion

  • Best practices for using Jenkins and other deployment/automation tools (requested by UTHSCSA)
  • comparing notes on performance improvements-- where to look for problems on ETL tasks that gobble up temp space or run forever.
  • Does anyone have any ideas about estimating query run-time? (Angela/UTHSCSA)

Phase 2 planning

By this meeting we should have finalized our partners for our phase 2 letter of intent and will need to be writing our phase 2 proposal. Russ and select site PIs will work on creating a compelling proposal that also transitions the network to serve CER research for PCORnet, our universities, and our healthsystems. Hopefully, we'll have the aspirin trial submission behind us.

  • Discuss requirements for Phase II operations
    • responsive to PCORI PFA but also aligned with CTSA PI slides highlighting RICs and TICs
    • The PCORnet phase specifically calls for CTSA collaboration so ideally we position the GPC to lead in this area of aligning recruitment and regulatory contracting needs for NCATS
  • Review best practices and SOP these three areas:
    • Embedding Research in clinical practice
    • Capturing patient reported outcomes
    • Engaging clinicians
  • Aspirin trial proposal post mortem? (due Jan 31)

For reference:

Meeting Notes

We normally to take notes on technical discussion and share them publicly. See #12 for meeting record norms.

Tickets discussed at this meeting (in progress):

Ticket Summary Owner Type Status Priority Milestone
#23 how to query for BMI and other core clinical measurements? Laurel Verhagen design-issue reopened minor snow-shrine-3
#109 represent PCORI CDM terminology as i2b2 metadata Nathan Graham enhancement closed major drn-basic-query
#110 securely de-identify and re-identify patient data Steve Fennel enhancement closed major bc-survey-cohort-def
#145 transform (ETL) GPC i2b2 data to PCORNet CDM Nathan Graham enhancement closed major drn-basic-query
#146 map CDM-in-i2b2 to existing i2b2 data and/or ontologies Nathan Graham enhancement closed minor drn-basic-query
#158 usable view of LOINC lab terms Angela Bos enhancement closed major data-domains3
#159 GPC REDCap Service for data sharing Bhargav Srinivas Adagarla enhancement closed major data-quality3
#174 federated login for GPC data store Bhargav Srinivas Adagarla enhancement closed major data-sec-check
#201 Align ACT, ARCH/SCHILS c_table_cd in table_access (item_key) across sites Lav Patel design-issue closed major snow-shrine-2
#210 Query by BMI percentile among children. Hubert Hickman enhancement closed major obesity-survey-def
#217 data builder failed to work with to MS SQL Server at MCRF Joe Finamore problem closed major bc-survey-cohort-def
#221 Weight and Health survey data request plan sschlacter design-issue closed major obesity-survey-def
#227 Data QA for Breast Cancer Finder File Tamara task closed major bc-survey-cohort-def
#228 refine DataBuilder output for traditional analysis (UTHSCSA-CIRD/datafinisher) Alex F. Bokov enhancement closed major cohort-char1
#229 Enrollment terms based on Catchment Area, encounters, etc. preeder design-issue assigned minor snow-shrine-3

Meeting Location and Lodging

Perhaps we can use two or three minivan cabs to get there from the Marriott.


meeting map thumbnail

This campus map indicates visitor parking.

UT Police have been informed of visitors on campus for those days; anyone driving here can park in visitor parking lot 8 the gate security will advise them where to go. When you check in at the entry security gates use event number 9411 on the Main Campus to tell security in order to gain access to the facilities. They may be calling the event the Greater Plains Collaborative Meeting rather than HackathonTwo, but as long as you have this event number you shouldn’t have any problems.

Note: SA drivers must follow the hands-free cell phone law starting Jan. 1.

Meeting Room and Facilities

There is a front entrance which has stairs going up into the building and there is a back entrance as well it has a large concrete ramp that leads up to a few stairs which lead up into the building as well. If the front entrance is taken walk in - take a left and see the double glass doors entrance to the library immediately. From there continue in thru the metal bars at the indoor entrance, continue straight and look to your right you will immediately see the elevators, (at this level it is the 3rd floor), so you go down on the elevator to the 2nd floor, as you exit the elevator directly in front of the elevators, walk straight thru the double glass doors which read library classrooms on them, continue straight thru to the very back you will see the last hallway walk it to your right and the classrooms are adjacent to each other on the left side of this hallway -#’s 2.039/2.048.

If folks park at the visitor parking lot 8 the library is to the right as they face the Dental School Building, if they walk in thru the back entrance all is the same except as they first step into the building take an immediate right and follow the above directions.

There will be wifi, power cords.

Remote Participation

Use the attendance survey to register your interest in remote participation in a topic so we know which sessions will have remote participation and so we can help you get connected.

The agenda schedule calls for up to three sessions in parallel; we'll allocate these facilities to sessions as necessary:

  1. GoToMeeting Meeting ID and Access Code: 476-096-581
  2. GoToMeeting Meeting ID and Access Code: 164-693-557
  3. HackathonTwo Google+ Event with hangout for voice, video, chat

Meanwhile if you want to catch up either in near-real-time or after the fact, see:

Another possibly useful facility:

Hotel Accommodations

Courtyard Medical Center

8585 Marriott Dr
San Antonio, TX 78229-3217

Use UTHSCSA Special Code: UTXL

Go to the Courtyard Medical Center website. On the left side, select the dates, click on the Corporate/promotional code radio button, type in UTX, and click Find. Special rates: $98 per night for king or double beds.

Last modified 7 years ago Last modified on Feb 2, 2015 4:14:56 PM

Attachments (4)