This page will hold notes and curriculum for the course for Fall 2016 at the University of Kansas Medical Center.

Week 1. August 23rd and 25th

Given the schedule, Tamara will lecture and run the lab for the first week and cover:

  • basic HERON login
  • VPN access
  • Wiki access
  • Fileserver access
  • Fundamentals of inclusion exclusion criteria and discrete math (Venn diagrams instantiated in queries)
  • review of background readings by Shortliffe
  • basic fishing in HERON/i2b2 (but not temporal queries).
    • review videos
  • Homework assignment on basic query for review on Thursday
  • discussion of office hours versus appointments

Student Pages: HaileyBaker EberBeck SamanthaKing AlyssaRollando JimmyNguyen

Steps to access files (Windows).

  1. Open myComputer folder.
  2. Right click
  3. Click add a network location
  4. Click next
  5. Choose a custom network location. Enter
  6. Enter KUMC login/password

Thur 8/25

  • All students completed HERON searches but we only had time to review searches by Jimmy, Alyssa, and Eber
  • All students are able to search HERON, access the necessary file folder, and update trac.
  • Homework: read the Shortliffe chapter.

Week 1. August 23rd and 25th Tamara lectures. Introduction to i2b2 and system access logistics.

Week 2 Tuesday 8/30 and Thurs 9/1

Russ lectures.

  • Lecture materials for today draw from the following presentations

Data systems as a representation of clinical and administrative activity

  • medication delivery cycle
  • orders to results to bills to claims
  • diagnoses and procedures
  • review of data sources currently in HERON and other integrated data repositories
  • what's possible with health system based data versus an idealized omniscient data source.

If time,

  • query homework review.
  • introduce discussion of potential projects for this semester

Building queries, defining cohorts and i2b2 as a pragmatic set theory laboratory.

Week 3 September 6 and 8th Potential Research Projects discussion and more i2b2 query laboratory.

Also, log into i2b2

Create your consort diagram and timelines for your research projects.

Week 4 September 13 and 15th

Timelines are now working. Will review their utility briefly on Tuesday.

You also have all been granted access to reuse Jasmin's Breast Cancer de-identified data request project from last year. We will briefly review this project on Tuesday and introduce SQL.

Were able to review SQL and then have everyone access the Jasmin's de-identified data request project in REDCap, extract the CSV files into a SQLite database, and link the patient table to the data table.

Thursday will be a further lab with SQL and the REDCap project.

Week 4 September 20 and 22nd

Distributed Cohort Discovery Exam. Due next Wednesday.

Review of projects and cohort definitions.

See the project specifics in the student wikis as attachments

Note: to review the blog articles of heron releases you need to be on campus or use VPN

Week 5 September 27 and 29th

Project specifics continued. Need to start with Eber.

Query refinement and data sets: based upon Tamara's last class, let's shift focus from the course outline and get your computable phenotypes and analytic datasets defined so when we do SQL, you can be manipulating the data for your projects.

Tuesday: Discussed, reviewed and refined each person's computable phenotype and desired data elements for their data request.

Thursday: Thank you all for submitting your homework on time. We have graded all the exams and I am deciding on scoring for partial credit. Worth noting that no one was 100% accurate though for many questions people were correct.

Goal for Thursday will be finalizing data requests.

Need for each data request

  • title
  • description
  • additional file describing the project
  • persons who will be accessing this project

For next week, Tamara will go over the exam, work more on SQL with your data sets.

Week 6 October 4 and 6th Exam discussion.

Russ was able to submit the data requests. Students were able to access their REDCap projects and load their data into SQLLite.

Week 7 October 11 and 13th No class on the 11th due to Fall break

October 13th: attendance of the Greater Plains Collaborative Learning Engagement Conference ( at the Kauffman Foundation and the sessions regarding:

  • State of PCORnet and the GPC
  • Youth Patient Panel for Research

Week 8 October 18 and 20th

Brief discussion of clincal data research networks, PCORnet (, and patient engagement based upon attending the GPC learning engagement conference last Thursday.

Data transformation from SQL and REDCap to address your research hypotheses.

  • REDCap
  • SQL tables and temp tables
  • We will work through hypotheses from your projects and think of the kinds of transformations we need to conduct to create the analytic dataset(s).

We used Samantha's and Hailey's projects as examples:

  • ejection fraction as a proxy for adequate cardiac monitoring pre and post cancer treatment
  • TIPS placement and revisions over time

A common need across both of these efforts is to define an event (TIPS placement for Samantha; exposure to trastuzumab and anthracyclines - Antibiotic antineoplastics for Hailey) then calculating occurrence of a fact before or after that event.

  • are or when are the revisions after TIPS occurring?
  • do we have a baseline ejection fraction before chemotherapy treatment? Do we see those ejection fractions after treatment so we know proper monitoring for cardiotoxicity is occurring?

A key approach is to start small and work incrementally to develop your analytic set.

  • What is the smallest combination of variables that you can use to answer your first question, test a single hypothesis, or generate the most basic descriptive statistics for your data.
  • Always comment your code as you assemble the steps which construct your data set.
  • It's useful to toggle between running a query against the table that you're extracting from while developing your specific dataset.
  • Use "SELECT count('your field of interest)" while you develop your analytic set. Do the numbers make sense? Check for cases where you might be creating duplicate rows in your analytic dataset.

Demonstrated the following techniques:

  • CREATE TABLE to create slices of data from the data_view or data_table.
    • then do a INSERT INTO table SELECT FROM to pull data from the existing table of extracted data from HERON into your new table that you are using to create an analytic dataset.
  • DROP TABLE to delete working copies of your analytic set if you are scripting the whole analysis in a single file.
    • warning: be careful and don't drop the original tables provided from HERON! Drop table is very powerful and empties and removes the table. Fortunately, you can go back and recreate your database from the original files or also consider saving back ups of your database.
  • using LIKE and IN as part of your where clause to select the appropriate codes and then inserting them into a merged table that you can build as your analytic set.
  • using MIN() and MAX() functions to find the first or last values of interest.
    • don't forget to include GROUP BY when using aggregate functions

For next class:

  • how to calculate a date difference so you can define the times from your main exposure event and the preceding of subsequent events you would like to include in your analysis.
  • Eber, Jimmy, or Alyssa present their starting point for their analytic data sets.

Week 9 October 25 and 27th

Reviewed SQL against the Jasmin Breast Cancer database. We conducted the data difference and date between capabilities to define windows around variables of interest. Specifically we looked at hypertension diagnoses relative to the first breast cancer diagnoses. There was some difficulty for people with properly importing their tables. We created tables, dropped tables, joined tables and use aggregate functions.

Distributed the SQL exam and used those exercises as a model for generating an analytic set.

Jimmy informed us he was able to use his data to submit an abstract over the weekend. He will give an overview of his abstract for next class and we will also discuss other projects as well as any questions regarding analytic dataset preparation using dates as outlined in today's class.

Reviewed SQL exam questions and also reviewed Jimmy's abstract and resulting SQL. Considerable use of create table statements.

Decided the take home SQL exam will be due next Thursday.

Week 10 November 1st and 3rd


  • reviewed Alyssa's project: challenges with drain output flowsheet attribution
  • reviewed Eber's project: consider recasting to look at the population where the cognitive testing is done (MOCA is in neurology; MMSE is in alzheimer's clinic) and then diabetes instead of starting with diabetics with neurologic complication


  • review Hailey's project: challenges with mini HDMI adapter....

Week 11 November 8th and 10th Tuesday:

  • discussed SQL Exam.
  • Worked on temporal relationships for TIPS and Breast Cancer projects.
  • discussed challenges with Alyssa's project and tying the drain output truly to output from the fistula
  • Can also get her notes data to then search for text indicating "healing" with a data pull by Maren after the identified data request is approved by Stephen Waller
  • Example:

select * from nightherondata.observation_fact nof where nof.CONCEPT_CD='KUMC|REPORTS|NOTETYPES:1' and patient_num in (put the mrns in here) ;

select * from nightherondata.concept_dimension ncd where ncd.CONCEPT_PATH like '%Visit Notes%Note Types%\%Progress%' ;

Can then link these note extractions into new tables in SQLLite

  • To do for next class is have your specification for the analytic file(s) for your project defined in Excel/CSV

-- clearly define your destination


  • review analytic file specifications
  • further discussion of Eber's project since the approach changed.

Week 12 November 15th and 17th


  • review analytic files
  • Deeper discussion of Jimmy's project and his refactor code and results when properly targeting the actual hospital length of stay.


  • any final review of analytic files and questions raised.

Week 13 November 22nd and 24th Tuesday:

  • review drafts of abstracts and analytic files.


  • celebrate Thanksgiving and be thankful for REDCap, SQLite.

Week 14 November 29th and December 1st


  • Abstract presentations and discussion by Jimmy, Alyssa, and Hailey

For final written wrap ups of your projects reference last year's class and update your wiki page and prepare your materials for your:

  • abstract
  • your presentation from this week
  • code to create your analytic file with comments
  • place your final analytic files back as file attachments in your REDCap project for future reference.
  • also go ahead and attach your abstract, code, and presentation in your REDCap project for future reference


  • Abstract presentations and discussion by Eber and Samantha

Week 15 December 6th and December 8th Tuesday:

  • Other data models and integrated data repository schemas.
  • Reference the attached presentation: Biomedical Informatics Driven Clinical Research: Closing Observations on Data Networks and Observational Studies PVRM868ClosingObservationsonDataNetworksDec2016.pdf
  • Reminder: polish up your wiki pages with all the files and presentations on the wiki and in your REDCap projects. Make it so future students enjoy your work and potentially extend or collaborate with you.


  • Course evaluations
  • feedback for MS-CR
  • feedback for future classes and student preparation

For reference, you can discuss your now fascile in computable phenotyping using i2b2/HERON and reference this introduction to computable phenotyping by Dr. Califf, current FDA commissioner.

Last modified 19 months ago Last modified on Dec 8, 2016 9:17:06 PM

Attachments (5)