Julia for Data Science

In [1]:
# for reproducibility
versioninfo()
Julia Version 1.7.3
Commit 742b9abb4d (2022-05-06 12:58 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin21.4.0)
  CPU: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, skylake)
Environment:
  JULIA_NUM_THREADS = 8
  JULIA_EDITOR = code

From previous two tutorials, we practiced a few essential data wrangling steps in R and Python.

  • Pipes

  • Data ingestion

  • Data filtering (rows) and selection (columns)

  • Data sorting and ranking

  • Data merging (joins)

  • Mutate (dplyr) or transform (Julia)

  • Pivot (dplyr) or reshape (Julia)

  • Group by

  • Data summaries

  • Visualization

The Julia package DataFrames.jl is the analog of dplyr and data.table in R and panda in Python.

Optional reading: Comparison of DataFrames.jl with Python/R/Stata.

In [2]:
using AlgebraOfGraphics, CairoMakie, CSV, DataFrames, Dates, Pipe
In [3]:
# path to MIMIC data
mimic_path = Sys.islinux() ? "/home/shared/1.0" : "/Users/huazhou/Desktop/mimic-iv-1.0"
Out[3]:
"/Users/huazhou/Desktop/mimic-iv-1.0"
In [4]:
# for printing all columns of DataFrame
ENV["COLUMNS"] = 1000
Out[4]:
1000

Data ingestion

Plain text files can be parsed by the CSV.jl package.

icustays_tbl

We use the dateformat argument to correctly parse charttime as DateTime.

In [5]:
icustays_tbl = CSV.File(
    mimic_path * "/icu/icustays.csv.gz",
    dateformat = "yyyy-mm-dd HH:MM:SS"
    ) |> DataFrame
Out[5]:

76,540 rows × 8 columns

subject_idhadm_idstay_idfirst_careunitlast_careunitintimeouttimelos
Int64Int64Int64StringStringDateTimeDateTimeFloat64
1178674022452853431793211Trauma SICU (TSICU)Trauma SICU (TSICU)2154-03-03T04:11:002154-03-04T18:16:561.58745
2144359962896096431983544Trauma SICU (TSICU)Trauma SICU (TSICU)2150-06-19T17:57:002150-06-22T18:33:543.02562
3176099462738589733183475Trauma SICU (TSICU)Trauma SICU (TSICU)2138-02-05T18:54:002138-02-15T12:42:059.74172
4189667702348302134131444Trauma SICU (TSICU)Trauma SICU (TSICU)2123-10-25T10:35:002123-10-25T18:59:470.350544
5127767352081752534547665Neuro StepdownNeuro Stepdown2200-07-12T00:33:002200-07-13T16:44:401.67477
6102151592428359334569476Trauma SICU (TSICU)Trauma SICU (TSICU)2124-09-20T15:05:292124-09-21T22:06:581.2927
7144890522651639035056286Trauma SICU (TSICU)Trauma SICU (TSICU)2118-10-26T10:33:562118-10-26T20:28:100.412662
8159147632890602036909804Trauma SICU (TSICU)Trauma SICU (TSICU)2176-12-14T12:00:002176-12-17T11:47:012.99098
9162562262001329039289362Neuro StepdownNeuro Stepdown2150-12-20T16:09:082150-12-21T14:58:400.951065
10191944492164199939387567Coronary Care Unit (CCU)Coronary Care Unit (CCU)2123-11-12T02:53:352123-11-12T13:52:030.457269
11155372372747276939467232Neuro IntermediateNeuro Intermediate2156-02-28T17:38:002156-02-29T16:57:080.97162
12153329762976219239883649Trauma SICU (TSICU)Trauma SICU (TSICU)2165-08-16T15:20:482165-08-17T17:09:471.07568
13168412802634026832550034Trauma SICU (TSICU)Trauma SICU (TSICU)2149-01-14T09:24:002149-01-15T23:56:411.60603
14129745632961805732563675Neuro StepdownNeuro Stepdown2138-11-13T23:30:012138-11-15T16:25:191.70507
15185992122853822633267162Trauma SICU (TSICU)Trauma SICU (TSICU)2129-06-01T16:27:392129-06-06T17:01:335.02354
16146092182060618934947848Neuro StepdownNeuro Stepdown2174-06-28T21:13:002174-07-05T17:01:326.82537
17103907322217753535370343Trauma SICU (TSICU)Trauma SICU (TSICU)2147-06-20T19:40:572147-06-22T11:47:381.67131
18176750162923570636961856Trauma SICU (TSICU)Trauma SICU (TSICU)2173-04-23T17:59:302173-05-10T16:06:4716.9217
19126871122613266737445058Neuro StepdownNeuro Stepdown2162-05-31T18:08:452162-06-04T10:16:133.67185
20184231512775319330073725Trauma SICU (TSICU)Trauma SICU (TSICU)2169-07-14T19:04:062169-07-15T13:21:120.761875
21172163132256319530201049Trauma SICU (TSICU)Trauma SICU (TSICU)2180-03-03T16:20:102180-03-04T18:37:421.09551
22175303042177616030655167Trauma SICU (TSICU)Trauma SICU (TSICU)2129-03-21T07:09:002129-03-22T15:26:481.34569
23122075932279520930000646Coronary Care Unit (CCU)Coronary Care Unit (CCU)2194-04-29T01:39:222194-05-03T18:23:484.69752
24106561732577876030001555Medical Intensive Care Unit (MICU)Medical Intensive Care Unit (MICU)2177-09-27T11:23:132177-09-28T18:26:001.2936
25143115222462251230002548Cardiac Vascular Intensive Care Unit (CVICU)Cardiac Vascular Intensive Care Unit (CVICU)2111-08-17T13:13:432111-08-18T18:50:311.23389
26102084682579641430002925Medical Intensive Care Unit (MICU)Medical Intensive Care Unit (MICU)2134-06-05T03:37:002134-06-05T22:45:150.797396
27106820022003589230003087Medical Intensive Care Unit (MICU)Medical Intensive Care Unit (MICU)2132-12-01T20:58:252132-12-07T21:18:196.01382
28162359112895656030003306Surgical Intensive Care Unit (SICU)Surgical Intensive Care Unit (SICU)2188-06-05T23:38:192188-06-08T00:32:172.03748
29125097992589722330004530Cardiac Vascular Intensive Care Unit (CVICU)Cardiac Vascular Intensive Care Unit (CVICU)2165-07-31T09:40:352165-08-03T16:29:093.28373
30188602332797800430007216Medical Intensive Care Unit (MICU)Medical Intensive Care Unit (MICU)2191-03-10T12:33:002191-03-11T18:15:591.23818

admissions_tbl

In [6]:
admissions_tbl = CSV.File(
    mimic_path * "/core/admissions.csv.gz",
    dateformat = "yyyy-mm-dd HH:MM:SS"
    ) |> DataFrame
Out[6]:

523,740 rows × 15 columns

subject_idhadm_idadmittimedischtimedeathtimeadmission_typeadmission_locationdischarge_locationinsurancelanguagemarital_statusethnicityedregtimeedouttimehospital_expire_flag
Int64Int64DateTimeDateTimeDateTime?String31String?String31?String15String7String15?String31DateTime?DateTime?Int64
114679932210383622139-09-26T14:16:002139-09-28T11:30:00missingELECTIVEmissingHOMEOtherENGLISHSINGLEUNKNOWNmissingmissing0
215585972249410862123-10-07T23:56:002123-10-12T11:22:00missingELECTIVEmissingHOMEOtherENGLISHmissingWHITEmissingmissing0
311989120219651602147-01-14T09:00:002147-01-17T14:25:00missingELECTIVEmissingHOMEOtherENGLISHmissingUNKNOWNmissingmissing0
417817079247098832165-12-27T17:33:002165-12-31T21:18:00missingELECTIVEmissingHOMEOtherENGLISHmissingOTHERmissingmissing0
515078341232721592122-08-28T08:48:002122-08-30T12:32:00missingELECTIVEmissingHOMEOtherENGLISHmissingBLACK/AFRICAN AMERICANmissingmissing0
619124609205172152169-03-14T12:44:002169-03-20T19:15:00missingELECTIVEmissingHOMEOtherENGLISHmissingUNKNOWNmissingmissing0
717301855297327232140-06-06T14:23:002140-06-08T14:25:00missingELECTIVEmissingHOMEOtherENGLISHmissingWHITEmissingmissing0
817991012242988362181-07-10T20:28:002181-07-12T15:49:00missingELECTIVEmissingHOMEOtherENGLISHmissingWHITEmissingmissing0
916865435232169612185-07-19T02:12:002185-07-21T11:50:00missingELECTIVEmissingHOMEOtherENGLISHmissingWHITEmissingmissing0
1013693648216407252111-01-30T23:43:002111-02-02T13:03:00missingELECTIVEmissingHOMEOtherENGLISHmissingWHITEmissingmissing0
1110803182224380702168-01-24T21:14:002168-01-27T11:36:00missingELECTIVEmissingHOMEOtherENGLISHmissingWHITEmissingmissing0
1210733959251085612175-08-08T08:56:002175-08-10T12:30:00missingELECTIVEmissingHOMEOtherENGLISHmissingBLACK/AFRICAN AMERICANmissingmissing0
1313246095256262922186-01-25T10:52:002186-01-28T12:41:00missingELECTIVEmissingHOMEOtherENGLISHmissingWHITEmissingmissing0
1418802685260358832143-02-01T16:02:002143-02-03T15:29:00missingELECTIVEmissingHOMEOtherENGLISHSINGLEWHITEmissingmissing0
1516942914235626392152-07-12T14:46:002152-07-16T14:25:00missingELECTIVEmissingHOMEOtherENGLISHmissingWHITEmissingmissing0
1615266824283539292153-02-02T23:26:002153-02-05T11:48:00missingELECTIVEmissingHOMEOtherENGLISHmissingWHITEmissingmissing0
1711586661269867172182-06-05T22:51:002182-06-08T13:30:00missingELECTIVEmissingHOMEOtherENGLISHmissingWHITEmissingmissing0
1819197569297802702185-04-01T01:09:002185-04-05T11:20:00missingELECTIVEmissingHOMEMedicaidENGLISHmissingWHITEmissingmissing0
1916865105298068792189-01-01T13:05:002189-01-04T11:00:00missingELECTIVEmissingHOMEOtherENGLISHmissingWHITEmissingmissing0
2019011229292365572134-04-27T22:37:002134-05-04T13:10:00missingELECTIVEmissingACUTE HOSPITALMedicaid?missingUNKNOWNmissingmissing0
2117433076240488272124-06-16T18:37:002124-06-19T16:29:00missingELECTIVEmissingHOMEOtherENGLISHmissingWHITEmissingmissing0
2216211263214946172115-03-09T15:43:002115-03-11T12:58:00missingELECTIVEmissingHOMEMedicaidENGLISHmissingWHITEmissingmissing0
2319170987284643072187-01-06T18:23:002187-01-10T12:48:00missingELECTIVEmissingHOMEMedicaidENGLISHmissingWHITEmissingmissing0
2410707837285134022118-12-14T04:37:002118-12-16T11:44:00missingELECTIVEmissingHOMEOtherENGLISHmissingWHITEmissingmissing0
2512564481227015562170-08-05T10:38:002170-08-09T10:24:00missingELECTIVEmissingHOMEOtherENGLISHmissingWHITEmissingmissing0
2618465134223779252171-04-18T08:57:002171-04-19T16:22:00missingELECTIVEmissingHOMEOtherENGLISHmissingWHITEmissingmissing0
2710293374267396222155-08-28T14:30:002155-08-31T15:00:00missingELECTIVEmissingHOMEOtherENGLISHmissingWHITEmissingmissing0
2811562038291808642146-05-03T13:06:002146-05-05T12:53:00missingELECTIVEmissingHOMEMedicaidENGLISHSINGLEASIANmissingmissing0
2915987366212921412138-08-24T20:06:002138-10-02T11:30:00missingELECTIVEmissingHOME HEALTH CAREOtherENGLISHSINGLEASIANmissingmissing0
3010318660240524972119-11-14T07:26:002119-11-23T18:25:00missingELECTIVEmissingACUTE HOSPITALOtherENGLISHmissingWHITEmissingmissing0

patients_tbl

In [7]:
patients_tbl = CSV.File(mimic_path * "/core/patients.csv.gz") |> DataFrame
Out[7]:

382,278 rows × 6 columns

subject_idgenderanchor_ageanchor_yearanchor_year_groupdod
Int64String1Int64Int64String15Date?
110000048F2321262008 - 2010missing
210002723F021282017 - 2019missing
310003939M021842008 - 2010missing
410004222M021612014 - 2016missing
510005325F021542011 - 2013missing
610007338F021532017 - 2019missing
710008101M021422008 - 2010missing
810009872F021682014 - 2016missing
910011333F021322014 - 2016missing
1010011879M021582014 - 2016missing
1110012663F021712011 - 2013missing
1210012691F021652011 - 2013missing
1310013428M021422011 - 2013missing
1410014536F021132008 - 2010missing
1510017072M021802008 - 2010missing
1610018724F021242008 - 2010missing
1710018726M021822014 - 2016missing
1810019105M021522008 - 2010missing
1910020370F021702011 - 2013missing
2010020442M021702011 - 2013missing
2110020546M021122008 - 2010missing
2210018928F3121252008 - 2010missing
2310022764F021432008 - 2010missing
2410022951F021372008 - 2010missing
2510021917M5421472017 - 2019missing
2610025573M021232017 - 2019missing
2710025785F021552008 - 2010missing
2810029477F021112014 - 2016missing
2910035753F021272017 - 2019missing
3010033879F2821732011 - 2013missing

chartevents_tbl

We use the dataformat argument to correctly parse charttime as DateTime.

In [8]:
@time chartevents_tbl = CSV.File(
    mimic_path * "/icu/chartevents_filtered_itemid.csv.gz", 
    dateformat = "yyyy-mm-dd HH:MM:SS"
    ) |> 
    DataFrame
  1.804059 seconds (32.94 k allocations: 777.766 MiB, 2.18% gc time, 3.87% compilation time)
Out[8]:

8,394,031 rows × 6 columns

subject_idhadm_idstay_idcharttimeitemidvaluenum
Int64Int64Int64DateTimeInt64Float64
11000370028623837306006912165-04-24T05:30:0022004565.0
21000370028623837306006912165-04-24T05:38:0022376197.6
31000370028623837306006912165-04-24T06:00:0022004556.0
41000370028623837306006912165-04-24T06:09:0022004555.0
51000370028623837306006912165-04-24T07:00:0022004557.0
61000370028623837306006912165-04-24T07:00:0022376197.8
71000370028623837306006912165-04-24T08:00:0022004556.0
81000423524181354341001912196-02-24T16:39:00220045136.0
91000423524181354341001912196-02-24T17:00:00220045134.0
101000423524181354341001912196-02-24T17:16:00220045144.0
111000423524181354341001912196-02-24T17:48:00220045133.0
121000423524181354341001912196-02-24T18:00:00220045124.0
131000423524181354341001912196-02-24T19:00:00220045113.0
141000423524181354341001912196-02-24T20:00:00220045105.0
151000423524181354341001912196-02-24T21:00:00220045110.0
161000423524181354341001912196-02-24T22:00:00220045104.0
171000423524181354341001912196-02-24T23:00:00220045101.0
181000423524181354341001912196-02-25T00:00:00220045107.0
191000423524181354341001912196-02-25T01:00:00220045106.0
201000423524181354341001912196-02-25T02:02:00220045110.0
211000423524181354341001912196-02-25T03:00:00220045108.0
221000423524181354341001912196-02-25T04:00:00220045114.0
231000423524181354341001912196-02-25T05:00:00220045111.0
241000423524181354341001912196-02-25T06:00:00220045117.0
251000423524181354341001912196-02-25T07:00:00220045118.0
261000423524181354341001912196-02-25T08:00:00220045122.0
271000423524181354341001912196-02-25T09:00:00220045123.0
281000423524181354341001912196-02-25T10:00:00220045115.0
291000423524181354341001912196-02-25T12:00:00220045115.0
301000423524181354341001912196-02-25T13:00:00220045114.0

Let's visualize the heart rate readings for a specific stay.

In [9]:
#filter(row -> row.stay_id == 30600691 && row.itemid == 220045, chartevents_tbl) |> 
chartevents_subset = @pipe chartevents_tbl |> 
    filter(row -> row.stay_id == 30600691 && row.itemid == 220045, _) |> 
    select(_, [:charttime, :valuenum]) |>
    DataFrame
Out[9]:

5 rows × 2 columns

charttimevaluenum
DateTimeFloat64
12165-04-24T05:30:0065.0
22165-04-24T06:00:0056.0
32165-04-24T06:09:0055.0
42165-04-24T07:00:0057.0
52165-04-24T08:00:0056.0
In [10]:
# be patient: time-to-first-plot is long!
x = chartevents_subset[!, :charttime]
y = chartevents_subset[!, :valuenum]
df = (; x, y)
plt = data(df) *
    mapping(:x, [:y] .=> "heart rate") *
    visual(Scatter)
draw(plt)
Out[10]:

Target cohort (from R session)

Let's continue on with the task we did with R. We aim to develop a predictive model, which computes the chance of dying within 30 days of ICU stay intime based on baseline features

  • first_careunit
  • age at intime
  • gender
  • ethnicity
  • first measurement of the following vitals since ICU stay intime
    • 220045 for heart rate
    • 223761 for Temperature Fahrenheit

We restrict to the first ICU stays of each unique patient.

Wrangling and merging data frames

Our stragegy is

  1. Identify and keep the first ICU stay of each patient.

  2. Identify and keep the first vital measurements during the first ICU stay of each patient.

  3. Join four data frames into a single data frame.

Important data wrangling concepts: group_by, sort, slice, joins, and pivot.

Step 1: restrict to the first ICU stay of each patient

icustays_df has 76,540 rows, which is reduced to 53,150 unique ICU stays.

In [11]:
icustays_tbl_1ststay = @pipe icustays_tbl |>
    sort(_, [:subject_id, :intime]) |>
    unique(_, :subject_id)
Out[11]:

53,150 rows × 8 columns

subject_idhadm_idstay_idfirst_careunitlast_careunitintimeouttimelos
Int64Int64Int64StringStringDateTimeDateTimeFloat64
1100000322907903439553978Medical Intensive Care Unit (MICU)Medical Intensive Care Unit (MICU)2180-07-23T14:00:002180-07-23T23:50:470.410266
2100009802691386539765666Medical Intensive Care Unit (MICU)Medical Intensive Care Unit (MICU)2189-06-27T08:42:002189-06-27T20:38:270.497535
3100012172459701837067082Surgical Intensive Care Unit (SICU)Surgical Intensive Care Unit (SICU)2157-11-20T19:18:022157-11-21T22:08:001.11803
4100017252556303131205490Medical/Surgical Intensive Care Unit (MICU/SICU)Medical/Surgical Intensive Care Unit (MICU/SICU)2110-04-11T15:52:222110-04-12T23:59:561.33859
5100018842618483437510196Medical Intensive Care Unit (MICU)Medical Intensive Care Unit (MICU)2131-01-11T04:20:052131-01-20T08:27:309.17182
6100020132358154139060235Cardiac Vascular Intensive Care Unit (CVICU)Cardiac Vascular Intensive Care Unit (CVICU)2160-05-18T10:00:532160-05-19T17:33:331.31435
7100021552382239533685454Coronary Care Unit (CCU)Coronary Care Unit (CCU)2129-08-04T12:45:002129-08-10T17:02:386.17891
8100022232249457039638202Trauma SICU (TSICU)Trauma SICU (TSICU)2158-01-15T08:01:492158-01-16T15:19:241.30388
9100023482272546032610785Neuro IntermediateNeuro Intermediate2112-11-30T23:24:002112-12-10T18:25:139.79251
10100024282866222533987268Medical Intensive Care Unit (MICU)Medical Intensive Care Unit (MICU)2156-04-12T16:24:182156-04-17T15:57:084.98113
11100024302629531838392119Coronary Care Unit (CCU)Coronary Care Unit (CCU)2129-06-13T00:43:082129-06-15T22:51:402.92259
12100024432132902135044219Coronary Care Unit (CCU)Coronary Care Unit (CCU)2183-10-18T00:47:002183-10-20T18:48:032.75073
13100024952498242636753294Coronary Care Unit (CCU)Coronary Care Unit (CCU)2141-05-22T20:18:012141-05-27T22:24:025.08751
14100025272911269637121704Cardiac Vascular Intensive Care Unit (CVICU)Cardiac Vascular Intensive Care Unit (CVICU)2136-03-24T10:24:082136-03-25T20:55:361.43852
15100026182219206435080100Trauma SICU (TSICU)Trauma SICU (TSICU)2173-12-03T09:30:002173-12-03T18:17:110.3661
16100027602809481331831386Cardiac Vascular Intensive Care Unit (CVICU)Cardiac Vascular Intensive Care Unit (CVICU)2141-04-20T13:20:462141-04-21T14:26:491.04587
17100029302569664437049133Medical Intensive Care Unit (MICU)Medical Intensive Care Unit (MICU)2196-04-14T13:40:002196-04-15T16:54:441.13523
18100030192277435930676350Medical/Surgical Intensive Care Unit (MICU/SICU)Medical/Surgical Intensive Care Unit (MICU/SICU)2175-10-08T18:58:002175-10-09T11:59:160.709213
19100030462604842935514836Trauma SICU (TSICU)Trauma SICU (TSICU)2154-01-02T15:57:152154-01-04T15:19:561.97409
20100034002021499432128372Medical/Surgical Intensive Care Unit (MICU/SICU)Medical/Surgical Intensive Care Unit (MICU/SICU)2137-02-25T23:37:192137-03-10T21:29:3612.9113
21100035022901126935796366Coronary Care Unit (CCU)Coronary Care Unit (CCU)2169-08-26T21:30:322169-08-27T22:27:211.03946
22100037002862383730600691Trauma SICU (TSICU)Trauma SICU (TSICU)2165-04-24T05:43:002165-04-24T09:13:200.146065
23100042352418135434100191Coronary Care Unit (CCU)Medical Intensive Care Unit (MICU)2196-02-24T17:07:002196-02-29T15:58:024.95211
24100044012998860132773003Medical Intensive Care Unit (MICU)Trauma SICU (TSICU)2144-01-26T22:28:042144-02-06T13:44:1510.6362
25100044222125540032155744Cardiac Vascular Intensive Care Unit (CVICU)Cardiac Vascular Intensive Care Unit (CVICU)2111-01-17T09:44:502111-01-23T18:18:466.3569
26100044572325135231494479Cardiac Vascular Intensive Care Unit (CVICU)Cardiac Vascular Intensive Care Unit (CVICU)2141-12-17T10:24:252141-12-18T14:16:171.16102
27100046062924215130213599Neuro Surgical Intensive Care Unit (Neuro SICU)Neuro Surgical Intensive Care Unit (Neuro SICU)2159-02-20T16:10:032159-02-25T20:09:145.1661
28100047202208155035009126Surgical Intensive Care Unit (SICU)Medical Intensive Care Unit (MICU)2186-11-12T19:55:002186-11-17T21:15:555.05619
29100047332741187639635619Medical/Surgical Intensive Care Unit (MICU/SICU)Medical/Surgical Intensive Care Unit (MICU/SICU)2174-12-04T11:28:242174-12-12T20:03:018.35737
30100047642481756332104791Cardiac Vascular Intensive Care Unit (CVICU)Cardiac Vascular Intensive Care Unit (CVICU)2168-04-12T09:34:522168-04-14T21:19:232.48925

Step 2: restrict to the first vital measurements during the ICU stay

​ Key data wrangling concepts: select, left_join, right_join, group_by, arrange, pivot.

In [12]:
@time chartevents_tbl_1ststay = @pipe chartevents_tbl |>
    # pull in the intime/outtime of each ICU stay
    rightjoin(_, select(icustays_tbl_1ststay, :stay_id, :intime, :outtime), on = :stay_id) |> 
    # only keep items during this ICU intime
    filter(row -> ismissing(row.charttime) ? false : (row.charttime  row.intime && row.charttime  row.outtime), _) |>
    # only keep the first charttime for each stay_id x item
    sort(_, [:stay_id, :itemid, :charttime]) |>
    unique(_, [:stay_id, :itemid]) |>
    # do not need charttime, intime and outtime anymore
    select(_, Not([:charttime, :intime, :outtime])) |>
    # pivot_wider (R) or reshape (Julia)
    unstack(_, [:subject_id, :hadm_id, :stay_id], :itemid, :valuenum) |>
    # more informative column names
    rename(_, Dict(
        "220045" => "heart_rate", 
        "223761" => "temp_f",
        ))
  8.693733 seconds (96.13 M allocations: 4.301 GiB, 12.29% gc time, 58.83% compilation time)
Out[12]:

53,135 rows × 5 columns

subject_idhadm_idstay_idheart_ratetemp_f
Int64?Int64?Int64Float64?Float64?
1124665502399818230000153104.099.1
2122075932279520930000646100.098.8
312980335235528493000114880.095.6
412168737292836643000133665.098.5
517371178245021663000139686.098.8
616513856244638323000144682.098.1
719609454241885153000165699.098.9
8159041732383660530001947105.097.9
917921898288410243000241580.097.6
1017938576208181453000249881.097.8
1114311522246225123000254880.098.5
1210208468257964143000292570.098.6
1310682002200358923000308774.098.4
1416165135247917293000312595.099.2
1511423795200129283000322689.099.0
1614895375247536023000327588.098.0
17112067842230809430003372106.098.4
1815332791206837543000359875.096.9
1911307058259462963000372970.098.4
20183004452654128030003746103.098.9
2112227720293967043000374967.0missing
2210369174246971583000414465.097.5
2317220323257006663000424260.098.1
24175800582585897930004306115.098.5
25143353012708850630004462100.095.2
2612509799258972233000453068.097.6
2711553072247606803000456864.097.3
2819272232281738703000457691.097.9
2912844527279591823000462781.097.9
3012098571205533143000479882.098.5

Step 3: merge data frames

New data wrangling concept: mutate.

In [13]:
@time mimic_icu_cohort = @pipe icustays_tbl_1ststay |>
    # merge data frames
    leftjoin(_, admissions_tbl, on = [:subject_id, :hadm_id]) |>
    leftjoin(_, patients_tbl, on = [:subject_id]) |>
    leftjoin(_, chartevents_tbl_1ststay, on = [:stay_id, :subject_id, :hadm_id]) |>
    # age_intime is the age at ICU stay intime
    insertcols!(_, :age_intime => _.anchor_age .+ year.(_.intime) .- _.anchor_year) |>
    # whether the patient died within 30 days of ICU stay intime
    insertcols!(_, :hadm_to_death => _.deathtime .- _.intime) |>
    insertcols!(_, :thirty_day_mort => _.hadm_to_death .≤ Millisecond(2592000000))
# missing in thirty_day_mort means patient not die
replace!(mimic_icu_cohort.thirty_day_mort, missing => false)
mimic_icu_cohort
  6.223667 seconds (9.67 M allocations: 1.919 GiB, 3.31% gc time, 67.03% compilation time)
Out[13]:

53,150 rows × 31 columns

subject_idhadm_idstay_idfirst_careunitlast_careunitintimeouttimelosadmittimedischtimedeathtimeadmission_typeadmission_locationdischarge_locationinsurancelanguagemarital_statusethnicityedregtimeedouttimehospital_expire_flaggenderanchor_ageanchor_yearanchor_year_groupdodheart_ratetemp_fage_intimehadm_to_deaththirty_day_mort
Int64Int64Int64StringStringDateTimeDateTimeFloat64DateTime?DateTime?DateTime?String31?String?String31?String15?String7?String15?String31?DateTime?DateTime?Int64?String1?Int64?Int64?String15?Date?Float64?Float64?Int64Millisec…?Bool?
1100189282252375135050109Medical Intensive Care Unit (MICU)Medical Intensive Care Unit (MICU)2125-02-27T10:42:002125-02-28T15:44:001.209722125-02-27T08:58:002125-03-10T17:00:00missingEW EMER.EMERGENCY ROOMHOMEOtherENGLISHSINGLEBLACK/AFRICAN AMERICAN2125-02-27T07:50:002125-02-27T10:42:000F3121252008 - 2010missing90.099.231missing0
2100765432451925433461770Trauma SICU (TSICU)Trauma SICU (TSICU)2187-03-21T13:55:102187-03-22T16:18:501.099772187-03-14T19:25:002187-03-28T18:00:00missingEW EMER.EMERGENCY ROOMREHABMedicareENGLISHWIDOWEDWHITE2187-03-14T16:55:002187-03-14T20:20:000F7821872008 - 2010missing58.096.378missing0
3100984282488650633389745Trauma SICU (TSICU)Trauma SICU (TSICU)2119-05-12T04:21:002119-05-15T16:38:193.512032119-05-12T03:28:002119-05-17T14:15:00missingEW EMER.EMERGENCY ROOMSKILLED NURSING FACILITYOther?SINGLEHISPANIC/LATINO2119-05-11T23:51:002119-05-12T04:21:000F8521192008 - 2010missing107.099.485missing0
4101271852792058331735272Coronary Care Unit (CCU)Coronary Care Unit (CCU)2141-10-31T10:41:522141-10-31T15:11:450.1874192141-10-31T10:41:002141-11-05T14:45:00missingURGENTTRANSFER FROM HOSPITALHOME HEALTH CAREOtherENGLISHMARRIEDUNABLE TO OBTAINmissingmissing0M6021412008 - 2010missing58.097.060missing0
5101487102244444338369458Cardiac Vascular Intensive Care Unit (CVICU)Cardiac Vascular Intensive Care Unit (CVICU)2148-01-27T11:18:142148-02-02T17:38:426.264212148-01-27T00:00:002148-02-11T17:25:00missingSURGICAL SAME DAY ADMISSIONPHYSICIAN REFERRALSKILLED NURSING FACILITYMedicareENGLISHDIVORCEDWHITEmissingmissing0M6721372008 - 2010missing80.098.778missing0
6101564862519430730065290Trauma SICU (TSICU)Trauma SICU (TSICU)2124-04-06T21:29:322124-04-09T17:40:472.841152124-04-06T21:28:002124-04-18T16:18:00missingURGENTTRANSFER FROM HOSPITALACUTE HOSPITALOtherENGLISHMARRIEDWHITEmissingmissing0F7521242017 - 2019missing97.098.075missing0
7101595852511826138341580Coronary Care Unit (CCU)Coronary Care Unit (CCU)2154-04-09T13:52:002154-04-16T19:18:167.226572154-04-09T10:31:002154-04-16T13:49:002154-04-16T13:49:00EW EMER.EMERGENCY ROOMDIEDMedicareENGLISHMARRIEDBLACK/AFRICAN AMERICAN2154-04-09T08:34:002154-04-09T13:52:001M5921462008 - 20102154-04-1695.097.867604620000 milliseconds1
8101715252126349534714152Medical/Surgical Intensive Care Unit (MICU/SICU)Medical/Surgical Intensive Care Unit (MICU/SICU)2115-12-03T22:54:002115-12-05T00:51:091.081352115-12-03T21:07:002115-12-13T17:30:00missingURGENTTRANSFER FROM HOSPITALPSYCH FACILITYMedicaidENGLISHmissingUNKNOWN2115-12-03T20:05:002115-12-03T22:54:000F2721152014 - 2016missing113.098.127missing0
9101963602779092432506152Coronary Care Unit (CCU)Coronary Care Unit (CCU)2122-04-10T21:36:542122-04-12T15:51:141.759952122-04-10T14:45:002122-04-12T15:25:00missingSURGICAL SAME DAY ADMISSIONPHYSICIAN REFERRALHOMEOtherENGLISHMARRIEDWHITEmissingmissing0M5721182011 - 2013missing94.097.761missing0
10102278232036651433191549Trauma SICU (TSICU)Trauma SICU (TSICU)2156-02-20T03:31:002156-02-20T18:37:180.6293752156-02-20T02:28:002156-02-22T22:30:00missingEW EMER.EMERGENCY ROOMHOMEOtherENGLISHmissingWHITE2156-02-19T23:45:002156-02-20T03:31:000M4521562014 - 2016missing73.098.545missing0
11103188162282739834338479Neuro StepdownNeuro Stepdown2184-08-12T06:31:002184-08-13T15:52:401.390052184-08-12T05:39:002184-08-13T15:00:00missingEW EMER.TRANSFER FROM HOSPITALHOMEOtherENGLISHSINGLEBLACK/AFRICAN AMERICAN2184-08-12T00:40:002184-08-12T06:31:000F2921842017 - 2019missing88.097.729missing0
12103355462289641733604820Neuro StepdownNeuro Stepdown2128-06-06T16:35:442128-06-08T22:20:412.239552128-06-05T23:04:002128-06-13T13:38:00missingURGENTTRANSFER FROM HOSPITALREHABOtherENGLISHSINGLEWHITEmissingmissing0M5021282014 - 2016missing93.098.550missing0
13103907322627214939439439Surgical Intensive Care Unit (SICU)Cardiac Vascular Intensive Care Unit (CVICU)2143-07-29T10:55:072143-07-31T12:08:472.051162143-07-26T19:43:002143-08-25T13:30:00missingURGENTTRANSFER FROM HOSPITALCHRONIC/LONG TERM ACUTE CAREOtherENGLISHmissingOTHERmissingmissing0M4821432011 - 2013missing69.096.348missing0
14104426032364464031663173Medical Intensive Care Unit (MICU)Medical Intensive Care Unit (MICU)2125-02-25T15:33:432125-02-26T18:13:131.110762125-02-25T15:33:002125-02-26T15:05:002125-02-26T15:05:00URGENTTRANSFER FROM HOSPITALDIEDMedicareENGLISHmissingUNKNOWNmissingmissing1M6721252008 - 20102125-02-26111.096.56784677000 milliseconds1
15104718372611852739287232Medical/Surgical Intensive Care Unit (MICU/SICU)Medical/Surgical Intensive Care Unit (MICU/SICU)2168-10-31T00:18:002168-11-03T17:29:443.716482168-10-30T22:57:002168-11-03T17:00:00missingEW EMER.EMERGENCY ROOMHOSPICEMedicare?MARRIEDASIAN2168-10-30T13:06:002168-10-31T00:18:000M7321682008 - 2010missing82.098.373missing0
16105342452232052632333965Medical Intensive Care Unit (MICU)Medical Intensive Care Unit (MICU)2155-12-06T02:32:002155-12-15T14:33:039.500732155-12-06T00:35:002156-02-06T17:05:00missingEW EMER.EMERGENCY ROOMHOME HEALTH CAREOtherENGLISHSINGLEBLACK/AFRICAN AMERICAN2155-12-05T22:38:002155-12-06T02:32:000M4321552014 - 2016missing100.036.543missing0
17105457472502040934649276Coronary Care Unit (CCU)Coronary Care Unit (CCU)2152-09-17T23:48:262152-09-27T17:46:139.748462152-09-17T23:47:002152-09-27T15:55:002152-09-27T15:55:00URGENTTRANSFER FROM HOSPITALDIEDMedicareENGLISHMARRIEDWHITEmissingmissing1M7421522014 - 20162152-09-27100.096.574835594000 milliseconds1
18106434342805591237011708Cardiac Vascular Intensive Care Unit (CVICU)Cardiac Vascular Intensive Care Unit (CVICU)2177-04-22T11:32:222177-04-27T12:51:015.054622177-04-22T08:00:002177-04-29T17:00:00missingSURGICAL SAME DAY ADMISSIONPHYSICIAN REFERRALREHABMedicareENGLISHMARRIEDWHITEmissingmissing0M6321752008 - 2010missing61.095.265missing0
19106516162263662835413275Cardiac Vascular Intensive Care Unit (CVICU)Cardiac Vascular Intensive Care Unit (CVICU)2182-03-16T10:29:032182-03-17T14:54:311.184352182-03-11T16:33:002182-03-23T16:53:00missingURGENTTRANSFER FROM HOSPITALSKILLED NURSING FACILITYOtherENGLISHSINGLEWHITEmissingmissing0M6721822014 - 2016missing80.097.567missing0
20107001302735455735513209Medical Intensive Care Unit (MICU)Medical Intensive Care Unit (MICU)2198-09-09T01:35:092198-09-09T13:34:200.4994332198-09-08T20:17:002198-09-14T17:00:00missingEW EMER.EMERGENCY ROOMHOME HEALTH CAREMedicareENGLISHWIDOWEDBLACK/AFRICAN AMERICAN2198-09-08T15:52:002198-09-08T22:25:000F9121982008 - 2010missing87.098.891missing0
21107225312022368130357477Surgical Intensive Care Unit (SICU)Surgical Intensive Care Unit (SICU)2161-04-15T14:29:032161-04-19T00:44:403.427512161-04-15T07:15:002161-04-19T14:20:00missingURGENTPHYSICIAN REFERRALHOMEOtherENGLISHMARRIEDWHITEmissingmissing0F5921612014 - 2016missing93.098.059missing0
22107859482091820030484339Cardiac Vascular Intensive Care Unit (CVICU)Cardiac Vascular Intensive Care Unit (CVICU)2122-12-10T11:26:102122-12-13T23:31:313.503722122-12-10T04:22:002122-12-16T16:20:00missingSURGICAL SAME DAY ADMISSIONPHYSICIAN REFERRALHOME HEALTH CAREOtherENGLISHDIVORCEDWHITEmissingmissing0F7721222017 - 2019missing80.097.377missing0
23108992252339783933578147Trauma SICU (TSICU)Trauma SICU (TSICU)2146-10-06T13:42:002146-10-08T14:58:022.05282146-10-06T12:27:002146-10-12T13:36:00missingEW EMER.EMERGENCY ROOMSKILLED NURSING FACILITYOtherENGLISHMARRIEDWHITE2146-10-06T10:53:002146-10-06T13:42:000M7021462011 - 2013missing51.098.070missing0
24109156972202745332843393Medical Intensive Care Unit (MICU)Medical Intensive Care Unit (MICU)2162-04-29T03:37:002162-04-29T21:34:180.7481252162-04-29T02:31:002162-05-07T13:35:00missingEW EMER.EMERGENCY ROOMSKILLED NURSING FACILITYMedicareENGLISHMARRIEDWHITE2162-04-29T01:25:002162-04-29T03:37:000M8621612011 - 2013missing124.098.987missing0
25109774232719789437017406Medical Intensive Care Unit (MICU)Medical Intensive Care Unit (MICU)2147-02-12T18:19:002147-02-13T14:06:010.8243172147-02-12T17:09:002147-02-14T14:22:00missingEW EMER.EMERGENCY ROOMHOMEOtherENGLISHMARRIEDWHITE2147-02-12T13:16:002147-02-12T18:19:000M5621452008 - 2010missing89.099.158missing0
26110095452024264137755726Cardiac Vascular Intensive Care Unit (CVICU)Cardiac Vascular Intensive Care Unit (CVICU)2150-10-21T21:36:222150-10-25T20:27:063.95192150-10-21T19:19:002150-10-26T18:00:00missingURGENTTRANSFER FROM HOSPITALHOME HEALTH CAREMedicareENGLISHSINGLEHISPANIC/LATINOmissingmissing0M7221502011 - 2013missing60.098.572missing0
27110574642346755033868624Medical Intensive Care Unit (MICU)Medical Intensive Care Unit (MICU)2165-03-24T01:56:002165-03-28T18:40:114.697352165-03-24T00:22:002165-04-05T17:00:00missingEW EMER.EMERGENCY ROOMSKILLED NURSING FACILITYMedicaidENGLISHSINGLEWHITE2165-03-23T21:22:002165-03-24T01:56:000F5721652011 - 2013missing94.097.957missing0
28110627552340726338273332Surgical Intensive Care Unit (SICU)Surgical Intensive Care Unit (SICU)2174-10-25T23:45:002174-10-29T20:39:233.87112174-10-25T22:36:002174-11-06T13:10:00missingEW EMER.EMERGENCY ROOMHOME HEALTH CAREMedicareENGLISHMARRIEDWHITE2174-10-25T20:05:002174-10-25T23:45:000M7621742008 - 2010missing76.098.676missing0
29110836552013717738699030Medical/Surgical Intensive Care Unit (MICU/SICU)Medical/Surgical Intensive Care Unit (MICU/SICU)2181-04-09T22:30:002181-04-11T04:58:311.26982181-04-09T20:34:002181-04-11T01:05:002181-04-11T01:05:00EW EMER.EMERGENCY ROOMDIEDMedicareENGLISHWIDOWEDBLACK/AFRICAN AMERICAN2181-04-09T17:10:002181-04-09T22:30:001M7521802008 - 20102181-04-1185.097.27695700000 milliseconds1
30110967682742264330520948Surgical Intensive Care Unit (SICU)Surgical Intensive Care Unit (SICU)2149-01-01T00:22:292149-01-06T14:32:555.590582149-01-01T00:20:002149-01-10T11:10:00missingEW EMER.PACUHOMEMedicareENGLISHSINGLEWHITEmissingmissing0M4421492017 - 2019missing119.098.544missing0

Data visualization

It is always a good idea to visualize data as much as possible before any statistical analysis.

Remember we want to model:

thirty_day_mort ~ first_careunit + age_intime + gender + ethnicity + heart_rate + temp_f

Let's start with a numerical summary of variables of interest.

In [14]:
@pipe mimic_icu_cohort |>
    select(_, [
        :first_careunit, 
        :gender, 
        :ethnicity, 
        :age_intime, 
        :heart_rate, 
        :temp_f, 
        :thirty_day_mort
        ]) |> 
    describe(_)
Out[14]:

7 rows × 7 columns

variablemeanminmedianmaxnmissingeltype
SymbolUnion…AnyUnion…AnyInt64Type
1first_careunitCardiac Vascular Intensive Care Unit (CVICU)Trauma SICU (TSICU)0String
2genderFM0Union{Missing, String1}
3ethnicityAMERICAN INDIAN/ALASKA NATIVEWHITE0Union{Missing, String31}
4age_intime64.47051866.01020Int64
5heart_rate87.46670.085.0941.015Union{Missing, Float64}
6temp_f98.03430.098.1106.0954Union{Missing, Float64}
7thirty_day_mort0.099680200.010Union{Missing, Bool}

Univariate summaries

Bar plot of first_careunit.

In [15]:
@pipe mimic_icu_cohort |> 
    groupby(_, :first_careunit) |> 
    combine(_, nrow) |>
    barplot(
        _.first_careunit.refs, 
        _.nrow,
        axis = (xticks = (1:size(_, 1), _.first_careunit), title = "First Care Unit", xticklabelrotation = 45.0)
)
Out[15]:

Bivariate summaries

Tally of thirty_day_mort vs first_careunit.

In [16]:
@pipe mimic_icu_cohort |> 
    groupby(_, [:first_careunit, :thirty_day_mort]) |> 
    combine(_, nrow) |>
    disallowmissing(_, :thirty_day_mort) |>
    barplot(
        _.first_careunit.refs, 
        _.nrow, 
        stack = _.thirty_day_mort,
        color = _.thirty_day_mort,
        axis = (xticks = (1:size(_, 1), _.first_careunit), title = "First Care Unit", xticklabelrotation = 45.0)
    )
Out[16]:

Pros and Cons of Julia

Pros

  • Julia solves the notorious two language problem in scientific computing. Julia combines the functionality and ease of use of Python, R, Matlab, SAS and Stata with the speed of C/C++ and Java. News: Julia Joins Petaflop Club.

  • As a new language, Julia integrates well with modern hardware (GPUs, parallel and distributed computing).

  • Excel domains such as differential equations, auto-differentiation, and optimization.

  • Interoperability with other languages (Python, R, Matlab, C, C++, Fortran).

Cons

  • Smaller ecosystem? Not anymore. On the contrary, some ecosystems (e.g., plotting, auto-diff, DL) are too rich/confusing for user to choose.

  • Smaller user base, compared to Python and R.

  • Lack of IDEs as feature-rich as RStudio.

  • Compilation time of some packages (Plots.jl, LoopVectorization.jl, etc) can be long. Time-to-first-plot issue