Dealing with massive lab reports
Contents
This is kind of the opposite problem from dealing with large lab reports or other data broken into smaller parts (see continuation messages). This is handling the performance issues of dealing with really massive amounts of data within one message.
The hl7.parse{} routine is in essence a ‘DOM‘ style parser. A DOM style parse is that we take the entire tree of a message and parses it into memory all at once. It’s very convenient to deal with data in this manner since a DOM style parse gives complete random access to any element of data in the message.
Where it runs into problems is when the amount of data becomes very large. This can often happen when you have very large lab results or binary data. These can consist of reams and reams of OBX segments which all have a very similar structure.
For this type of problem is may be desirable to switch more to a SAX style parse. That is we start at the first OBX segment and then we sequentially move through them processing them one by one and consuming the data as we go.
Now for the other part of the message it’s still quite likely to be useful to process that data using the DOM style parse – it’s a matter of chopping out the large repetitive part of the message first.
This can give you the best of both worlds:
- DOM style parse for the field level data for convenient clean APIs
- SAX style parse for the body of the lab report for speed
SAX parsing is one possibility – another cool trick is leverage things like Lua’s fast global pattern matching substitution for which we give an example below.
If you are curious let us know if you are interested in seeing a fully worked example and we can put it into the wiki here.
Example lab message with a large textual component
Say you have a message like this:
MSH|^~\&|RESULT|CHRS:CLA:CFC^CHRS:CLA^CFC|||20100819141949||ORU^R01|ORU-R01-20100819141949830|^PK.E|2.3|4.3.2. MOBILIZER_4_3_2_2-20090417_0946 094605 CLA1 PID|||dsf^^^CHRS:CLA^BACKENDID^^dfsd:201008180900 HPL^true|^^^^MRN^^^true||||||||||||||MF00380390:MF286640:20100817.1015:IN:AF0000676886^^^CHRS:CLA:CFC^BACKENDID^^MF286640:961424734201008181BENROR1JONGAP1^true PV1 OBR|||R:758481^CHRS:CLA:CFC^^^z7-3961495615SignedRADCFCRADLEMMONS,GARLAND JR:19320417:MF00380390:|Radiology:RADIOLOGY REPORT^RADIOLOGY REPORT^dddf:CLA:CFC^Radiology|||20100817165600||||||||||||||||||Signed^^sdfds:CLA:CFC OBX|1|TX|||Big honking report... OBX|2|TX|||Big honking report... OBX|3|TX|||Big honking report... OBX|4|TX|||Big honking report... OBX|5|TX|||Big honking report... OBX|6|TX|||Big honking report... : OBX|93634|TX|||Order: 0817-0137 OBX|93635|TX||| Procedure: CHEST 2 VIEWS OBX|93636|TX||| Ordered by: dsfd,34sdfs M.D. OBX|93637|TX||| OBX|93638|TX|||Dictated: esd,gdfdf S M.D. Aug 17, 2010 4:56pm OBX|93639|TX|||Transcribed: sdf S rdfdf Aug 17, 2010 5:02pm OBX|93640|TX|||Signed: erwss,ersds S M.D. Aug 19, 2010 10:06am OBX|93641|TX||| OBX|93642|TX|||SECTION: 1 Final OBX|93643|TX|||TWO VIEW CHEST DATED 08-17-2010 OBX|93644|TX|||CLINICAL HISTORY OF PRE OPERATIVE FOR CORONARY ARTERY BY-PASS GRAFT. OBX|93645|TX|||THERE IS NO PRIOR EXAMINATION AVAILABLE FOR COMPARISON. OBX|93646|TX|||FINDINGS: OBX|93647|TX|||THERE ARE SOME CALCIFIED DENSITIES SEEN OVERLYING BOTH HEMITHORACES. SOME OF THESE MAY OBX|93648|TX|||REPRESENT CALCIFIED GRANULOMAS. OTHER AREAS MAY REPRESENT SOME CALCIFIED PLAQUES. NO OBX|93649|TX|||INFILTRATE IS SEEN. HEART IS NORMAL IN APPEARANCE. AORTA IS ECTATIC. PULMONARY OBX|93650|TX|||VASCULARITY IS UNREMARKABLE. BONES AND JOINTS SHOW NO ACUTE ABNORMALITY. OBX|93651|TX|||IMPRESSION: OBX|93652|TX|||MULTIPLE CALCIFIED DENSITIES SEEN OVERLYING BOTH LUNGS, POSSIBLY REPRESENTING PREVIOUS OBX|93653|TX|||GRANULOMATOUS DISEASE EXPOSURE. SOME AREAS MAY REPRESENT CALCIFIED PLAQUES, AS WELL. OBR|||R:758481^CHRS:CLA:CFC^^^z7-dfsdfds,dsfd JR:19320417:MF00380390:|Radiology:RADIOLOGY REPORT^RADIOLOGY REPORT^CHRS:CLA:CFC^Radiology|||20100817165600||||||||||||||||||Signed^^CHRS:CLA:CF
Then we can write code which follows this algorithm:
- Find the data before the first OBX segment.
- Find the data after the last OBX segment.
- Concatenate that together to make a very small message which can be parsed using hl7.parse{}
- Take all the OBX data and use Lua’s global pattern replacing to convert it into just the text.
This turns out to be a very fast algorithm since Lua’s pattern replacing is extremely fast and we avoid the overhead of creating an entire tree of OBX segments. Here’s the code:
function main(Data) local Data, LabReport = ObxChop(Data) LabReport = LabReport:gsub('OBX|%d+|TX|||', '\n') LabReport = LabReport:sub(2, #LabReport) local Msg = hl7.parse{vmd='example/demo.vmd', data=Data} print(Msg.PATIENT.PID[18][1]:nodeValue()) print(LabReport) end function ObxChop(Data) local First = Data:find('OBX')-1 local LastObx for i=#Data, 1, -1 do if Data:sub(i,i+3) == '\rOBX' then LastObx = i+1 break end end local E = Data:find('\r', LastObx) return Data:sub(1, First-1)..Data:sub(E), Data:sub(First+1, E) end