Working with HL7

Dealing with massive lab reports

This is kind of the opposite problem from dealing with large lab reports or other data broken into smaller parts (see continuation messages). This is handling the performance issues of dealing with really massive amounts of data within one message.

The hl7.parse{} routine is in essence a ‘DOM‘ style parser.  A DOM style parse is that we take the entire tree of a message and parses it into memory all at once.  It’s very convenient to deal with data in this manner since a DOM style parse gives complete random access to any element of data in the message.

Where it runs into problems is when the amount of data becomes very large.  This can often happen when you have very large lab results or binary data.  These can consist of reams and reams of OBX segments which all have a very similar structure.

For this type of problem is may be desirable to switch more to a SAX style parse.  That is we start at the first OBX segment and then we sequentially move through them processing them one by one and consuming the data as we go.

Now for the other part of the message it’s still quite likely to be useful to process that data using the DOM style parse – it’s a matter of chopping out the large repetitive part of the message first.

This can give you the best of both worlds:

  • DOM style parse for the field level data for convenient clean APIs
  • SAX style parse for the body of the lab report for speed

SAX parsing is one possibility – another cool trick is leverage things like Lua’s fast global pattern matching substitution for which we give an example below.

If you are curious let us know if you are interested in seeing a fully worked example and we can put it into the wiki here.

Example lab message with a large textual component

Say you have a message like this:

MSH|^~\&|RESULT|CHRS:CLA:CFC^CHRS:CLA^CFC|||20100819141949||ORU^R01|ORU-R01-20100819141949830|^PK.E|2.3|4.3.2. MOBILIZER_4_3_2_2-20090417_0946  094605 CLA1
PID|||dsf^^^CHRS:CLA^BACKENDID^^dfsd:201008180900 HPL^true|^^^^MRN^^^true||||||||||||||MF00380390:MF286640:20100817.1015:IN:AF0000676886^^^CHRS:CLA:CFC^BACKENDID^^MF286640:961424734201008181BENROR1JONGAP1^true
PV1
OBR|||R:758481^CHRS:CLA:CFC^^^z7-3961495615SignedRADCFCRADLEMMONS,GARLAND JR:19320417:MF00380390:|Radiology:RADIOLOGY REPORT^RADIOLOGY REPORT^dddf:CLA:CFC^Radiology|||20100817165600||||||||||||||||||Signed^^sdfds:CLA:CFC
OBX|1|TX|||Big honking report...
OBX|2|TX|||Big honking report...
OBX|3|TX|||Big honking report...
OBX|4|TX|||Big honking report...
OBX|5|TX|||Big honking report...
OBX|6|TX|||Big honking report...
:
OBX|93634|TX|||Order: 0817-0137
OBX|93635|TX||| Procedure: CHEST 2 VIEWS
OBX|93636|TX||| Ordered by: dsfd,34sdfs M.D.
OBX|93637|TX||| 
OBX|93638|TX|||Dictated: esd,gdfdf S M.D.   Aug 17, 2010 4:56pm
OBX|93639|TX|||Transcribed: sdf S rdfdf   Aug 17, 2010 5:02pm
OBX|93640|TX|||Signed: erwss,ersds S M.D.   Aug 19, 2010 10:06am
OBX|93641|TX||| 
OBX|93642|TX|||SECTION: 1  Final
OBX|93643|TX|||TWO VIEW CHEST DATED 08-17-2010 
OBX|93644|TX|||CLINICAL HISTORY OF PRE OPERATIVE FOR CORONARY ARTERY BY-PASS GRAFT. 
OBX|93645|TX|||THERE IS NO PRIOR EXAMINATION AVAILABLE FOR COMPARISON. 
OBX|93646|TX|||FINDINGS: 
OBX|93647|TX|||THERE ARE SOME CALCIFIED DENSITIES SEEN OVERLYING BOTH HEMITHORACES.  SOME OF THESE MAY 
OBX|93648|TX|||REPRESENT CALCIFIED GRANULOMAS.  OTHER AREAS MAY REPRESENT SOME CALCIFIED PLAQUES.  NO 
OBX|93649|TX|||INFILTRATE IS SEEN.  HEART IS NORMAL IN APPEARANCE. AORTA IS ECTATIC.  PULMONARY 
OBX|93650|TX|||VASCULARITY IS UNREMARKABLE.  BONES AND JOINTS SHOW NO ACUTE ABNORMALITY.
OBX|93651|TX|||IMPRESSION:  
OBX|93652|TX|||MULTIPLE CALCIFIED DENSITIES SEEN OVERLYING BOTH LUNGS, POSSIBLY REPRESENTING PREVIOUS 
OBX|93653|TX|||GRANULOMATOUS DISEASE EXPOSURE.  SOME AREAS MAY REPRESENT CALCIFIED PLAQUES, AS WELL.
OBR|||R:758481^CHRS:CLA:CFC^^^z7-dfsdfds,dsfd JR:19320417:MF00380390:|Radiology:RADIOLOGY REPORT^RADIOLOGY REPORT^CHRS:CLA:CFC^Radiology|||20100817165600||||||||||||||||||Signed^^CHRS:CLA:CF

Then we can write code which follows this algorithm:

  1. Find the data before the first OBX segment.
  2. Find the data after the last OBX segment.
  3. Concatenate that together to make a very small message which can be parsed using hl7.parse{}
  4. Take all the OBX data and use Lua’s global pattern replacing to convert it into just the text.

This turns out to be a very fast algorithm since Lua’s pattern replacing is extremely fast and we avoid the overhead of creating an entire tree of OBX segments. Here’s the code:

function main(Data)
   local Data, LabReport = ObxChop(Data)
   LabReport = LabReport:gsub('OBX|%d+|TX|||', '\n')
   LabReport = LabReport:sub(2, #LabReport)
   local Msg = hl7.parse{vmd='example/demo.vmd', data=Data}
   print(Msg.PATIENT.PID[18][1]:nodeValue())
   print(LabReport)
end

function ObxChop(Data)
   local First = Data:find('OBX')-1
   local LastObx
   for i=#Data, 1, -1 do
      if Data:sub(i,i+3) == '\rOBX' then
         LastObx = i+1
         break
      end
   end 
   local E = Data:find('\r', LastObx)
   return Data:sub(1, First-1)..Data:sub(E), 
          Data:sub(First+1, E)
end