This code is designed to handle well formed CSV files. However CSV implementations can vary, so you should test it using realistic samples.
Please contact support at support@interfaceware.com if you need further help.
Sample Code [top]
You can download the Parse_CSV_To_Translator.zip project file (code and test messages) or copy the code from below.
Code for main()
:
local csv = require 'csv' function main(Data) local Csv = csv.parseCsv(Data) -- comma separated (default) --local Csv = csv.parseCsv(Data, '\t') -- tab separated (sample message 11) --local Csv = csv.parseCsv(Data, '|') -- bar separated (sample message 12) trace(Csv) -- Examples of what you can do: -- 1) Use in a To Translator and add code -- for saving patients to a database -- 2) Use in a Filter Component and map to -- XML/JSON then queue for further processing end
Code for the module:
-- module is likely to be re-used so use a descriptive name like "csv" local function parseCsvLine (line,sep) local res = {} local pos = 1 sep = sep or ',' while true do local c = string.sub(line,pos,pos) if (c == "") then break end local posn = pos local ctest = string.sub(line,pos,pos) trace(ctest) while ctest == ' ' do -- handle space(s) at the start of the line (with quoted values) posn = posn + 1 ctest = string.sub(line,posn,posn) if ctest == '"' then pos = posn c = ctest end end if (c == '"') then -- quoted value (ignore separator within) local txt = "" repeat local startp,endp = string.find(line,'^%b""',pos) txt = txt..string.sub(line,startp+1,endp-1) pos = endp + 1 c = string.sub(line,pos,pos) if (c == '"') then txt = txt..'"' -- check first char AFTER quoted string, if it is another -- quoted string without separator, then append it -- this is the way to "escape" the quote char in a quote. example: -- value1,"blub""blip""boing",value3 will result in blub"blip"boing for the middle elseif c == ' ' then -- handle space(s) before the delimiter (with quoted values) while c == ' ' do pos = pos + 1 c = string.sub(line,pos,pos) end end until (c ~= '"') table.insert(res,txt) trace(c,pos,i) if not (c == sep or c == "") then error("ERROR: Invalid CSV field - near character "..pos.." in this line of the CSV file: \n"..line, 3) end pos = pos + 1 posn = pos ctest = string.sub(line,pos,pos) trace(ctest) while ctest == ' ' do -- handle space(s) after the delimiter (with quoted values) posn = posn + 1 ctest = string.sub(line,posn,posn) if ctest == '"' then pos = posn c = ctest end end else -- no quotes used, just look for the first separator local startp,endp = string.find(line,sep,pos) if (startp) then table.insert(res,string.sub(line,pos,startp-1)) pos = endp + 1 else -- no separator found -> use rest of string and terminate table.insert(res,string.sub(line,pos)) break end end end return res end ------------------------------------ ---- Module Interface functions ---- ------------------------------------ local csv = {} function csv.parseCsv(Data, Separator) -- handle '\r\n\' as line separator Data = Data:gsub('\r\n','\n') -- handle '\r' (bad form) as line separator Data = Data:gsub('\r','\n') local Result={} for Line in Data:gmatch("([^\n]+)") do local ParsedLine = parseCsvLine(Line, Separator) table.insert(Result, ParsedLine) end return Result end return csv
Using the code [top]
- This code would usually be used in a Filter or Destination component script, for example:
- You could save patient details to a database table in a Destination component
- You could map patient details to XML/JSON in Filter and then queue for further processing (either in the Destination or by forwarding to another channel)
- You will need to write From Translator code to read your CSV files and push them onto the Iguana queue for processing in your Filter or To Translator component
- Because the code is a good candidate for re-use by other channels, give it a descriptive name, we used “csv”
Note: If you already have a module called “csv” you can add this code (or use a different module name).
How it works [top]
- The code reads CSV messages from the queue and then parses them using the
csv.parseCsv()
function - There are definite advantages loading and queuing the CSV data files in the From Component, and then processing the CSV messages in a Filter or Destination component:
- The main advantage is that you can reprocess CSV messages using the Iguana log rather than having to reload the original files
- Separating the file handling and parsing/processing also keeps the code cleaner and easier to maintain
- The parsing code is based on this code from Lua user’s wiki, with the following additions:
- We changed the assert to an if statement to give a more informative error
- We changed the code to allow for spaces before/after separators when using quoted fields
- We changed the code to allow for spaces at the start of a line when using quoted fields
- We wrapped the
parseCsvLine()
function in theparseCsv()
function that accepts multiple lines
- You can use different field separators by passing them as the Separator (2nd) parameter to the
parseCsv()
function, for example:- To use a tab separator:
parseCsv(Data, '\t')
- To use a “|” separator:
parseCsv(Data, '|')
- To use a tab separator:
- The parse will handle the following:
- Quoted and unquoted fields
- Separators within quoted fields
- Escaped quotes (“”) within quoted fields
Note: See the “Speedie” nickname in the 2nd row of 9th sample message - The parse will raise an error for an un-escaped (“) quote in quoted field
Note: The 10th sample message raises an error for this reason
- The parse will handle spaces before/after separators when using quoted fields
- The parse will handle spaces at the start of a line when using quoted fields
- Limitation: The parse does not handle quotes in an unquoted field, it just reads them like any other character
Note: Technically this violates CSV rules but I have seen it occasionally- A un-escaped quote (“) will be shown as a single quote (“), when it could be considered an error
- An escaped quote (“”) will be show as two quotes (“”) when it should probably be shown as one quote (“)
Best Practices [top]
- Test the code to make sure it will handle the CSV files you receive correctly, and adjust it if necessary
Note: There a CSV standard (RFC 4180) but it is not always followed exactly, so parsing CSV is never completely cut and dried
What not to do [top]
- Don’t expect the code to handle all possible CSV file content (you need to test)
Note: As mentioned in best practices CSV files are not truly standard so parsing them is not an exact science