This code is designed to handle well formed CSV files. However CSV implementations can vary, so you should test it using realistic samples.
Please contact support at support@interfaceware.com if you need further help.
Sample Code [top]
You can download the Parse_CSV_To_Translator.zip project file (code and test messages) or copy the code from below.
Code for main():
local csv = require 'csv' function main(Data) local Csv = csv.parseCsv(Data) -- comma separated (default) --local Csv = csv.parseCsv(Data, '\t') -- tab separated (sample message 11) --local Csv = csv.parseCsv(Data, '|') -- bar separated (sample message 12) trace(Csv) -- Examples of what you can do: -- 1) Use in a To Translator and add code -- for saving patients to a database -- 2) Use in a Filter Component and map to -- XML/JSON then queue for further processing end
Code for the module:
-- module is likely to be re-used so use a descriptive name like "csv"
local function parseCsvLine (line,sep)
local res = {}
local pos = 1
sep = sep or ','
while true do
local c = string.sub(line,pos,pos)
if (c == "") then break end
local posn = pos
local ctest = string.sub(line,pos,pos)
trace(ctest)
while ctest == ' ' do
-- handle space(s) at the start of the line (with quoted values)
posn = posn + 1
ctest = string.sub(line,posn,posn)
if ctest == '"' then
pos = posn
c = ctest
end
end
if (c == '"') then
-- quoted value (ignore separator within)
local txt = ""
repeat
local startp,endp = string.find(line,'^%b""',pos)
txt = txt..string.sub(line,startp+1,endp-1)
pos = endp + 1
c = string.sub(line,pos,pos)
if (c == '"') then
txt = txt..'"'
-- check first char AFTER quoted string, if it is another
-- quoted string without separator, then append it
-- this is the way to "escape" the quote char in a quote. example:
-- value1,"blub""blip""boing",value3 will result in blub"blip"boing for the middle
elseif c == ' ' then
-- handle space(s) before the delimiter (with quoted values)
while c == ' ' do
pos = pos + 1
c = string.sub(line,pos,pos)
end
end
until (c ~= '"')
table.insert(res,txt)
trace(c,pos,i)
if not (c == sep or c == "") then
error("ERROR: Invalid CSV field - near character "..pos.." in this line of the CSV file: \n"..line, 3)
end
pos = pos + 1
posn = pos
ctest = string.sub(line,pos,pos)
trace(ctest)
while ctest == ' ' do
-- handle space(s) after the delimiter (with quoted values)
posn = posn + 1
ctest = string.sub(line,posn,posn)
if ctest == '"' then
pos = posn
c = ctest
end
end
else
-- no quotes used, just look for the first separator
local startp,endp = string.find(line,sep,pos)
if (startp) then
table.insert(res,string.sub(line,pos,startp-1))
pos = endp + 1
else
-- no separator found -> use rest of string and terminate
table.insert(res,string.sub(line,pos))
break
end
end
end
return res
end
------------------------------------
---- Module Interface functions ----
------------------------------------
local csv = {}
function csv.parseCsv(Data, Separator)
-- handle '\r\n\' as line separator
Data = Data:gsub('\r\n','\n')
-- handle '\r' (bad form) as line separator
Data = Data:gsub('\r','\n')
local Result={}
for Line in Data:gmatch("([^\n]+)") do
local ParsedLine = parseCsvLine(Line, Separator)
table.insert(Result, ParsedLine)
end
return Result
end
return csv
Using the code [top]
- This code would usually be used in a Filter or Destination component script, for example:
- You could save patient details to a database table in a Destination component
- You could map patient details to XML/JSON in Filter and then queue for further processing (either in the Destination or by forwarding to another channel)
- You will need to write From Translator code to read your CSV files and push them onto the Iguana queue for processing in your Filter or To Translator component
- Because the code is a good candidate for re-use by other channels, give it a descriptive name, we used “csv”
Note: If you already have a module called “csv” you can add this code (or use a different module name).
How it works [top]
- The code reads CSV messages from the queue and then parses them using the
csv.parseCsv()function - There are definite advantages loading and queuing the CSV data files in the From Component, and then processing the CSV messages in a Filter or Destination component:
- The main advantage is that you can reprocess CSV messages using the Iguana log rather than having to reload the original files
- Separating the file handling and parsing/processing also keeps the code cleaner and easier to maintain
- The parsing code is based on this code from Lua user’s wiki, with the following additions:
- We changed the assert to an if statement to give a more informative error
- We changed the code to allow for spaces before/after separators when using quoted fields
- We changed the code to allow for spaces at the start of a line when using quoted fields
- We wrapped the
parseCsvLine()function in theparseCsv()function that accepts multiple lines
- You can use different field separators by passing them as the Separator (2nd) parameter to the
parseCsv()function, for example:- To use a tab separator:
parseCsv(Data, '\t') - To use a “|” separator:
parseCsv(Data, '|')
- To use a tab separator:
- The parse will handle the following:
- Quoted and unquoted fields
- Separators within quoted fields
- Escaped quotes (“”) within quoted fields
Note: See the “Speedie” nickname in the 2nd row of 9th sample message - The parse will raise an error for an un-escaped (“) quote in quoted field
Note: The 10th sample message raises an error for this reason
- The parse will handle spaces before/after separators when using quoted fields
- The parse will handle spaces at the start of a line when using quoted fields
- Limitation: The parse does not handle quotes in an unquoted field, it just reads them like any other character
Note: Technically this violates CSV rules but I have seen it occasionally- A un-escaped quote (“) will be shown as a single quote (“), when it could be considered an error
- An escaped quote (“”) will be show as two quotes (“”) when it should probably be shown as one quote (“)
Best Practices [top]
- Test the code to make sure it will handle the CSV files you receive correctly, and adjust it if necessary
Note: There a CSV standard (RFC 4180) but it is not always followed exactly, so parsing CSV is never completely cut and dried
What not to do [top]
- Don’t expect the code to handle all possible CSV file content (you need to test)
Note: As mentioned in best practices CSV files are not truly standard so parsing them is not an exact science