Parse a CSV file

Introduction

This example shows how to use the csv_parse.lua module to parse CSV files. This code is designed to handle well formed CSV files. However CSV implementations can vary, so you should test it using realistic samples.

If you have any questions please contact us at support@interfaceware.com.

Using the Code [top]

  • Import the Parse a CSV file channel from the Builtin: Iguana Files repository
  • Experiment with the code to find out how it works
  • Then add the module to your Translator project
  • Copy the require statement from the channel and add it at the top of your script
    Note: This module uses require to return a single function
  • Adapt the code to your own requirements
  • Use the parseCsv() function to parse CSV data
  • You will need to write From Translator code to read your CSV files and push them onto the Iguana queue for processing in your Filter or To Translator component
  • Interactive scripting help is included for this module

This is the github code for the main module:

 

How it works [top]

  • The parsing code is based on this code from Lua user’s wiki, with the following additions:
    • We changed the assert to an if statement to give a more informative error
    • We changed the code to allow for spaces before/after separators when using quoted fields
    • We changed the code to allow for spaces at the start of a line when using quoted fields
    • We wrapped the parseCsvLine() function in the parseCsv() function that accepts multiple lines
  • You can use different field separators by passing them as the Separator (2nd) parameter to the parseCsv() function, for example:
    • To use a tab separator: parseCsv(Data, '\t')
    • To use a “|” separator: parseCsv(Data, '|')
  • The parse will handle the following:
    • Quoted and unquoted fields
    • Separators within quoted fields
    • Escaped quotes (“”) within quoted fields
      Note: See the “Speedie” nickname in the 2nd row of 9th sample message
    • The parse will raise an error for an un-escaped (“) quote in quoted field
      Note: The 10th sample message raises an error for this reason
  • The parse will handle spaces before/after separators when using quoted fields
  • The parse will handle spaces at the start of a line when using quoted fields
  • Limitation: The parse does not handle quotes in an unquoted field, it just reads them like any other character
    Note: Technically this violates CSV rules but I have seen it occasionally

    • A un-escaped  quote (“) will be shown as a single quote (“), when it could be considered an error
    • An escaped quote (“”) will be show as two quotes (“”) when it should probably be shown as one quote (“)

More information [top]