Code: Prevent errors reading from an empty XML element

This post was originally written for Iguana 5 so it contains version 5 screenshots, and may contain out of date references.

This is similar but different to the XML Oddity that was raised in our forums, perhaps we should call it “XML Oddity II”.

The issue is a simple one: If you have an XML element with an empty string then the node:nodeValue() function raises an “Index 1 is out of bounds.” error, when the desired behaviour is to return an empty TEXT node.

We present these options:

  1. Best Practice: Use the node:text() function, from XML Techniques in the Protocols Repo, to return an empty TEXT node.
  2. Recommended: Create your own node function node:text() that will return an empty TEXT node.
  3. Use the # operator to check if a node has children before reading.

Sample Code [top]

Option One

Code for main():

require 'xml'

local Example=[[
<Person>
<FirstName>Fred</FirstName>
<LastName></LastName>
</Person>]]

function main()
   local X = xml.parse{data=Example}
   
   -- NOTE: We recommend using :text() here too as you might receive data with a missing FirstName
   local FirstName = X.Person.FirstName[1]:nodeValue()
   
   -- Read an empty element safely
   -- Reads the *first* TEXT element
   -- If there is no TEXT element then it appends an empty one
    local LastName = X.Person.LastName:text()
end

Code for the xml module: Import the module code from XML Techniques in the Protocols Repo.

Option Two

Code for main():

require 'xml'

local Example=[[
<Person>
<FirstName>Fred</FirstName>
<LastName></LastName>
</Person>]]

function main()
   local X = xml.parse{data=Example}
   
   -- NOTE: We recommend using :text() here too as you might receive data with a missing FirstName
   local FirstName = X.Person.FirstName[1]:nodeValue()
   
   -- Read an empty element safely
   -- Reads the *first* element
   -- If there is no TEXT element then it returns an empty string ''
   -- Only works when the first child is a TEXT element
   trace(X)
   local LastName = X.Person.LastName:text()
end

Code for the xml module:

-- return the first child element
-- if no child element create and return a TEXT element 
-- only works when the first child is a TEXT element
function node.text(X)
   if #X > 0 then
      return X[1]
   end
   return X:append(xml.TEXT, '')
end

Option Three

Code for main():

local Example=[[
<Person>
<FirstName>Fred</FirstName>
<LastName></LastName>
</Person>]]

function main()
   local X = xml.parse{data=Example}
   
   -- Use the # operator to check for children before reading
   -- You need to know the index of the desired element
   -- Code is inelegant but useful for "manual" reading
   local FirstName=''
   if #X.Person.FirstName > 0 then
      FirstName = X.Person.FirstName[1]:nodeValue()
   end
   local LastName=''  
   if #X.Person.LastName > 0 then
      LastName = X.Person.Lastname[1]:nodeValue()
   end
end

Using the code [top]

 

  • This code would probably be used in a To Translator, Filter or From Translator component script
  • Choose the option you prefer:
    • node:text(): Uses the node:text() function, from XML Techniques in the Protocols Repo
    • Custom node:text(): Create a node:text() function, because the code is a good candidate for re-use, put it in a module that uses a descriptive name, we used “xml”
      Note: If you already have an “xml” module you can add this code (or use a different name).
    • Use the # operator: This is inline code so no module is needed

 

How it works [top]

The reason the error occurs is because xml.parse{} does not create nodes for empty XML elements, therefore when we attempt to read the empty element there is no corresponding node:

This code demonstrates that the issue is with index not the :nodeValue() function:

Options One and Two

Both of these options produce the same result when reading an empty TEXT element. However the option one node:text() function is “smart” in how it finds the first existing TEXT element by looping through the child elements. The node:text() function for option two is “dumb” in the sense that it just grabs the first child element, assuming it is a TEXT element. You need to choose the function that matches your requirements, or customize the behaviour if neither is suitable.

In our example the “expected” empty TEXT element is missing which causes the index out of bounds error. Using node:text() adds the “missing” empty element into the node tree, and then returns it.

The node:text() function adds and returns an empty text element under the specified element (in this case “LastName”):

Option Three

This option checks for child elements and returns the first element, if no children exist then it takes no action. Once again you need to check if this option matches your needs and customize the behaviour if necessary. For example, you might want to add an else statement and do some other processing (like logging an error or warning), when there is no child element.

Best Practices [top]

  • You need to make sure that the option you choose matches your requirements
  • Test the code to make sure it works correctly with the type of XML data you receive
  • Using a node function makes the code easier to read and the behaviour change is limited to one place

What not to do [top]

  • Assume the function will do what you need without understanding how it works
  • Put the module into production without testing against representative sample data

Leave A Comment?