Retrying unreliable external resources

Introduction

Retrying unreliable external resources: It is quite common for an interface to use retry logic when encountering unreliable external resources. Such situations may include the following:

  • An unreliable database connection
  • An application that for some reason cannot listen on a port as directed
  • An application that returns a negative ACK due to its own resource issues

These, and other conditions, can be identified by the errors that occur. Some errors will indicate transitory problems which can be overcome with enough retries. Other errors could indicate fatal conditions requiring the interface to stop. Iguana gives you complete control over identifying errors and implementing your desired response.

How It Works

The retry module used on this page is quite simple. The retry.call() function retries the specified function call whenever an error occurs. By default a fatal error occurs if the retried function returns false, or you can supply an error function for custom error handling.

The retry.call() takes various arguments like: The function to be called, the number of retries, pause (in seconds) and zero or multiple arguments for the function, etc. Also multiple returns from the function are supported.

Tip: In the real world we would probably not want to simply retry for all errors, some could considered fatal, and some might trigger different actions. The same applies to function returns.

The last page in this section explains how to customize the retry logic using an error function for both scenarios.

Additional Information

We introduced the retry module, which can be used to retry any call that generates an error. Our examples show how to retry a database and also an LLP connection.

If you have any requirements not met by this script, please contact support at support@interfaceware.com and we’ll see if we can help.

Database Connection [top]

Our first example shows how we can retry an ODBC query to an MySQL database. If an error occurs we simply retry the query.

Preparation

You will need to have a MySQL ODBC database connection for use with the sample code.

If you match the following credentials, as used in the code, it should work immediately:

Alternatively you could change the db.connect() parameters to connect to a different database.

The database will also need to contain a Patient table so this query from DoInsert() will work:

If you do not have a Patient table then substitute another SQL query that works against your database.

Tip: Some APIs like the ODBC connection API to SQL server can take a long time to time out. The default Iguana script timeout is 5 minutes, so if a database timeout takes longer than this, then the channel will be stopped immediately after the database call returns.

If you want to prevent the channel being stopped you can use iguana.setTimeout() to set a longer timeout for the script.

How it works

We apply the retry module by performing the following steps:

  1. Add require'retry' to the shared module source code, as shown below.
  2. Invoke retry.call() with a table containing these entries:
    • func = The function you wish to call
    • arg1/2/3…N = Multiple arguments to pass in to the function
    • retry (optional) = The maximum number of failed retries
    • pause (optional) = The pause between each retry attempt (in seconds)
    • funcname (optional) = The name of the function ( informational for errors and logging)
  3. Place the call to db.connect() in the DoInsert() function so it runs every time Iguana polls

    Warning: If db.connect() is outside DoInsert() it will raise an error before retry.call() is called, and therefore the retry logic will never be invoked.

If the MySQL database is unavailable then you will get a connection error and retry message. We stopped the MySQL database service/daemon to simulate a connection error:

Tip: If you want to customize the error messages, simply change the generic messages near the top of the retry module:

Testing The Code

We ran the code in a From Translator component, and then verified that it worked as expected by stopping and starting the MySQL database. Below you can see how the channel dashboard light changes, and how the errors are logged.

The orange light comes on when an error occurs:

The green light comes on when we recover from the error:

Here is how the errors look in the Iguana Logs:

Warning: Be aware that the retry module will retry for all MySQL errors (not just connection errors),which is probably not what you want.

For example if you misspell the patient table name in a SQL Query:

This can be solved by creating an error function that prevents retries for everything except database connection errors, this is then passed as a parameter to retry.call{}:

function myError(Success, ErrMsgOrReturnCode)
   local funcSuccess
   if Success then
      -- successfully read the data
      funcSuccess = true -- don't retry
   else
      -- these are MySQL error codes - they will be different for other databases
      if ErrMsgOrReturnCode.code == 2002 or ErrMsgOrReturnCode.code == 2006 then
         -- retry *only* for failed connection (error 2002 or 2006)
         iguana.logInfo('Retrying DB connection: '..tostring(ErrMsgOrReturnCode))
         funcSuccess = false -- retry
      else
         -- then raise error for all other DB issue (error ~= 2002 or 2006)
         error(tostring(ErrMsgOrReturnCode))
      end
   end
   return funcSuccess
end

See the last page in this section Customize the retry logic using an error function for instructions on using an error function with the retry module.

Adapting the code for your own needs

You can easily adapt the code we supplied to meet your needs. If you are retrying a database (a very common scenario) then you can just add your own database code to the DoInsert() function. If you want to retry another operation then replace DoInsert() with your own function. For example: to access a RESTful web service you could create a DoHttpGet() function using net.http.get(), using our net interface module.

Sample Code

Import the Retry_Database_From_Translator.zip project into a From Translator component, it contains all the necessary code.

Otherwise you can get the code for How to retry a database connection from our repository.
Note: If you don’t already have the retry module you can get it from our repository server.

LLP Connection [top]

Our second example shows the versatility of the retry module by resending messages to an LLP client. If the client sends back a negative acknowledgment code (an AR: Application Reject or an AE: Application Error), then we resend the message. In this case the LLP client we are talking to has a MySQL Database as a back-end. So when we want to generate negative acknowledgements we can simply stop the MySQL database service.

Note: Where the messages returned from retry.call() refer to “database”, they could easily be changed to refer to “LLP client” if desired.

The steps to use the retry module are similar to the previous example:

  1. Add require 'retry' to the shared module source code, as shown below.
  2. Invoke retry.call() with a table containing these entries:
    • func = The function you wish to call
    • arg1 = The argument to pass in
      Note: Multiple arguments can be passed in to the function
    • retry (optional) = The maximum number of failed retries
    • pause (optional) = The pause between each retry attempt (in seconds)
    • funcname (optional) = The name of the function ( informational for errors and logging)

Testing The Code

We ran it in an LLP–>Translator channel and used the HL7 Listener to receive the messages sent.

We did the the following tests:

  • No LLP Connection: Stopped the HL7 Listener
  • No ACK: Deselected Send ACKnowledgement back to sender

  • Invalid ACK: Changed the if condition in the ValidateAck() function, to simulate and invalid ACK

Tip: If you want to test the third option more fully, you can create a channel to receive the LLP messages. Use a channel that creates custom ACK messages in a From LLP script. The script will need to return some AR/AE ACK messages, to test the condition.

For more information, see: Using the Translator as an LLP Client.

Sample Code

Code for main:

local llp   = require 'llp'
local retry = require 'retry'

function main(Data)
  retry.call{func=SendMessage,arg1=Data,retry=30,pause=2}
end

function SendMessage(Data)
   local s = llp.connect{host='10.211.55.25',port=5351,timeout=2, live=true}
   s:send(Data)
   local Ack = s:recv()
   trace(Ack)
   if not ValidateAck(Ack) then
      error('Did not receive a positive ACK message')
   end
   s:close()
end

function ValidateAck(Data)
   local Ack = hl7.parse{vmd='ack.vmd',data=Data}
   if Ack.MSA[1]:S() ~= 'AA' then
   --if Ack.MSA[1]:S() ~= 'BB' then -- change from 'AA' to simulate invalid ACK
      return false
   end
   return true
end

Download the llp.lua module from our code repository.

Using multiple parameters and multiple returns [top]

The retry.call() function handles multiple arguments and multiple returns.

Multiple Arguments

The retry.call() function supports multiple arguments, in the form: arg1, arg2,… argN.
Note: Arguments with other names will raise an error:

Multiple Returns

Lua functions can return multiple values from a function, retry.call() supports this language feature:

Tip: The code that achieves the multiple returns is a bit subtle.

The returns from pcall() are assigned to a table “R”:

Screen Shot 2014-04-03 at 15.05.09

Which is is then unpacked when it is returned from retry.call()

Screen Shot 2014-03-04 at 16.47.24

Note: unpack() is a standard Lua function that returns the elements from a table.

Customize the retry logic using an error function [top]

Using an error function is very flexible, basically it enables you to completely customize the behaviour to match your needs. This examples on this page expand on the database code from the second page. They use the errorfunc parameter for retry.call{} to supply an error function, which processes the returns and/or errors from the retried function.

The first example is a simple but useful function that overcomes a shortcoming of the original code, the fact that it retries for any error returned not just connection errors. For example: Consider a scenario where you are generating an SQL query and under some (unforeseen) circumstances the query is malformed. With the current code this malformed query would be retried 1000 times (with no chance of success), where the appropriate action could be to raise an error and stop the channel or to log an error and continue processing the next query (or perform some other action…).

The second example is designed to show the power and flexibility of using an error function. It extends the first example, by retrying for some other errors, retrying for some non-error conditions by testing a business rule (patient does not exist yet), and allowing processing to continue (rather than retrying or raising an error) for non-fatal errors.

How the error function works

There are two types of “error” that can occur when a function is called:

  1. The function does not complete and raises an error
  2. The function completes and returns false or an error code

The retry.call() function uses pcall() to trap errors raised by the “retried function”, this catches the first error type. The second error type is “caught” by checking the function return(s). The retry.call() function does a basic check to see if the retried function returns false, for more sophisticated checking you need an error function. To allow the error function to handle both types of errors retry.call() will always call an error function if it is supplied.

The error function needs to do the following things:

  1. Mandatory: Handle both types of errors from the retried function, by doing one of the following:
    • Raising an error for “fatal” errors (which will stop the channel)
    • Allowing the function to be retried for “recoverable” errors
    • Skipping processing the message for non-fatal “non-recoverable” errors
  2. Mandatory: Return Success a boolean to indicate whether the function should retried:
    • Retry: For “recoverable” errors you would return false
    • Don’t retry: For non-fatal “non-recoverable” errors return true
  3. Best Practice: Create appropriate log entries, errors (iguana.logError) or informational (iguana.logInfo).
  4. Best Practice: Email an administrator to deal with issues, for example:
    • Before raising an error for “fatal” errors.
    • To follow up on skipped messages (non-fatal “non-recoverable” errors).

Usage scenarios for error functions

There are basically two scenarios: The first is is a simple case where we only want to retry disconnection errors, the second scenario allows for retrying other errors as well (that can occur without causing a disconnection).

Simple scenario: Retry only lost connections:

Because we are only retrying for lost connections there is no need to check for open connections (as we do in the second scenario). This means that the retried function can simply open and close a connection each time it is called.

The error function then does the following:

  1. If the retried function ran successfully then it prevents retries
  2. Otherwise it checks if the connection was broken and initiates a retry
  3. In all other cases it raises an error

Complex scenario: Retry various error types:

Because we are only retrying for various errors (not just disconnections) we may not need to open a new connection each time. This means that the retried function must check the connection and only open a new one if the old one has died. Also it is best practice to close the open connection at the end of main().

The error function then does the following:

  1. If the retried function ran successfully then it prevents retries
  2. Otherwise it checks if the connection was broken and initiates a retry
  3. Then it does custom error handling based on the function return
  4. In all other cases it raises an error

Simple scenario: Retry only lost connections

This example overcomes a shortcoming of the original database retry code, the fact that it retries for any error returned not just connection errors. For example: Consider a scenario where you are generating an SQL query and under some (unforeseen) circumstances the query is malformed. With the current code this malformed query would be retried 1000 times (with no chance of success), where the appropriate action would be to raise an error and stop the channel or to log an error and continue processing the next query (or perform some other action…).

Import the Retry_Simple_From_Translator.zip project into a From Translator component, it contains all the necessary code.

Otherwise you can paste in the code below for the main() module:
Note: If you don’t already have the retry module you can download it from our code repository.

local retry = require 'retry'

function main(Data)
   local R, R2, M = retry.call{func=DoInsert, retry=1000, pause=10, funcname='DoInsert', errorfunc=myError}
end

function DoInsert(T)
   -- call db.connect each time Iguana Polls
      conn = db.connect{
         api=db.MY_SQL,
         user='root',
         password='', -- no password
         name='test',
         live=true
      }
   
   -- NOTE: query for testing purposes only
   -- replace the query with your select/insert/update code
   local R = conn:query('SELECT * FROM patient')

   -- housekeeping (more efficient than garbage collection)
   if conn and conn:check() then conn:close() end 
   
   return R
end

function myError(Success, ErrMsgOrReturnCode)
   local funcSuccess
   if Success then
      -- successfully read the data
      funcSuccess = true -- don't retry
   else
      -- these are MySQL error codes - they will be different for other databases
      if ErrMsgOrReturnCode.code == 2002 or ErrMsgOrReturnCode.code == 2006 then
         -- retry *only* for failed connection (error 2002 or 2006)
         iguana.logInfo('Retrying DB connection: '..tostring(ErrMsgOrReturnCode))
         funcSuccess = false -- retry
      else
         -- then raise error for all other DB issue (error ~= 2002 or 2006)
         error(tostring(ErrMsgOrReturnCode))
      end
   end
   return funcSuccess
end

Complex scenario: Retry various error types

For this example we are dealing with an (imaginary) legacy system that sometimes sends patient updates or requests before inserting patient records, so we need to retry if a patient record is not found. The system is also able to create multiple patients with the same Id, in this case we raise an error (because user intervention is required to identify the patient).

To demonstrate customized behaviour for both errors and returns we created two similar DoInsert functions (DoInsert_error and DoInsert_return), and used a single error function (myErrorFunc) to handle both (you could use two error functions if desired).

To demonstrate a successful function call that requires a retry, the DoInsert_return function returns false if the patient cannot be identified (zero or multiple patients found). Because a patient updates and requests can be sent sent be before the patient is inserted this means we need to retry this call.

We also use an (admittedly contrived) example to demonstrate how to trap a minor error and continue processing. Unfortunately the Addams family is not welcome at our facility, to identify them DoInsert_error raises an error if the surname “Addams” is detected, the error function checks for this error and makes a logs entry indicating they are not welcome, and then allows processing to continue on to the next message. Perhaps an error alert for a potential terrorist might be a more realistic example.

Import the Retry_Complex_From_Translator.zip project into a From Translator component, it contains all the necessary code.

Otherwise you can paste in the code below for the main() module:
Note: If you don’t already have the retry module you can download it from our code server.

local retry = require 'retry'

function main(Data)
   local R, R2, M = retry.call{func=DoInsert_return, retry=1000, pause=10,
      funcname='DoInsert_return', arg1=1, errorfunc=myError}
   
   local R, R2, M = retry.call{func=DoInsert_error, retry=1000, pause=10,
      funcname='DoInsert_error', arg1=1, errorfunc=myError}

   -- housekeeping close open connection if it exists
   -- (more efficient than waiting for garbage collection)
   if conn and conn:check()then conn:close() end
end

function DoInsert_return(Id)
   -- only create new connection if old is dead
   if not conn or not conn:check() then
      conn = db.connect{
         api=db.MY_SQL,
         user='root',
         password='', -- no password
         name='test',
         live=true
      }
   end
   
   -- NOTE: query for testing purposes only
   -- replace the query with your select/insert/update code
   local R = conn:query('SELECT * FROM patient WHERE Id ='..Id)
   
   if #R==1 then
      -- if one patient is found then return patient data
      return true, R
   else
      -- ERROR if no patient or more than one patient is found
      -- return false and count of rows found
      return false, #R, Id
   end
end


-- errors raised are "intentionally obscure" and are
-- "translated" to more friendly messages in myError()
function DoInsert_error(Id)
   -- only create new connection if old is dead
   if not conn or not conn:check() then
      conn = db.connect{
         api=db.MY_SQL,
         user='root',
         password='', -- no password
         name='test',
         live=true
      }
   end
   
   -- NOTE: query for testing purposes only
   -- replace the query with your select/insert/update code
   local R = conn:query('SELECT * FROM patient WHERE Id ='..Id)
   
   if #R==1 then
      -- if one patient is found then return patient data
      if R[1].LastName:nodeValue() == 'Addams' then
         error('ERROR: 2')
      end
      return true, R
   elseif #R==0 then
      -- ERROR if the patient is not found
      error('ERROR: 0')
   elseif #R>1 then
      -- ERROR if more than one patient is found
      error('ERROR: 1')
   end
end

-- handles errors raised and function returns (could split into two functions)
-- if a Patient is not found retries are allowed 
-- (because this legacy system allows updates before adding a patient)
-- if multiple Patients are found an ERROR is raised
-- (because this legacy system allows multiple patients with the same id)
function myError(Success, ErrMsgOrReturnCode, Result, Id)
   local funcSuccess
   if Success then
      funcSuccess = true -- don't retry
      -- Function call did not throw an error 
      -- but we still have handle the function returning false
      if not ErrMsgOrReturnCode then
         if Result == 0 then
            iguana.logInfo('WARNING: Patient '..Id..' not found')
            if not iguana.isTest() then
               util.sleep(5000)
            end
            -- allow retries (we expect the patient will be added soon)
            funcSuccess = false -- retry ***NOTICE HOW RETRY IS REQUIRED FOR SUCCESSUL FUNCTION EXECUTION***
         elseif Result > 1 then
            -- email administrator or do other processing etc.
            error('ERROR: Multiple Patients found with Id = '..Id)
         end
      else
         if Result[1].LastName:nodeValue() == 'Addams' then
            -- minor error = continue processing next message
            -- email administrator or do other processing etc.
            iguana.logError('ERROR: Addams family not welcome here')
            funcSuccess = true -- don't retry
         end
      end
   else
      if ErrMsgOrReturnCode.code == 2002 or ErrMsgOrReturnCode.code == 2006 then
         -- retry *only* for failed connection (error 2002 or 2006)
         iguana.logInfo('Retrying DB connection: '..tostring(ErrMsgOrReturnCode))
         funcSuccess = false -- retry
      elseif ErrMsgOrReturnCode == 'ERROR: 0' then
         iguana.logInfo('WARNING: Patient not found')
         if not iguana.isTest() then
            util.sleep(5000)
         end
         -- allow retries (we expect the patient will be added soon)
         funcSuccess = false -- retry
      elseif ErrMsgOrReturnCode == "ERROR: 1"then
         -- email administrator or do other processing etc.
         error('ERROR: Multiple Patients found')
      elseif ErrMsgOrReturnCode == "ERROR: 2"then
         -- minor error = continue processing next message
         -- email administrator or do other processing etc.
         iguana.logError('ERROR: Addams family not welcome here')
         funcSuccess = true -- don't retry
      else
         error('ERROR: Unknown error from function')
      end
   end
      return funcSuccess
end