How do I handle retry logic for unreliable external resources?

Customize the retry logic using an error function


Using an error function is very flexible, basically it enables you to completely customize the behaviour to match your needs. This examples on this page expand on the database code from the second page. They use the errorfunc parameter for retry.call{} to supply an error function, which processes the returns and/or errors from the retried function.

The first example is a simple but useful function that overcomes a shortcoming of the original code, the fact that it retries for any error returned not just connection errors. For example: Consider a scenario where you are generating an SQL query and under some (unforeseen) circumstances the query is malformed. With the current code this malformed query would be retried 1000 times (with no chance of success), where the appropriate action could be to raise an error and stop the channel or to log an error and continue processing the next query (or perform some other action…).

The second example is designed to show the power and flexibility of using an error function. It extends the first example, by retrying for some other errors, retrying for some non-error conditions by testing a business rule (patient does not exist yet), and allowing processing to continue (rather than retrying or raising an error) for non-fatal errors.

How the error function works [top]

There are two types of “error” that can occur when a function is called:

  1. The function does not complete and raises an error
  2. The function completes and returns false or an error code

The retry.call() function uses pcall() to trap errors raised by the “retried function”, this catches the first error type. The second error type is “caught” by checking the function return(s). The retry.call() function does a basic check to see if the retried function returns false, for more sophisticated checking you need an error function. To allow the error function to handle both types of errors retry.call() will always call an error function if it is supplied.

The error function needs to do the following things:

  1. Mandatory: Handle both types of errors from the retried function, by doing one of the following:
    • Raising an error for “fatal” errors (which will stop the channel)
    • Allowing the function to be retried for “recoverable” errors
    • Skipping processing the message for non-fatal “non-recoverable” errors
  2. Mandatory: Return Success a boolean to indicate whether the function should retried:
    • Retry: For “recoverable” errors you would return false
    • Don’t retry: For non-fatal “non-recoverable” errors return true
  3. Best Practice: Create appropriate log entries, errors (iguana.logError) or informational (iguana.logInfo).
  4. Best Practice: Email an administrator to deal with issues, for example:
    • Before raising an error for “fatal” errors.
    • To follow up on skipped messages (non-fatal “non-recoverable” errors).

Usage scenarios for error functions [top]

There are basically two scenarios: The first is is a simple case where we only want to retry disconnection errors, the second scenario allows for retrying other errors as well (that can occur without causing a disconnection).

Simple scenario: Retry only lost connections:

Because we are only retrying for lost connections there is no need to check for open connections (as we do in the second scenario). This means that the retried function can simply open and close a connection each time it is called.

The error function then does the following:

  1. If the retried function ran successfully then it prevents retries
  2. Otherwise it checks if the connection was broken and initiates a retry
  3. In all other cases it raises an error

Complex scenario: Retry various error types:

Because we are only retrying for various errors (not just disconnections) we may not need to open a new connection each time. This means that the retried function must check the connection and only open a new one if the old one has died. Also it is best practice to close the open connection at the end of main().

The error function then does the following:

  1. If the retried function ran successfully then it prevents retries
  2. Otherwise it checks if the connection was broken and initiates a retry
  3. Then it does custom error handling based on the function return
  4. In all other cases it raises an error

Simple scenario: Retry only lost connections [top]

This example overcomes a shortcoming of the original database retry code, the fact that it retries for any error returned not just connection errors. For example: Consider a scenario where you are generating an SQL query and under some (unforeseen) circumstances the query is malformed. With the current code this malformed query would be retried 1000 times (with no chance of success), where the appropriate action would be to raise an error and stop the channel or to log an error and continue processing the next query (or perform some other action…).

Import the Retry_Simple_From_Translator.zip project into a From Translator component, it contains all the necessary code.

Otherwise you can paste in the code below for the main() module:
Note: If you don’t already have the retry module you can download it from our code repository.

local retry = require 'retry'

function main(Data)
   local R, R2, M = retry.call{func=DoInsert, retry=1000, pause=10, funcname='DoInsert', errorfunc=myError}
end

function DoInsert(T)
   -- call db.connect each time Iguana Polls
      conn = db.connect{
         api=db.MY_SQL,
         user='root',
         password='', -- no password
         name='test',
         live=true
      }
   
   -- NOTE: query for testing purposes only
   -- replace the query with your select/insert/update code
   local R = conn:query('SELECT * FROM patient')

   -- housekeeping (more efficient than garbage collection)
   if conn and conn:check() then conn:close() end 
   
   return R
end

function myError(Success, ErrMsgOrReturnCode)
   local funcSuccess
   if Success then
      -- successfully read the data
      funcSuccess = true -- don't retry
   else
      -- these are MySQL error codes - they will be different for other databases
      if ErrMsgOrReturnCode.code == 2002 or ErrMsgOrReturnCode.code == 2006 then
         -- retry *only* for failed connection (error 2002 or 2006)
         iguana.logInfo('Retrying DB connection: '..tostring(ErrMsgOrReturnCode))
         funcSuccess = false -- retry
      else
         -- then raise error for all other DB issue (error ~= 2002 or 2006)
         error(tostring(ErrMsgOrReturnCode))
      end
   end
   return funcSuccess
end

Complex scenario: Retry various error types [top]

For this example we are dealing with an (imaginary) legacy system that sometimes sends patient updates or requests before inserting patient records, so we need to retry if a patient record is not found. The system is also able to create multiple patients with the same Id, in this case we raise an error (because user intervention is required to identify the patient).

To demonstrate customized behaviour for both errors and returns we created two similar DoInsert functions (DoInsert_error and DoInsert_return), and used a single error function (myErrorFunc) to handle both (you could use two error functions if desired).

To demonstrate a successful function call that requires a retry, the DoInsert_return function returns false if the patient cannot be identified (zero or multiple patients found). Because a patient updates and requests can be sent sent be before the patient is inserted this means we need to retry this call.

We also use an (admittedly contrived) example to demonstrate how to trap a minor error and continue processing. Unfortunately the Addams family is not welcome at our facility, to identify them DoInsert_error raises an error if the surname “Addams” is detected, the error function checks for this error and makes a logs entry indicating they are not welcome, and then allows processing to continue on to the next message. Perhaps an error alert for a potential terrorist might be a more realistic example.

Import the Retry_Complex_From_Translator.zip project into a From Translator component, it contains all the necessary code.

Otherwise you can paste in the code below for the main() module:
Note: If you don’t already have the retry module you can download it from our code server.

local retry = require 'retry'

function main(Data)
   local R, R2, M = retry.call{func=DoInsert_return, retry=1000, pause=10,
      funcname='DoInsert_return', arg1=1, errorfunc=myError}
   
   local R, R2, M = retry.call{func=DoInsert_error, retry=1000, pause=10,
      funcname='DoInsert_error', arg1=1, errorfunc=myError}

   -- housekeeping close open connection if it exists
   -- (more efficient than waiting for garbage collection)
   if conn and conn:check()then conn:close() end
end

function DoInsert_return(Id)
   -- only create new connection if old is dead
   if not conn or not conn:check() then
      conn = db.connect{
         api=db.MY_SQL,
         user='root',
         password='', -- no password
         name='test',
         live=true
      }
   end
   
   -- NOTE: query for testing purposes only
   -- replace the query with your select/insert/update code
   local R = conn:query('SELECT * FROM patient WHERE Id ='..Id)
   
   if #R==1 then
      -- if one patient is found then return patient data
      return true, R
   else
      -- ERROR if no patient or more than one patient is found
      -- return false and count of rows found
      return false, #R, Id
   end
end


-- errors raised are "intentionally obscure" and are
-- "translated" to more friendly messages in myError()
function DoInsert_error(Id)
   -- only create new connection if old is dead
   if not conn or not conn:check() then
      conn = db.connect{
         api=db.MY_SQL,
         user='root',
         password='', -- no password
         name='test',
         live=true
      }
   end
   
   -- NOTE: query for testing purposes only
   -- replace the query with your select/insert/update code
   local R = conn:query('SELECT * FROM patient WHERE Id ='..Id)
   
   if #R==1 then
      -- if one patient is found then return patient data
      if R[1].LastName:nodeValue() == 'Addams' then
         error('ERROR: 2')
      end
      return true, R
   elseif #R==0 then
      -- ERROR if the patient is not found
      error('ERROR: 0')
   elseif #R>1 then
      -- ERROR if more than one patient is found
      error('ERROR: 1')
   end
end

-- handles errors raised and function returns (could split into two functions)
-- if a Patient is not found retries are allowed 
-- (because this legacy system allows updates before adding a patient)
-- if multiple Patients are found an ERROR is raised
-- (because this legacy system allows multiple patients with the same id)
function myError(Success, ErrMsgOrReturnCode, Result, Id)
   local funcSuccess
   if Success then
      funcSuccess = true -- don't retry
      -- Function call did not throw an error 
      -- but we still have handle the function returning false
      if not ErrMsgOrReturnCode then
         if Result == 0 then
            iguana.logInfo('WARNING: Patient '..Id..' not found')
            if not iguana.isTest() then
               util.sleep(5000)
            end
            -- allow retries (we expect the patient will be added soon)
            funcSuccess = false -- retry ***NOTICE HOW RETRY IS REQUIRED FOR SUCCESSUL FUNCTION EXECUTION***
         elseif Result > 1 then
            -- email administrator or do other processing etc.
            error('ERROR: Multiple Patients found with Id = '..Id)
         end
      else
         if Result[1].LastName:nodeValue() == 'Addams' then
            -- minor error = continue processing next message
            -- email administrator or do other processing etc.
            iguana.logError('ERROR: Addams family not welcome here')
            funcSuccess = true -- don't retry
         end
      end
   else
      if ErrMsgOrReturnCode.code == 2002 or ErrMsgOrReturnCode.code == 2006 then
         -- retry *only* for failed connection (error 2002 or 2006)
         iguana.logInfo('Retrying DB connection: '..tostring(ErrMsgOrReturnCode))
         funcSuccess = false -- retry
      elseif ErrMsgOrReturnCode == 'ERROR: 0' then
         iguana.logInfo('WARNING: Patient not found')
         if not iguana.isTest() then
            util.sleep(5000)
         end
         -- allow retries (we expect the patient will be added soon)
         funcSuccess = false -- retry
      elseif ErrMsgOrReturnCode == "ERROR: 1"then
         -- email administrator or do other processing etc.
         error('ERROR: Multiple Patients found')
      elseif ErrMsgOrReturnCode == "ERROR: 2"then
         -- minor error = continue processing next message
         -- email administrator or do other processing etc.
         iguana.logError('ERROR: Addams family not welcome here')
         funcSuccess = true -- don't retry
      else
         error('ERROR: Unknown error from function')
      end
   end
      return funcSuccess
end