Tuning a large scale Iguana install

Diagnostic Tools

We have a lot of customers that start off small and organically grow to the point where they become very large.

At that point they often need to look seriously at how they architect their overall solution using Iguana with respect to running many busy interfaces.

This tip covers a couple of places I see customers often have challenges with and some good ways to make things work better.

Tools

The key to good tuning is empirical measurement. No two Iguana installs are exactly alike and tuning requires looking at how Iguana is performing under your specific loads, how you write your interfaces and so on.

Some diagnostic tools that you should be aware of. There is a /debug URL in Iguana which will give you access to a number of useful diagnostic screens. So if you are have an Iguana instance on localhost with a web interface on port 6543 the URL is:

http://localhost:6543/debug

And one of these is a good screen for socket diagnostics:

http://localhost:6543/socket_diagnostic.html

Also try the monitor graph utility – it’s a handy example of how you can collect runtime statistics about an Iguana instance over time which can be very helpful in understanding where attention needs to be given.

Using an Iguana Cluster [top]

The nice thing about Iguana is that it has next to no dependencies. It doesn’t use Apache or IIS. It’s logging/queuing system is self contained. This makes it very easy to install – for production installs we always recommend doing a manual install even you are using Windows. It gives you much better control and makes it much easier to manage controlled upgrades.

There are a lot of benefits that come from breaking up an install into a set of smaller individual Iguana instances by siloing groups of interfaces into individual instances of Iguana. So instead of cramming 134 interfaces into one single Iguana process you can instead break that out into 3 Iguana instances.

You can use the Iguana global dashboard to manage the cluster and monitor them all in one screen.

Smaller silos means less activity going on in each process – far less risk in hitting limits that the underlying operating system has. You can scale out horizontally with as many Iguana servers as you need in this manner. I would recommend reading Setting up a staged deployment environment using Git which demonstrates how to promote interfaces from development, into test and then into production in a manner that can be done for high availability (you may also find Using a public repository for community collaboration helpful).

A good silo size is 50 channels per instance. It’s also a good idea to use separation of concerns. Say you have banks of LLP –> Translator channels and File –> LLP client channels. It makes good sense to reduce the number of variables in play by only having the same type of channel on a given instance. i.e. put all the LLP->Translator channels on one instance all the File–> LLP channels on another.

This plays well with high availability strategies since it reduces the likelihood of a problem with one type of channel impacting on other unrelated channels. If you are doing anything different to what you normally do in production using some different kind of transport than it makes sense to put that into a dedicated instance of Iguana.

Also as you scale it becomes important to separate your production, test and development instances of Iguana.

Note: If you are using Iguana version 5 (or earlier) you can use the Channel Manager (Iguana 5 documentation) for moving channels between servers in a staged deployment environment.

Socket management [top]

Iguana works within a single process so if you have it configured so that the number of open sockets gets too large then you can run into problems. Firstly it can slow overall performance since the networking code in Iguana is having to multiplex between all these open socket connections. But what can really kill you is when you exceed the allowed limit of sockets for the process. This is defined by the operating system.

Some of the symptoms you may see are error messages with the words FD_SETSIZE=2048 reached.

There is some simple tuning that you can do which will make things work smoothly in this area. Take a look at the way you have your LLP listeners components are configured. The default for socket connections in the LLP Listener in Iguana 5.6.6 and earlier versions is to never timeout sockets – see the Connection timeout setting:

Screen Shot 2014-05-15 at 8.30.08 AM

Now if you Connection Timeout to:

Screen Shot 2014-05-15 at 8.30.20 AM

Then Iguana will close connections automatically that haven’t been active for 10 minutes. This stops badly behaved counterparties from having too many open socket connections into your Iguana instance. In Iguana 5.6.7 onwards we actually select this timeout behavior as the default.

Another area where sockets can give you grief is for HTTP(S) sockets. HTTP supports persistent connections from counterparties which can result in sockets being used up they do not close these connections. To counteract this if you have a lot of HTTP(S) activity we recommend that you use at least Iguana 5.6.3 – in fact if you are using Iguana HTTPS you should definitely be on Iguana 5.6.5 or above because of the Heartbleed issue with OpenSSL – any customer using HTTPS should be ready to upgrade to patch releases because there are likely to be ongoing patches required for new versions of OpenSSL.

In Iguana 5.6.3 and above there is an environmental variable called IGUANA_WEB_SOCKET_IDLE_TIMEOUT_SECONDS which has a default value of 120 seconds. You can set this variable as low as 1 second which might be needed if you have a very busy transactional environment.

That still may not be enough if you have crazy volumes of transactions. Let’s say for instance you had 300,000 HTTPS transactions going through one Iguana instance. At this point it’s pushing the limit of what a single process can manage in terms of open sockets and so on. So a technique that can used here is to divide into smaller silos by doing something like:

  1. Leverage the channel management features in Iguana 6 to take the same production channel and deploy it across multiple server instances, see Setting up a staged deployment environment using Git (you may also find Using a public repository for community collaboration helpful).
    Note: If you are using an earlier version of Iguana you can use the Channel Manager (Iguana 5 documentation).
  2. Push the inbound traffic through a high speed web proxy like nginx and round robin the traffic against that Iguana cluster.

This is good design to scale out a very high volume installation with a lot of HTTPS transactions.

Optimal settings for log file sizes [top]

Another area that we’ve seen problems is when a single Iguana server has too much aggregate data going through it. The problem here is that as the size of the log/queue files gets into the realms of gigabytes it can create some operational headaches. Too much data and you start to reach the limits of how much the operating system can flush out to disc. If you need to restart the server and it has to re-index the log files then it can take a long time. Another area is that log searches of massive logs can results in spikes in memory usage which in themselves can cause problems such as exhausting system memory.

The solution is simple – divide into smaller silos. Distribute your interfaces over multiple Iguana instances. This means less log/queue data per instance which means you don’t have all the problems that arise from log/queue data sizes getting too large on any single instance. Pay careful attention to underlying physical stores that the log files are stored on – you’ll have problems if you have them all flushing to a single physical disc.

How does Iguana use multiple cores? [top]

This is question which comes up often when clients are figuring out what servers to buy. Iguana does make some use of threads. These components run on their own threads:

  1. Each translator instance.
  2. To/From Legacy Database components.
  3. To/From File components.

So more cores means that Iguana will achieve some parallelization with the additional cores. However the main thread in Iguana is responsible for co-ordinating messaging between the threads so it can become a bottleneck in some circumstances. One possible way to get more use of the CPU resources on such a machine is to consider running more than one Iguana instance on such a machine.

Server optimization will vary greatly depending on the nature of the interfaces and type of data you are running through the machine.

The monitoring tool I mentioned before is helpful in measuring the actual performance of an Iguana instance to get a feel for what the bottlenecks are. In practice we’ve found that Iguana is rarely CPU bound – it more important to spend money on high quality discs to reduce the amount of contention one sees there.

Leave A Comment?