Duplicate message filter

If you ever need to write an efficient filter to eliminate duplicate messages then this may be what you are looking for.

The dup module was put together by Jon Scalise with some suggestions from me on making a more efficient algorithm than an earlier duplicate script I wrote.

It does require Iguana 5.6.12 or above.

When the script starts up it queries the Iguana log for the last N messages that you specify. From then on it maintains a list of MD5 hashes for the last N messages in a first in, first out (FIFO) order. A message is a duplicate if it’s MD5 hash matches the MD5 hash of a previous message.

It’s very efficient. There is no disc IO other than when the script starts up when you begin running the channel. It makes use of a linked list to maintain the FIFO buffer so this will scale with large numbers of messages with no additional overhead. A hash table is used to do the look ups which will scale with log N.

The only real overhead is memory usage – which isn’t too bad since we are storing MD5 hashes rather than the full text of each message itself.

The script may not exactly fit your purposes if you have messages which are similar in content but not identical.

You can import the example channel using the Channel Manager or just go and grab the dup module from our code repository, or directly from the community GitHub repository:

https://github.com/interfaceware/iguana-web-apps/blob/master/shared/dup.lua

This file shows how to use the module in your own channel:

https://github.com/interfaceware/iguana-web-apps/blob/master/Duplicate%20filter_filter/main.lua

This forum thread:

http://help.interfaceware.com/forums/topic/easy-way-to-filter-out-duplicate-messages

Can be used to discuss this module.