Using rxsub()
Contents
Tip: It is very easy to adapt PHP examples for Iguana, rxsub()
corresponds to preg_filter and preg_replace.
- The first two example is very simple, replace a word or a phrase. As you can see (in this case) there is no difference between the
string.gsub()
andstring.rxsub()
syntax.- Replace a word:
-- match and replace a word -- using gsub local s = "hello world my name is ..." x = s:gsub( "world", "everyone") --> hello everyone my name is ... -- using rxsub local s = "hello world my name is ..." x = s:rxsub( "world", "everyone") --> hello everyone my name is ...
- Replace a phrase:
-- match and replace a phrase -- using gsub local s = "hello everyone my name is ... " x = s:gsub( "hello everyone", "Hello World") --> Hello World my name is ... -- using rxsub local s = "hello everyone my name is ... " x = s:rxsub( "hello everyone", "Hello World") --> Hello World my name is ...
- Replace a word:
- This next example shows how to remove duplicate space or whitespace characters.
Note: Space is just a space ” “, whitespace includes all space characters (like space, tab, newline, carriage return, vertical tab).-- remove duplicate spaces -- using gsub local s = "hello world I was here" x = s:gsub(" +", " ") -- replace multiple spaces with a single space --> hello world I was here local s = "hello world I \t was \r\n here" x = s:gsub("%s+", " ") -- replace *any* multiple whitespace characters with a single space --> hello world I was here -- using rxsub local s = "hello world I was here" x = s:rxsub(" +", " ") -- replace multiple spaces with a single space --> hello world I was here local s = "hello world I \t was \r\n here" x = s:rxsub("\\s+", " ") -- replace *any* multiple whitespace characters with a single space --> hello world I was here
- This example removes multiple spaces or whitespace characters before a fullstop (point) at the end of a sentence.
-- remove multiple spaces before a fullstop/point -- using gsub local s = "Hello world. I was here." x = s:gsub("%. +", ". ") -- replace multiple spaces with a single space --> hello world I was here local s = "Hello world. \t\r\n I was here." x = s:gsub("%.%s+", ". ") -- replace *any* multiple whitespace characters with a single space --> hello world I was here -- using rxsub local s = "Hello world. I was here. " x = s:rxsub("\\. +", ". ") -- replace multiple spaces with a single space --> hello world I was here local s = "Hello world. \t\r\n I was here. " x = s:rxsub("\\.\\s+", ". ") -- replace *any* multiple whitespace characters with a single space --> hello world I was here
- This example demonstrates the use of a capture to duplicate words. To create a capture you enclose a phrase or pattern in brackets, then you can refer to it later as $1-9 (regex) or %1-9 (Lua pattern), you can also use $0 or %0 to refer to a complete string match. If there is no explicit capture then
string.gsub()
will capture a whole string match as %1 (which is equivalent to %0 in this case), we prefer %0 as it is more obvious as it is consistent with regex (see example below).-- using gsub local s = "hello world" x = s:gsub("(%w+)", "%1 %1") -- %1 = first match x = s:gsub("(%w+)", "%0 %0") -- %0 = whole match x = s:gsub("(%w+)", "%1 %0") -- mixed = same result x = s:gsub("%w+", "%0 %0") -- %0 = whole match x = s:gsub("%w+", "%1 %1") -- %1 = first match (same result as %0) - equivalent regex (below) fails = not recommended --> x="hello hello world world" -- using rxsub local s = "hello world" x = s:rxsub("(\\w+)", "$1 $1") -- $1 = first match x = s:rxsub("(\\w+)", "$0 $0") -- $0 = whole match x = s:rxsub("\\w+", "$0 $0") -- $0 = whole match x = s:rxsub("\\w+", "$1 $1") -- $1 = first match fails - equivalent Lua pattern (above) works --> x="hello hello world world" -- notice how %0 or $0 is different from %1, %2 or $1, $2 when using multiple captures -- using gsub local s = "hello world" x = s:gsub("(%w+) (%w+)", "%0 %0") -- %0 = whole match --> hello world hello world x = s:gsub("(%w+) (%w+)", "%1 %1") -- %1 = first capture --> hello hello -- using rxsub local s = "hello world" x = s:gsub("(\\w+) (\\w+)", "$0 $0") -- %0 = whole match --> hello world hello world x = s:gsub("(\\w+) (\\w+)", "$1 $1") -- %1 = first capture --> hello hello
- A more useful example with captures is to use them to remove duplicate words. Notice how we can remove multiple repeated words with
string.rxsub()
but not withstring.gsub()
, this is because PCRE allows for repetition of captures to be quantified with (with * or +), but Lua Patterns do not allow this.-- remove duplicate words -- using gsub local s = "hello hello world world I was here" x = s:gsub("(%w+)%s+%1","%1") -- can only remove duplicate words (not multiples like PCRE below) x = s:gsub("(%w+)(%s+%1)+","%1") -- cannot repeat a capture in Lua so THIS DOES NOT WORK --> hello world I was here -- using rxsub local s = "hello hello hello world world I was here" x = s:rxsub("(\\b\\w+\\b)(\\W+\\1)+","$1") -- using a word boundary \b --> hello world I was here -- our PCRE regex can remove multiple repeats (unlike gsub)
- Suppose we occasionally receive HL7 messages with an extra “|” character before the encoding characters (“MSH||^~\&“), we can remove this by using an “^” anchor to check and fix the start of the message.
Note: You might think the anchor is overkill, but it will prevent matching things like embedded HL7 messages.-- Remove extra bar "|" character -- Note: The use of the [[<string>]] syntax to reduce escaping -- i.e., for gsub() [[MSH|^~\&|]] rather than 'MSH|^~\\&|' -- using gsub local s = [[MSH||^~\&|iNTERFACEWARE|Lab|]] -- partial HL7 message for brevity x = s:gsub([[^MSH||^~\&|]], [[MSH|^~\&|]]) --> MSH|^~\&|iNTERFACEWARE|Lab| -- using rxsub local s = [[MSH||^~\&|iNTERFACEWARE|Lab|]] -- partial HL7 message for brevity x = s:rxsub([[^MSH\|\|\^~\\&\|]], [[MSH|^~\\&|]]) --> MSH|^~\&|iNTERFACEWARE|Lab|
- We can also use a “$” anchor to add a fullstop/point at the end of a string.
-- put a fullstop/point at the end of the string -- using gsub local s = "Hello world I was here" x = s:gsub('[^.]$', '%0.') --> Hello world I was here. -- using rxsub local s = "Hello world I was here" x = s:rxsub('[^.]$', '$0.') --> Hello world I was here.
- This example demonstrates the use of multiple captures to reverse the order of consecutive words. Notice the use of the POSIX class [:alpha:] with
string.rxsub()
to match alphabetic characters.-- reverse the order of two consecutive words -- using gsub local s = "one two three four 5 6" x = s:gsub("(%w+)%s*(%w+)", "%2 %1") -- %w for alphanumeric, %s* matches multiple spaces --> x="two one four three 6 5" -- but multiple spaces are not included in the result local s = "one two three four 5 6" x = s:gsub("(%w+)(%s*)(%w+)", "%3%2%1") -- 3 captures will include spaces in the result --> x="two one four three 6 5" local s = "one two three four 5 6" x = s:gsub("(%a+)%s*(%a+)", "%2 %1") -- %a for alphabetic only --> x="two one four three 5 6" -- using rxsub local s = "one two three four 5 6" x = s:rxsub("(\\w+)\\s*(\\w+)", "$2 $1") -- \w for alphanumeric, \s* matches multiple spaces --> x="two one four three 6 5" local s = "one two three four 5 6" x = s:rxsub("(\\w+)(\\s*)(\\w+)", "$3$2$1") -- 3 captures will include spaces in the result --> x="two one four three 6 5" local s = "one two three four 5 6" x = s:rxsub("([[:alpha:]]+)\\s*([[:alpha:]]+)", "$2 $1") -- [:alpha:] (POSIX class) for alphabetic only --> x="two one four three 5 6"
- Convert URLs in text to hyperlinks:
Note: This will not find “shorthand” URLs like “www.google.com” they needs to start with https:// (or http, ftp or ftps)-- convert URLs to hyperlinks local r = [[(http|https|ftp|ftps)\://[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(/[^\s,;:(?:. )]*)?]] -- URL regex -- text to match and update local s = 'We should match https://css-tricks.com/snippets/php/find-urls-in-text-make-links/, '.. 'this https://gist.github.com/dperini/729294, this http://php.net/manual/en/book.pcre.php '.. 'and this https://www.google.com, but not this www.google.com (without the https://)' if(s:rxmatch(r)) then x = s:rxsub(r, "<a href=$0>$0</a>") -- create hyperlinks end
- Replace words in a foreign language like Greek or Cyrillic etc, by using unicode scripts.
Note: You could use a call to translation web service rather than “<Greek word>”.
-- Replace words in a foreign language local s = "Hello world in Greek Γειά σου Κόσμε (from google translate)" x = s:rxsub([[\p{Greek}+]], '<Greek word>', 'u') -- use a call to translation web service instead of "<Greek word>" trace(x)
Continue: PCRE Samples