Introduction
In Lua there are characters called magic characters. These characters allow you to do special actions when pattern matching.
Task [top]
Explain how to use magic characters for Lua pattern matching.
Implementation [top]
The magic characters are:
( ) . % + – * ? [ ^ $
In addition to this, Lua uses the following character classes (you will notice that the magic character % is used here)
- %a letters
- %c control characters
- %d digits
- %l lowercase letters
- %p punctuation characters
- %s whitespace characters
- %u uppercase letters
- %w alphanumeric characters
- %x hexadecimal digits
- %z the character \000
How it works [top]
This is section explains what each of the magic characters does. It also explains how to work with sets of characters.
The magic characters:
- “( )”
- Represents what is called a capture. This allows you to enclose sub-patterns in your patterns
- “.”
- Represents any single character
- If you want the literal . character then you have to escape it with the % character: %.
- “%”
- This is a special character which toggles the character classes
- In order to use the % pattern you must use %% as an input
- “+”
- Matches 1 or more repetitions of the class. This will always match the longest possible chain.
- Example of Usage: %w+
- “-“
- Matches 0 or more repetitions of the class. This will always match the shortest possible chain
- Example of Usage: %d-
- “*”
- Matches 0 or more repetitions of the class. This will always match the longest possible chain
- Example Usage: %l*
- “?”
- Matches 0 or 1 occurrence of the class
- Example Usage: %a?
- “^”
- This is only a magic character when it is at the beginning of a pattern.
- When this is at the beginning of a pattern it forces the pattern to match the start of a string
- Example Usage: ^A.+ This will match any set of characters which begin with the character A
- “$”
- This is only a magic character when it is at the beginning of a pattern.
- When it is at the end of a pattern it forces the pattern to match the end of the string
- Example Usage: %w%.$ will match any alphanumeric character which is followed immediately and only by a . character
Fun with Sets:
- The “[“ and “]” symbols are used to represent sets:
- The “[“ character denotes the start of a set, and a “]” shows the end
- A set is a class which is the union of all of the characters and/or classes which appear in the set
- Example Usage: [%d%l] will match any digit or any lowercase letter
- Example Usage: [%dabc] will match any digit or the characters a, b, or c
- Sets can be modified with the “^” not character:
- This will make the set match anything but the characters listed inside the brackets
- Example Usage: [^%l] will match anything but a lowercase letter
- Use the “-“ (dash) character to indicate a range of values:
- Example Usage: [1-5] will match the values 1 through 5 inclusive