Date/time conversion: Using the fuzzy date/time parser

Complete documentation for the fuzzy date/time parser

If you look at the source code for the parser you’ll see that the node.D() function invokes the the string.D() function wich invokes the actual implementation which is called dateparse.parse().

The dateparse.parse() function recognizes the format, and returns two values: a Lua time value (t), and a table with the date/time components (d). The time value (t) can be compared with the Lua routine os.difftime() or formatted as text with os.date(). If you want to handle add and compare date/times then you probably will need to use this interface.

The component table (d) has various fields usable with Lua routines (i.e., year, month, day, hour, min, sec and isdst), as well as fields for partial seconds (sec_fraction) and time zones (tz and tz_offset). As you can see in the example, we can compute the point in time exactly fourteen days later just by adding 14 to d.day. Modifying the table (d) does not affect the time value (t), of course, but you can use os.time() to compute the new future timestamp.

Now that we have our new time value (t), we use os.date() to format it HL7-style.

If wanted to format it differently, say in an XML format we could either:

  1. Make a new method like function node.XD that would use this new format.
  2. Implement a new copy of function node.D in the main module to override the implementation in dateparse.lua.
  3. Edit the dateparse.lua module itself. Not recommended, but it is your code.

It’s not all Roses

Take care when testing the fuzzy parser, as sometimes it gets it wrong. Suppose a message had 01/10/07 or 07-Oct-01, for October 1st, 2007. The fuzzy parser would get both wrong, interpreting them as January 10th, 2007, and October 7th, 2001, respectively.

It tries to match common formats, but there are a ton of ways to write dates out, so it can’t always get it right. You can always specify the format (described below), but you have the source of the module: make it your own! We tried to make it easy to read and modify, so skim it for the list of known formats and edit them to your heart’s content.

Custom Formats

The dateparse.parse() function is pretty smart, but it can’t predict everything. Suppose you get a more free-form date/time in a message, like “2PM March 5th, ’77”. Currently dateparse.parse() doesn’t understand that format. The module contains a list of known formats, which you can add to. Alternatively, you can include the format details in the call to dateparse.parse() itself, as in the following example.

require 'dateparse'
require 'node'

function main(Data)
   local Ts,Tm = dateparse.parse(
      "2PM March 5th '77",
      "H[:MM]tt mmmm dw [']yy[yy]")
   tostring(t):TS() --> 19770305140000
end

This is an excellent approach when you encounter very odd date/time formats, and you don’t expect them to be found elsewhere. No need to complicate the dateparse module for one-off problems.

In the format we used: H will match the one-digit or two-digit hour; [:MM] will match a colon and a two-digit minute, if present (the brackets make this whole part optional); and tt matches AM or PM. For the date part, mmmm matches the month name (full or abbreviated), d matches the day (one-digit or two-digit), w matches the “th” (actually it will match and ignore any word); and [‘]yy[yy] will match a two-digit or four-digit year, possibly preceded by an apostrophe.

Obviously this example pattern matches more than the one example date/time we are working with, but with hand-written date/times, it’s best to be flexible. Any format you specify must match the entire date/time string you supply. If you expect any extra words, numbers or punctuation, you need to specify them in the pattern—just make them [optional]. The full pattern language is detailed after the module below.

Time Zones

The dateparse.parse() function returns two values: a Lua time value (t) and a date/time component table (d). Time zone information is understood by the parser, but not by Lua. The Lua time value (t) returned will not be adjusted when time zone information is present. Instead we add a couple fields to the component table (d), so that you can adjust the time value yourself, if you so desire.

When a time zone abbreviation (e.g., EST) is recognized in a date/time string, d.tz_offset is given the offset in minutes from Universal Time Coordinated (UTC), and the abbreviation is stored in d.tz (in uppercase). E.g., with “EST”, d.tz_offset would be set to -300 (EST is UTC-05:00).

When an offset itself is present as (+/-)HH:MM (or without the colon, as in HL7 timestamps), the d.tz_offset field is computed given this value, and d.tz is set to “UTC(+/-)HH:MM” (always with a colon). E.g., if the offset -0500 were read in an HL7 timestamp, d.tz and d.tz_offset would be set to “UTC-05:00” and -300, respectively.

When writing your own formats, “zzzz” can be used to match time zone offsets (including the +/- part), and “ZZZ” can be used to match abbreviations. See the note on formats below for details.

Partial Seconds

Some timestamps can be incredibly accurate. In an HL7 timestamp, for instance, you can specify values accurate to 1/10th of a millisecond (four decimal places). As with time zones, Lua time values cannot contain this information, but we provide it in the d.sec_fraction field in the example above. E.g., with the HL7 timestamp “19771020213003.1415-0500”, the d.sec_fraction value would be 0.1415.

When writing your own formats, “ssss” can be used to match partial second values (it will not match against decimal point). E.g., “HHMMSS[.ssss]”.

Date/time Formats

The following table lists the patterns allowed in date/time format strings. Note that unless your format contains spaces, spaces in date/time strings won’t be accepted by the parser. E.g., the HL7 format “yyyymmddHHMM” will not match “1977 10 20 21 30” because it contains spaces not found in the pattern.

Note, however, that having spaces (or commas) in your pattern does not mean that they are required. E.g., “mmmm dw, yyyy” will still match “October20th1977”. This allows for more flexibility, without more effort from you. When spaces are really required, as in patterns like “yyyy/m/d H:MM” (between the d and the H), the absence of spaces will make the pattern fail to match. For a value like “1977/1/115:30”, even a human would have to guess if the hour is 15 or if the day is 11.

Pattern Matches What?
yy Year, two digits. With ’69–99 in the 1900s.
yyyy Year, four digits.
m Month, one or more digits.
mm Month, two digits.
mmm Month, three character abbreviation.
mmmm Month, abbrev. or full name.
d Day, one or more digits.
dd Day, two digits.
ddd Week-day, three character abbreviation (ignored)
dddd Week-day, abbrev. or full name (ignored)
H Hour, one or more digits.
HH Hour, two digits.
M Minute, one or more digits.
MM Minute, two digits.
S Second, one or more digits.
SS Second, two digits.
ssss Second fraction, one or more digits (e.g., “SS[.ssss]”).
t “A” or “P” (for AM and PM, respectively).
tt “AM” or “PM”
zzzz Time zone offset, e.g., “-05:00” (colon optional).
ZZZ Time zone abbreviation, one word.
A Space Any amount of whitespace (including none).
, A comma (optional; may be preceded by whitespace).
w Any word (ignored)
n Any number (ignored; -/+ are not matched)
(other*) *Except letters. Matched exactly as-is.

The dateparse Module

The source code for the dateparse module can be found in our code repository.

Leave A Comment?