Parsing formatted text
Contents
How do we build a simple function to extract information from plain text file?
Say we wish to split by empty lines, indicating new paragraphs:
Or we can for split by a substring:
- Search for the header string, for example “PAST MEDICAL HISTORY”
- Extract the information out after the colon
- Use each paragraph as values for say numerous OBX-5 fields in HL7 message
Or we can for split by several substrings:
- Search for the header string, for example “MEDICATIONS”
- Extract the medicines information out after the colon
- Use medicines names as values for HL7 messages
The project can be imported from this zip file Parse_text_file.zip, the test data is included.
Functions string.split()
and string.trimRAnyC()
could be included in any module, we included then in stringutil, as it seems like a good candidate.
Continue: Mirroring a Fogbugz Wiki