PushTagParser

Program status

Program created June 2008
Program last touched June 2008
Status Finished, some fixes planned for later

Abstract

  • XML-like parser
  • SAX-like behaviour: generates events as things are processed
  • Uses "data-push" approach
  • Uses transition-tables
  • PushTagParser is a Delphi class for parsing XML-like documents. Any document with tags enclosed within angle brackets ( < and > ), optional attributes in opening tags and <!-- / --> comments can be parsed. That includes, but is not limited to, XML files, HTML files, MS ASX files etc.

    So what makes this parser so special? Probably its architecture: An everyday XML parser expects a (complete) stream, memory chunk or even a disk file, to parse. In contrast to this, PTP doesn't care where the data comes from. It supplies an entry point Push(Data: pointer; DataSize: integer) through which You can feed data to the parser at any time, arbitrarily chunked. This means that data of virtually any size can be processed.

    There are no limits to the amount of data that can be processed. However, there are some limits on the data itself: any opening tag together with its attributes must fit into available memory. This is a very loose limit that is not likely to be an obstacle for current applications.

    The application interface resembles SAX. Events are generated as the data is processed, notifying of tag start, end, plain characters, comments and errors. The parser itself does neither validate nor even check the data for conformity. However, upper level may add these features, at the cost of speed and memory requirements and introduction of other limits on the data.

    From a programming point of view it may be interesting to compare transition-tables, upon which PTP is implemented, with classic static-code approach. I believe transition-tables to be slightly faster, since they eliminate conditional execution, which is the slowest operation of modern processors. However, i know of no test where this was actually compared.

    Usage

    To use the PushTagParser, include its unit file in the uses clause of the unit, construct an instance of TPushTagParser class, set its event handlers and start feeding data to it using the Push() procedure:

    uses
    	uPushTagParser;
    ...
    	Parser := TPushTagParser.Create(OnTagStart, OnTagEnd, OnCharacters, OnError, OnComment);
    	try
    		Parser.Push(Buffer1, Buffer1Size);
    		Parser.Push(Buffer1, Buffer1Size);
    	finally
    		Parser.Free();
    	end;
    ...
    

    Downloads

    PushTagParser source (PAS) 15 KiB


    Generated: 9.8.2011 15:05:01 by WebComposerWebComposer.