NAME XML::Driver::HTML - SAX Driver for non wellformed HTML. SYNOPSIS use XML::Driver::HTML; $driver = new XML::Driver::HTML( 'Handler' => $some_sax_filter_or_handler, 'Source' => $some_PerlSAX_like_hash ); $driver->parse(); or use XML::Driver::HTML; $driver = new XML::Driver::HTML(); $driver->parse( 'Handler' => $some_sax_filter_or_handler, 'Source' => $some_PerlSAX_like_hash ); $driver->parse( 'Handler' => $some_other_sax_filter_or_handler, 'Source' => $some_other_source ); DESCRIPTION XML::Driver::HTML is a SAX Driver for HTML. There is no need for the HTML input to be weel formed, as XML::Driver::HTML is generating its SAX events by walking a HTML::TreeBuilder object. The simplest kind of use, is a filter from HTML to XHTML using XML::Handler::YAWriter as a SAX Handler. my $ya = new XML::Handler::YAWriter( 'Output' => new IO::File ( ">-" ), 'Pretty' => { 'NoWhiteSpace'=>1, 'NoComments'=>1, 'AddHiddenNewline'=>1, 'AddHiddenAttrTab'=>1, } ); my $html = new XML::Driver::HTML( 'Handler' => $ya, 'Source' => { 'ByteStream' => new IO::File ( "<-" ) } ); $html->parse(); METHODS new Creates a new XML::Driver::HTML object. Default options for parsing, described below, are passed as key-value pairs or as a single hash. Options may be changed directly in the object. parse Parses a document. Options, described below, are passed as key-value pairs or as a single hash. Options passed to parse() override the default options in the parser object for the duration of the parse. OPTIONS The following options are supported by XML::Driver::HTML : Handler Default SAX Handler to receive events Source Hash containing the input source for parsing. The `Source' hash may contain the following parameters: ByteStream The raw byte stream (file handle) containing the document. String A string containing the document. SystemId The system identifier (URL) of the document. Encoding A string describing the character encoding. If more than one of `ByteStream', `String', or `SystemId', then preference is given first to `ByteStream', then `String', then `SystemId'. NOTES XML::Driver::HTML requires Perl 5.6 to convert from ISO-8859-1 to UTF-8. BUGS not yet implemented: Interpretation of SystemId as being an URI XHTML document type other bugs: HTML::Parser and HTML::TreeBuilder bugs concerning DOCTYPE and CSS. Perl handling of UFT8 is compatible between different versions. So you need exactly Perl 5.6.0, not lower not higher. AUTHOR Michael Koehne, Kraehe@Copyleft.De (c) 2001 GNU General Public License SEE ALSO the XML::Parser::PerlSAX manpage and the HTML::TreeBuilder manpage