ProAnswers.org

Looking for reliable and efficient C# HTML parser?

I would like to extract the structure of the HTML document - so the tags are more important than the content. Ideally, it would be able to cope reasonably with badly-formed HTML to some extent also.

Anyone know of a reliable and efficient parser?

This is an agile HTML parser that builds a read/write DOM and supports plain XPATH or XSLT (you actually don’t HAVE to understand XPATH nor XSLT to use it, don’t worry…). It is a .NET code library that allows you to parse “out of the web” HTML files. The parser is very tolerant with “real world” malformed HTML. The object model is very similar to what proposes System.Xml, but for HTML documents (or streams).