ProAnswers.org

Parsing Webpage C#?

I tried to saved an entire webpage’s html to a string, and now I want to grab the “href” values from the links, preferably with the ability to save them to different strings later. What’s the best way to do this?

Most HTML pages can't be parsed using standard html techniques because, as you've found out, most don't validate.



You could spend the time trying to integrate [HTML Tidy](http://tidy.sourceforge.net/) or a similar tool, but it would be much faster to just build the regex you need.