Regular expression for parsing links from a webpage? *

Question

I'm looking for a .NET regular expression extract all the URLs from a webpage but haven't found one to be comprehensive enough to cover all the different ways you can specify a link.

And a side question:

Is there 'one regex to rule them all'? Or am I better off using a series of less complicated regular expressions and just using mutliple passes against the raw HTML? (Speed vs. Maintainability)

Answer

((mailto\:|(news|(ht|f)tp(s?))\://){1}\S+)

I took this from regexlib.com

[editor's note: the {1} has no real function in this regex; see this post]

< br > via < a class="StackLink" href=" http://stackoverflow.com/questions/6173/" >Regular expression for parsing links from a webpage?< /a>
Share on Google Plus

About Cinema Guy

This is a short description in the author block about the author. You edit it by entering text in the "Biographical Info" field in the user admin panel.
    Blogger Comment
    Facebook Comment

0 comments:

Post a Comment