Question
I'm looking for a .NET regular expression extract all the URLs from a webpage but haven't found one to be comprehensive enough to cover all the different ways you can specify a link.
And a side question:
Is there 'one regex to rule them all'? Or am I better off using a series of less complicated regular expressions and just using mutliple passes against the raw HTML? (Speed vs. Maintainability)
Answer
((mailto\:|(news|(ht|f)tp(s?))\://){1}\S+)
I took this from regexlib.com
[editor's note: the {1} has no real function in this regex; see this post]
< br > via < a class="StackLink" href=" http://stackoverflow.com/questions/6173/" >Regular expression for parsing links from a webpage?< /a>
0 comments:
Post a Comment