Skip to content

Instantly share code, notes, and snippets.

@tott
Created December 21, 2011 14:26
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tott/1506214 to your computer and use it in GitHub Desktop.
Save tott/1506214 to your computer and use it in GitHub Desktop.
Find urls in text and do something with them
<?php
if ( preg_match_all( '/(?P<protocol>(?:(?:f|ht)tp|https):\/\/)?(?P<domain>(?:(?!-)(?P<sld>[a-zA-Z\d\-]+)(?<!-)[\.]){1,2}(?P<tld>(?:[a-zA-Z]{2,}\.?){1,}){1,}|(?P<ip>(?:(?(?<!\/)\.)(?:25[0-5]|2[0-4]\d|[01]?\d?\d)){4}))(?::(?P<port>\d{2,5}))?(?:\/(?P<script>[~a-zA-Z\/.0-9-_]*)?(?:\?(?P<parameters>[=a-zA-Z+%&\&amp;\'\(\)0-9,.\/_ -]*))?)?(?:\#(?P<anchor>[=a-zA-Z+%&0-9._]*))?/x', $text, $data ) ) {
foreach ( $data['0'] as $url_key=>$url ) {
$domain = $data['sld'][$url_key].".".$data['tld'][$url_key];
$host = $data['domain'][$url_key];
$script = $data['script'][$url_key];
// for now we only want domains
//$urls[] = array('url'=>$url, 'host'=>$host, 'domain'=>$domain, 'script'=>$script);
$domains[$domain] = 0; // we do it like this to make sure we have only one hit per domain/context
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment