Skip to content

Instantly share code, notes, and snippets.

@DHuckaby
Created October 19, 2012 18:44
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save DHuckaby/3919930 to your computer and use it in GitHub Desktop.
Save DHuckaby/3919930 to your computer and use it in GitHub Desktop.
Extract urls from plaintext
public class Extractor {
public static ArrayList<String> extract(String text) {
ArrayList<String> links = new ArrayList<String>();
String regex = "\\(?\\b(http://|www[.])[-A-Za-z0-9+&@#/%?=~_()|!:,.;]*[-A-Za-z0-9+&@#/%=~_()|]";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(text);
while (m.find()) {
String urlStr = m.group();
if (urlStr.startsWith("(") && urlStr.endsWith(")")) {
urlStr = urlStr.substring(1, urlStr.length() - 1);
}
links.add(urlStr);
}
return links;
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment