Skip to content

Instantly share code, notes, and snippets.

@poulter7
Created February 24, 2011 16:37
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save poulter7/842405 to your computer and use it in GitHub Desktop.
Save poulter7/842405 to your computer and use it in GitHub Desktop.
uRL connection stuff!
URLConnection connection = url.openConnection();
// setup connection to be polite
// connection.setRequestProperty("Accept", "image/*"); // image of any type
connection.setRequestProperty("Accept", "text/html");
connection.setRequestProperty("Accept-Charset","utf-8");
// connection.setRequestProperty("Accept-Encoding","gzip, deflate");
connection.setRequestProperty("Accept-Language", "en-GB, en-US, en-CA, en");
connection.setRequestProperty("Cache-Control","no-cache");
connection.setRequestProperty("Connection","Keep-Alive");
connection.setRequestProperty("Content-Type", "application/x-www-form-urlencoded");
// connection.setRequestProperty("Date","");
connection.setRequestProperty("From","pha07jrp@shef.ac.uk");
// connection.setRequestProperty("If-Modified-Since","");
// connection.setRequestProperty("Referer","");
connection.setRequestProperty("User-Agent", "SquidCrawler "+ version +" (UoS COM4230 module; pha07jrp@dcs.shef.ac.uk)"); // UA string
System.out.println("nom");
if ((connection.getContentType() != null)
&& !connection.getContentType().toLowerCase().startsWith(
"text/")) {
getWorkloadWaiting().remove(url);
getWorkloadProcessed().add(url);
log("Not processing because content type is: "
+ connection.getContentType());
return;
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment