Skip to content

Instantly share code, notes, and snippets.

Created February 7, 2015 05:14
Show Gist options
  • Save anonymous/8373f8a08a357146c20b to your computer and use it in GitHub Desktop.
Save anonymous/8373f8a08a357146c20b to your computer and use it in GitHub Desktop.
# fetch list of WET files
# see http://blog.commoncrawl.org/2015/01/december-2014-crawl-archive-available/
$ wget https://aws-publicdatasets.s3.amazonaws.com/common-crawl/crawl-data/CC-MAIN-2014-52/wet.paths.gz
# grab first one
$ zcat wet.paths.gz | head -n1
common-crawl/crawl-data/CC-MAIN-2014-52/segments/1418802764752.1/wet/CC-MAIN-20141217075244-00000-ip-10-231-17-201.ec2.internal.warc.wet.gz
# download it locally (140M)
$ s3cmd get s3://aws-publicdatasets/common-crawl/crawl-data/CC-MAIN-2014-52/segments/1418802764752.1/wet/CC-MAIN-20141217075244-00006-ip-10-231-17-201.ec2.internal.warc.wet.gz
# how many lines? (note: includes HTTP header stuff)
$ zcat CC-MAIN-20141217075244-00006-ip-10-231-17-201.ec2.internal.warc.wet.gz | wc -l
9_428_923
# grep for big data
$ zcat CC-MAIN-20141217075244-00006-ip-10-231-17-201.ec2.internal.warc.wet.gz | grep -i "big data" > bigdata.txt
# how many?
$ wc -l bigdata.txt
434 bigdata.txt # <--- now THATs big data right there!
$ shuf bigdata.txt | head
You need know your target customer intimately, so the research tools you use will need to be highly quantitative such as measuring emotional effect of your marketing through social media. Companies such as Crimson Hexagon http://www.crimsonhexagon.com/ can provide your marketing team with business intelligence from Big Data & social media. And don’t forget the follow-ups including direct emails, Facebook groups, special offers and customer loyalty programs. A good measure of how well you succeed is how well you have succeeded in changing your customers buying behaviour.
Social Meets Big Data: Get ReadyYou may not agree with Marc Benioff that Facebook looks like the future of the Web. But you'd better be ready for the mountain of data social media produces. "Everything I want in a consumer OS is in Facebook," Salesorce.com CEO Marc Benioff told the audience at the Web 2.0 Summit on Monday.
There is little doubt that Big Data solutions will have an increasing role in the Enterprise IT mainstream over time. Get a jump on that rapidly evolving trend at Big Data Expo, which we are introducing in June at
November- Really Big Data
Big Data
All Big Data
The Cloud Expo series is the fastest-growing Enterprise IT event in the past 10 years, devoted to every aspect of delivering massively scalable enterprise IT as a service.Reads: 11,031World's Top Expert Named "DevOps Summit 2014" Tech ChairBy Liz McMillanUlitzer.com announced "the World's 30 most influential Cloud bloggers," who collectively generated more than 24 million Ulitzer page views. Ulitzer's annual "most influential Cloud bloggers" list was announced at Cloud Expo, which drew more delegates than all other Cloud-related events put together worldwide. "The world's 50 most influential Cloud bloggers 2010" list will be announced at the Cloud Expo 2010 East, which will take place April 19-21, 2010, at the Jacob Javitz Convention Center, in New York City, with more than 5,000 expected to attend.Reads: 13,510How Context Will Solve the Big Data Problem for Sales & MarketingBy Howard BrownIt's a simple fact that the better sales reps understand their prospects' intentions, preferences and pain points during calls, the more business they'll close. Each day, as your prospects interact with websites and social media platforms, their behavioral data profile is expanding. It's now possible to gain unprecedented insight into prospects' content preferences, product needs and budget. We hear a lot about how valuable Big Data is to sales and marketing teams. But data itself is only valuable when it's part of a bigger story, made visible in the right context.Reads: 6,482Cloud Expo Names Larry Carvalho Tech ChairBy Elizabeth WhiteCloud Expo, Inc. has announced today that Larry Carvalho has been named Tech Chair of Cloud Expo® 2014.
Managing and taking advantage of “big data” in a way that will be cost-effective, yet add value to state agencies, will also be a focus for ASET in the future.
Transforming Operations - Part 1: Managing Outsourced Development in Telecommunications Catch the Security Breach Before It’s Out of Reach Step Up Your Game in Loan Operations in 2014 Advanced Threat Protection For Dummies ebook and Using Big Data Security Analytics to Identify Advanced Threats Webcast Defense Against the Dark Arts More Webcasts>>
“Big data” approaches to communication scholarship/the practice of public relations
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment