maheshcr/Sample-content-input.txt

## Sample-content-input.txt
Here is the latest Twitter scam I’ve heard this week. Consider two fictitious media, the Gazette and the Tribune operating on the same market, targeting the same demographics, competing fort the same online eyeballs (and the brains behind those). Our two online papers rely on four key traffic drivers:

Their own editorial efforts, aimed at building the brand and establishing a trusted relationship with the readers. Essential but, by itself, insufficient to reach the critical mass needed to lure advertisers.
Getting in bed with Google, with a two-strokes tactic: Search Engine Optimization (SEO), which helps climb to the top of search results page; and Search Engine Marketing (SEM), in which a brand buys keywords to position its ads in the best possible context.
An audience acquisition strategy that will artificially grow page views as well as the unique visitors count. Some sites will aggregate audiences that are remotely related to their core product, but that will better dress them up for the advertising market (more on this in a forthcoming column).
An intelligent use of social medias such Facebook, Twitter, LinkedIn and of the apps ecosystem as well.
Coming back to the Tribune vs. Gazette competition, let’s see how they deal with the latter item.

For both, Twitter is a reasonable source of audience, worth a few percentage points. More importantly, Twitter is a strong promotional vehicle. With 27,850 followers, the Tribune lags behind the Gazette and its 40,000 followers. Something must be done. The Tribune decides to work with a social media specialist. Over a couple of months, the firm gets to the Tribune to follow (in the Twitter sense) most of the individuals who already are Gazette followers. This mechanically translates into a “follow-back” effect powered by implicit flattery: ‘Wow, I’ve been spotted by the Tribune, I must have voice on some sort…’ In doing so, the Tribune will be able to vacuum up about a quarter or a third — that’s a credible rate of follow-back — of the Gazette followers. Later, the Tribune will “unfollow” the defectors to cover its tracks.

Compared to other more juvenile shenanigans, that’s a rather sophisticated scam. After all, in our example, one media is exploiting its competitor’s audience the way it would buy a database of prospects. It’s not ethical but it’s not illegal. And it’s effective: a significant part of the the followers so “converted” to the Tribune are likely stick to it as the two media do cover the same beat.

Sometimes, only size matters. Last December, the French blogger Cyroul (also a digital media consultant) uncovered a scam performed by Fred & Farid, one of the hippest advertising advertising agencies. In his post (in French) Cyroul explained how the ad agency got 5000 followers in a matter of five days. As in the previous example, the technique is based on the “mass following” technique but, this time, it has nothing to do with recruiting some form of “qualified” audience. Fred & Farid arranged to follow robots that, in turn, follow their account.  The result is a large number of new followers from Japan or China, all sharing the same characteristic: the ratio between following/followed is about one, which is, Cyroul say, the signature of bots-driven mass following. Pathetic indeed. His conclusion:

One day, your “influence” will be measured against real followers or fans as opposed to bots-induced accounts or artificial ones. Then, brands will weep as their fan pages will be worth nothing; ad agencies will cry as well when they realize that Twitter is worth nothing.

But wait, there are higher numbers on the crudeness scale: If you type “increase Facebook fans” in Google, you’ll get swamped with offers. Wading through the search results, I spotted one carrying a wide range of products: 10,000 views on YouTube for €189; 2000 Facebook “Likes” for €159; 10,000 followers on Twitter for €890, etc. You provide your URL, you pay on a secure server, it stays anonymous and the goods are delivered between 5 and 30 days.

The private sector is now allocating huge resources to fight the growing business of internet scams. Sometimes, it has to be done in a opaque way. One of the reasons why Google is not saying much about its ranking algorithm is — also — to prevent fraud.

As for Apple, its application ecosystem faces the same problem in. Over time, its ranking system became questionable as bots and download farms joined the fray. In a nutshell, as for the Facebook fans harvesting, the more you were willing to pay, the more notoriety you got thanks to inflated rankings and bogus reviews. Last week, Apple issued this warning to its developer community:

Adhering to Guidelines on Third-Party Marketing Services

Feb 6, 2012
Once you build a great app, you want everyone to know about it. However, when you promote your app, you should avoid using services that advertise or guarantee top placement in App Store charts. Even if you are not personally engaged in manipulating App Store chart rankings or user reviews, employing services that do so on your behalf may result in the loss of your Apple Developer Program membership.

Evidently, Apple has a reliability issue on how its half million apps are ranked and evaluated by users. Eventually, it could affect its business as the AppStore could become a bazaar in which the true value of a product gets lost in a quagmire of mediocre apps. This, by the way, is a push in favor of an Apple-curated guide described in the Monday Note by Jean-Louis (see Why Apple Should Follow Michelin). In the UK, several print publishers have detected the need for independent reviews; there, newsstands carry a dozen of app review magazines, not only covering Apple, but the Android market as well.

Obviously there is a market for that.

Because they depend heavily on advertising, preventing scams is critical for social networks such as Facebook or Twitter. In Facebook’s pre-IPO filing, I saw no mention of scams in the Risk Factors section, except in vaguest of terms. As for Twitter, all we know is the true audience is much smaller than the company says it is: Business Insider calculated that, out of the 175 million accounts claimed by Twitter, 90 million have zero followers.

For now, the system stills holds up. Brands remain convinced that their notoriety is directly tied to the number of fan/followers they claim — or their ad agency has been able to channel to them. But how truly efficient is this? How large is the proportion of bogus audiences? Today there appears to be no reliable metric to assess the value of a fan or a follower. And if there is, no one wants to know.

## test-client.py
import logging
from suds.client import Client
from suds import WebFault

# Setup basic logging
logging.basicConfig(level=logging.INFO)
logging.getLogger('suds.client').setLevel(logging.DEBUG)

url = "http://api3.pingar.com/PingarAPIService.asmx?WSDL"

# Get WSDL config
client = Client(url)

# Ensure we pick port required from many available
client.set_options(port='PingarAPIServiceSoap')

# Create appropriate types, setup values

apiRequest = client.factory.create("PingarAPIRequest")
apiRequest.AppID = "<register for AppID>"
apiRequest.AppKey = "<register for AppKey"

langCode = client.factory.create("LanguageCodes")
apiRequest.Language = langCode.en

# Set up vars for EntityExtraction Request
entityExtractRequest = client.factory.create("EntityExtractionRequest")

documentsArray = client.factory.create("ArrayOfString")

# Get sample content
content1 = open("samplecontent-input.txt").read()

content1 = content1.decode("utf-8")

documentsArray.string.append(content1)

entityExtractRequest.Documents = documentsArray

documentFormat = client.factory.create("DocumentFormat")
entityExtractRequest.DocumentsFormat = documentFormat.Text

# Need docs on what each param below means
entityExtractRequest.IncludeSingleSiblings = True
entityExtractRequest.NumberOfKeywords = 50
entityExtractRequest.WikifyLinkDensity = 0.50

apiRequest.EntityExtraction = entityExtractRequest

try:
    # Execute the actual request
    response = client.service.GetEntities(apiRequest)

    # line below gives raw xml received
    output = client.last_received()
    # Log the raw xml for reference
    resp = open("entity-extraction-result.txt", 'w')
    resp.write(str(output))
    resp.close()
except WebFault, e:
    print e
	Here is the latest Twitter scam I’ve heard this week. Consider two fictitious media, the Gazette and the Tribune operating on the same market, targeting the same demographics, competing fort the same online eyeballs (and the brains behind those). Our two online papers rely on four key traffic drivers:

	Their own editorial efforts, aimed at building the brand and establishing a trusted relationship with the readers. Essential but, by itself, insufficient to reach the critical mass needed to lure advertisers.
	Getting in bed with Google, with a two-strokes tactic: Search Engine Optimization (SEO), which helps climb to the top of search results page; and Search Engine Marketing (SEM), in which a brand buys keywords to position its ads in the best possible context.
	An audience acquisition strategy that will artificially grow page views as well as the unique visitors count. Some sites will aggregate audiences that are remotely related to their core product, but that will better dress them up for the advertising market (more on this in a forthcoming column).
	An intelligent use of social medias such Facebook, Twitter, LinkedIn and of the apps ecosystem as well.
	Coming back to the Tribune vs. Gazette competition, let’s see how they deal with the latter item.

	For both, Twitter is a reasonable source of audience, worth a few percentage points. More importantly, Twitter is a strong promotional vehicle. With 27,850 followers, the Tribune lags behind the Gazette and its 40,000 followers. Something must be done. The Tribune decides to work with a social media specialist. Over a couple of months, the firm gets to the Tribune to follow (in the Twitter sense) most of the individuals who already are Gazette followers. This mechanically translates into a “follow-back” effect powered by implicit flattery: ‘Wow, I’ve been spotted by the Tribune, I must have voice on some sort…’ In doing so, the Tribune will be able to vacuum up about a quarter or a third — that’s a credible rate of follow-back — of the Gazette followers. Later, the Tribune will “unfollow” the defectors to cover its tracks.

	Compared to other more juvenile shenanigans, that’s a rather sophisticated scam. After all, in our example, one media is exploiting its competitor’s audience the way it would buy a database of prospects. It’s not ethical but it’s not illegal. And it’s effective: a significant part of the the followers so “converted” to the Tribune are likely stick to it as the two media do cover the same beat.

	Sometimes, only size matters. Last December, the French blogger Cyroul (also a digital media consultant) uncovered a scam performed by Fred & Farid, one of the hippest advertising advertising agencies. In his post (in French) Cyroul explained how the ad agency got 5000 followers in a matter of five days. As in the previous example, the technique is based on the “mass following” technique but, this time, it has nothing to do with recruiting some form of “qualified” audience. Fred & Farid arranged to follow robots that, in turn, follow their account. The result is a large number of new followers from Japan or China, all sharing the same characteristic: the ratio between following/followed is about one, which is, Cyroul say, the signature of bots-driven mass following. Pathetic indeed. His conclusion:

	One day, your “influence” will be measured against real followers or fans as opposed to bots-induced accounts or artificial ones. Then, brands will weep as their fan pages will be worth nothing; ad agencies will cry as well when they realize that Twitter is worth nothing.

	But wait, there are higher numbers on the crudeness scale: If you type “increase Facebook fans” in Google, you’ll get swamped with offers. Wading through the search results, I spotted one carrying a wide range of products: 10,000 views on YouTube for €189; 2000 Facebook “Likes” for €159; 10,000 followers on Twitter for €890, etc. You provide your URL, you pay on a secure server, it stays anonymous and the goods are delivered between 5 and 30 days.

	The private sector is now allocating huge resources to fight the growing business of internet scams. Sometimes, it has to be done in a opaque way. One of the reasons why Google is not saying much about its ranking algorithm is — also — to prevent fraud.

	As for Apple, its application ecosystem faces the same problem in. Over time, its ranking system became questionable as bots and download farms joined the fray. In a nutshell, as for the Facebook fans harvesting, the more you were willing to pay, the more notoriety you got thanks to inflated rankings and bogus reviews. Last week, Apple issued this warning to its developer community:

	Adhering to Guidelines on Third-Party Marketing Services

	Feb 6, 2012
	Once you build a great app, you want everyone to know about it. However, when you promote your app, you should avoid using services that advertise or guarantee top placement in App Store charts. Even if you are not personally engaged in manipulating App Store chart rankings or user reviews, employing services that do so on your behalf may result in the loss of your Apple Developer Program membership.

	Evidently, Apple has a reliability issue on how its half million apps are ranked and evaluated by users. Eventually, it could affect its business as the AppStore could become a bazaar in which the true value of a product gets lost in a quagmire of mediocre apps. This, by the way, is a push in favor of an Apple-curated guide described in the Monday Note by Jean-Louis (see Why Apple Should Follow Michelin). In the UK, several print publishers have detected the need for independent reviews; there, newsstands carry a dozen of app review magazines, not only covering Apple, but the Android market as well.

	Obviously there is a market for that.

	Because they depend heavily on advertising, preventing scams is critical for social networks such as Facebook or Twitter. In Facebook’s pre-IPO filing, I saw no mention of scams in the Risk Factors section, except in vaguest of terms. As for Twitter, all we know is the true audience is much smaller than the company says it is: Business Insider calculated that, out of the 175 million accounts claimed by Twitter, 90 million have zero followers.

	For now, the system stills holds up. Brands remain convinced that their notoriety is directly tied to the number of fan/followers they claim — or their ad agency has been able to channel to them. But how truly efficient is this? How large is the proportion of bogus audiences? Today there appears to be no reliable metric to assess the value of a fan or a follower. And if there is, no one wants to know.
	import logging
	from suds.client import Client
	from suds import WebFault

	# Setup basic logging
	logging.basicConfig(level=logging.INFO)
	logging.getLogger('suds.client').setLevel(logging.DEBUG)

	url = "http://api3.pingar.com/PingarAPIService.asmx?WSDL"

	# Get WSDL config
	client = Client(url)

	# Ensure we pick port required from many available
	client.set_options(port='PingarAPIServiceSoap')

	# Create appropriate types, setup values

	apiRequest = client.factory.create("PingarAPIRequest")
	apiRequest.AppID = "<register for AppID>"
	apiRequest.AppKey = "<register for AppKey"

	langCode = client.factory.create("LanguageCodes")
	apiRequest.Language = langCode.en

	# Set up vars for EntityExtraction Request
	entityExtractRequest = client.factory.create("EntityExtractionRequest")

	documentsArray = client.factory.create("ArrayOfString")

	# Get sample content
	content1 = open("samplecontent-input.txt").read()

	content1 = content1.decode("utf-8")

	documentsArray.string.append(content1)

	entityExtractRequest.Documents = documentsArray

	documentFormat = client.factory.create("DocumentFormat")
	entityExtractRequest.DocumentsFormat = documentFormat.Text

	# Need docs on what each param below means
	entityExtractRequest.IncludeSingleSiblings = True
	entityExtractRequest.NumberOfKeywords = 50
	entityExtractRequest.WikifyLinkDensity = 0.50

	apiRequest.EntityExtraction = entityExtractRequest

	try:
	# Execute the actual request
	response = client.service.GetEntities(apiRequest)

	# line below gives raw xml received
	output = client.last_received()
	# Log the raw xml for reference
	resp = open("entity-extraction-result.txt", 'w')
	resp.write(str(output))
	resp.close()
	except WebFault, e:
	print e