Skip to content

Instantly share code, notes, and snippets.

@JamesChevalier
Last active November 28, 2022 12:58
Show Gist options
  • Star 6 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save JamesChevalier/d212c1998360520dd8a0e67cf6f2fd9c to your computer and use it in GitHub Desktop.
Save JamesChevalier/d212c1998360520dd8a0e67cf6f2fd9c to your computer and use it in GitHub Desktop.
How to run Newspaper in an Amazon Lambda function
from newspaper import Article
def lambda_handler(event, context):
url = event['url']
article = Article(url)
article.download()
article.parse()
return {
'content' : article.text
}

How to run Newspaper (the Python 2.7 version) in an Amazon Lambda function:

  • Start a new EC2 instance with the Amazon Linux AMI
  • sudo yum install gcc gcc-c++ libjpeg-devel zlib-devel libevent-devel libxml2-devel libxslt-devel libpng-devel
  • sudo yum install python27-devel python27-pip
  • virtualenv env
  • source env/bin/activate
  • sudo /usr/bin/easy_install lxml
  • pip install newspaper
  • nano env/local/lib/python2.7/site-packages/newspaper/settings.py
    • change DATA_DIRECTORY variable value to '/tmp/.newspaper_scraper'
  • zip -9 bundle.zip lambda_function.py
  • cd $VIRTUAL_ENV/lib/python2.7/site-packages
  • zip -r9 ~/bundle.zip *
  • cd $VIRTUAL_ENV/lib64/python2.7/site-packages
  • zip -r9 ~/bundle.zip *
  • Upload the bundle.zip file to your Lambda function
    • This assumes a default Handler set to lambda_function.lambda_handler
  • Delete your EC2 instance
@rakeshtembhurne
Copy link

I have a few questions. Why a new EC2 instance was created? Can it not be done on Ubuntu machine or Mac?
Do you have any suggestions for newspaper3K version?

@DucNguyenVan
Copy link

@will3216
Copy link

will3216 commented May 2, 2018

Really wish I had found this yesterday lol... That said, I've had been stuck on the DATA_DIR issue for a couple hours, so I am glad I found this!

Thanks for posting it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment