Last active

Embed URL

HTTPS clone URL

SSH clone URL

You can clone with HTTPS or SSH.

Download Gist

Official prerender.io nginx.conf for nginx

View nginx.conf
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
# Change YOUR_TOKEN to your prerender token and uncomment that line if you want to cache urls and view crawl stats
# Change example.com (server_name) to your website url
# Change /path/to/your/root to the correct value
 
server {
listen 80;
server_name example.com;
root /path/to/your/root;
index index.html;
 
location / {
try_files $uri @prerender;
}
location @prerender {
#proxy_set_header X-Prerender-Token YOUR_TOKEN;
set $prerender 0;
if ($http_user_agent ~* "baiduspider|twitterbot|facebookexternalhit|rogerbot|linkedinbot|embedly|quora link preview|showyoubot|outbrain|pinterest|slackbot|vkShare|W3C_Validator") {
set $prerender 1;
}
if ($args ~ "_escaped_fragment_") {
set $prerender 1;
}
if ($http_user_agent ~ "Prerender") {
set $prerender 0;
}
if ($uri ~ "\.(js|css|xml|less|png|jpg|jpeg|gif|pdf|doc|txt|ico|rss|zip|mp3|rar|exe|wmv|doc|avi|ppt|mpg|mpeg|tif|wav|mov|psd|ai|xls|mp4|m4a|swf|dat|dmg|iso|flv|m4v|torrent|ttf|woff)") {
set $prerender 0;
}
#resolve using Google's DNS server to force DNS resolution and prevent caching of IPs
resolver 8.8.8.8;
if ($prerender = 1) {
#setting prerender as a variable forces DNS resolution since nginx caches IPs and doesnt play well with load balancing
set $prerender "service.prerender.io";
rewrite .* /$scheme://$host$request_uri? break;
proxy_pass http://$prerender;
}
if ($prerender = 0) {
rewrite .* /index.html break;
}
}
}

condition

    if ($uri ~ ".js|.css|.xml|.less|.png|.jpg|.jpeg|.gif|.pdf|.doc|.txt|.ico|.rss|.zip|.mp3|.rar|.exe|.wmv|.doc|.avi|.ppt|.mpg|.mpeg|.tif|.wav|.mov|.psd|.ai|.xls|.mp4|.m4a|.swf|.dat|.dmg|.iso|.flv|.m4v|.torrent") {

incorrect because

    $uri ~ ".ico"

mean "string consist of one any character THEN 'ico'" that matched the URL /icon/ and it is wrong

I suggest this one:

    if ($uri ~ "\.(js|css|xml|less|png|jpg|jpeg|gif|pdf|doc|txt|ico|rss|zip|mp3|rar|exe|wmv|doc|avi|ppt|mpg|mpeg|tif|wav|mov|psd|ai|xls|mp4|m4a|swf|dat|dmg|iso|flv|m4v|torrent)$") {

that means "string is finished dot and one of strings 'js' OR 'css' etc"

Owner

You're right, thanks. I'll update that.

How about html5mode of angularjs? Is prerender supports html5mode, would it be enough to add googlebot to the if statement?

        if ($http_user_agent ~* "baiduspider|twitterbot|facebookexternalhit|googlebot") {
            set $prerender 1;
        }
Owner
thoop commented

Google's recommended way is to use escaped_fragment. In theory, yes we should add it to the user agent check, but in reality we wouldn't want Google to request the same URL using another user agent and then think the user is cloaking because the content is different. So we try to stay on the safe side.

Confused... why isn't googlebot listed?

@sansmischevia : because googlebot uses the escaped_fragment, which is listed.

If users are using HTML5 pushState, surely Google will request the URL's without escaped_fragment ?

Search bots look for --> <meta name="fragment" content="!" /> in the head tag. Read https://developers.google.com/webmasters/ajax-crawling/docs/specification
Pages without hash fragments
It may be impossible or undesirable for some pages....

I'm not too sure on

  rewrite .* /$scheme://example.com$request_uri? break;

Am I only replacing the example.com ? Could we not use the $server_name variable for this?

  rewrite .* /$scheme://$server_name$request_uri? break;

If our "location /" block is where we typically specify a reverse proxy with proxy_pass, should I assume we would essentially add that to the "if (@prerender = 0)" section?

Owner

@sentient good idea :)

@akoumjian yes, in that case you would do your own proxy_pass "if @prerender = 0"

Owner

I added this since we were seeing issues where nginx was caching IPs and hitting servers that might have been taken out of our load balancer rotation:

#resolve using Google's DNS server
resolver 8.8.8.8;

#setting prerender as a variable forces DNS resolution since nginx caches IPs and doesnt play well with load balancing
set $prerender "service.prerender.io";
rewrite .* /$scheme://$server_name$request_uri? break;
proxy_pass http://$prerender;

With nginx 1.4.6 (ubuntu) I get an error when I try to restart nginx service or reload the configuration.

nginx: [emerg] "resolver" directive is not allowed here in /etc/nginx/sites-enabled/nginx.conf

I had to move resolver out of the conditional for it to work but this isn't ideal. Any ideas?

I'm wondering if $host is not a better choice than $server_name in

rewrite .* /$scheme://$server_name$request_uri? break;

There can be multiple $server_name values in a single server{...} but $host is the one in the HTTP requests (and falls back to $server_name anyway)

@thoop @patrickng I too have the same issue ('resolver' directive is not allowed here)

Owner

Sorry, I wish github would send notifications for comments on gists :(

@fergusg we used to have it set to $host but recently changed it to $server_name.

I'll look into a better solution for the resolver in the if-conditional.

Vkontakte social network uses this - "Mozilla/5.0 (compatible; vkShare; +http://vk.com/dev/Share)"
UserAgent for sharing functionality. Therefore you need to add 'vkshare' to detect list

pinterest|vkshare

See embeding docs. Sharing uses the same userAgent

If behind a load balancer, the $scheme var may not be set right - if the LB is doing SSL termination, the scheme on the machine behind the box may be http. Prerender service would then try to access http://, but get bounced to https://, which in my case did not work. I had to hard-code https:// in there.

Any update on @patrickng resolver issue? I took out the conditional as well.

Owner

@leorue can you try the new nginx config? I just updated it to move the resolver outside of the "if" statement.

Owner

I just changed $server_name back to $host. Hopefully that clears up any issues with the server name not being the actual url of your site.

Hi, do I have to install something on my webserver running nginx as frontend HTTP server, or do I simply need to add this snippet to my vhost .conf ?

Owner

You should just be able to add this snippet to your .conf file. Email me at todd@prerender.io if you're having any problems with it. Github doesn't send notifications on gists so I'll be able to help you more quickly over email.

http://wiki.nginx.org/IfIsEvil

Directive if has problems when used in location context, in some cases it doesn't do what you expect but something completely different instead. In some cases it even segfaults. It's generally a good idea to avoid it if possible.

The only 100% safe things which may be done inside if in location context are:

return ...;
rewrite ... last;
Anything else may possibly cause unpredictable behaviour, including potential SIGSEGV.

It is important to note that the behaviour of if is not inconsistent, given two identical requests it will not randomly fail on one and work on the other, with proper testing and understanding ifs can be used. The advice to use other directives where available still very much apply, though.

Google doesn't use ?escaped_fragment= for all of it's services. It might do for it's indexer but for instance when I use it from the Webmaster tools "Fetch as Google", it correctly renders it but the HTML it received was before rendering:

66.249.75.21 - - [08/Dec/2014:12:11:51 +0000] "GET /owner/ HTTP/1.1" 200 6292 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.75.37 - - [08/Dec/2014:12:11:51 +0000] "GET /owner/?_escaped_fragment_= HTTP/1.1" 200 9401 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

In fact, I am ALWAYS seeing two requests from Googlebot - One with and without the escaped fragment.
I'm seeing other services without the escaped fragment as well: Google Page Speed Insights, Google-StructuredDataTestingTool, Google Web Preview.

A few that I added myself: (tumblr|slackbot|xml-sitemaps|google-structureddatatestingtool). Definitely need tubmlr :) It's a shame we have to go with a whitelist approach, because there are so many services where my site is just invisible.

@intellix I guess that in the first request google find the and then do a second request with escaped_fragment. Before the first request it did not know that it needs to use escaped_fragment

Owner

@varuzhnikov if we could refactor to remove if statements, that would be ideal. Any idea on the best way to do that? We haven't had any problems with this configuration yet.

@intellix @csbenjamin is correct. The first request see's your tag and re-requests the page with the escaped fragment parameter. "Fetch as Google" does not follow the escaped fragment, and that is a known bug. If you are seeing services that do not send two requests (one with escaped fragment), then we should add them to the whitelist of user agents.

I use self hosted prerender and nginx server/angularjs html application but when try share contentn on facebook or linked I always got root page (index.html) /

nginx conf
set $prerender "85.10.211.83:4000";

rewrite .* /$scheme://$host$request_uri? break;
proxy_pass http://$prerender;

just for review:
http://85.10.211.83:4000/http://novi.bktvnews.com:3030/#!/
and single pages is same
http://85.10.211.83:4000/http://novi.bktvnews.com:3030/#!/grafika-iz-dalarne-na-putu-vas

in app.js
I have

$locationProvider.hashPrefix('!');

and in index.html template

we had a AWS nginx proxy setup and were having timeout issues - service.prerender.io could not be resolved (60: Operation timed out). We were able to get around this by using nginx upstream:

upstream pre-render {
   server service.prerender.io;
}

and the if in location block changed to:

if ($prerender = 1) {
   rewrite ^(.*)$ /https://$server_name..... break;
   proxy_pass http://pre-render;
}

I use the official server :"service.prerender.io" and it works, but when i try to use my own server ,it didn't. I have test it with http://myserver.com/http://www.google.com and it work with my server which i think it mean the server work. But when i replace "service.prerender.io" with "myserver.com" , it didn't work and got an error with 502 Bad Gateway . Anyone know why?

I use nginx like a reverse proxy and i would like to use prerender. I don't know how to adapt the nginx.conf especially this section:

location / {
try_files $uri @prerender;
}

I can't use the "try_files" instruction as i use "proxy_pass" instruction.

Do you know how to do ? Thank you

@creatorkuang it'd be hard to know without seeing your configuration. myserver.com needs to be running the open-source prerender node server and on the right port (by default it's 3000), and your nginx prerender middleware has to be configured to proxy there if the prerender variable = 1. E.g. proxy_pass http://myserver.com; But I've only tried the open-source server from localhost:3000.

@geolart I think some people above commented that you can do that by moving your proxy_pass line into the location @prerender block if the prerender variable = 0 (instead of rewrite .* /index.html break;). Let me know if it doesn't work for you though.

@rkulla Thanks to you it's work for me !
Thank you Rhulla !

When also using nginx as a reverse proxy via proxy_pass http://myupstream, I had an issue where if I do proxy_set_header X-Prerender-Token XXX in the location @prerender block, it reset my other proxy_set_header lines, causing 'http://myupstream' to be treated as a literal URL. However, things work fine if I redefine my proxy_set_header 'Host', etc in the same block as X-Prerender-Token -- either all in the location / or all in the location @prerender, but not divided.

Owner

Send me an email at support@prerender.io if anyone has any issues. I don't get notified when someone comments on this gist.

As per Facebook official documentation, you should also add the user agent 'facebot' to $http_user_agent

https://developers.facebook.com/docs/sharing/best-practices#crawl

I have monitored many requests that come with Googlebot and AdsBot-Google-Mobile as User-Agent, so I added it to my list.

There is an exhaustive of those User-Agents somewhere?

You might want to add 'svg' to the list of extensions not prerendered.

@evandhoffman we have the same issue when hosting on Heroku, which in turn seems to be using Amazon ELB. $scheme is always HTTP. As an alternative to hardcoding we decided to use $http_x_forwarded_proto instead of $scheme.

Owner
thoop commented

@mrgamer you don't want to add Googlebot (or any other crawlers that support the escaped fragment protocol) to the user agent list. You could get penalized for cloaking. You want Google to continue using the escaped fragment protocol

@ermakovich great idea!

Send me an email at support@prerender.io if anyone has any issues. I don't get notified when someone comments on this gist.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.