-
-
Save thoop/8165802 to your computer and use it in GitHub Desktop.
# Change YOUR_TOKEN to your prerender token | |
# Change example.com (server_name) to your website url | |
# Change /path/to/your/root to the correct value | |
server { | |
listen 80; | |
server_name example.com; | |
root /path/to/your/root; | |
index index.html; | |
location / { | |
try_files $uri @prerender; | |
} | |
location @prerender { | |
proxy_set_header X-Prerender-Token YOUR_TOKEN; | |
set $prerender 0; | |
if ($http_user_agent ~* "googlebot|bingbot|yandex|baiduspider|twitterbot|facebookexternalhit|rogerbot|linkedinbot|embedly|quora link preview|showyoubot|outbrain|pinterest\/0\.|pinterestbot|slackbot|vkShare|W3C_Validator|whatsapp") { | |
set $prerender 1; | |
} | |
if ($args ~ "_escaped_fragment_") { | |
set $prerender 1; | |
} | |
if ($http_user_agent ~ "Prerender") { | |
set $prerender 0; | |
} | |
if ($uri ~* "\.(js|css|xml|less|png|jpg|jpeg|gif|pdf|doc|txt|ico|rss|zip|mp3|rar|exe|wmv|doc|avi|ppt|mpg|mpeg|tif|wav|mov|psd|ai|xls|mp4|m4a|swf|dat|dmg|iso|flv|m4v|torrent|ttf|woff|svg|eot)") { | |
set $prerender 0; | |
} | |
#resolve using Google's DNS server to force DNS resolution and prevent caching of IPs | |
resolver 8.8.8.8; | |
if ($prerender = 1) { | |
#setting prerender as a variable forces DNS resolution since nginx caches IPs and doesnt play well with load balancing | |
set $prerender "service.prerender.io"; | |
rewrite .* /$scheme://$host$request_uri? break; | |
proxy_pass http://$prerender; | |
} | |
if ($prerender = 0) { | |
rewrite .* /index.html break; | |
} | |
} | |
} |
+1 to @jeerbl for pointing out that this config file is missing Google Plus. The Google+ Snippet Fetcher User Agent is:
Google (+https://developers.google.com/+/web/snippet/)
For my part I added it to my config file UserAgent regex as developers\.google\.com
Also, if you want to use the Structured Data Testing Tool, you will want to add Google-Structured-Data-Testing-Tool
to the regex as well.
Final example:
if ($http_user_agent ~* "baiduspider|twitterbot|facebookexternalhit|rogerbot|linkedinbot|embedly|quora link preview|showyoubot|outbrain|pinterest|slackbot|vkShare|W3C_Validator|developers\.google\.com|Google-Structured-Data-Testing-Tool") {
set $prerender 1;
}
Please add redditbot
and Discordbot
to the list of user agents.
if we use proxy for caching requests can we use proxy_cache_bypass based on user agent?
The goal is that any refresh request made by prerender always takes latest version of the content. Any suggestions?
You may have already found a solution for this but I just ran into this too. Rather than cache based on user-agent (which didn't seem efficient as there are lots of possible UA strings), I use the same $prerender
variable as the input to proxy_cache_bypass.
Edit:
Also added proxy_no_cache $prerender
so that the response is not added to the cache in these cases either.
I tried adding this config and it didn't work right away because of my try_files:
location / { try_files $uri @prerender $uri/ =404; }
I ended up having to remove $uri/ =404 and it worked, but now non-existent pages returned index.html! I don't want this behavior, and narrowed it down to removing that pesky
if ($prerender = 0) { rewrite .* /index.html break; }
Why does this even exist here? Totally confusing.
I am getting Error: 2019/02/06 18:03:19 [error] 7#7: *17 service.prerender.io could not be resolved (110: Operation timed out), client: 10.0.101.66, server: , request: "GET / HTTP/1.1"
with nginx as a sidecar in ECS. Please follow the nginx conf below.
Command: curl -vvv --header "User-Agent:googlebot" https://example.com
upstream backend {
server example:3000;
}
map $http_upgrade $connection_upgrade {
default upgrade;
'' close;
}
server {
listen 80;
add_header X-Whom $hostname;
proxy_buffering off;
proxy_buffer_size 128k;
proxy_buffers 4 256k;
proxy_busy_buffers_size 256k;
location / {
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade; #for websockets
proxy_set_header Connection $connection_upgrade;
proxy_set_header X-Forwarded-For $remote_addr;
proxy_set_header Host $host;
access_log /dev/stdout apm;
error_log stderr notice;
try_files $uri @prerender;
}
location @prerender {
proxy_set_header X-Prerender-Token "TOKEN_VALUE";
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade; #for websockets
proxy_set_header Connection $connection_upgrade;
proxy_set_header X-Forwarded-For $remote_addr;
proxy_set_header Host $host;
access_log /dev/stdout apm;
error_log stderr notice;
set $prerender 0;
if ($http_user_agent ~* "googlebot|bingbot|baiduspider|twitterbot|facebookexternalhit|rogerbot|linkedinbot|embedly|quora link preview|showyoubot|outbrain|pinterest|slackbot|vkShare|W3C_Validator|WhatsApp|Twitterbot|showyoubot|outbrain|vkShare|Slack-ImgProxy|Slackbot-LinkExpanding|Site Analyzer|SiteAnalyzerBot|Viber|Whatsapp|Telegram|W3C_Validator") {
set $prerender 1;
}
resolver 8.8.8.8;
if ($prerender = 1) {
set $prerender "service.prerender.io";
#rewrite .* /https://$host$request_uri? break;
rewrite .* $host$request_uri? break;
#proxy_pass http://$prerender;
proxy_pass http://$prerender;
}
if ($prerender = 0) {
proxy_pass http://backend;
}
}
}
location ~* /article/([a-z]|[A-Z]|[0-9]|[-]) {
rewrite ^/article/((([a-z]|[A-Z]|[0-9]|[-])+))?$ /article.html?$1 break;
}
how can render whit try_files inside on this route?, i want this rewrite for webpack url, but i can't access from prerender, any idea?
how come DuckDuckBot isn't in the list of crawlers?
doc appears twice in the extension regexp
For anybody having issues setting up their own pre-render server with NGINX and/or getting 502 errors.
Check your NGINX HTTPS and HTTPS logs for more information on the issue.
if "502 Bad Gateway" error throws on centos api url for api gateway proxy pass on NGINX, run following command to solve the issue:
sudo setsebool -P httpd_can_network_connect 1
If you have any issue with AWS loadbalancer https redirect you will need to force it to https
#rewrite .* /$scheme://$host$request_uri? break;
rewrite .* /https://$host$request_uri? break;
Will it work with cloudflare DNS?
I was using Prerender with Wordpress and HTTPS protocol and I had to consult Prerender.io to get it to work. I had to remove $http_user_agent ~ "Prerender" condition to be able to use try_files as can be seen in this gist https://gist.github.com/torava/324af0b11fafe6c5e1cc8e02121af608
"502 Bad Gateway" - check resolve dns 8.8.8.8 in your network. It was forbidden on my network and this did not work.
for my nginx config I was unable to include a try_files in an if statement:
It fails with "not allowed"
if ($prerender = 1) {
#setting prerender as a variable forces DNS resolution since nginx caches IPs and doesnt play well with load balancing
set $prerender "service.prerender.io";
rewrite .* /$scheme://$host$request_uri? break;
proxy_pass http://$prerender;
}
if ($prerender = 0) {
try_files $uri $uri/ /index.php?$query_string;
}
I have excluded the if($prerender = 0) {...
Now I just do the try_files after the first if
if ($prerender = 1) {
#setting prerender as a variable forces DNS resolution since nginx caches IPs and doesnt play well with load balancing
set $prerender "service.prerender.io";
rewrite .* /$scheme://$host$request_uri? break;
proxy_pass http://$prerender;
}
try_files $uri $uri/ /index.php?$query_string;
Is this fine?
Thank you for this @thoop! Similar to slackbot
, I request you to please add discordbot
to the list of bots - https://developers.whatismybrowser.com/useragents/parse/572332-discord-bot
I have this site: https://estimationpoker.de. It is written in Angular. Unfortunately all subsites can be crawled by google like /impressum or something except the root site. How is this possible? I am using the nginx config from here.
Hi all, I use prerender in my project for SEO. the nginx.conf is following:
`location / {
root /dist;
try_files $uri @prerender;
# try_files $uri $uri/ /index.html;
# index index.html index.htm;
}
location @prerender {
# proxy_set_header X-Prerender-Token YOUR_TOKEN;
set $prerender 0;
if ($http_user_agent ~* "googlebot|bingbot|yandex|baiduspider|twitterbot|facebookexternalhit|rogerbot|linkedinbot|embedly|quora link preview|showyoubot|outbrain|pinterest\/0\.|pinterestbot|slackbot|vkShare|W3C_Validator|whatsapp") {
set $prerender 1;
}
if ($args ~ "_escaped_fragment_") {
set $prerender 1;
}
if ($http_user_agent ~ "Prerender") {
set $prerender 0;
}
if ($uri ~* "\.(js|css|xml|less|png|jpg|jpeg|gif|pdf|doc|txt|ico|rss|zip|mp3|rar|exe|wmv|doc|avi|ppt|mpg|mpeg|tif|wav|mov|psd|ai|xls|mp4|m4a|swf|dat|dmg|iso|flv|m4v|torrent|ttf|woff|svg|eot)") {
set $prerender 0;
}
#resolve using Google's DNS server to force DNS resolution and prevent caching of IPs
resolver 8.8.8.8;
if ($prerender = 1) {
#setting prerender as a variable forces DNS resolution since nginx caches IPs and doesnt play well with load balancing
set $prerender "127.0.0.1:3000";
rewrite .* /$scheme://$host$request_uri? break;
proxy_pass http://$prerender;
break;
}
try_files $uri $uri/ /index.html;
}`
when I using the command: curl http://localhost:3000/render?url=http://localhost
The result is right. But when I using the command: curl http://localhost:3000/render?url=http://localhost/help
The result is 404. Does anyone know the reason? Please tell me. Thanks in advance!
Hi, can someone help me, I added GTmetrix to my http_user_agent, but it is not working. Google and others work though.
`
location @prerender {
proxy_set_header X-Prerender-Token XXXXXXXXX;
set $prerender 0;
if ($http_user_agent ~* "baiduspider|twitterbot|facebookexternalhit|rogerbot|linkedinbot|embedly|quora link preview|showyoubot|outbrain|pinterest|slackbot|vkShare|W3C_Validator|GTmetrix|screaming frog seo spider|screamingfrogseospider|chrome-lighthouse") {
set $prerender 1;
}
if ($args ~ "_escaped_fragment_|prerender=1") {
set $prerender 1;
}
if ($http_user_agent ~ "Prerender") {
set $prerender 0;
}
if ($prerender = 1) {
rewrite .* /?url=$scheme://$host$request_uri? break;
proxy_pass http://prerender.freyagriculturalproducts.com;
}
if ($prerender = 0) {
proxy_pass http://127.0.0.1:5080;
}
}
`
Has anyone been able to figure out how to handle relative URLs with nginx? I am serving a SPA site over nginx and some paths that webpack builds like CSS are relative. So the site does not render properly.
Support docsearch-scraper?
For those struggling with the nginx configuration on SPAs (react, angular, vue.js), here's an overview on how it should work:
How the @prerender location works 🥇 :
Given we have a default location (/) with "try_files $uri @prerender", here's what happens:
First, nginx will try to find a real file on the directory, such as images, js or css files.
If there is a match, nginx will return that file.
If no real file is found, nginx will try the @prerender location:
If the user agent is a bot (google, twitter, etc), set a variable $prerender to 1
If the user agent is prerender itself, set it to 0 back again
If the uri is for a file such as js, css, imgs, set it to 0 again
Now, if after all of these conditions we have $prerender==1, we will;
Proxy pass the request to prerender and return the cached html file
If we have prerender=0:
Default to our own index.html.
Since this is a SPA, all uris that are not a real file will be redirected to index.html
Caveats 🚨
- The
if
directive in nginx can be tricky (please read https://www.nginx.com/resources/wiki/start/topics/depth/ifisevil/) - In summary, we should never use try_files and ifs in the same location block.
- This is the reason why we have to use rewrite on the @location block instead of try_files.
Final nginx conf with SSL 🔒
It's very similar to the first example, I just needed to add a root path in each location:
# MANAGED BY PUPPET
server {
listen *:443 ssl;
server_name example.com;
ssl_certificate /etc/nginx/ssl/example.crt;
ssl_certificate_key /etc/nginx/ssl/example.key;
access_log /var/log/nginx/example.log combined_cloudflare;
error_log /var/log/nginx/ssl-example.error.log;
add_header "X-Clacks-Overhead" "GNU Terry Pratchett";
location / {
root /usr/local/example/webapp-build;
try_files $uri @prerender;
add_header Cache-Control "no-cache";
}
# How the @prerender location works:
# Given we have a default location (/) with "try_files $uri @prerender", here's what happens:
# First, nginx will try to find a real file on the directory, such as images, js or css files.
# If there is a match, nginx will return that file.
# If no real file is found, nginx will try the @prerender location:
# If the user agent is a bot (google, twitter, etc), set a variable $prerender to 1
# If the user agent is prerender itself, set it to 0 back again
# If the uri is for a file such as js, css, imgs, set it to 0 again
# Now, if after all of these conditions we have $prerender==1, we will;
# Proxy pass the request to prerender and return the cached html file
# If we have prerender=0:
# Default to our own index.html.
# Since this is a SPA, all uris that are not a real file will be redirected to index.html
# **** CAVEATS ****
# - The `if` directive in nginx can be tricky (please read https://www.nginx.com/resources/wiki/start/topics/depth/ifisevil/)
# - In summary, we should never use try_files and ifs in the same location block.
# - This is the reason why we have to use rewrite on the @location block instead of try_files.
location @prerender {
root /usr/local/example/webapp-build;
proxy_set_header X-Prerender-Token "YOUR_TOKEN_GOES_HERE";
set $prerender 0;
if ($http_user_agent ~* "googlebot|bingbot|yandex|baiduspider|twitterbot|facebookexternalhit|rogerbot|linkedinbot|embedly|quora link preview|showyoubot|outbrain|pinterest\/0\.|pinterestbot|slackbot|vkShare|W3C_Validator|whatsapp") {
set $prerender 1;
}
if ($args ~ "_escaped_fragment_") {
set $prerender 1;
}
if ($http_user_agent ~ "Prerender") {
set $prerender 0;
}
if ($uri ~* "\.(js|css|xml|less|png|jpg|jpeg|gif|pdf|doc|txt|ico|rss|zip|mp3|rar|exe|wmv|doc|avi|ppt|mpg|mpeg|tif|wav|mov|psd|ai|xls|mp4|m4a|swf|dat|dmg|iso|flv|m4v|torrent|ttf|woff|svg|eot)") {
set $prerender 0;
}
#resolve using Google's DNS server to force DNS resolution and prevent caching of IPs
resolver 8.8.8.8;
if ($prerender = 1) {
#setting prerender as a variable forces DNS resolution since nginx caches IPs and doesnt play well with load balancing
set $prerender "service.prerender.io";
rewrite .* /$scheme://$host$request_uri? break;
proxy_pass http://$prerender;
break;
}
rewrite .* /index.html break;
}
}
If you are using puppet to manage your infrastructure... 🤖
I preferred creating a template file with the @prerender
location for readability. Additionally, this config should only be done in production, so I added some conditionals as well:
if $environment == 'production' {
# prerender.io is a SEO tool used to process javascript websites into a robot-friendly page.
# Please read the details on profile/webapp/prerender_nginx.erb
$nginx_try_files = ['$uri', '@prerender'] # the $uri must be single quoted!
$nginx_raw_append = template('profile/webapp/prerender_nginx.erb') # Only the @prerender location goes in this block
} else {
$nginx_try_files = ['$uri', '/index.html'] # the $uri must be single quoted!
$nginx_raw_append = []
}
nginx::resource::server { $name:
ensure => present,
server_name => $server_name,
listen_port => 443,
ssl_port => 443,
ssl => true,
ssl_cert => $ssl_cert,
ssl_key => $ssl_key,
format_log => 'combined_cloudflare',
www_root => $www_root,
index_files => $index_files,
error_pages => $error_pages,
try_files => $nginx_try_files,
location_cfg_append => $location_cfg_append,
raw_append => $nginx_raw_append,
}
```
Hi everyone!
there is an updated, official set of nginx configuration files here:
https://github.com/prerender/prerender-nginx
Does anyone know what should be the UA string for Google Sites (https://sites.google.com)? Perhaps they use something else other than 'googlebot' since I can't get link unfurling to work when using their "Embed from the web" widget.
From "Deprecating our AJAX crawling scheme" (https://webmasters.googleblog.com/2017/12/rendering-ajax-crawling-pages.html)
So what do you think? Did you guys do some tests with "escaped_fragment" condition?