Skip to content

Instantly share code, notes, and snippets.

@thoop
Last active November 12, 2024 19:08
Show Gist options
  • Save thoop/8165802 to your computer and use it in GitHub Desktop.
Save thoop/8165802 to your computer and use it in GitHub Desktop.
Official prerender.io nginx.conf for nginx
# Change YOUR_TOKEN to your prerender token
# Change example.com (server_name) to your website url
# Change /path/to/your/root to the correct value
server {
listen 80;
server_name example.com;
root /path/to/your/root;
index index.html;
location / {
try_files $uri @prerender;
}
location @prerender {
proxy_set_header X-Prerender-Token YOUR_TOKEN;
set $prerender 0;
if ($http_user_agent ~* "googlebot|bingbot|yandex|baiduspider|twitterbot|facebookexternalhit|rogerbot|linkedinbot|embedly|quora link preview|showyoubot|outbrain|pinterest\/0\.|pinterestbot|slackbot|vkShare|W3C_Validator|whatsapp") {
set $prerender 1;
}
if ($args ~ "_escaped_fragment_") {
set $prerender 1;
}
if ($http_user_agent ~ "Prerender") {
set $prerender 0;
}
if ($uri ~* "\.(js|css|xml|less|png|jpg|jpeg|gif|pdf|doc|txt|ico|rss|zip|mp3|rar|exe|wmv|doc|avi|ppt|mpg|mpeg|tif|wav|mov|psd|ai|xls|mp4|m4a|swf|dat|dmg|iso|flv|m4v|torrent|ttf|woff|svg|eot)") {
set $prerender 0;
}
#resolve using Google's DNS server to force DNS resolution and prevent caching of IPs
resolver 8.8.8.8;
if ($prerender = 1) {
#setting prerender as a variable forces DNS resolution since nginx caches IPs and doesnt play well with load balancing
set $prerender "service.prerender.io";
rewrite .* /$scheme://$host$request_uri? break;
proxy_pass http://$prerender;
}
if ($prerender = 0) {
rewrite .* /index.html break;
}
}
}
@celiovasconcelos
Copy link

celiovasconcelos commented May 13, 2018

Why try_files?

Using try_files the location @prerender will never be fired!
I was looking the following behaviour:

If any page is accessed from the browser, nginx serve it statically/locally. If that page is accessed from some crawler, it's served by the proxy.

The official snippet seems not to do that.

Am I crazy?

@jindrichbartek
Copy link

jindrichbartek commented Jul 7, 2018

From "Deprecating our AJAX crawling scheme" (https://webmasters.googleblog.com/2017/12/rendering-ajax-crawling-pages.html)

Given these advances, in the second quarter of 2018, we'll be switching to rendering these pages on Google's side, rather than on requiring that sites do this themselves. In short, we'll no longer be using the AJAX crawling scheme.

So what do you think? Did you guys do some tests with "escaped_fragment" condition?

@mindtonic
Copy link

mindtonic commented Jul 11, 2018

+1 to @jeerbl for pointing out that this config file is missing Google Plus. The Google+ Snippet Fetcher User Agent is:

Google (+https://developers.google.com/+/web/snippet/)

For my part I added it to my config file UserAgent regex as developers\.google\.com

Also, if you want to use the Structured Data Testing Tool, you will want to add Google-Structured-Data-Testing-Tool to the regex as well.

Final example:

if ($http_user_agent ~* "baiduspider|twitterbot|facebookexternalhit|rogerbot|linkedinbot|embedly|quora link preview|showyoubot|outbrain|pinterest|slackbot|vkShare|W3C_Validator|developers\.google\.com|Google-Structured-Data-Testing-Tool") {
  set $prerender 1;
 }

@maxklenk
Copy link

Please add redditbot and Discordbot to the list of user agents.

@ddereszewski
Copy link

if we use proxy for caching requests can we use proxy_cache_bypass based on user agent?

The goal is that any refresh request made by prerender always takes latest version of the content. Any suggestions?

@theChumpus
Copy link

theChumpus commented Aug 22, 2018

@ddereszewski

You may have already found a solution for this but I just ran into this too. Rather than cache based on user-agent (which didn't seem efficient as there are lots of possible UA strings), I use the same $prerender variable as the input to proxy_cache_bypass.

Edit:
Also added proxy_no_cache $prerender so that the response is not added to the cache in these cases either.

@dimitarz
Copy link

I tried adding this config and it didn't work right away because of my try_files:

location / { try_files $uri @prerender $uri/ =404; }

I ended up having to remove $uri/ =404 and it worked, but now non-existent pages returned index.html! I don't want this behavior, and narrowed it down to removing that pesky

if ($prerender = 0) { rewrite .* /index.html break; }

Why does this even exist here? Totally confusing.

@ethicalmohit
Copy link

I am getting Error: 2019/02/06 18:03:19 [error] 7#7: *17 service.prerender.io could not be resolved (110: Operation timed out), client: 10.0.101.66, server: , request: "GET / HTTP/1.1" with nginx as a sidecar in ECS. Please follow the nginx conf below.

Command: curl -vvv --header "User-Agent:googlebot" https://example.com

upstream backend {
  server example:3000;
}

map $http_upgrade $connection_upgrade {

  default upgrade;
  '' close;
}

server {

  listen 80;

  add_header X-Whom $hostname;
  proxy_buffering off;
  proxy_buffer_size 128k;
  proxy_buffers 4 256k;
  proxy_busy_buffers_size 256k;

location / {

    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade; #for websockets
    proxy_set_header Connection $connection_upgrade;
    proxy_set_header X-Forwarded-For $remote_addr;
    proxy_set_header Host $host;

    access_log /dev/stdout apm;
    error_log stderr notice;

    try_files $uri @prerender;
  }
  location @prerender {

    proxy_set_header X-Prerender-Token "TOKEN_VALUE";
    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade; #for websockets
    proxy_set_header Connection $connection_upgrade;
    proxy_set_header X-Forwarded-For $remote_addr;
    proxy_set_header Host $host;
    
    access_log /dev/stdout apm;
    error_log stderr notice;

    set $prerender 0;
    if ($http_user_agent ~* "googlebot|bingbot|baiduspider|twitterbot|facebookexternalhit|rogerbot|linkedinbot|embedly|quora link preview|showyoubot|outbrain|pinterest|slackbot|vkShare|W3C_Validator|WhatsApp|Twitterbot|showyoubot|outbrain|vkShare|Slack-ImgProxy|Slackbot-LinkExpanding|Site Analyzer|SiteAnalyzerBot|Viber|Whatsapp|Telegram|W3C_Validator") {

      set $prerender 1;
    }


    resolver 8.8.8.8;

    if ($prerender = 1) {

      set $prerender "service.prerender.io";
      #rewrite .* /https://$host$request_uri? break;
      rewrite .* $host$request_uri? break;
      #proxy_pass http://$prerender;
      proxy_pass http://$prerender;
    }
    if ($prerender = 0) {

      proxy_pass http://backend;
    }

  }

}

@luigink
Copy link

luigink commented Jul 23, 2019

location ~* /article/([a-z]|[A-Z]|[0-9]|[-]) {
rewrite ^/article/((([a-z]|[A-Z]|[0-9]|[-])+))?$ /article.html?$1 break;
}

how can render whit try_files inside on this route?, i want this rewrite for webpack url, but i can't access from prerender, any idea?

@vesper8
Copy link

vesper8 commented Oct 10, 2019

how come DuckDuckBot isn't in the list of crawlers?

@eirikm
Copy link

eirikm commented Oct 30, 2019

doc appears twice in the extension regexp

@Trench94
Copy link

Trench94 commented Jan 7, 2020

For anybody having issues setting up their own pre-render server with NGINX and/or getting 502 errors.

Check your NGINX HTTPS and HTTPS logs for more information on the issue.

if "502 Bad Gateway" error throws on centos api url for api gateway proxy pass on NGINX, run following command to solve the issue:
sudo setsebool -P httpd_can_network_connect 1

@kaioken
Copy link

kaioken commented Jan 14, 2020

If you have any issue with AWS loadbalancer https redirect you will need to force it to https

#rewrite .* /$scheme://$host$request_uri? break;
rewrite .* /https://$host$request_uri? break;

@konrad-ch
Copy link

Will it work with cloudflare DNS?

@torava
Copy link

torava commented Jun 9, 2020

I was using Prerender with Wordpress and HTTPS protocol and I had to consult Prerender.io to get it to work. I had to remove $http_user_agent ~ "Prerender" condition to be able to use try_files as can be seen in this gist https://gist.github.com/torava/324af0b11fafe6c5e1cc8e02121af608

@Psh777
Copy link

Psh777 commented Jul 8, 2020

"502 Bad Gateway" - check resolve dns 8.8.8.8 in your network. It was forbidden on my network and this did not work.

@roarkmccolgan
Copy link

roarkmccolgan commented Jul 23, 2020

for my nginx config I was unable to include a try_files in an if statement:
It fails with "not allowed"

if ($prerender = 1) {
            
            #setting prerender as a variable forces DNS resolution since nginx caches IPs and doesnt play well with load balancing
            set $prerender "service.prerender.io";
            rewrite .* /$scheme://$host$request_uri? break;
            proxy_pass http://$prerender;
        }
        if ($prerender = 0) {
            try_files $uri $uri/ /index.php?$query_string;
        }

I have excluded the if($prerender = 0) {...

Now I just do the try_files after the first if

if ($prerender = 1) {
            
            #setting prerender as a variable forces DNS resolution since nginx caches IPs and doesnt play well with load balancing
            set $prerender "service.prerender.io";
            rewrite .* /$scheme://$host$request_uri? break;
            proxy_pass http://$prerender;
        }
        
        try_files $uri $uri/ /index.php?$query_string;

Is this fine?

@chinnichaitanya
Copy link

Thank you for this @thoop! Similar to slackbot, I request you to please add discordbot to the list of bots - https://developers.whatismybrowser.com/useragents/parse/572332-discord-bot

@MeMeMax
Copy link

MeMeMax commented Jan 14, 2021

I have this site: https://estimationpoker.de. It is written in Angular. Unfortunately all subsites can be crawled by google like /impressum or something except the root site. How is this possible? I am using the nginx config from here.

@ghfkui
Copy link

ghfkui commented Mar 25, 2021

Hi all, I use prerender in my project for SEO. the nginx.conf is following:
`location / {
root /dist;
try_files $uri @prerender;
# try_files $uri $uri/ /index.html;
# index index.html index.htm;

    }

    location @prerender {
        # proxy_set_header X-Prerender-Token YOUR_TOKEN;
        
        set $prerender 0;
        if ($http_user_agent ~* "googlebot|bingbot|yandex|baiduspider|twitterbot|facebookexternalhit|rogerbot|linkedinbot|embedly|quora link preview|showyoubot|outbrain|pinterest\/0\.|pinterestbot|slackbot|vkShare|W3C_Validator|whatsapp") {
            set $prerender 1;
        }
        if ($args ~ "_escaped_fragment_") {
            set $prerender 1;
        }
        if ($http_user_agent ~ "Prerender") {
            set $prerender 0;
        }
        if ($uri ~* "\.(js|css|xml|less|png|jpg|jpeg|gif|pdf|doc|txt|ico|rss|zip|mp3|rar|exe|wmv|doc|avi|ppt|mpg|mpeg|tif|wav|mov|psd|ai|xls|mp4|m4a|swf|dat|dmg|iso|flv|m4v|torrent|ttf|woff|svg|eot)") {
            set $prerender 0;
        }
        
        #resolve using Google's DNS server to force DNS resolution and prevent caching of IPs
        resolver 8.8.8.8;

        if ($prerender = 1) {
            #setting prerender as a variable forces DNS resolution since nginx caches IPs and doesnt play well with load balancing
            set $prerender "127.0.0.1:3000";
            rewrite .* /$scheme://$host$request_uri? break;
            proxy_pass http://$prerender;
            break;
        }
        try_files $uri $uri/ /index.html;
    }`

when I using the command: curl http://localhost:3000/render?url=http://localhost
The result is right. But when I using the command: curl http://localhost:3000/render?url=http://localhost/help
The result is 404. Does anyone know the reason? Please tell me. Thanks in advance!

@normancpleitez
Copy link

normancpleitez commented Jun 29, 2021

Hi, can someone help me, I added GTmetrix to my http_user_agent, but it is not working. Google and others work though.

`
location @prerender {
proxy_set_header X-Prerender-Token XXXXXXXXX;

     set $prerender 0;
     if ($http_user_agent ~* "baiduspider|twitterbot|facebookexternalhit|rogerbot|linkedinbot|embedly|quora link preview|showyoubot|outbrain|pinterest|slackbot|vkShare|W3C_Validator|GTmetrix|screaming frog seo spider|screamingfrogseospider|chrome-lighthouse") {
         set $prerender 1;
     }
     if ($args ~ "_escaped_fragment_|prerender=1") {
         set $prerender 1;
     }
     if ($http_user_agent ~ "Prerender") {
         set $prerender 0;
     }

     if ($prerender = 1) {
         rewrite .* /?url=$scheme://$host$request_uri? break;
         proxy_pass http://prerender.freyagriculturalproducts.com;
     }

     if ($prerender = 0) {
         proxy_pass http://127.0.0.1:5080;
     }
 }

`

@viperfx
Copy link

viperfx commented Aug 1, 2021

Has anyone been able to figure out how to handle relative URLs with nginx? I am serving a SPA site over nginx and some paths that webpack builds like CSS are relative. So the site does not render properly.

@mantou132
Copy link

@thiagoferreiraw
Copy link

thiagoferreiraw commented Nov 18, 2021

For those struggling with the nginx configuration on SPAs (react, angular, vue.js), here's an overview on how it should work:

How the @prerender location works 🥇 :

Given we have a default location (/) with "try_files $uri @prerender", here's what happens:
First, nginx will try to find a real file on the directory, such as images, js or css files.
  If there is a match, nginx will return that file.
If no real file is found, nginx will try the @prerender location:
  If the user agent is a bot (google, twitter, etc), set a variable $prerender to 1
  If the user agent is prerender itself, set it to 0 back again
  If the uri is for a file such as js, css, imgs, set it to 0 again
  Now, if after all of these conditions we have $prerender==1, we will;
      Proxy pass the request to prerender and return the cached html file
  If we have prerender=0:
      Default to our own index.html.
      Since this is a SPA, all uris that are not a real file will be redirected to index.html

Caveats 🚨

Final nginx conf with SSL 🔒

It's very similar to the first example, I just needed to add a root path in each location:

# MANAGED BY PUPPET
server {
  listen       *:443 ssl;


  server_name  example.com;

  ssl_certificate           /etc/nginx/ssl/example.crt;
  ssl_certificate_key       /etc/nginx/ssl/example.key;

  access_log            /var/log/nginx/example.log combined_cloudflare;
  error_log             /var/log/nginx/ssl-example.error.log;
      add_header "X-Clacks-Overhead" "GNU Terry Pratchett";


  location / {
    root      /usr/local/example/webapp-build;
    try_files $uri @prerender;
    add_header Cache-Control "no-cache";
  }
  # How the @prerender location works:
  # Given we have a default location (/) with "try_files $uri @prerender", here's what happens:
  # First, nginx will try to find a real file on the directory, such as images, js or css files.
  #   If there is a match, nginx will return that file.
  # If no real file is found, nginx will try the @prerender location:
  #   If the user agent is a bot (google, twitter, etc), set a variable $prerender to 1
  #   If the user agent is prerender itself, set it to 0 back again
  #   If the uri is for a file such as js, css, imgs, set it to 0 again
  #   Now, if after all of these conditions we have $prerender==1, we will;
  #       Proxy pass the request to prerender and return the cached html file
  #   If we have prerender=0:
  #       Default to our own index.html.
  #       Since this is a SPA, all uris that are not a real file will be redirected to index.html
  # **** CAVEATS ****
  # - The `if` directive in nginx can be tricky (please read https://www.nginx.com/resources/wiki/start/topics/depth/ifisevil/)
  # - In summary, we should never use try_files and ifs in the same location block.
  # - This is the reason why we have to use rewrite on the @location block instead of try_files.

  location @prerender {
    root      /usr/local/example/webapp-build;
    proxy_set_header X-Prerender-Token "YOUR_TOKEN_GOES_HERE";

    set $prerender 0;
    if ($http_user_agent ~* "googlebot|bingbot|yandex|baiduspider|twitterbot|facebookexternalhit|rogerbot|linkedinbot|embedly|quora link preview|showyoubot|outbrain|pinterest\/0\.|pinterestbot|slackbot|vkShare|W3C_Validator|whatsapp") {
        set $prerender 1;
    }
    if ($args ~ "_escaped_fragment_") {
        set $prerender 1;
    }
    if ($http_user_agent ~ "Prerender") {
        set $prerender 0;
    }
    if ($uri ~* "\.(js|css|xml|less|png|jpg|jpeg|gif|pdf|doc|txt|ico|rss|zip|mp3|rar|exe|wmv|doc|avi|ppt|mpg|mpeg|tif|wav|mov|psd|ai|xls|mp4|m4a|swf|dat|dmg|iso|flv|m4v|torrent|ttf|woff|svg|eot)") {
        set $prerender 0;
    }

    #resolve using Google's DNS server to force DNS resolution and prevent caching of IPs
    resolver 8.8.8.8;

    if ($prerender = 1) {
        #setting prerender as a variable forces DNS resolution since nginx caches IPs and doesnt play well with load balancing
        set $prerender "service.prerender.io";
        rewrite .* /$scheme://$host$request_uri? break;
        proxy_pass http://$prerender;
        break;
    }

    rewrite .* /index.html break;
  }
}

If you are using puppet to manage your infrastructure... 🤖

I preferred creating a template file with the @prerender location for readability. Additionally, this config should only be done in production, so I added some conditionals as well:

  if $environment == 'production' {
    # prerender.io is a SEO tool used to process javascript websites into a robot-friendly page.
    # Please read the details on profile/webapp/prerender_nginx.erb
    $nginx_try_files = ['$uri', '@prerender']  # the $uri must be single quoted!
    $nginx_raw_append = template('profile/webapp/prerender_nginx.erb')  # Only the @prerender location goes in this block
  } else {
    $nginx_try_files = ['$uri', '/index.html']  # the $uri must be single quoted!
    $nginx_raw_append = []
  }

nginx::resource::server { $name:
    ensure                => present,
    server_name           => $server_name,
    listen_port           => 443,
    ssl_port              => 443,
    ssl                   => true,
    ssl_cert              => $ssl_cert,
    ssl_key               => $ssl_key,
    format_log            => 'combined_cloudflare',
    www_root              => $www_root,
    index_files           => $index_files,
    error_pages           => $error_pages,
    try_files             => $nginx_try_files,
    location_cfg_append   => $location_cfg_append,
    raw_append            => $nginx_raw_append,
  }
```

@varrocs
Copy link

varrocs commented Nov 18, 2021

Hi everyone!
there is an updated, official set of nginx configuration files here:
https://github.com/prerender/prerender-nginx

@zehawki
Copy link

zehawki commented Mar 17, 2022

Does anyone know what should be the UA string for Google Sites (https://sites.google.com)? Perhaps they use something else other than 'googlebot' since I can't get link unfurling to work when using their "Embed from the web" widget.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment