Last active

Embed URL

HTTPS clone URL

SSH clone URL

You can clone with HTTPS or SSH.

Download Gist

Official prerender.io .htaccess for Apache.

View .htaccess
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
# Change YOUR_TOKEN to your prerender token and uncomment that line if you want to cache urls and view crawl stats
# Change http://example.com (at the end of the last RewriteRule) to your website url
 
<IfModule mod_headers.c>
#RequestHeader set X-Prerender-Token "YOUR_TOKEN"
</IfModule>
 
<IfModule mod_rewrite.c>
RewriteEngine On
 
<IfModule mod_proxy_http.c>
RewriteCond %{HTTP_USER_AGENT} baiduspider|facebookexternalhit|twitterbot|rogerbot|linkedinbot|embedly|quora\ link\ preview|showyoubot|outbrain|pinterest|slackbot|vkShare|W3C_Validator [NC,OR]
RewriteCond %{QUERY_STRING} _escaped_fragment_
# Only proxy the request to Prerender if it's a request for HTML
RewriteRule ^(?!.*?(\.js|\.css|\.xml|\.less|\.png|\.jpg|\.jpeg|\.gif|\.pdf|\.doc|\.txt|\.ico|\.rss|\.zip|\.mp3|\.rar|\.exe|\.wmv|\.doc|\.avi|\.ppt|\.mpg|\.mpeg|\.tif|\.wav|\.mov|\.psd|\.ai|\.xls|\.mp4|\.m4a|\.swf|\.dat|\.dmg|\.iso|\.flv|\.m4v|\.torrent|\.ttf|\.woff))(.*) http://service.prerender.io/http://example.com/$2 [P,L]
</IfModule>
</IfModule>

A warning to anyone using CodeIgniter (or any PHP script that warrants a RewriteRule on index.php): you will need to add a capture group to the proxy rule, for index.php.

The line will read as follows:

RewriteRule ^(?!.*?(\.js|\.css|\.xml|\.less|\.png|\.jpg|\.jpeg|\.gif|\.pdf|\.doc|\.txt|\.ico|\.rss|\.zip|\.mp3|\.rar|\.exe|\.wmv|\.doc|\.avi|\.ppt|\.mpg|\.mpeg|\.tif|\.wav|\.mov|\.psd|\.ai|\.xls|\.mp4|\.m4a|\.swf|\.dat|\.dmg|\.iso|\.flv|\.m4v|\.torrent))(index\.php)?(.*) http://service.prerender.io/%{REQUEST_SCHEME}://%{HTTP_HOST}/$3 [P,L]
Owner

Thanks @benceg!

For me, I had to change the proxy target to remove an extra "/", so

http://service.prerender.io/http://example.com$2

This is because the pattern being matched has a leading '/' already.

There is an issue where the service is caching the / home page for all urls. Any idea what might be the issue

Thanks

Owner
thoop commented

@baki250 just saw your comment. Following up with you via email.

ioloie commented

@baki250 I had a similar issue except it was always caching /index.html. The cause was that I had another rewrite for pushstate that always returned index.html:

        RewriteCond %{REQUEST_FILENAME} !-f
        RewriteCond %{REQUEST_FILENAME} !-d
        RewriteRule .* index.html [L]

The problem was that if this rule is carried out first, then the second rule receives /index.html as its URI. So prerender is always being asked for example.com/index.html. The solution is to switch the order around.

Owner
thoop commented

Thanks @iolo-matchbook!

your missing "visionutils" and "Facebot" for facebook user agents

quora link preview on line 12 is causing error: RewriteCond: bad flag delimiters. Escaping the spaces fixes the problem: quora\ link\ preview

Owner

Thanks @ianmstew!

Hey @thoop, I ran into an issue with this .htaccess that had me stumped for days, until I realized it was the issue proxying the server root specifically using mod_rewrite described here and here. Both solutions recommended using the ProxyPass directive instead, which does not work in our case requiring special rewrite conditions.

I discovered a workaround, however. This solves the server root case where, by the time mod_rewrite rules are evaluated, the incoming blank ("/") root request has already been converted to "index.html" by Apache. I would love to find a more elegant solution, but for now this has solved my problem.

RewriteCond %{HTTP_USER_AGENT} baiduspider|facebookexternalhit|twitterbot|rogerbot|linkedinbot|embedly|quora\ link\ preview|showyoubot|outbrain|pinterest [NC,OR]
RewriteCond %{QUERY_STRING} _escaped_fragment_
RewriteCond %{REQUEST_URI} ^/index\.html$
# Proxy the server root
RewriteRule .* http://service.prerender.io/http://example.com/ [P,L]

RewriteCond %{HTTP_USER_AGENT} baiduspider|facebookexternalhit|twitterbot|rogerbot|linkedinbot|embedly|quora\ link\ preview|showyoubot|outbrain|pinterest [NC,OR]
RewriteCond %{QUERY_STRING} _escaped_fragment_
# Only proxy the request to Prerender if it's a request for HTML
RewriteRule ^(?!.*?(\.js|\.css|\.xml|\.less|\.png|\.jpg|\.jpeg|\.gif|\.pdf|\.doc|\.txt|\.ico|\.rss|\.zip|\.mp3|\.rar|\.exe|\.wmv|\.doc|\.avi|\.ppt|\.mpg|\.mpeg|\.tif|\.wav|\.mov|\.psd|\.ai|\.xls|\.mp4|\.m4a|\.swf|\.dat|\.dmg|\.iso|\.flv|\.m4v|\.torrent))(.*) http://service.prerender.io/http://example.com/$2 [P,L]
Owner

Hmm, interesting. Is this just because you have your server behind a path and not the root?

@baki250 @homerjam Thanks! I've been stumped on this for the past week and both of your solutions have helped me greatly! It's now indexing not just the index.

I couldn't get it working the first time round, it had to do with multiple things, the instructions on most sites were for html5 push state, while we used #! etc. So I pretty much tested every setting and configuration and it resulted in the following overview and solutions for both #! and html5 push state.

SEO / Prerender.io (Angular) etc.

How the solution works in general:

  1. Search engine notices that your page is rendered using Javascript instead of server side.
  2. Search engine requests your pages with a modified url instead of the original one (escaped_fragment).
  3. You return the prerendered HTML (Thanks to prerender.io) to the crawler.

Instructions depending on your routing setup:

Option 1: # Hash routing

Example:

  # http://www.example.com/#/user/123
  #! http://www.example.com/#!/user/123

Problem:
Nothing after the # (hash) in the url gets sent to your server.

Solution:
In angular:

      $locationProvider.hashPrefix('!');
      $locationProvider.html5Mode(false);

In html remove the following meta header if there:

    <head>
      <!-- REMOVE --> <meta name="fragment" content="!"> <!-- /REMOVE -->
    </head>

Everytime a search engine finds a URI like this:

  http://www.example.com/#!/user/123

It will send a request like this:

http://www.example.com/?_escaped_fragment_=/user/123

Configure Apache:

  RewriteEngine On
    # If requested resource exists as a file or directory
      # (REQUEST_FILENAME is only relative in virtualhost context, so not usable)
        RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} -f [OR]
        RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} -d
      # Only exception is /index.htm, /index.html
        RewriteCond %{REQUEST_URI} !/index\.html?
      # Go to it as is
        RewriteRule ^ - [L]
    # If non existent
      # If path ends with / and is not just a single /, redirect to without the trailing /
        RewriteCond %{REQUEST_URI} ^.*/$
        RewriteCond %{REQUEST_URI} !^/$
        RewriteRule ^(.*)/$ $1 [R,QSA,L]
      # If path that is not empty or / or /index.htm or /index.html, redirect to /#!/path
        RewriteCond %{REQUEST_URI} !(/index\.html?|/|)$
        RewriteRule ^(.*)$ /#!$1 [R,QSA,NE,L]
      # If not /, redirect to it.
        RewriteCond %{REQUEST_URI} !^/$
        RewriteRule ^ / [R,QSA,L]

  # Handle Prerender.io
    RequestHeader set X-Prerender-Token "YOUR_TOKEN"

    RewriteCond %{HTTP_USER_AGENT} baiduspider|facebookexternalhit|twitterbot|rogerbot|linkedinbot|embedly|quora\ link\ preview|showyoubot|outbrain|pinterest [NC,OR]
    RewriteCond %{QUERY_STRING} _escaped_fragment_
    RewriteCond %{QUERY_STRING} _escaped_fragment_=([^&]*)

  # Proxy the request
    RewriteRule ^ http://service.prerender.io/http://%{HTTP_HOST}/?_escaped_fragment_=%1 [P,L]

Option 2: HTML5 push state routing

Example:

http://www.example.com/user/123

Problem:
You need to tell the search engine that your HTML 5 state page uses javascript to generate content

Solution:

In angular:

$locationProvider.html5Mode(true);

In html, add this meta header:

<head>
    <meta name="fragment" content="!">
</head>

Everytime a search engine finds a URI like this:

http://www.example.com/user/123

It will send a request like this:

http://www.example.com/user/123?_escaped_fragment_= 

Configure Apache:

  RewriteEngine On
# If requested resource exists as a file or directory
  # (REQUEST_FILENAME is only relative in virtualhost context, so not usable)
    RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} -f [OR]
    RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} -d
    # Go to it as is
    RewriteRule ^ - [L]

  # If non existent
    # If path ends with / and is not just a single /, redirect to without the trailing /
      RewriteCond %{REQUEST_URI} ^.*/$
      RewriteCond %{REQUEST_URI} !^/$
      RewriteRule ^(.*)/$ $1 [R,QSA,L]      

  # Handle Prerender.io
    RequestHeader set X-Prerender-Token "YOUR_TOKEN"

    RewriteCond %{HTTP_USER_AGENT} baiduspider|facebookexternalhit|twitterbot|rogerbot|linkedinbot|embedly|quora\ link\ preview|showyoubot|outbrain|pinterest [NC,OR]
    RewriteCond %{QUERY_STRING} _escaped_fragment_

    # Proxy the request
    RewriteRule ^(.*)$ http://service.prerender.io/http://%{HTTP_HOST}$1 [P,L]

  # If non existent
    # Accept everything on index.html
    RewriteRule ^ /index.html

If submitted to prerender.io, how it sends requests to your server again

http://service.prerender.io/http://www.example.com/user/123 -> http://www.example.com/user/123
http://service.prerender.io/http://www.example.com/?_escaped_fragment_=/user/123 -> http://www.example.com/index.html#!/user/123
http://service.prerender.io/http://www.example.com/?_escaped_fragment_=/user/123&var1=val1&val2=val2 -> http://www.example.com/index.html?var1=val1&val2=val2#!/user/123

Other resources

http://www.prerender.io
http://scotch.io/tutorials/javascript/angularjs-seo-with-prerender-io

You may want to omit .ttf and .woff files as well (fonts). I noticed font-awesome was being requested from the prerender service.

Having trouble getting Google Bot (maybe others) cache my pages.

I'm using EmberJS and Prerender.io.

Here is my Apache .htaccess

AddOutputFilterByType DEFLATE text/plain
AddOutputFilterByType DEFLATE text/html
AddOutputFilterByType DEFLATE text/xml
AddOutputFilterByType DEFLATE text/css
AddOutputFilterByType DEFLATE application/xml
AddOutputFilterByType DEFLATE application/xhtml+xml
AddOutputFilterByType DEFLATE application/rss+xml
AddOutputFilterByType DEFLATE application/javascript
AddOutputFilterByType DEFLATE application/x-javascript


RequestHeader set X-Prerender-Token "FyKfYC3YYBXiBJoUBmlY"


RewriteEngine On

<IfModule mod_proxy_http.c>
    RewriteCond %{HTTP_USER_AGENT} baiduspider|facebookexternalhit|twitterbot|rogerbot|linkedinbot|embedly|quora\ link\ preview|showyoubot|outbrain|pinterest|slackbot [NC,OR]
    RewriteCond %{QUERY_STRING} _escaped_fragment_

    # Only proxy the request to Prerender if it's a request for HTML
    RewriteRule ^(?!.*?(\.js|\.css|\.xml|\.less|\.png|\.jpg|\.jpeg|\.gif|\.pdf|\.doc|\.txt|\.ico|\.rss|\.zip|\.mp3|\.rar|\.exe|\.wmv|\.doc|\.avi|\.ppt|\.mpg|\.mpeg|\.tif|\.wav|\.mov|\.psd|\.ai|\.xls|\.mp4|\.m4a|\.swf|\.dat|\.dmg|\.iso|\.flv|\.m4v|\.torrent))(.*) http://service.prerender.io/http://dev6.lovelionstudio.com/$2 [P,L]
</IfModule>

And here is my index file head:



...

I have included the sitemap.xml file that Prerender generates, and have cached all the main pages of this site.

Any ideas on why this isn't working?
Here is the site for reference http://lovelionstudio.com/

Thanks for your help ahead of time.

Owner

Send me an email at support@prerender.io if anyone has any issues. I don't get notified when someone comments on this gist.

Why is google bot not included?

Because I'm using CodeIgniter, I needed the URL that's passed to prerender.io to not include the hashbang, and include the _escaped_fragment_ parameter in the URL itself.

In other words, instead of pages looking like the following in "Cached Pages"

http://www.example.com/#!/user/123

I needed them to look like this

http://www.example.com/user/123

I used this:

RewriteCond %{HTTP_USER_AGENT} baiduspider|facebookexternalhit|twitterbot|rogerbot|linkedinbot|embedly|quora\ link\ preview|showyoubot|outbrain|pinterest|slackbot|vkShare|W3C_Validator [NC,OR]
RewriteCond %{QUERY_STRING} _escaped_fragment_=(\%2F|/)*(.*)

RewriteRule ^(?!.*?(\.js|\.css|\.xml|\.less|\.png|\.jpg|\.jpeg|\.gif|\.pdf|\.doc|\.txt|\.ico|\.rss|\.zip|\.mp3|\.rar|\.exe|\.wmv|\.doc|\.avi|\.ppt|\.mpg|\.mpeg|\.tif|\.wav|\.mov|\.psd|\.ai|\.xls|\.mp4|\.m4a|\.swf|\.dat|\.dmg|\.iso|\.flv|\.m4v|\.torrent|\.ttf|\.woff))(.*) http://service.prerender.io/https://%{HTTP_HOST}/%2? [P,NE,L]
  • The first RewriteCond checks to make sure one of the user agents listed in requesting the page
  • The second RewriteCond checks for _escaped_fragment_ and puts its value, excluding any prefixed forward-slashes, into %2
  • The RewriteRule sends the request to http://service.prerender.io/https://HOST_NAME/VALUE_OF_ESCAPED_FRAGMENT and removes any query string

For reference, my full .htaccess for a site with CodeIgniter + AngularJS is as follows:

<IfModule mod_headers.c>
    # Change YOUR_TOKEN to your prerender token
    RequestHeader set X-Prerender-Token "YOUR_TOKEN"
</IfModule>

<IfModule mod_rewrite.c>
    RewriteEngine On
    # !IMPORTANT! Set your RewriteBase here and don't forget trailing and leading
    # slashes.
    # If your page resides at
    # http://www.example.com/mypage/test1
    # then use
    # RewriteBase /mypage/test1/
    RewriteBase /

    RewriteRule ^index.php/(.*)$ /$1 [R=302,L]

    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteRule ^(.*)$ index.php?/$1 [L]

    <IfModule mod_proxy_http.c>
        RewriteCond %{HTTP_USER_AGENT} baiduspider|facebookexternalhit|twitterbot|rogerbot|linkedinbot|embedly|quora\ link\ preview|showyoubot|outbrain|pinterest|slackbot|vkShare|W3C_Validator [NC,OR]
        RewriteCond %{QUERY_STRING} _escaped_fragment_=(\%2F|/)*(.*)

        # Only proxy the request to Prerender if it's a request for HTML
        RewriteRule ^(?!.*?(\.js|\.css|\.xml|\.less|\.png|\.jpg|\.jpeg|\.gif|\.pdf|\.doc|\.txt|\.ico|\.rss|\.zip|\.mp3|\.rar|\.exe|\.wmv|\.doc|\.avi|\.ppt|\.mpg|\.mpeg|\.tif|\.wav|\.mov|\.psd|\.ai|\.xls|\.mp4|\.m4a|\.swf|\.dat|\.dmg|\.iso|\.flv|\.m4v|\.torrent|\.ttf|\.woff))(.*) http://service.prerender.io/https://%{HTTP_HOST}/%2? [P,NE,L]
    </IfModule>
</IfModule>

<IfModule !mod_rewrite.c>
    # If we don't have mod_rewrite installed, all 404's
    # can be sent to index.php, and everything works as normal.
    # Submitted by: ElliotHaughin

    ErrorDocument 404 /index.php
</IfModule>

I have my location provider set as:

$locationProvider.hashPrefix('!');
$locationProvider.html5Mode(true);

im currently working on a project and this is my first time using angularjs and i think its pretty powerful but there's a problem.
SEO problem. When i first use Angular i didn't know that its not visible to SEARCH ENGINES.

Right now im hosting my own Prerender Service. but whenever i request for a url after a #!

http-:// -->> http:// ------ because i have limited links

http-://prerender.host.com/http-://host.com/#!/

something it always renders the http-://host.com/ route and not the http-://host.com/#!/something.

please help me!! i dont really know whats happening. i also tried html5mode with a fragment meta but still the same.

SETUP
HASHBANG

http-://host.com/#!/

ROUTES

i do have hashprefix and explicitly turning off html5mode knowing its off by default. --- i thought it was the problem :))

$routeProvider
    .when('/', {
        templateUrl: '../views/tutorials.html',
        controller: 'TutorialController',


    })
    .when('/tutorial/:id', {
        templateUrl: '../views/tutorial_detail.html',
        controller: 'TutorialDetailController',

    })
    .when('/add-tutorial', {
        templateUrl: '../views/tutorial_add.html',
        controller: 'TutorialAddController',

    })

.otherwise('/');
$locationProvider.hashPrefix('!');
$locationProvider.html5Mode(false);

APACHE

im using the settings from prerender apache guide.

<IfModule mod_rewrite.c>
RewriteEngine On



<IfModule mod_proxy_http.c>
    # Enable prerendering for .html and directory index files
    RewriteCond %{HTTP_USER_AGENT} Googlebot|bingbot|Googlebot-Mobile|Baiduspider|Yahoo|YahooSeeker [NC,OR]
    RewriteCond %{QUERY_STRING} _escaped_fragment_
    RewriteRule ^ http://prerender.em0xi0nx.com/http://tutorial.em0xi0nx.com/%{REQUEST_URI} [P,L]
</IfModule>
</IfModule>

Do i need to do something else or is there anything wrong with my setup?? i've figuring out this for 3 days and now i really need help.

any suggestions?? i really need to finish this for our defense.

Owner
thoop commented

Googlebot, bingbot, and many others are not included in the config because they support the _escaped_fragment_ parameter, which is checked for in the config. More information can be found here: https://developers.google.com/webmasters/ajax-crawling/docs/getting-started

Google Snippet (for sharing on Google+) is missing.

Add this: Google\ \(\+https:\/\/developers.google.com\/\+\/web\/snippet\/\)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.