Create a gist now

Instantly share code, notes, and snippets.

@thoop /.htaccess
Last active Sep 15, 2016

Official prerender.io .htaccess for Apache.
# Change YOUR_TOKEN to your prerender token and uncomment that line if you want to cache urls and view crawl stats
# Change http://example.com (at the end of the last RewriteRule) to your website url
<IfModule mod_headers.c>
#RequestHeader set X-Prerender-Token "YOUR_TOKEN"
</IfModule>
<IfModule mod_rewrite.c>
RewriteEngine On
<IfModule mod_proxy_http.c>
RewriteCond %{HTTP_USER_AGENT} baiduspider|facebookexternalhit|twitterbot|rogerbot|linkedinbot|embedly|quora\ link\ preview|showyoubot|outbrain|pinterest|slackbot|vkShare|W3C_Validator [NC,OR]
RewriteCond %{QUERY_STRING} _escaped_fragment_
# Only proxy the request to Prerender if it's a request for HTML
RewriteRule ^(?!.*?(\.js|\.css|\.xml|\.less|\.png|\.jpg|\.jpeg|\.gif|\.pdf|\.doc|\.txt|\.ico|\.rss|\.zip|\.mp3|\.rar|\.exe|\.wmv|\.doc|\.avi|\.ppt|\.mpg|\.mpeg|\.tif|\.wav|\.mov|\.psd|\.ai|\.xls|\.mp4|\.m4a|\.swf|\.dat|\.dmg|\.iso|\.flv|\.m4v|\.torrent|\.ttf|\.woff))(.*) http://service.prerender.io/http://example.com/$2 [P,L]
</IfModule>
</IfModule>
@benceg

A warning to anyone using CodeIgniter (or any PHP script that warrants a RewriteRule on index.php): you will need to add a capture group to the proxy rule, for index.php.

The line will read as follows:

RewriteRule ^(?!.*?(\.js|\.css|\.xml|\.less|\.png|\.jpg|\.jpeg|\.gif|\.pdf|\.doc|\.txt|\.ico|\.rss|\.zip|\.mp3|\.rar|\.exe|\.wmv|\.doc|\.avi|\.ppt|\.mpg|\.mpeg|\.tif|\.wav|\.mov|\.psd|\.ai|\.xls|\.mp4|\.m4a|\.swf|\.dat|\.dmg|\.iso|\.flv|\.m4v|\.torrent))(index\.php)?(.*) http://service.prerender.io/%{REQUEST_SCHEME}://%{HTTP_HOST}/$3 [P,L]
@thoop
Owner

Thanks @benceg!

@dobesv

For me, I had to change the proxy target to remove an extra "/", so

http://service.prerender.io/http://example.com$2

This is because the pattern being matched has a leading '/' already.

@baki250

There is an issue where the service is caching the / home page for all urls. Any idea what might be the issue

Thanks

@thoop
Owner

@baki250 just saw your comment. Following up with you via email.

@ioloie

@baki250 I had a similar issue except it was always caching /index.html. The cause was that I had another rewrite for pushstate that always returned index.html:

        RewriteCond %{REQUEST_FILENAME} !-f
        RewriteCond %{REQUEST_FILENAME} !-d
        RewriteRule .* index.html [L]

The problem was that if this rule is carried out first, then the second rule receives /index.html as its URI. So prerender is always being asked for example.com/index.html. The solution is to switch the order around.

@thoop
Owner

Thanks @iolo-matchbook!

@andykillen

your missing "visionutils" and "Facebot" for facebook user agents

@ianmstew

quora link preview on line 12 is causing error: RewriteCond: bad flag delimiters. Escaping the spaces fixes the problem: quora\ link\ preview

@thoop
Owner

Thanks @ianmstew!

@ianmstew

Hey @thoop, I ran into an issue with this .htaccess that had me stumped for days, until I realized it was the issue proxying the server root specifically using mod_rewrite described here and here. Both solutions recommended using the ProxyPass directive instead, which does not work in our case requiring special rewrite conditions.

I discovered a workaround, however. This solves the server root case where, by the time mod_rewrite rules are evaluated, the incoming blank ("/") root request has already been converted to "index.html" by Apache. I would love to find a more elegant solution, but for now this has solved my problem.

RewriteCond %{HTTP_USER_AGENT} baiduspider|facebookexternalhit|twitterbot|rogerbot|linkedinbot|embedly|quora\ link\ preview|showyoubot|outbrain|pinterest [NC,OR]
RewriteCond %{QUERY_STRING} _escaped_fragment_
RewriteCond %{REQUEST_URI} ^/index\.html$
# Proxy the server root
RewriteRule .* http://service.prerender.io/http://example.com/ [P,L]

RewriteCond %{HTTP_USER_AGENT} baiduspider|facebookexternalhit|twitterbot|rogerbot|linkedinbot|embedly|quora\ link\ preview|showyoubot|outbrain|pinterest [NC,OR]
RewriteCond %{QUERY_STRING} _escaped_fragment_
# Only proxy the request to Prerender if it's a request for HTML
RewriteRule ^(?!.*?(\.js|\.css|\.xml|\.less|\.png|\.jpg|\.jpeg|\.gif|\.pdf|\.doc|\.txt|\.ico|\.rss|\.zip|\.mp3|\.rar|\.exe|\.wmv|\.doc|\.avi|\.ppt|\.mpg|\.mpeg|\.tif|\.wav|\.mov|\.psd|\.ai|\.xls|\.mp4|\.m4a|\.swf|\.dat|\.dmg|\.iso|\.flv|\.m4v|\.torrent))(.*) http://service.prerender.io/http://example.com/$2 [P,L]
@thoop
Owner

Hmm, interesting. Is this just because you have your server behind a path and not the root?

@davidkcyip

@baki250 @homerjam Thanks! I've been stumped on this for the past week and both of your solutions have helped me greatly! It's now indexing not just the index.

@vinceMeens

I couldn't get it working the first time round, it had to do with multiple things, the instructions on most sites were for html5 push state, while we used #! etc. So I pretty much tested every setting and configuration and it resulted in the following overview and solutions for both #! and html5 push state.

SEO / Prerender.io (Angular) etc.

How the solution works in general:

  1. Search engine notices that your page is rendered using Javascript instead of server side.
  2. Search engine requests your pages with a modified url instead of the original one (escaped_fragment).
  3. You return the prerendered HTML (Thanks to prerender.io) to the crawler.

Instructions depending on your routing setup:

Option 1: # Hash routing

Example:

  # http://www.example.com/#/user/123
  #! http://www.example.com/#!/user/123

Problem:
Nothing after the # (hash) in the url gets sent to your server.

Solution:
In angular:

      $locationProvider.hashPrefix('!');
      $locationProvider.html5Mode(false);

In html remove the following meta header if there:

    <head>
      <!-- REMOVE --> <meta name="fragment" content="!"> <!-- /REMOVE -->
    </head>

Everytime a search engine finds a URI like this:

  http://www.example.com/#!/user/123

It will send a request like this:

http://www.example.com/?_escaped_fragment_=/user/123

Configure Apache:

  RewriteEngine On
    # If requested resource exists as a file or directory
      # (REQUEST_FILENAME is only relative in virtualhost context, so not usable)
        RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} -f [OR]
        RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} -d
      # Only exception is /index.htm, /index.html
        RewriteCond %{REQUEST_URI} !/index\.html?
      # Go to it as is
        RewriteRule ^ - [L]
    # If non existent
      # If path ends with / and is not just a single /, redirect to without the trailing /
        RewriteCond %{REQUEST_URI} ^.*/$
        RewriteCond %{REQUEST_URI} !^/$
        RewriteRule ^(.*)/$ $1 [R,QSA,L]
      # If path that is not empty or / or /index.htm or /index.html, redirect to /#!/path
        RewriteCond %{REQUEST_URI} !(/index\.html?|/|)$
        RewriteRule ^(.*)$ /#!$1 [R,QSA,NE,L]
      # If not /, redirect to it.
        RewriteCond %{REQUEST_URI} !^/$
        RewriteRule ^ / [R,QSA,L]

  # Handle Prerender.io
    RequestHeader set X-Prerender-Token "YOUR_TOKEN"

    RewriteCond %{HTTP_USER_AGENT} baiduspider|facebookexternalhit|twitterbot|rogerbot|linkedinbot|embedly|quora\ link\ preview|showyoubot|outbrain|pinterest [NC,OR]
    RewriteCond %{QUERY_STRING} _escaped_fragment_
    RewriteCond %{QUERY_STRING} _escaped_fragment_=([^&]*)

  # Proxy the request
    RewriteRule ^ http://service.prerender.io/http://%{HTTP_HOST}/?_escaped_fragment_=%1 [P,L]

Option 2: HTML5 push state routing

Example:

http://www.example.com/user/123

Problem:
You need to tell the search engine that your HTML 5 state page uses javascript to generate content

Solution:

In angular:

$locationProvider.html5Mode(true);

In html, add this meta header:

<head>
    <meta name="fragment" content="!">
</head>

Everytime a search engine finds a URI like this:

http://www.example.com/user/123

It will send a request like this:

http://www.example.com/user/123?_escaped_fragment_= 

Configure Apache:

  RewriteEngine On
# If requested resource exists as a file or directory
  # (REQUEST_FILENAME is only relative in virtualhost context, so not usable)
    RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} -f [OR]
    RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} -d
    # Go to it as is
    RewriteRule ^ - [L]

  # If non existent
    # If path ends with / and is not just a single /, redirect to without the trailing /
      RewriteCond %{REQUEST_URI} ^.*/$
      RewriteCond %{REQUEST_URI} !^/$
      RewriteRule ^(.*)/$ $1 [R,QSA,L]      

  # Handle Prerender.io
    RequestHeader set X-Prerender-Token "YOUR_TOKEN"

    RewriteCond %{HTTP_USER_AGENT} baiduspider|facebookexternalhit|twitterbot|rogerbot|linkedinbot|embedly|quora\ link\ preview|showyoubot|outbrain|pinterest [NC,OR]
    RewriteCond %{QUERY_STRING} _escaped_fragment_

    # Proxy the request
    RewriteRule ^(.*)$ http://service.prerender.io/http://%{HTTP_HOST}$1 [P,L]

  # If non existent
    # Accept everything on index.html
    RewriteRule ^ /index.html

If submitted to prerender.io, how it sends requests to your server again

http://service.prerender.io/http://www.example.com/user/123 -> http://www.example.com/user/123
http://service.prerender.io/http://www.example.com/?_escaped_fragment_=/user/123 -> http://www.example.com/index.html#!/user/123
http://service.prerender.io/http://www.example.com/?_escaped_fragment_=/user/123&var1=val1&val2=val2 -> http://www.example.com/index.html?var1=val1&val2=val2#!/user/123

Other resources

http://www.prerender.io
http://scotch.io/tutorials/javascript/angularjs-seo-with-prerender-io
@amergin

You may want to omit .ttf and .woff files as well (fonts). I noticed font-awesome was being requested from the prerender service.

@stevendeeds

Having trouble getting Google Bot (maybe others) cache my pages.

I'm using EmberJS and Prerender.io.

Here is my Apache .htaccess

AddOutputFilterByType DEFLATE text/plain
AddOutputFilterByType DEFLATE text/html
AddOutputFilterByType DEFLATE text/xml
AddOutputFilterByType DEFLATE text/css
AddOutputFilterByType DEFLATE application/xml
AddOutputFilterByType DEFLATE application/xhtml+xml
AddOutputFilterByType DEFLATE application/rss+xml
AddOutputFilterByType DEFLATE application/javascript
AddOutputFilterByType DEFLATE application/x-javascript


RequestHeader set X-Prerender-Token "FyKfYC3YYBXiBJoUBmlY"


RewriteEngine On

<IfModule mod_proxy_http.c>
    RewriteCond %{HTTP_USER_AGENT} baiduspider|facebookexternalhit|twitterbot|rogerbot|linkedinbot|embedly|quora\ link\ preview|showyoubot|outbrain|pinterest|slackbot [NC,OR]
    RewriteCond %{QUERY_STRING} _escaped_fragment_

    # Only proxy the request to Prerender if it's a request for HTML
    RewriteRule ^(?!.*?(\.js|\.css|\.xml|\.less|\.png|\.jpg|\.jpeg|\.gif|\.pdf|\.doc|\.txt|\.ico|\.rss|\.zip|\.mp3|\.rar|\.exe|\.wmv|\.doc|\.avi|\.ppt|\.mpg|\.mpeg|\.tif|\.wav|\.mov|\.psd|\.ai|\.xls|\.mp4|\.m4a|\.swf|\.dat|\.dmg|\.iso|\.flv|\.m4v|\.torrent))(.*) http://service.prerender.io/http://dev6.lovelionstudio.com/$2 [P,L]
</IfModule>

And here is my index file head:



...

I have included the sitemap.xml file that Prerender generates, and have cached all the main pages of this site.

Any ideas on why this isn't working?
Here is the site for reference http://lovelionstudio.com/

Thanks for your help ahead of time.

@thoop
Owner

Send me an email at support@prerender.io if anyone has any issues. I don't get notified when someone comments on this gist.

@PdotRudy

Why is google bot not included?

@SystemDisc

Because I'm using CodeIgniter, I needed the URL that's passed to prerender.io to not include the hashbang, and include the _escaped_fragment_ parameter in the URL itself.

In other words, instead of pages looking like the following in "Cached Pages"

http://www.example.com/#!/user/123

I needed them to look like this

http://www.example.com/user/123

I used this:

RewriteCond %{HTTP_USER_AGENT} baiduspider|facebookexternalhit|twitterbot|rogerbot|linkedinbot|embedly|quora\ link\ preview|showyoubot|outbrain|pinterest|slackbot|vkShare|W3C_Validator [NC,OR]
RewriteCond %{QUERY_STRING} _escaped_fragment_=(\%2F|/)*(.*)

RewriteRule ^(?!.*?(\.js|\.css|\.xml|\.less|\.png|\.jpg|\.jpeg|\.gif|\.pdf|\.doc|\.txt|\.ico|\.rss|\.zip|\.mp3|\.rar|\.exe|\.wmv|\.doc|\.avi|\.ppt|\.mpg|\.mpeg|\.tif|\.wav|\.mov|\.psd|\.ai|\.xls|\.mp4|\.m4a|\.swf|\.dat|\.dmg|\.iso|\.flv|\.m4v|\.torrent|\.ttf|\.woff))(.*) http://service.prerender.io/https://%{HTTP_HOST}/%2? [P,NE,L]
  • The first RewriteCond checks to make sure one of the user agents listed in requesting the page
  • The second RewriteCond checks for _escaped_fragment_ and puts its value, excluding any prefixed forward-slashes, into %2
  • The RewriteRule sends the request to http://service.prerender.io/https://HOST_NAME/VALUE_OF_ESCAPED_FRAGMENT and removes any query string

For reference, my full .htaccess for a site with CodeIgniter + AngularJS is as follows:

<IfModule mod_headers.c>
    # Change YOUR_TOKEN to your prerender token
    RequestHeader set X-Prerender-Token "YOUR_TOKEN"
</IfModule>

<IfModule mod_rewrite.c>
    RewriteEngine On
    # !IMPORTANT! Set your RewriteBase here and don't forget trailing and leading
    # slashes.
    # If your page resides at
    # http://www.example.com/mypage/test1
    # then use
    # RewriteBase /mypage/test1/
    RewriteBase /

    RewriteRule ^index.php/(.*)$ /$1 [R=302,L]

    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteRule ^(.*)$ index.php?/$1 [L]

    <IfModule mod_proxy_http.c>
        RewriteCond %{HTTP_USER_AGENT} baiduspider|facebookexternalhit|twitterbot|rogerbot|linkedinbot|embedly|quora\ link\ preview|showyoubot|outbrain|pinterest|slackbot|vkShare|W3C_Validator [NC,OR]
        RewriteCond %{QUERY_STRING} _escaped_fragment_=(\%2F|/)*(.*)

        # Only proxy the request to Prerender if it's a request for HTML
        RewriteRule ^(?!.*?(\.js|\.css|\.xml|\.less|\.png|\.jpg|\.jpeg|\.gif|\.pdf|\.doc|\.txt|\.ico|\.rss|\.zip|\.mp3|\.rar|\.exe|\.wmv|\.doc|\.avi|\.ppt|\.mpg|\.mpeg|\.tif|\.wav|\.mov|\.psd|\.ai|\.xls|\.mp4|\.m4a|\.swf|\.dat|\.dmg|\.iso|\.flv|\.m4v|\.torrent|\.ttf|\.woff))(.*) http://service.prerender.io/https://%{HTTP_HOST}/%2? [P,NE,L]
    </IfModule>
</IfModule>

<IfModule !mod_rewrite.c>
    # If we don't have mod_rewrite installed, all 404's
    # can be sent to index.php, and everything works as normal.
    # Submitted by: ElliotHaughin

    ErrorDocument 404 /index.php
</IfModule>

I have my location provider set as:

$locationProvider.hashPrefix('!');
$locationProvider.html5Mode(true);
@em0xi0nx

im currently working on a project and this is my first time using angularjs and i think its pretty powerful but there's a problem.
SEO problem. When i first use Angular i didn't know that its not visible to SEARCH ENGINES.

Right now im hosting my own Prerender Service. but whenever i request for a url after a #!

http-:// -->> http:// ------ because i have limited links

http-://prerender.host.com/http-://host.com/#!/

something it always renders the http-://host.com/ route and not the http-://host.com/#!/something.

please help me!! i dont really know whats happening. i also tried html5mode with a fragment meta but still the same.

SETUP
HASHBANG

http-://host.com/#!/

ROUTES

i do have hashprefix and explicitly turning off html5mode knowing its off by default. --- i thought it was the problem :))

$routeProvider
    .when('/', {
        templateUrl: '../views/tutorials.html',
        controller: 'TutorialController',


    })
    .when('/tutorial/:id', {
        templateUrl: '../views/tutorial_detail.html',
        controller: 'TutorialDetailController',

    })
    .when('/add-tutorial', {
        templateUrl: '../views/tutorial_add.html',
        controller: 'TutorialAddController',

    })

.otherwise('/');
$locationProvider.hashPrefix('!');
$locationProvider.html5Mode(false);

APACHE

im using the settings from prerender apache guide.

<IfModule mod_rewrite.c>
RewriteEngine On



<IfModule mod_proxy_http.c>
    # Enable prerendering for .html and directory index files
    RewriteCond %{HTTP_USER_AGENT} Googlebot|bingbot|Googlebot-Mobile|Baiduspider|Yahoo|YahooSeeker [NC,OR]
    RewriteCond %{QUERY_STRING} _escaped_fragment_
    RewriteRule ^ http://prerender.em0xi0nx.com/http://tutorial.em0xi0nx.com/%{REQUEST_URI} [P,L]
</IfModule>
</IfModule>

Do i need to do something else or is there anything wrong with my setup?? i've figuring out this for 3 days and now i really need help.

any suggestions?? i really need to finish this for our defense.

@thoop
Owner

Googlebot, bingbot, and many others are not included in the config because they support the _escaped_fragment_ parameter, which is checked for in the config. More information can be found here: https://developers.google.com/webmasters/ajax-crawling/docs/getting-started

@sgasser

Google Snippet (for sharing on Google+) is missing.

Add this: Google\ \(\+https:\/\/developers.google.com\/\+\/web\/snippet\/\)

@avin77

Sorry if this question seems stupid
I have deployed prerender on my localhost and able to

open at localhost:3000/http://www.abc.com but if i try to open it http://www.abc.in/?_escaped_fragment_=/ or http://www.abc.in/?_escaped_fragment_= it redirects on the same home page with exactly same html .

Can anyone tell what to write in RedirectRule in .htaccess for localhost deployment?

@sirviejo

Just something i learned today, while trying to make this work with an angular. Apache by default tries to redirect your request to /index.php so you will need to set DirectoryIndex index.html before the request is proxied, hope this saves time to others.

This is how we have it now:

DirectoryIndex index.html

RequestHeader set X-Prerender-Token "MYPRERENDERIOTOKEN"


RewriteEngine On

<IfModule mod_proxy_http.c>
    RewriteCond %{HTTP_USER_AGENT} baiduspider|facebookexternalhit|twitterbot|rogerbot|linkedinbot|embedly|quora\ link\ preview|showyoubot|outbrain|pinterest|slackbot|vkShare|W3C_Validator [NC,OR]
    RewriteCond %{QUERY_STRING} _escaped_fragment_

    # Only proxy the request to Prerender if it's a request for HTML
    RewriteRule ^(?!.*?(\.js|\.css|\.xml|\.less|\.png|\.jpg|\.jpeg|\.gif|\.pdf|\.doc|\.txt|\.ico|\.rss|\.zip|\.mp3|\.rar|\.exe|\.wmv|\.doc|\.avi|\.ppt|\.mpg|\.mpeg|\.tif|\.wav|\.mov|\.psd|\.ai|\.xls|\.mp4|\.m4a|\.swf|\.dat|\.dmg|\.iso|\.flv|\.m4v|\.torrent|\.ttf|\.woff))(.*) http://service.prerender.io/http://myprittieangularapp.com$2 [P,L]
</IfModule>

@philippbosch

You might want to add applebot to the list like here

@thekosmix

Hey Guys,

I have access to httpd.conf (/etc/httpd/conf/httpd.conf). Can I add those lines in this file directly? If yes, then do i need to change some other configuration also? like AllowOverride etc?

i have other conf files with virtualhost configuration:
<VirtualHost *:80>
<Proxy *>
Order deny,allow
Allow from all
</Proxy>

ProxyPass / http://localhost:8080/ retry=0
ProxyPassReverse / http://localhost:8080/
ProxyPreserveHost on

LogFormat "%h (%{X-Forwarded-For}i) %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\""
ErrorLog /var/log/httpd/elasticbeanstalk-error_log
TransferLog /var/log/httpd/elasticbeanstalk-access_log
</VirtualHost>

PS: it's an elastic beanstalk machine with tomcat stack and httpd server (httpd server is just there to forward request on port 80 to port 8080 of tomcat)

@D0xzen

Hi guys im new, i need help, i did an angular website but when i try crawler tools it give me the same problem everytime {{ description }} for example when i changed my .htaccess and changed the token on it gaves me the same problem. how i can solve that problem?

@IMPMAC

Wanted to let anyone know. I had a problem where this wasn't working on a fresh server. The fix was to enable the mod_proxy_http module in apache.

@TensaZangetsu

I can't seem to get my apache config working i go to prerender io and it tells me to instal token anyone can help? I have this code

RewriteEngine On

If requested resource exists as a file or directory

# (REQUEST_FILENAME is only relative in virtualhost context, so not usable)
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} -f [OR]
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} -d
# Go to it as is
RewriteRule ^ - [L]

# If non existent
# If path ends with / and is not just a single /, redirect to without the trailing /
RewriteCond %{REQUEST_URI} ^./$
RewriteCond %{REQUEST_URI} !^/$
RewriteRule ^(.
)/$ $1 [R,QSA,L]

# Handle Prerender.io
RequestHeader set X-Prerender-Token "TOKEN"

RewriteCond %{HTTP_USER_AGENT} baiduspider|facebookexternalhit|twitterbot|rogerbot|linkedinbot|embedly|quora\ link\ preview|showyoubot|outbrain|pinterest [NC,OR]
RewriteCond %{QUERY_STRING} _escaped_fragment_

# Proxy the request
RewriteRule ^(.*)$ http://service.prerender.io/http://%{HTTP_HOST}$1 [P,L]

# If non existent
# Accept everything on index.html
RewriteRule ^ /index.html#!

@samvloeberghs

@thelakshya @thoop
It is actually better to put this configs in Apache config directly, if you can.
Using .htaccess slows down Apache quiet a bit.

http://httpd.apache.org/docs/current/howto/htaccess.html

@fabiowitt

thanks @vinceMeens . Your solution worked great for me!

@tenzopro

This SEO prerender.io is such a mess. I have Laravel 5.1 and Angularjs. So how should my .htaccess configuration look like? I'm using .htaccess that comes with Laravel. I have the hashBang #! set in my angular routes; meta fragment in my layout.blade.php like so: ?escaped_fragment= ... so whats next? My prerender.io key is in my .env - I'm totally confused.

@jscontrust

.svg and .svgz should be added to the excluded file extensions

@mmbfreitas

@fabiowitt In fact we changed the order of scripts, because when we used the order of @vinceMeens, our home did not render even using escape_frament.

We changed

#If requested resource exists as a file or directory
#(REQUEST_FILENAME is only relative in virtualhost context, so not usable)
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} -f [OR]
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} -d
#Go to it as is
RewriteRule ^ - [L]

down of

RewriteCond %{HTTP_USER_AGENT} baiduspider|facebookexternalhit|twitterbot|rogerbot|linkedinbot|embedly|quora\ link\ preview|showyoubot|outbrain|pinterest|slackbot|vkShare|W3C_Validator [NC,OR]
RewriteCond %{QUERY_STRING} escaped_fragment
#Proxy the request
RewriteRule ^(.*)$ http://service.prerender.io/http://%{HTTP_HOST}/$1 [P,L]

in this way my home was found in prerender, is there any problem with this change @ vinceMeens?

@JosephELKHOURY

Thanks @vinceMeens 👍 Your solution worked perfectly!

@stevenbitner

The escaped_fragment aspect of this config only gets applied if your routes utilize parameter strings. If you use "pretty" urls like www.mysite.com/foo/bar, then that rewrite condition will not trigger for Google, Bing and others unless you add them yourself

@ballaevi

I am new with .htaccess and I am having some trouble with it
Following the default config setting I have this:

<IfModule mod_headers.c>
    RequestHeader set X-Prerender-Token "TOKEN"
</IfModule>

<IfModule mod_rewrite.c>
    <IfModule mod_negotiation.c>
        Options -MultiViews
    </IfModule>
    RewriteEngine On
    <IfModule mod_proxy_http.c>
        RewriteCond %{HTTP_USER_AGENT} baiduspider|facebookexternalhit|twitterbot|rogerbot|linkedinbot|embedly|quora\ link\ preview|showyoubot|outbrain|pinterest|slackbot|vkShare|W3C_Validator [NC,OR]
        RewriteCond %{QUERY_STRING} _escaped_fragment_

        # Only proxy the request to Prerender if it's a request for HTML
        RewriteRule ^(?!.*?(\.js|\.css|\.xml|\.less|\.png|\.jpg|\.jpeg|\.gif|\.pdf|\.doc|\.txt|\.ico|\.rss|\.zip|\.mp3|\.rar|\.exe|\.wmv|\.doc|\.avi|\.ppt|\.mpg|\.mpeg|\.tif|\.wav|\.mov|\.psd|\.ai|\.xls|\.mp4|\.m4a|\.swf|\.dat|\.dmg|\.iso|\.flv|\.m4v|\.torrent|\.ttf|\.woff))(.*) http://service.prerender.io/https://MY_URL/$2 [P,L]
    </IfModule>

</IfModule>

I am using Laravel 5 as backend and AngularJs for frontend.
My config for anguar are:

$locationProvider.html5Mode(true);
$locationProvider.hashPrefix('!');

In index.php I have:

  <base href="/"></base>
  <meta name="fragment" content="!">
@CosyStudios

Trying to implement vinceMeens solution with Angular but prerender seems to be receiving request for things minus a / somewhere and ends up with a 504.

Prerenders crawlstats look like:
504 0.243 s http://songsaboutanimals.co.ukindex.html.var/ 14 minutes ago Twitter 1.0
504 1.733 s http://songsaboutanimals.co.ukstories/ an hour ago Firefox 6.0
504 0.056 s http://songsaboutanimals.co.ukalbum/ an hour ago Firefox 6.0
504 0.109 s http://songsaboutanimals.co.ukassets/book/1.jpg an hour ago Facebook 1.1

When evidently it should be querying songsaboutanimals.co.uk/album or /assets/book/1.jpg

htaccess is setup as Vince suggests, except I've tried moving the bit that replaces the / to after the prerender handling in hope it might make a difference

rewrite portion looks like this:

Options +FollowSymlinks
RewriteEngine On
RewriteBase /

# (REQUEST_FILENAME is only relative in virtualhost context, so not usable)
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} -f [OR]
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} -d
# Go to it as is
RewriteRule ^ - [L]

# Handle Prerender.io
DirectoryIndex index.html
RequestHeader set X-Prerender-Token "XXX replaced XXX"

RewriteCond %{HTTP_USER_AGENT} baiduspider|facebookexternalhit|twitterbot|rogerbot|linkedinbot|embedly|bingbot|Baiduspider|Yahoo|YahooSeeker|quora\ link\ preview|showyoubot|outbrain|pinterest|applebot [NC,OR]
RewriteCond %{QUERY_STRING} _escaped_fragment_

# Proxy the request
RewriteRule ^(.*)$ http://service.prerender.io/http://%{HTTP_HOST}$1 [P,L]

# If path ends with / and is not just a single /, redirect to without the trailing /
  RewriteCond %{REQUEST_URI} ^.*/$
  RewriteCond %{REQUEST_URI} !^/$
  RewriteRule ^(.*)/$ $1 [R,QSA,L]      

# Accept everything on index.html
RewriteRule ^ index.html [L]

Then in my Angular route controller:
$locationProvider.hashPrefix('!');
$locationProvider.html5Mode(true);

And in index.html:

Going to songaboutanimals.co.uk/stories/ succesffully routes to songaboutanimals.co.uk/stories (minus slash)
Going to songaboutanimals.co.uk/#!/stories succesffully routes to songaboutanimals.co.uk/stories
BUT songaboutanimals.co.uk/#!/stories/ does not and default back to home page .

Site is live and viewable at songsaboutanimals.co.uk

@CosyStudios

Solution appears to work once I force an additional slash into the line

RewriteRule ^(.*)$ http://service.prerender.io/http://%{HTTP_HOST}{ADDITIONAL / HERE}$1 [P,L]

@CosyStudios

Only thing I cant get now is for facebook etc to read my dynamically generated meta tags

Fine when sharing the base url because that defaults set, but for any other URL, angular replaces the meta per page using
https://github.com/jvandemo/angular-update-meta

... but nothing is returned for facebook from the prerender shot

Definatey coming through though...
200 0.065 s http://songsaboutanimals.co.uk/news 5 minutes ago Twitter 1.0
200 0.028 s http://songsaboutanimals.co.uk/stories 8 minutes ago Facebook 1.1
200 0.127 s http://songsaboutanimals.co.uk/stories 8 minutes ago Facebook 1.1

@fantaJinMode

@vinceMeens I tried your solution. But its not working. Can anyone help me figure out this problem. (Drupal 7 + Angular JS)
My htaccess is shown below

` RewriteEngine On

#If requested resource exists as a file or directory
# (REQUEST_FILENAME is only relative in virtualhost context, so not usable)
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} -f [OR]
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} -d
# Go to it as is
RewriteRule ^ - [L]

If non existent

# If path ends with / and is not just a single /, redirect to without the trailing /
RewriteCond %{REQUEST_URI} ^./$
RewriteCond %{REQUEST_URI} !^/$
RewriteRule ^(.
)/$ $1 [R,QSA,L]

Handle Prerender.io

DirectoryIndex /qld/qld_angular/index.html
RequestHeader set X-Prerender-Token "My token"

RewriteCond %{HTTP_USER_AGENT} baiduspider|facebookexternalhit|twitterbot|rogerbot|linkedinbot|embedly|quora\ link\ preview|showyoubot|outbrain|pinterest [NC,OR]
RewriteCond %{QUERY_STRING} escaped_fragment

Proxy the request

RewriteRule ^(.*)$ http://service.prerender.io/http://%{HTTP_HOST}$1 [P,L]

If non existent

# Accept everything on index.html
RewriteRule ^ /qld/qld_angular/index.html`

@mehtapn

Hello all,

I am new in angular js. I have created a site using angularjs and for rendering I have used prerender.io services.
I want to prevent /MYURL_TYPE/a/b/c/ pages from being sent to prerender.io service and being rendered for crawlers

How can I do it.

@htor

@jsoncontrust Good observation.

Is there any way to make the RewriteRule matching only HTML to work without listing all possible file endings on your site? This is in my opinion a maintenance issue and I'd rather avoid it if possible.

@Alphavader

I have a Problem that Prerender cant cache links correct.
If i say http://example.de/ register - it chaches http://example.de/.

Does this have something todo with : $urlRouterProvider.otherwise('/'); ?
Page is full Angular JS.

Also have:
$httpProvider.defaults.withCredentials = true;
$locationProvider.html5Mode(true).hashPrefix('!');

and in the index:

Thanks alot

@toioski

I was having problem to get Prerender working and I want to share my solution.
After some hard hours of debugging I finally I've solved moving the Prerender section of configuration just after Rewrite Engine On.

This is my .htacces file if can help someone in the future:

<IfModule mod_rewrite.c>
    Options +FollowSymlinks
    RewriteEngine On

    # Handle Prerender.io
    RequestHeader set X-Prerender-Token "YOUR_TOKEN"
    RewriteCond %{HTTP_USER_AGENT} baiduspider|facebookexternalhit|twitterbot|rogerbot|linkedinbot|embedly|bingbot|Baiduspider|Yahoo|YahooSeeker|quora\ link\ preview|showyoubot|outbrain|pinterest|applebot [NC,OR]
    RewriteCond %{QUERY_STRING} _escaped_fragment_
    # Proxy the request
    RewriteRule ^(?!.*?(\.js|\.css|\.xml|\.less|\.png|\.jpg|\.jpeg|\.gif|\.pdf|\.doc|\.txt|\.ico|\.rss|\.zip|\.mp3|\.rar|\.exe|\.wmv|\.doc|\.avi|\.ppt|\.mpg|\.mpeg|\.tif|\.wav|\.mov|\.psd|\.ai|\.xls|\.mp4|\.m4a|\.swf|\.dat|\.dmg|\.iso|\.flv|\.m4v|\.torrent|\.ttf|\.woff))(.*) http://service.prerender.io/http://%{HTTP_HOST}/$2 [L]

    # (REQUEST_FILENAME is only relative in virtualhost context, so not usable)
    RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} -f [OR]
    RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} -d
    # Go to it as is
    RewriteRule ^ - [L]

    # If path ends with / and is not just a single /, redirect to without the trailing /
    RewriteCond %{REQUEST_URI} ^.*/$
    RewriteCond %{REQUEST_URI} !^/$
    RewriteRule ^(.*)/$ $1 [R,QSA,L]

    # Accept everything on index.php
    RewriteRule ^ index.php [L]
</IfModule>
@gurkan0791

@toioski thanks it worked. But not working image and css with render.
When I look at the console I see "The stylesheet was not loaded because its MIME type, ”text/html“ is not ”text/css" .
Folder --> bee-angular/view/css/. Help me please. Thanks.

@simonpeters

@toioski your solution works for me! Thanks!

The only issue I still have is that when using prerender with this htaccess solution is that facebook crawler sees my Canonical URL as "http://service.prerender.io/https://www.domain.com" and this is also shown below a facebook post.

Anyone know a solution for this problem?

@sashatexb
sashatexb commented Jun 23, 2016 edited
<IfModule mod_headers.c>
    RequestHeader set X-Prerender-Token "YOUR_TOKEN"
  </IfModule>

  <IfModule mod_rewrite.c>
    RewriteEngine on

    RewriteCond %{HTTP_HOST} ^www\.(.+)$  [NC]
    RewriteRule ^(.*)$ http://%1/$1 [L,R=301]

    # Don't rewrite files or directories
    RewriteCond %{REQUEST_FILENAME} -f [OR]
    RewriteCond %{REQUEST_FILENAME} -d
    RewriteRule ^ - [L]

    # If the requested pattern is file and file doesn't exist, send 404
    RewriteCond %{REQUEST_URI} ^(\/[a-z_\-\s0-9\.]+)+\.[a-zA-Z]{2,4}$
    RewriteRule ^ - [L,R=404]

    # Prerender.io stuff
    <IfModule mod_proxy_http.c>
        RewriteCond %{HTTP_USER_AGENT} baiduspider|facebookexternalhit|twitterbot|rogerbot|linkedinbot|embedly|quora\ link\ preview|showyoubot|MailRuConnect|Rambler|OdklBot|outbrain|msnbot|ia_archiver|pinterest|slackbot|Yahoo|Bingbot|vkShare|Feedfetcher-Google|W3C_Validator [NC,OR]
        RewriteCond %{HTTP_USER_AGENT} Yandex(Bot|Images|Video|Media) [NC,OR]
        RewriteCond %{QUERY_STRING} _escaped_fragment_=

        # Only proxy the request to Prerender if it's a request for HTML
        RewriteRule ^(?!.*?(\.js|\.css|\.xml|\.less|\.png|\.jpg|\.jpeg|\.gif|\.pdf|\.doc|\.txt|\.ico|\.rss|\.zip|\.mp3|\.rar|\.exe|\.wmv|\.doc|\.avi|\.ppt|\.mpg|\.mpeg|\.tif|\.wav|\.mov|\.psd|\.ai|\.xls|\.mp4|\.m4a|\.swf|\.dat|\.dmg|\.iso|\.flv|\.m4v|\.torrent|\.ttf|\.woff))(.*) http://service.prerender.io/http://%{HTTP_HOST}/$2 [P,QSA,L]

    </IfModule>

    # otherwise use history router
    RewriteRule ^ /index.html [L]

</IfModule>
@vijayrevature
vijayrevature commented Aug 16, 2016 edited

RewriteCond %{HTTP_USER_AGENT} (facebookexternalhit/[0-9]|Twitterbot|Pinterest|Slackbot\s[0-9.]|Slackbot-LinkExpanding\s[0-9.]|Slack-ImgProxy\s[0-9.]|Google.*snippet)

RewriteRule ^(?!.?(.js|.css|.xml|.less|.png|.jpg|.jpeg|.gif|.pdf|.doc|.txt|.ico|.rss|.zip|.mp3|.rar|.exe|.wmv|.doc|.avi|.ppt|.mpg|.mpeg|.tif|.wav|.mov|.psd|.ai|.xls|.mp4|.m4a|.swf|.dat|.dmg|.iso|.flv|.m4v|.torrent|.ttf|.woff))(.) http://service.prerender.io/http://example.com/$2 [P,L]

This is the code I have placed in my .htaccess file, but its working only for facebook and its not working for other socialbots(facebook, LinkedIn, Slack, etc..).
Can some please correct this if anything wrong here? Or Else please suggest if any other solution is available?

Thanks,
-Vijay

@jakecolour

@simonpeters I'm having the same problem with the canonical URL appearing below a facebook post.
Have you had any luck with getting it to show the fetched URL?

@viking2917

With google having deprecated the escaped_fragment syntax (https://webmasters.googleblog.com/2015/10/deprecating-our-ajax-crawling-scheme.html), anecdotally in my apache logs I see a lot of requests from Google coming without the escaped_fragment parameter. I added |googlebot| to my .htaccess for redirection to pre-render and my cache hits went way up. Anyone else have similar experience? Is there any particular reason NOT to have googlebot explicitly listed?

@mustivar

Hi I have an application developed with spring boot + angularjs
I serve it through Apache with ProxyPass
With suggested rewrite rule I can redirect it to prerender.io but unfortunately angular views cannot be rendered.
In rendered body all I can see is top level divs. It seems views for that divs cannot be loaded. (I use ui-router for views)
Also I cannot see any script and .js includes in rendered page(I think that is how prerender does that)

I tried every solution that is suggested here and on web but could not succeed to make it work.

Do you have any suggestions for me? Thanks.

@russellwark
russellwark commented Aug 31, 2016 edited

I'm having some issues with my .htaccess file for a site I've been working on - basically, it's a WordPress site with an Angular theme, hence the need for Prerender.io. What's happening is that the site is throwing a 404 every time the page is accessed, yet the page still displays and can be refreshed. Since it's throwing a 404, Prerender.io isn't picking it up for caching. This is only in HTML5 'pretty URLs' mode - it doesn't throw a 404 if you access the page with the hashbang.

Here's my .htaccess file - any suggestions would be very much appreciated.

<IfModule mod_deflate.c>
    <IfModule mod_headers.c>
        Header append Vary User-Agent env=!dont-vary
    </IfModule>
        AddOutputFilterByType DEFLATE text/css text/x-component application/x-javascript application/javascript text/javascript text/x-js text/html text/richtext image/svg+xml text/plain text/xsd text/xsl text/xml image/x-icon application/json
    <IfModule mod_mime.c>
        AddOutputFilter DEFLATE js css htm html xml
    </IfModule>
</IfModule>

Options -MultiViews

RequestHeader set X-Prerender-Token "{prerender_token}"

<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} baiduspider|facebookexternalhit|twitterbot|rogerbot|linkedinbot|embedly|bingbot|Baiduspider|Yahoo|YahooSeeker|quora\ link\ preview|showyoubot|outbrain|pinterest|applebot [NC,OR]
RewriteCond %{QUERY_STRING} _escaped_fragment_

RewriteRule ^(?!.*?(\.js|\.css|\.xml|\.less|\.png|\.jpg|\.jpeg|\.gif|\.pdf|\.doc|\.txt|\.ico|\.rss|\.zip|\.mp3|\.rar|\.exe|\.wmv|\.doc|\.avi|\.ppt|\.mpg|\.mpeg|\.tif|\.wav|\.mov|\.psd|\.ai|\.xls|\.mp4|\.m4a|\.swf|\.dat|\.dmg|\.iso|\.flv|\.m4v|\.torrent|\.ttf|\.woff))(.*) http://service.prerender.io/http://%{HTTP_HOST}/$2 [P,L]
</IfModule>

RewriteEngine On  
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} -f [OR]
RewriteCond %{DOCUMENT_ROOT}%{REQUEST_URI} -d
RewriteRule ^ - [L]

RewriteRule ^ /index.php [L]

<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d

RewriteCond %{HTTP_USER_AGENT} libwww-perl.* 
RewriteRule .* ? [F,L]

</IfModule>

<IfModule mod_expires.c>
ExpiresActive On
ExpiresByType image/jpg "access 1 year"
ExpiresByType image/jpeg "access 1 year"
ExpiresByType image/gif "access 1 year"
ExpiresByType image/png "access 1 year"
ExpiresByType text/css "access 1 month"
ExpiresByType text/html "access 1 month"
ExpiresByType application/pdf "access 1 month"
ExpiresByType text/x-javascript "access 1 month"
ExpiresByType application/x-shockwave-flash "access 1 month"
ExpiresByType image/x-icon "access 1 year"
ExpiresDefault "access 1 month"
</IfModule>
@maa42

Thanks everyone for posting on this forum. I'm on AngularJS and apache, #! URLs. I noticed today that google was not crawling my site. I made the following changes: Per other comments, converted the #! RewriteCond %{QUERY_STRING} escaped_fragment=(\%2F|/)(.). I also added the google bots to the list being the test searches did not work till I added the bots. Finally, I'm https so the url definition at the end of the file has to be HTTPS! Good luck everyone

DirectoryIndex index.html

RequestHeader set X-Prerender-Token "Your Token"


RewriteEngine On


RewriteCond %{HTTP_USER_AGENT} baiduspider|googlebot|googlebot-mobile|bingbot|facebookexternalhit|twitterbot|rogerbot|linkedinbot|embedly|quora\ link\ preview|showyoubot|outbrain|pinterest|slackbot|vkShare|W3C_Validator [NC,OR]
RewriteCond %{QUERY_STRING} escaped_fragment=(\%2F|/)(.)

# Only proxy the request to Prerender if it's a request for HTML
RewriteRule ^(?!.*?(\.js|\.css|\.xml|\.less|\.png|\.jpg|\.jpeg|\.gif|\.pdf|\.doc|\.txt|\.ico|\.rss|\.zip|\.mp3|\.rar|\.exe|\.wmv|\.doc|\.avi|\.ppt|\.mpg|\.mpeg|\.tif|\.wav|\.mov|\.psd|\.ai|\.xls|\.mp4|\.m4a|\.swf|\.dat|\.dmg|\.iso|\.flv|\.m4v|\.torrent|\.ttf|\.woff))(.*) http://service.prerender.io/https://www.leaguescience.com/$2 [P,L]


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment