Skip to content

Instantly share code, notes, and snippets.

@laradevitt
Created September 19, 2018 16:16
Show Gist options
  • Save laradevitt/e9c7bae4a578128d7d0e803ad8e5da16 to your computer and use it in GitHub Desktop.
Save laradevitt/e9c7bae4a578128d7d0e803ad8e5da16 to your computer and use it in GitHub Desktop.
(Drupal 7) Programmatically add <meta name="robots" content="noindex, nofollow"> to certain pages.
<?php
// I was getting a lot of "Indexed, though blocked by robots.txt" coverage
// errors on search/ pages from Google. Since robots.txt only prevents
// crawling, not indexing, I will allow crawling and prohibit indexing via
// the "robots" meta tag instead.
function MYTHEME_preprocess_html(&$variables) {
MYTHEME_add_meta_robots_noindex();
}
/**
* Helper function: Add <meta name="robots" content="noindex, nofollow"> to some
* pages. Any pages you add here should be removed from 'Disallow' rules in
* robots.txt, which prevents crawling.
*/
function MYTHEME_add_meta_robots_noindex() {
$robots_noindex = array(
'/search/*',
);
$nofollow = false;
// Get current URI and remove the trailing slash.
$uri = request_uri();
$uri = preg_replace('{/$}', '', $uri);
foreach ($robots_noindex as $url) {
if (substr($url, -1) === '*') {
// This url contains a wildcard.
$url_trimmed = substr($url, 0, -1);
if (substr($uri, 0, strlen($url_trimmed)) === $url_trimmed) {
$nofollow = true;
}
}
else {
if ($url == $uri) {
$nofollow = true;
}
}
}
if ($nofollow) {
$meta = array(
'#type' => 'html_tag',
'#tag' => 'meta',
'#attributes' => array(
'name' => 'robots',
'content' => 'noindex, nofollow',
)
);
drupal_add_html_head($meta, 'robots_noindex');
}
}
@awesomebloging
Copy link

Hello, What about Drupal 8/9? How do I use this code?

@laradevitt
Copy link
Author

Hi. :) Sorry, I only have a Drupal 7 version, as that was what I was working on at the time. I'll think about adding a version for 8/9 if I can find the time. It should be pretty easy to port.

@team-community
Copy link

Hi there. I would also be interested)
Maybe there is a place to make the module read robots.txt and substitute noindex for these pages from this file.

@laradevitt
Copy link
Author

Hi @team-community - You could do that, as long as you also update robots.txt to allow the pages to be crawled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment