SEO/SEM guide. Keywords: seo sem ux ttfb noindex robots.txt sitemap robots bundle


Google release the best document for SEO practices here: There are 2 sections:

  1. Beginner SEO
  2. Advanced SEO

If you're just interested in performances, please refer to the Web Vitals document.

Table of contents



Stands for Search Engine Result Page.


Optimizing a web page for keywords

Once you've found the list of keywords you wish a web page to rank for (ideally only a few because each technique only support one keywors at a time, therefore, too many different keywords will dilute your results), place them in the following HTML tags (sorted by order of importance):

  1. title tag in the HTML head.
  2. meta description tag in the HTML head.
  3. canonical link tag in the HTML head (e.g., <link rel="canonical" href=""/>). The Google bot hates duplicate content. This tag tell the GBot which page is the one and only page that should receive SEO love from it. Use it even if you think you don't have any duplicate page, because in reality, you do. Indeed, as far as the GBot is concerned, and are duplicated page.
  4. h1(1) tag in the HTML body.
  5. h2(1) tag in the HTML body.
  6. h3(1) tag in the HTML body.
  7. image alts tag in the HTML body. There are so many missing alt attributes in web page. This is a shame as it is a missed opportunity to rank.
  8. anchor text. Make sure that the text you use in your a tag describes the link as clearly as possible. If that links points to an external website, that website's domain authority will benefit from your good description. The same applies to an internal link. The Google bot loves organized content.

(1) A typical mistake is to use a H2 with no H1 because H1 looks too big. The issue is that H1 tags worth more than H2s when it comes to SEO. If the content of your header contains keywords you wish to rank for, try to use H1. Use CSS to change its style so it matches your design.

Finding the keywords ranking for a domain

To figure out what are the keywords for which a specific domain ranks, use the Organic Keywords Trend chart. To see that chart:

  1. Login to Semrush.
  2. Select Domain Analytics/Overview, enter the domain in the top input and click the Search button.
  3. Select the Organic Research in the left pane to see the Organic Keywords Trend chart.

Understanding the Organic Keywords Trend chart

The top horizontal bar can be read as follow:

  • Keywords: Current number of keywords that rank within the first 100 Google pages.
  • Traffic: Current number of users that those keywords have redirected to your website this month.
  • Traffic cost: How much would it cost to rank the way your keywords do.

The Organic Keywords Trend can be read as follow:

  • The legend show the colors that represent the keywords categories based on how they rank in the SERP (e.g., Top 3 are keywords that makes your domain rank in the top 3 Google pages).
  • Each vertical bar is a snapshot of the keywords ranking. For example, in the image above, hovering on the March 20 bar shows that 33 keywords in total ranked your webiste inside the first 100 SERPs. Amongst those 33 keywords:
    • 0 ranked in the first top 3 SERPs.
    • 5 ranked between the 4th and 10th SERP.
    • 4 ranked between the 11th and 20th SERP.
    • 11 ranked between the 21st and 50th SERP.
    • 13 ranked between the 51st and 100th SERP.

When you click on that bar, you can see the details of those keywords.

Finding the historical keywords ranking for a domain

This is achieved by following the same steps as the Finding the keywords ranking for a domain section. However, you'll need to have the Semrush Guru tier at minimum (almost USD200/month). In the Organic Keywords Trend chart, click on any bar in the chart to see the keywords ranking details for that point in time.

Crawl budget or how to use robots.txt sitemap.xml and noindex

Tl;dr Those three techniques aim to optimize your crawl budget:

  • Use a robots.txt to prevent non-marketing pages to consume your crawl budget.
  • Use one or many sitemap.xml to make sure that marketing pages are crawled to make the best out of your crawl budget.
  • Ise the noindex value in the HTML head to prevent certain pages that can't be listed in the robots.txt to be crawled. This technique is used to:
    • Prevent duplicate content.
    • Deal with faceted navigation
    • Soft 404 error page (i.e., pages that return a 200 status code saying that the page is not found), instead of a explicit 404 status HTML page.
    • Infinite space (e.g., calendar page where the URL contains the date)

What is the crawl budget

WARNING: Optimizing crawl budget only worth it if your website contains a few thousands web pages. Otherwise, it is a waste of time. That being said, nurturing good SEO habits doesn't hurt and will make it easier to grow.

Crawling your website is not effortless. This means that search engine companies don't allocate an infinite amount of resources to crawl your precious website. Instead, they allocate it a specific budget called the crawling budget. This budget is usually denominated in number of pages that the search engine will crawl. This budget depends on many factors that are left to the discretion of each search engine company, though some factors have become public. Without knowing exactly what your budget is, you should do your best to configure your website to prioritize the pages you want to be index and de-prioritize the pages that you do not wish to consume any amount of your precious crawl budget. The pages you should block from consuming your crawl budget are:

Factors that improve the crawl budget

  • Fast web pages even under pressure: If the GoogleBot notices that your pages load very quickly even with a lot of traffic, it can decide that increase the number of pages it schedules to crawl.
  • No errors
  • JS and CSS files: Every resource that Googlebot needs to fetch to render your page counts toward your crawl budget. To mitigate this, ensure these resources can be cached by Google. Avoid using cache-busting URLs (those that change frequently).
  • Avoid long redirect chains. A redirect count as an additional page to crawl in your budget.

Understanding how the crawl budget is spent

For a detailed explanation of the Google Search Console's Coverage report, please refer to

  • Use the Coverage section of the Google Search Console.
  • Review the URLs in the Valid category to confirm they are listed as expected. Unexpected pages are:
    • Duplicated content (often due to faceted URLs).
    • Soft 404s.
    • Non-marketing pages.

Non-marketing pages

  • Thank you page. Those pages could rank for long-funnel keywords.
  • User settings.

Dealing with duplicate content

  • Determine whether duplicate pages have already been indexed:
    • Login to the Google Search Console.
    • Select the correct property.
    • Click on the Coverage section in the menu.
    • Review all Valid URLs and look for duplicate URLs.
  • For all duplicate URLs:
    1. Do not block them in the robots.txt yet. Otherwise, this won't give Google a change to deindex them first.
    2. Make sure they have a canonical URL set the head.
    3. Add the noindex.
    4. Wait until the effect of the previous steps shows the duplicate page in the Excluded URLs category (this could take a couple of days).
    5. Block that page in the robots.txt.
    6. Optionally, if that page was useless and could be deleted in favor of the canonical version, then do it. Then make sure to create a 301 from that duplicate link to the canonical.

Canonical URL

<link rel="canonical" href="" />
  • A canonical URL impacts both indexing and crawlability:
    • Indexing: When all duplicate pages use the same canonical URL, only the canonical URL is indexed.
    • Crawlability: Once the page has been crawled and indexed once, Google will now which page is a duplicate. This means that subsequent crawls will only crawl the canonical URL and skip the duplicated content. This will allow to avoid wasting the crawl budget. Also, by making sure that the only the canonical URLs are added to the sitemap.xml, we can implicitely improve the crawl budget (as opposed to listing the duplicated links in the sitemap).
  • Both rel="canonical" and content="noindex" will prevent the page to be indexed by Google.
  • Do not mix canonical URL with noindex. This confuses the GoogleBot. If it sees both, it will choose to follow the canonical URL signal (ref: Google: Don’t Mix Noindex & Rel=Canonical).
  • Canonical URL has the same effect as a 301 permanent redirect. In fact, canonical URL was originally made for situation where a 301-redirect was not possible.
  • Use only the URLs that are canonical for your sitemap.

robots.txt or how to block pages with no marketing value to waste your crawl budget

Shopify Robots.txt Guide: How To Create & Edit The Robots.txt.liquid

User-agent: Googlebot
Disallow: /nogooglebot/

User-agent: *
Allow: /



  • The user agent named Googlebot is not allowed to crawl any URL that starts with
  • All other user agents are allowed to crawl the entire site. This could have been omitted and the result would be the same; the default behavior is that user agents are allowed to crawl the entire site.
  • The site's sitemap file is located at

Making Google aware of the robots.txt

Manually submit it to the Google Serch Console.

sitemap.xml overview

To create a robots.txt online, please refer to

Pages that are not listed in the sitemap.xml as well as not listed in the robots.txt will still be eventually crawled, but they won't receive as much attention from Google.

Making Google aware of the sitemap.xml

There are two ways to make Google aware of your sitemap.xml:

  1. Include it in the robots.txt. To see an example, please refer to the robots.txt section.
  2. Manually submit it to the Google Serch Console.

#1 is considered a best practice.

X-Robots-Tag with noindex

<meta name="robots" content="noindex" />

robots.txt vs noindex or the difference between crawling and indexing

Website to-do list

To double-check that the list below is correctly implemented, refer to successfull website that tick all the SEO boxes:


Metadata in the head

At a minimum, the page's head tag must contain:

	<title>Example Title</title> <!-- Keep it between 50 and 60 characters. Use your targetted keywords as well as long-tail keywords. -->
	<link rel="canonical" href=""> <!-- Don't forget the trailing slash -->
	<link rel="alternate" href="" hreflang="en"> <!-- Don't forget the trailing slash -->

	<!-- SEO, Meta and Opengraph -->
	<meta name="title" content="Example Title"> 
	<meta name="description" content="This is meta description Sample."> <!-- Keep it between 50 and 160 characters. -->
	<meta name="robots" content="index,follow"> <!-- Very important, otherwise, Google might not be able to index your page -->
	<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes">
	<meta itemprop="description" content="Clear page description">
	<meta itemprop="name" content="Page title">
	<meta name="description" content="Clear page description">
	<meta name="keywords" content="SEO keywords">
	<meta name="og:keywords" content="SEO keywords">
	<meta property="og:description" content="Clear page description.">
	<meta property="og:image" content="Path to image">
	<meta property="og:title" content="Page title">
	<meta property="og:type" content="website">
	<meta property="og:url" content=""> <!-- Don't forget the trailing slash -->
	<meta name="twitter:card" content="summary">
	<meta name="twitter:description" content="Clear page description">
	<meta name="twitter:image" content="Path to image">
	<meta name="twitter:title" content="Page title">

Pathname and information architecture

  • USE ABSOLUTE PATH IN ALL LINKS AND MAKE THE ROOT DOMAIN REDIRECT TO WWW. USE WWW. IN ALL YOUR ABSOLUTE PATH. The reason behind this is to create unique content. Otherwise, if your content is accessible from:

    Then it is technically duplicated 4 times, which confuses the Google bot and dilute your SEO efforts.

  • Use keywords in your URL paths. For example, if you're a shoe manufacturer, you may want to use a path similar to /shoe-manufacturing rather than /shoe.

  • Use hyphens in your pathname (no underscore). Google treats a hyphen as a word separator, but does not treat an underscore that way.

  • Use an absolute path in the canonical URL on all pages and make sure there is a trailing slash. Also, make sure that the the search params are included if they matter.

  • Add a trailing / on all internal links and make sure that all web page are using /, otherwise, Google may think the 2 version are duplicated content.


  • Use JSONLD on all pages (please refer to Annex in the JSONLD examples section).


  • Add the lang attribute on the html tag: <html lang="en">
  • If you know the language of a link, try to add hreflang on it.
  • Explicitely set up the hreflang, even if you only use a single language. Use an absolute path for that URL.


  • Use alt attribute on all images. Favor proper description, rather than SEO keywords. Use 50-55 characters (up to 16 words) in the alt text.


  • Find a way to organize your text content so that important keywords are in H1 tags and less important keywords are in h2.


robots.txt does not have authority than the robots meta tag, and vice versa, but noindex in any of those two places will stop the bot to index.


There are 3 types of sitemap files:

WARNING: The info in the sitemap.xml must be the same as in the actual page, otherwise, it will confuse the crawler bot, which might result in worst results than no sitemap at all (e.g., outdated hreflang or canonical URL).





  • Use your page is duplicated in another language, add an alternate link in the header:
<link rel="alternate" href="" hreflang="en"> <!-- Don't forget the trailing slash -->

Progressive Web Apps aka PWA & SEO

As of 2019, PWA are all the rage and Google has made a lot of progress to index them properly. To test how Google sees your PWA, please refer to the How to test how your page is seen by Google? section.

That being said, there are a series of caveats to avoid in order to not be penalized by Google:

  1. Avoid any URL with a #. Anything following the # is ignore by the Googlebot.
  2. Reduce the number of embedded resources in the page (especially the number of JavaScript files required to render the page), since these might not be fully loaded.
  3. Make sure required resources aren’t blocked by robots.txt.
  4. Use an accurate sitemap file to signal any changes to your website when using Accelerated Mobile Pages (AMP).

The first two points are the most important as the last two points are good practices for any websites in general.


Key moments

Google Search Console

This online tool from Google allows to gain insights on how your website is being crawled by the GoogleBot. It can also submit pages for crawling.

  • Submit new sitemap.xml or explicit new URLs.
  • Get alerted on issues.
  • Understand how Google sees your pages.
  • Test:
    • Mobile usability
    • Rich results ()

Understanding how your pages are performing

This is mainly detailed under the Performance section.

  • Queries: Details which keywords drive the most traffic.
  • Pages: Shows which pages receive the most traffic.

How to use this section:

  • Improve conversion: Use the Pages sectino to identify the pages that receive a lot of traffic but do not convert in terms of click.
  • Optimize your website for the best keywords. Use the Queries to understand which keywords is driving the most traffic and create dedicated pages just for those keyrods.
  • Compare your page performance from one period to another:
    • Select filter at the top (e.g., Query with keywords)
    • Click on the Date filter at the top and select Compare rather than Filter
    • You may see an increase of traffic due to:
      • Seasonality
      • Better content optimization for specific keywords.
      • Improvement is Web Vitals and fixed issues.
    • You may see a decrease of traffic due to:
      • Seasonality
      • Page errors (jump to the Anlysing a specific URL section to diagnoze issues)
      • Content is less popular
      • You've canibalized that page with a new optimized landing page

Anlysing a specific URL

Simply paste the URL in the search bar at the top.

Google Search Console Tips

  • Link your property in Google Analytics with the one in Google Search Console:
    • Open your property in Google Analytics
    • Click the Admin
    • Under the Property section, under the PRODUCT LINKING, click on the All Products
    • Link Google Search Console to feed new valuable data into your Google Analytics.

UX and SEO

Rendering on the Web


Topic Description Link
robots.txt Create a robots.txt online
robots.text Test the validity of a robots.txt
robots.text Test URLs against an inline robots.txt
sitemap.xml Create a sitemap.xml online
sitemap.xml Validate a sitemap.xml online

Tips and tricks

Red flags

Red flags - Google Search Console

  • Sudden spike in valid URLs in the Coverage section. This is usually due to misconfigured faceted pages.

How to

How to test how your page is seen by Google?

This renders all the HTML, but unfortunately, it won't render a full image of that HTML, just the beginning. To see the full render, you have not choice but to copy paste the HTML in a local file and render it yourself 😫.

How to check keywords ranking history for a domain?

Please refer to the Finding the historical keywords ranking for a domain section.

How to request Google to recrawl your website?

  1. Login to the Google Search Console (
  2. Choose one of the two options:
    1. Upload a new sitmaps.xml with new lastmod date for the URLs you wish to refresh. That's the fastest way to perform a batch re-crawl.
    2. Paste a URL in the Inspect search bar at the top, then click in the REQUEST INDEXING button.


JSONLD examples

General website description

    <script type="application/ld+json">
        "@context" : "",
        "@type" : "Organization",
        "legalName" : "Australian Barnardos Recruitment Services",
        "alternateName" : "ABRS",
        "url" : "",
        "contactPoint" : [{
            "@type" : "ContactPoint",
            "telephone" : "(02) 9218 2334",
            "Email" : "",
            "contactType" : "Sydney Office"
        "logo" : "",
        "sameAs" : ""

Describing a home page structure

<script type="application/ld+json">
          "name": "Home",
          "description": "{{ page.homeSiteNavDescription }}",
          "name": "About Us",
          "description": "{{ page.aboutUsSiteNavDescription }}",

          "name": "Job Types",
          "description": "{{ page.aboutUsSiteNavDescription }}",

          "name": "Industry Sectors",
          "description": "{{ page.aboutUsSiteNavDescription }}",

          "name": "Clients",
          "description": "{{ page.clientsSiteNavDescription }}",
          "name": "Jobs",
          "description": "{{ page.candidatesSiteNavDescription }}",
          "name": "Blog",
          "description": "{{ page.aboutUsSiteNavDescription }}",

          "name": "Contact",
          "description": "{{ page.contactSiteNavDescription }}",

Describing the position in the website

<script type="application/ld+json">
      "@context": "",
      "@type": "BreadcrumbList",
      "itemListElement": [{
        "@type": "ListItem",
        "position": 1,
        "name": "Home",
        "item": ""
      }, {
        "@type": "ListItem",
        "position": 2,
        "name": "About us",
        "item": ""
        "@type": "ListItem",
        "position": 3,
        "name": "Our values",
        "item": ""

ahrefs recipes to rank

General SEO

Keyword research

Link building


