(Contents copied with permission from a private GitHub issue.)
The current state of c7 search is:
- versions 7.18 through 7.15, and develop/master, and latest all have canonical URLs pointing at the 7.18 version of the page
- The 7.18 sitemap contains canonical URLs that point at the 7.18 versions
- The 7.18 sitemap has been submitted to Google to crawl/index
- Versions 7.6 through 7.17 are marked to not be indexed by Google (with a
<meta name="robots" content="noindex" />
tag)
In this current state, we still see Google pointing at a lot of pre-7.18 canonicals. There are many reasons Google might choose a different canonical than what we've declared, but it appears that the noindex
tags on older versions contribute. They seem to be preventing Google from re-reading the older version pages (and seeing that the older pages declare 7.18 to be canonical).
- The
noindex
tags on older versions are unnecessary. If we mark the old versions with a propercanonical
tag, Google will not penalize duplicate content, and will be likely to choose the newer version of the page as canonical. - The
noindex
tags are an impediment to Google correcting its canonicals, and cause Google to hang on to old canonicals despite our efforts to convince the Googlebot otherwise.
- find <5 pages that are 7.17 canonicalized
- Capture the state of them in Google search console, and via a Google search for
site:docs.camunda.org <phrase visible on that specific page>
- Update the 7.17 site to remove the
noindex
tag on only those pages (camunda/camunda-docs-manual#1395). - Submit the 7.17 pages to be crawled via Google search console
- Capture any changes in Google search console, as well as a Google search for a phrase on each page. Note that it could take weeks for changes to appear.
- Analyze results.
Results: 2 of the 3 test pages became searchable, with version 7.18 chosen as canonical! Conclusion: we should extend the experiment.
- Update the 7.15 site to remove the
noindex
tag from all pages. - Update 7.15 version to exclude noindex from all pages & deploy it (camunda/camunda-docs-manual#1409)
- Click the "Validate Fix" button on the "Duplicate, Google chose different canonical than user" dataset
- Wait for a few weeks
- Re-analyze "Duplicate, Google chose different canonical than user" dataset
- 7.15 canonicals have declined by at least 50% in the "Duplicate, Google chose different canonical than user" dataset
- No new 7.15 results show
- i.e. The Google search query site:docs.camunda.org inurl:"15" -inurl:"javadoc" still yields 0 results.
Results: 7.15 canonicals did not decline, and some 7.15 pages became newly canonical.
Conclusion: removing noindex
from all pages would result in many non-7.18 canonicals.
- Roll back 7.17 experiment (camunda/camunda-docs-manual#1395)
- Roll back 7.15 experiment (camunda/camunda-docs-manual#1409)