Skip to content

Instantly share code, notes, and snippets.

@gitmathub
Last active December 25, 2019 18:48
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save gitmathub/4611bb8f6f62cf5a0bbf99359977790c to your computer and use it in GitHub Desktop.
Save gitmathub/4611bb8f6f62cf5a0bbf99359977790c to your computer and use it in GitHub Desktop.
Research. How to generate speaking URLs from a CMS?

Web URL Generation

Requirements

Must have

  1. Unique (canonical link available)
  2. URL has to be rfc3986 conform

Good to have

  1. Deep linking to app
  2. SEO optimized
  3. Readable/ meaning full words

Example

ID Replacement

Shorten the Name

Too long urls might be not practical if you have to type it or if you want to read it in the browser location. So shorten an URL might be a good idea.

Example:

  • "Set up Ongoing Shared Facilitation of Meetings" (47 characters)
  • Translates to: "set-up-ongoing-shared-facilitati" (32 characters)

Maximum length of an URL: The URL has not be not longer than 1855 characters. The maximum size is beyond of readability. See test at the end of this document

Check Existing Name

Let’s assume assume, that two tracks are named same: “Agility”. So we need 2 different URLs

rfc3986 Conform Character

If we have characters that are not allowed in URLs (rfc3986), we need to translate them.

  • Example: "Let’s get сложный!"
  • Translates to: "lets-get-slozhnyj"

Options

Option 1 - name encoding + number

Example:

Rules

  1. Schema: https://domain /entity / document-title -consecutive-number
  2. ID Replacement: Only the ID part gets translated. The rest of the URL stays same.
  3. All lower case
  4. Too long name. Names get shortened to 32 characters
  5. Existing Name. If the generated link already exists, then it get a consecutive number, starting with 2 (number 1, which is the assumed default case, is the link without any number)
  6. Non rfc3986 characters get translated or are left out

Benefits

  1. Good readability if URL schema is well chosen

Drawbacks

  1. This works only with id lookup table
  2. The CMS allows to change the title of the document. Every change would break existing links. This option seems to be more preferably if the domain is about people names (like linkedin) that don’t change (is only true if you don’t marry too often and change your name ;-)
  3. URL generator needs to check if an url already exists and then add number. If this process runs twice it has to have a rule what item gets number 2 and what number 3. This must be consistent.
  4. URL schema can’t be changed once it’s established without breaking old links. Example: when we decide that a guide/scrum could become practice/scrum

Option 2 - id and name

Example

Rules

  1. Schema https://domain.com /entity /id /generated-document-title
  2. Choosing a good entity that will probably stays valid is important. Example: If we move all /guides to /practices then this will break old links. When mindsettlers has a certain size this shouldn't happen anymore.

Benefits

  1. URL is unique
  2. No duplicate handling
  3. URL works also, when it accidentally got shorter as long the ID part stays the same. Cutting off URLs happens accidentally in emails.
  4. This works with our Android app, since the schema is programatically and not using a lookup table. The feature we want and have implemented is the deep linking
  5. Title of the document can be renamed. It would break SEO only but the link would still work as long the
  6. This is not less SEO effective (this statement has to be confirmed)
  7. The schema of “/entity/id” can be used for dynamical links. For the document title no lookup table is necessary

Drawbacks

  1. The ID doesn’t look nice
  2. It hard to type
  3. Ids can be very long dependig on your backend system

Questions

  1. Same implementation questions like with the other solution. How is the lookup table and the redirect organised?
  2. Links - Discussion about slugs in URL: https://stackoverflow.com/questions/47427/why-do-some-websites-add-slugs-to-the-end-of-urls

Option 3 - template - name - parameters

Benefit2

  1. SEO. The crawlers usually went by concept of importance in the route and for us it would be domain/template/actual-content-i.e-mindsettlers/player/jurgen-appelo

Questions

  1. What would be "template"?
  2. There seems to be no id. What is the algorithm to make it unique?
  3. Is “actual content” the title of a practice? Is so, are then “player” and “jurgen-appelo”. Additional parameters?

Industrie Examples

Technologies

Requirement Details

SEO

  • The crawlers usually went by concept of importance in the route and for us it would be domain/template/actual-content

    • i.e mindsettlers/player/jurgen-appelo
  • favor readability before seo:

“More recently the search engines have lowered the weighting given to keywords in the URL, likely because the technique is now more common on spam sites than legitimate. Keywords in the URL now have only a very minor impact on the search results, if at all.”

App Deep Links

An app deep link works as follows: The user has a web link on the phone and clicks onto it. Then the user has then 3 options:

  1. The user has not installed the app, and the web page opens in the web browser
  2. The user has an app installed that claims to be responsible for that link and so the app opens with the adequate page. The app has a definition what content should be rather displayed on the app then on the web.
  3. The user has the app installed but don’t want the link to open the app but rather see the content in the web browser.

The requirements for this feature is, that you can describe/configure your URL schema.

Canonical Link

Google doc explains: https://webmasters.googleblog.com/2009/02/specify-your-canonical.html)

  • Different content can have different URLS but can be made identifiable with markup
  • Same content can appear differently (page including rotating content)

Example (Dummy) URL with 1855 characters:

https://url.length-of-1855/this-is-a-dummy-text-showing-how-much-1855-characters-are--Search-engines-like-URLs-2048-chars-Be-aware-that-the-sitemaps-protocol-which-allows-a-site-to-inform-search-engines-about-available-page-has-a-limit-of-2048-characters-in-a-URL-If-you-intend-to-use-sitemaps-a-limit-has-been-decided-for-you!-see-Calin-Andrei-Burloiu-s-answer-below-There-s-also-some-research-from-2010-into-the-maximum-URL-length-that-search-engines-will-crawl-and-index-They-found-the-limit-was-2047-chars-which-appears-allied-to-the-sitemap-protocol-spec-However-they-also-found-the-Google-SERP-tool-wouldn-t-cope-with-URLs-longer-than-1855-chars-this-is-a-dummy-text-showing-how-much-1855-characters-are--Search-engines-like-URLs-2048-chars-Be-aware-that-the-sitemaps-protocol-which-allows-a-site-to-inform-search-engines-about-available-pages-has-a-limit-of-2048-characters-in-a-URL-If-you-intend-to-use-sitemaps-a-limit-has-been-decided-for-you!-see-Calin-Andrei-Burloiu-answer-below-There-s-also-some-research-from-2010-into-the-maximum-URL-length-that-search-engines-will-crawl-and-index-They-found-the-limit-was-2047-chars-which-appears-allied-to-the-sitemap-protocol-spec-However-they-also-found-the-Google-SERP-tool-wouldn-t-cope-with-URLs-longer-than-1855-chars-this-is-a-dummy-text-showing-how-much-1855-characters-are--Search-engines-like-URLs-2048-chars-Be-aware-that-the-sitemaps-protocol-which-allows-a-site-to-inform-search-engines-about-available-pages-has-a-limit-of-2048-characters-in-a-URL-If-you-intend-to-use-sitemaps-a-limit-has-been-decided-for-you!-see-Calin-Andrei-Burloiu-s-answer-below-There-s-also-some-research-from-2010-into-the-maximum-URL-length-that-search-engines-will-crawl-and-index-They-found-the-limit-was-2047-chars-which-whichs-x-They-found-the-limit-was-2047-chars-which-whichs-THIS-IS-THE-END-OF-1855-CHARACTERS-TEST

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment