Skip to content

Instantly share code, notes, and snippets.

@philfreo
Last active September 29, 2021 13:46
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save philfreo/181b92528c352cdccf43f57a5908c47f to your computer and use it in GitHub Desktop.
Save philfreo/181b92528c352cdccf43f57a5908c47f to your computer and use it in GitHub Desktop.
Segment workaround due to their incorrect Google Analytics server-side integration

Overview

Problem

If you use Segment's Google Analytics server-side integrations, even if you follow all of their documentation / recommendation, it is impossible to properly track your visitors/sessions in GA from anonymous through identified.

This means that if you use Segment like this, you cannot do very basic things in GA like understanding attribution of your product's sign ups. In GA, a brand new session is created for the identified users, which is not correct.

This problem applies to using Segment on the web with analytics.js when the Google Analytics Destination is set to Cloud Mode, or when using a true server-side Source such as Python/Ruby/Node.

This is a problem even if you carefully pass anonymousId, userId, web page URL/title, and user's IP from your frontend to your backend and then to Segment following all their recommendations.

Cause

In Segment's Google Analytics server-side integration they do this to compute the value passed to GA's cid ("Client ID") field:

let cid = hash(facade.userId() || facade.anonymousId())

The hash they use is the string-hash npm package

But because they change this value as soon as a visitor is given a userId, this makes it impossible to track users/sessions in GA from anonymous (marketing stie) to identified (signed up in product)

Segment incorrectly prefers sending a clientId/cid to Google Analytics based on the Segment userId (if present) rather than preferring their anonymousId which would fix that problem and more appropriately use the fields outlined by Google Analytics API documentation here

Solution

Segment needs to update their Google Analytics server-side integration to be like this:

let cid = hash(facade.anonymousId() || facade.userId())

This would allow for consistent tracking in GA from anonymous to identified. There is already a separate field/option to pass User ID directly to GA, so there's no reason to prefer userId here when both are passed.

Segment has indicated that the more customers complain about this, the more likely they are to make this change. Currently they haven't committed to making any change :(

Workaround

See code snippets below.

// In browser code with analytics.js when using a Google Analytics Destination in Cloud Mode
// This is the same hash function that Segment uses on their server-side GA integration
// from https://github.com/darkskyapp/string-hash/blob/master/index.js
function hash(str) {
var hash = 5381,
i = str.length;
while(i) {
hash = (hash * 33) ^ str.charCodeAt(--i);
}
return hash >>> 0;
}
// Segment Source Middleware helps us here so that we don't have to
// customize each and every track(), page() etc.
// https://segment.com/docs/connections/sources/catalog/libraries/website/javascript/middleware/
const segmentSourceMiddleware = function({ payload, next, integrations }) {
payload.obj.integrations['Google Analytics'] = {
// Segment ignores our custom clientId if it's not a string
clientId: String(hash(analytics.user().anonymousId()))
};
next(payload);
};
analytics.addSourceMiddleware(segmentSourceMiddleware);
// Then you can use analytics.js like normal:
analytics.track('Test');
# This matches the same hash function that Segment uses on their server-side GA integration
# Ported to Python from https://github.com/darkskyapp/string-hash/blob/master/index.js
def hash(s):
import struct
from ctypes import c_uint32
hash = 5381
for c in reversed(list(struct.iter_unpack('H', s.encode('utf-16le')))):
hash = (hash * 33) ^ c[0]
return c_uint32(hash).value
anonymous_id = '(This needs to be passed to your backend from the browser/JS)'
# Then on any call to Segment where integrations['Google Analytics'] is not False...
analytics.track(user_id, 'Test', {}, anonymous_id=anonymous_id, integrations={
'Google Analytics': {
# Segment ignores our custom clientId if it's not a string
'clientId': str(hash(anonymous_id || user_id))
}
}
@philfreo
Copy link
Author

Currently just a proof of concept workaround / haven’t seen end to end if it solves the GA problem.

@philfreo
Copy link
Author

In practice it seems like we need to do something more like this:

<script>
  !function(){var analytics=window.analytics=window.analytics||[];if(!analytics.initialize)if(analytics.invoked)window.console&&console.error&&console.error("Segment snippet included twice.");else{analytics.invoked=!0;analytics.methods=["trackSubmit","trackClick","trackLink","trackForm","pageview","identify","reset","group","track","ready","alias","debug","page","once","off","on","addSourceMiddleware","addIntegrationMiddleware","setAnonymousId","addDestinationMiddleware"];analytics.factory=function(e){return function(){var t=Array.prototype.slice.call(arguments);t.unshift(e);analytics.push(t);return analytics}};for(var e=0;e<analytics.methods.length;e++){var key=analytics.methods[e];analytics[key]=analytics.factory(key)}analytics.load=function(key,e){var t=document.createElement("script");t.type="text/javascript";t.async=!0;t.src="https://cdn.segment.com/analytics.js/v1/" + key + "/analytics.min.js";var n=document.getElementsByTagName("script")[0];n.parentNode.insertBefore(t,n);analytics._loadOptions=e};analytics.SNIPPET_VERSION="4.13.1";
    analytics.load("FILL IN");
    
    // START of workaround from https://gist.github.com/philfreo/181b92528c352cdccf43f57a5908c47f
    // This is the same hash function that Segment uses on their server-side GA integration
    // from https://github.com/darkskyapp/string-hash/blob/master/index.js
    function hash(str) {
      var hash = 5381,
          i    = str.length;
      while(i) {
        hash = (hash * 33) ^ str.charCodeAt(--i);
      }
      return hash >>> 0;
    }
    
    function fireWhenAnonIdIsAvailable(cb) {
        if (window.analytics.user) return cb();
        setTimeout(fireWhenAnonIdIsAvailable.bind(this, cb), 100);
    }

    const segmentSourceMiddleware = function({ payload, next, integrations }) {
      // Unfortunately the middleware can fire before window.analytics.user() is available,
      // and analytics.ready() doesn't seem to work for us either, so we block/poll until
      // window.analytics.user() is available.
      // (Note: analytics isn't same as window.analytics here either)
      fireWhenAnonIdIsAvailable(function() {
        payload.obj.integrations['Google Analytics'] = {
          // Segment ignores our custom clientId if it's not a string...
          clientId: String(hash(window.analytics.user().anonymousId()))
        };
        next(payload);
      });
    };
    analytics.addSourceMiddleware(segmentSourceMiddleware);
    // END of workaround
        
    analytics.page();
  }}();
</script>

@philfreo
Copy link
Author

Segment recently shared:

we released a setting to our GA destination that lets customers choose to prefer anonymousId over userId for clientId in cloud-mode: https://segment.com/docs/connections/destinations/catalog/google-analytics/#prefer-anonymous-id-for-client-id-server-side-only.

Which should hopefully make this gist no longer needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment