Skip to content

Instantly share code, notes, and snippets.

@ramkumarvenkat
Created January 25, 2024 03:34
Show Gist options
  • Save ramkumarvenkat/cbbe068b3b16729e2c2f74bb0e0917de to your computer and use it in GitHub Desktop.
Save ramkumarvenkat/cbbe068b3b16729e2c2f74bb0e0917de to your computer and use it in GitHub Desktop.
In a highly regulated industry in the US, there is a regulation that states that all communications should be archived and available for querying at any time.
For example, lets assume that the company uses the following internal tools:
1. Google for emails
2. Slack for internal communication
3. Whatsapp for customer communication
4. Zoho for customer support
As an engineer working with the compliance team, your task is to design a system that stores all the data from the above systems in a cloud and make it available for querying.
The design should handle the following:
1. Ability to pull/push data real-time from the sources
2. No data loss, whatsoever, as this is a regulatory issue to miss any data
3. Fast access to queries with filtering along different dimensions like time, name, email addresses, etc.
4. All attached media needs to be stored
5. Ability to run a post-storage validation pipeline to ensure non-compliant data is not present. For example, no customer support agent should say non-complaint things to a customer; no email should contain deceptive and abusive words, etc.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment