Created
April 18, 2019 12:47
-
-
Save ouyi/6043e13d3e461512033eb3ee590db2e2 to your computer and use it in GitHub Desktop.
Lessons learned from the recent years
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Lessons learned | |
I. Software engineering | |
1. Automate everything, have CI/CD and test automation from the beginning | |
2. Security shall not be an afterthought (have TLS and avoid plain text passwords from the start) | |
3. Component shall be stateless, state shall be extracted and kept in a external store (key-value store or databases) | |
4. Favor open source and off-the-shelf solutions instead of building proprietary solutions | |
5. Avoid hard-coding dependencies, try to inject them via command line parameters, configuration files, env variables, etc | |
II. Data engineering | |
1. Have a strategy to handle delayed records | |
2. Use a decent scheduling software to manage job dependencies and handle retries | |
3. Make data processing jobs idempotent | |
4. Code the data pipeline in UTC | |
5. Have a data cleansing job as the first job in the data pipelines | |
6. Divide and conquer instead of the shared main script monster | |
7. Reading large amount of data and filtering/projecting out unneeded stuff is often cheaper than customized data logging, collection, and transportation | |
8. Hadoop is better at handling fewer larger files than more small files | |
9. Related information shall be logged together to avoid expensive joining in the later processing |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment