bastelfreak/software_engineering.md Secret

## software_engineering.md

      
    Raw
  

              software_engineering.md
            
          
    Software Engineering

TOC


Abstract
Do I need this?
One thing at a time
Wireframes
Split changes in parts
Data
Functions and Methods
Naming things
Abstraction
Data input, validation and handling
SOLID
One source of truth principle
Design Pattern
Deployment strategies and application requirements
Contribution Guidelines
Standards
More documentation
License

Abstract

This is a loose collection of thoughts, ideas and best practices for a software engineer. What is the difference to a developer vs an actual engineer? A developers knows a programming language (very) well. He's very good at implementing a given task within this language. An engineer typically does the work before the code is actually written. The engineer decides why and how someting is implemented. Often, a single person fulfills both roles.
Thinking about ways to resolve an issue with code is different than actually hitting keys on the keyboard to write code. This guide isn't about details for a specific programming language (even though it contains some code examples). Rather it aims to teach you the basic about sfotware engineering.
Do I need this?

For every new change ask yourself:

Do we need this?
Do we already have a suitable lib or existing code we can reuse?
How do we implement it?
80/20 rule. Only implement a feature request if at least 80% of the users benefit from it

In a perfect world, all of this is discussed in a short standup meeting and documented in a Jira ticket. Also you ask yourself this before you actually start implementing it, not at the end (which sadly happend alot in the past)
One thing at a time


what is your goal?
How to achieve it?
Implement it

split all ideas in three parts and think about these questions.
Todo: Add a new section here? There are always multiple ways to implement something. Code should not only work. It needs to be efficient. Asymptotics is important. How does the code scale? How much time should I spend optimizing it if I could solve it by throwing more hardware on it or does it actully matter how fast my code is? If I develop a service that talks to a limiting API, te speed of my code doesn't matter (until someone replaces the API, then you'er screwed)
Example for Asymptotics:
Accept a positive integer as input. sum up all digits from 1 to the input. Doing this with a loop doesn't scale. Add together $input + ($input -1) scales almost linear.
Wireframes

Note: This is useful for planning new features and for debugging issues

Draw a diagram/mockup/table of whatever you want to change
Define which data is needed and modified
Where does the data come from?
Who has access to this data?
In how many small tasks can we split this?
More/smaller issues are better than a few big ones
This allows many people to work in parallel on an issue, in short timeframes

TAn actual mockup works pretty good for graphical UI changes, but also for database migrations. for API/CLI changes, a diagram (flowdiagram, sequencediagram) is nice.
Split changes in parts

different commits for changes/enhancements/new features

Each change should be categorized into those types
Their workflow varies, so don't mix them up
Makes review and reverting easier
Bad example: https://github.com/voxpupuli/puppet-zabbix/pull/430/commits (+1 for splitting  into multiple commits, but check the first one)
Good example: https://github.com/voxpupuli/puppet-zabbix/pull/409/commits many small changes, One in each commit

Data


You generate any kind of output? Provide it in human readable form if you want, but you have to provide it in machine readable form!
Separate data from code. This makes management of data way easier. Nobody want's to change an api password thats used in 70 bash scripts. But a single json/yaml file is way easier to parse and update. Also this makes the data more reuseable.
Kind of represents business logic vs program logic (compared to the roles and profiles pattern)

Functions and Methods


every function needs a meaningful name
every function should only do one thing
It should be short, around 10-15 lines in ruby
Always, always test (function|user) input

rm -rf $var/, steam on linux did this once, also squid rpm on centos6 or 7


You probably do something wrong if your function gets a boolean as parameter, split it up into two parts. Source
Every function and variable should be in English, not German

Naming things

Thesis: there are only two real problems in computer sience

cache invalidation
naming things
off by one errors

https://martinfowler.com/bliki/TwoHardThings.html
solution:

you need a proper naming convention for variables and functions
no matter what you choose, it should be consistent in your code and in your team
Variable?/Class/Method names should never be generic (like i as counter), but specific

Proper naming while iterating

for i in $(cat bla); do
  echo $i
done

which file do we read?
What is i? What are we iterating at?

for ((i=1; i<=COUNT_DRIVES; i++)); do
  # some code here
done

What is i again?
What is COUNT_DRIVES?

Better:
for ((drive_number=1; drive_number<=AMOUNT_OF_HARDDISKS; drive_number++)); do
  # some code here
done
Abstraction

woah woah, fucking complicated topic

Abstraction is important
It splits things into multiple pieces
It allows multiple people to work on it (everybody gets one piece)
It makes testing way easier
It adds more boilerplate (looks bad, but that is a good thing)
It is harder to understand :(

Break a method into logical pieces

def zbx_check_if_template_is_linked(template_id)
  list = @zbx_api.templates.get_ids_by_host(hostids: @zbx_host_id)
  list.include? template_id
end

What the hell is list again?
Maybe we need the list of templates at another place as well?

def zbx_get_linked_templates
  @zbx_api.templates.get_ids_by_host(hostids: @zbx_host_id)
end 

def zbx_check_if_template_is_linked(template_id)
  zbx_get_linked_templates.include? template_id
end 
Ruby guidelines


Method name should always indicate what it does
It should be easy to understand, without inline comments
A function should not exceed 15 lines
A class should not exceed 100-150 lines

Data input, validation and handling


Use correct datatypes, not everything is a string

https://github.com/DCC-org/documents/pull/197/commits/76dcdd190da2eae6ef5041aa749c56e7f17153e0#diff-6148ddb9a6109404e7293be7fcc5a71fL110

brings more performance, more security

Even strings can be "special strings" if you add new datatypes for them, even if it is just a simple regex

Use correct array functions

https://dev.to/andrew565/which-array-function-when
SOLID

https://en.wikipedia.org/wiki/SOLID_(object-oriented_design)
Single responsibility principle


One class is only responsible for a single functionality, not multiple


https://en.wikipedia.org/wiki/Single_responsibility_principle


Open/closed principle


software entities (classes, modules, functions, etc.) should be open for extension, but closed for modification


Provide a solution do extend the code, for example through inheritance


Allow others to embed your code as a library (composition)


https://en.wikipedia.org/wiki/Open/closed_principle


https://dev.to/ruidfigueiredo/why-composition-is-superior-to-inheritance-as-a-way-of-sharing-code


One source of truth principle

this is important for two topics. Data. You should not replicate data. For example: You've got two confluence pages to track hardware. Which of them is correct? Do you always update both? Probably not. Instead of multiple sources, implement a dataflow. Save it to a database, generate the confluence pages on the fly. Or simply link from one page to the other. Do not replicate data!
The same principle applies for documentation. You write a huge documenation by hand + software. Is the documentation always 100% correct? Do you update it with each software-change? Probably not. This is bad. Generate the documentation based on the code.
examples: swagger-ui for java, puppet-strings for Puppet, yard for Ruby, sphinx for Python.
Design Pattern


Observer pattern - notify objects on changes
Operator pattern - Extend behavior for each inheritance
Singleton pattern - There can only be one instance of a class
State pattern - method in an object does something different, depending on the state

Deployment strategies and application requirements


Codebase - One codebase tracked in revision control, many deploys
Dependencies - Explicitly declare and isolate dependencies
Config - Store config in the environment
Backing services  - Treat backing services as attached resources
Build, release, run - Strictly separate build and run stages
Processes - Execute the app as one or more stateless processes
Port binding - Export services via port binding
Concurrency - Scale out via the process model
Disposability - Maximize robustness with fast startup and graceful shutdown
Dev/prod - parity - Keep development, staging, and production as similar as possible
Logs - Treat logs as event streams
Admin processes - Run admin/management tasks as one-off processes
use standardized formarts for admin / internal / metadata APIs

cloudevents for event data formats
Health Check Response Format for HTTP APIs
OpenMetrics for metrics
OpenTracing for distributed tracing in microservices


Deployment rules - From the one and only Igor Galic

Based on The Twelve Factors, a guideline for proper cloud application, based on community efforts.
Contribution guidelines

Take a look at VirtAPI project contribution guidelines or the Vox Pupuli guidelines. Define guidelines as early as possible for your project. It doesn't matter if it's a FOSS project or something within your company that will never be released to the public. Somewhere in the future somebody wants to interact with your codebase. This will be a lot easier if you have proper documentation already in place.
Standards

many people worked years to define standards, RFCs and protocols to unify software, communication acress different programs and software development. Instead of reinventing the wheel or following the NIH syndrome, use existing standards:

Specification for building APIs in JSON
RESTful API Guidelines from Zalando
OpenID for authentication
Open Policy Agent for authorization

opentracing/openmetrics
More Documentation

A loose collection of useful links:

my git cheatsheet
My SysAdmin Manifest
Googles SRE book
Short quiz about software engineering
An introduction to distributed systems

License

This document is licensed under CC BY-SA 4.0. The document was created by Tim bastelfreak Meusel.