Edit: Shoutout to kdeldycke/awesome-falsehood.
- Never assume what's in the database is sane
- Addresses should probably get their own table, unless it's recording historical behavior (eg, address mailed to)
- Mailing addresses and property addresses are not the same thing
- A street number may be alphanumeric
- A street number may come after the street name
- Not all addresses have street numbers
- Not all addresses have streets
- Not all addresses have cities
- A postal code may come before the city
- Not all countries have states/provinces
- There are more than just 50 US states
- 5 digit Zip codes can belong to more than one county
- 5 digit Zip codes can belong to more than one state
- Geocoding based on zipcode (even 9 digit zip) is unreliable, as zip codes have funky shapes
- MLS data is not a reliable way to identify property owners
- MLS data is not a reliable way to validate addresses
- You cannot uniquely identify a property with five fields
- You cannot uniquely identify a property with twelve fields
- You cannot uniquely identify a property.
- Address validation is extremely hard and you will save yourself a lot of hurt by paying someone else to do it, preferably a company who does it as a commercial product.
- Google is not one of those companies
- If you need to know the county for an address, get it from an API and denormalize on insert, don’t try to infer it from postal code.
- Don't make a table to hold every 9 digit zip code in the US, it will be huge and quickly out of date. Get what you need from an API and insert denormalized.
- The correct minimum length for an individual name is 0 characters
- The correct maximum length for a name is "bigger than a varchar".
- Names can contain any character, including spaces.
- Names can use any character set (cyrillic, chinese, japanise
- Names should be stored in utf8 at a minimum, utf16 is better
- First name and last name have different purposes in different cultures
- A full name is not just a combination of given name and family name
- Not all people have middle names
- Some people have multiple given names
- Some people have multiple family names
- Some people have no family name
- Full name, legal name, and latinized name are three completely different things.
- Gender and Sex are not the same thing, make sure you know which is which
- Gender and Sex cannot be used to infer ANYTHING, especially medically
- There are more than three genders
- There are more than two sexes
- Some people have multiple genders
- Some people have no gender
- Some people defy sexual categorization
- Title/honorifics do not infer gender
- Name does not infer gender
- Unless it's required by a downstream dependency, you probably don't actually need to know a user's gender or sex.
- Identifying what functionality users have access to should always be explicit, not implicit or inferred.
- If an enum has more than five values, make it a reference table
- If an enum has different values for presentation, make it a reference table
- If an enum’s values are longer than ten characters, make it a reference table
- If the reference table will continue to get larger over time (and doesn’t have presentation values), just make the column a string. You’ll thank yourself later
- Do not use enums for booleans
- A boolean should only be used to represent true and false