Skip to content

Instantly share code, notes, and snippets.

@brijesh-deb
Last active June 1, 2018 05:11
Show Gist options
  • Save brijesh-deb/7fb1c1073f55e320752821c8aa082935 to your computer and use it in GitHub Desktop.
Save brijesh-deb/7fb1c1073f55e320752821c8aa082935 to your computer and use it in GitHub Desktop.
#Database #basics

Shadow information

  • Shadow information is any data that objects need to maintain, above and beyond their normal domain data, to persist themselves.  This typically includes primary key information, particularly when the primary key is a surrogate key that has no business meaning, concurrency control markings such as timestamps or incremental counters, and versioning numbers.

Scaffolding

  • Scaffolding is a technique supported by some model-view-controller frameworks, in which the programmer may write a specification that describes how the application database may be used. The compiler uses this specification to generate code that the application can use to create, read, update and delete database entries, effectively treating the template as a "scaffold" on which to build a more powerful application.

Mapping Inheritance structure to database

  • Scenario: Person is the parent class of Customer and Employee. Further Executive is the child class of Employee.
  • Map entire class hierarchy to a single table
    • The attributes of each the classes are stored in single table Person. Two columns have been added to the table – PersonPOID and PersonType.  The first column is the primary key for the table, you know this because of the <> stereotype, and the second is a code indicating whether the person is a customer, an employee, or perhaps both.
    • The PersonType column is required to identify the type of object that can be instantiated from a given row.  For example the value of E would indicate the person is an employee, C would indicate customer, and B would indicate both.  Although this approach is straightforward it tends to break down as the number of types and combinations begin to grow.  For example, when you add the concept of executives you need to add a code value, perhaps X, to represent this.  Now the value of B, representing both, is sort of goofy. Furthermore you might have combinations involving executives now, for example it seems reasonable that someone can be both an executive and a customer so you’d need a code for this. 
    • Advantage:
      • Simple approach.
      • Easy to add new classes, you just need to add new columns for the additional data.
      • Supports polymorphism by simply changing the type of the row.
      • Data access is fast because the data is in one table.
      • Ad-hoc reporting is very easy because all of the data is found in one table.
    • Disadvantage
      • Coupling within the class hierarchy is increased because all classes are directly coupled to the same table. A change in one class can affect the table which can then affect the other classes in the hierarchy.
      • Space potentially wasted in the database.
      • Indicating the type becomes complex when significant overlap between types exists.
      • Table can grow quickly for large hierarchies.
    • When to use
      • This is a good strategy for simple and/or shallow class hierarchies where there is little or no overlap between the types within the hierarchy.
  • Map each concrete class to its own table
    • With this approach a table is created for each concrete class, each table including both the attributes implemented by the class and its inherited attributes.  Each table was assigned its own primary key, customerPOID and employeePOIDrespectively.  To support the addition of Executive all I needed to do was add a corresponding table with all of the attributes required by executive objects.
    • Advantage:
      • Easy to do ad-hoc reporting as all the data you need about a single class is stored in only one table. 
      • Good performance to access a single object’s data.
    • Disadvantage
      • When you modify a class you need to modify its table and the table of any of its subclasses.  For example if you were to add height and weight to the Person class you would need to add columns to the Customer,Employee, and Executive tables.
      • Whenever an object changes its role, perhaps you hire one of your customers, you need to copy the data into the appropriate table and assign it a new POID value (or perhaps you could reuse the existing POID value). 
      • It is difficult to support multiple roles and still maintain data integrity.  For example, where would you store the name of someone who is both a customer and an employee?
    • When to use
      • When changing types and/or overlap between types is rare.
  • Map each class to its own table
    • The data for theCustomer class is stored in two tables, Customer and Person, therefore to retrieve this data you would need to join the two tables (or do two separate reads, one to each table). The application of keys is interesting.  Notice how personPOID is used as the primary key for all of the tables. For the Customer, Employee, and Executive tables the personPOIDis both a primary key and a foreign key.  In the case of Customer, personPOID is its primary key and a foreign key used to maintain the relationship to the Person table.  This is indicated by application of two stereotypes, <> and <>.
    • Advantage:
      • Easy to understand because of the one-to-one mapping. 
      • Supports polymorphism very well as you merely have records in the appropriate tables for each type. 
      • Very easy to modify superclasses and add new subclasses as you merely need to modify/add one table.
      • Data size grows in direct proportion to growth in the number of objects.   - Disadvantage
      • There are many tables in the database, one for every class (plus tables to maintain relationships). 
      • Potentially takes longer to read and write data using this technique because you need to access multiple tables.  This problem can be alleviated if you organize your database intelligently by putting each table within a class hierarchy on different physical disk-drive platters (this assumes that the disk-drive heads all operate independently). 
      • Ad-hoc reporting on your database is difficult, unless you add views to simulate the desired tables.
    • When to use
      • When there is significant overlap between types or when changing types is common.

Relations in RDBMS

  • Relationships in relational databases are maintained through the use of foreign keys.  A foreign key is a data attribute(s) that appears in one table that may be part of or is coincidental with the key of another table.
    • One-to-one relationship: With a one-to-one relationship the foreign key needs to be implemented by one of the tables.  Positiontable includes EmployeePOID, a foreign key to the Employee table, to implement the association.  I could easily have implemented a PositionPOID column in Employee instead.
    • One-to-many relationship: To implement a one-to-many relationship you implement a foreign key from the “one table” (Division) to the “many table” (Employee).  For example Employee includes a DivisionPOID column to implement the works in relationship to Division. 
    • Many –to-many relationship: The basic "trick" is that the many-to-many relationship is converted into two one-to-many relationships, both of which involve the associative table. EmployeeTask includes the combination of the primary keys of the tables that it associates.  With this approach you could have fifty people assigned to the same task, or twenty tasks assigned to the same person, and it wouldn’t matter.

Normalization

  • The goal of data normalization is to reduce and even eliminate data redundancy, an important consideration for application developers because it is incredibly difficult to stores objects in a relational database that maintains the same information in several places. The advantage of having a highly normalized data schema is that information is stored in one place and one place only, reducing the possibility of inconsistent data.  Furthermore, highly-normalized data schemas in general are closer conceptually to object-oriented schemas because the object-oriented goals of promoting high cohesion and loose coupling between classes results in similar solutions (at least from a data point of view).  This generally makes it easier to map your objects to your data schema. 
  • First Normal Form (1NF)
    • n entity type is in first normal form (1NF) when it contains no repeating groups of data.
    • For example, in Figure 1 you see that there are several repeating attributes in the data Order0NF table – the ordered item information repeats nine times and the contact information is repeated twice, once for shipping information and once for billing information. Although this initial version of orders could work, what happens when an order has more than nine order items?  Do you create additional order records for them?  What about the vast majority of orders that only have one or two items?  Do we really want to waste all that storage space in the database for the empty fields?  Likely not.  Furthermore, do you want to write the code required to process the nine copies of item information, even if it is only to marshal it back and forth between the appropriate number of objects.  Once again, likely not.
  • Second Normal Form (2NF)
    • An entity type is in second normal form (2NF) when it is in 1NF and when every non-key attribute, any attribute that is not part of the primary key, is fully dependent on the primary key.
    • The problem with OrderItem1NF is that item information, such as the name and price of an item, do not depend upon an order for that item.  For example, if Hal Jordan orders three widgets and Oliver Queen orders five widgets, the facts that the item is called a “widget” and that the unit price is $19.95 is constant.  This information depends on the concept of an item, not the concept of an order for an item, and therefore should not be stored in the order items table – therefore the Item2NF table was introduced.  OrderItem2NF retained the TotalPriceExtended column, a calculated value that is the number of items ordered multiplied by the price of the item.  The value of the SubtotalBeforeTax column within the Order2NF table is the total of the values of the total price extended for each of its order items.
  • Third Normal Form (3NF)
    • An entity type is in third normal form (3NF) when it is in 2NF and when all of its attributes are directly dependent on the primary key. A better way to word this rule might be that the attributes of an entity type must depend on all portions of the primary key.

cardinality?

  • Thinking mathematically, it is the number of elements in a set. Thinking in the database world, cardinality has to do with the counts in a relationship, one-to-one, one-to-many, or many-to-many

Referential integrity

  • Referential integrity is a database concept that ensures that relationships between tables remain consistent. When one table has a foreign key to another table, the concept of referential integrity states that you may not add a record to the table that contains the foreign key unless there is a corresponding record in the linked table. It also includes the techniques known as cascading update and cascading delete, which ensure that changes made to the linked table are reflected in the primary table.
  • Consider the situation where we have two tables: Employees and Managers. The Employees table has a foreign key attribute entitled ManagedBy which points to the record for that employee’s manager in the Managers table. Referential integrity enforces the following three rules:
    • We may not add a record to the Employees table unless the ManagedBy attribute points to a valid record in the Managers table.
    • If the primary key for a record in the Managers table changes, all corresponding records in the Employees table must be modified using a cascading update.
    • If a record in the Managers table is deleted, all corresponding records in the Employees table must be deleted using a cascading delete.

Database strategy for multi-tanent applications

  • Separate Database:
    • Computing resources and application code are generally shared between all the tenants on a server, but each tenant has its own set of data that remains logically isolated from data that belongs to all other tenants. Metadata associates each database with the correct tenant, and database security prevents any tenant from accidentally or maliciously accessing other tenants' data.
    • Giving each tenant its own database makes it easy to extend the application's data model (discussed later) to meet tenants' individual needs, and restoring a tenant's data from backups in the event of a failure is a relatively simple procedure. Unfortunately, this approach tends to lead to higher costs for maintaining equipment and backing up tenant data. Hardware costs are also higher than they are under alternative approaches, as the number of tenants that can be housed on a given database server is limited by the number of databases that the server can support. (Using autoclose to unload databases from memory when there are no active connections can make an application more scalable by increasing the number of databases each server can support.).
  • Shared Database, separate schema
    • Another approach involves housing multiple tenants in the same database, with each tenant having its own set of tables that are grouped into a schema created specifically for the tenant. When a customer first subscribes to the service, the provisioning subsystem creates a discrete set of tables for the tenant and associates it with the tenant's own schema. You can use the SQL CREATE command to create a schema and authorize a user account to access it. 
    • Like the isolated approach, the separate-schema approach is relatively easy to implement, and tenants can extend the data model as easily as with the separate-database approach. A significant drawback of the separate-schema approach is that tenant data is harder to restore in the event of a failure. If each tenant has its own database, restoring a single tenant's data means simply restoring the database from the most recent backup. With a separate-schema application, restoring the entire database would mean overwriting the data of every tenant on the same database with backup data, regardless of whether each one has experienced any loss or not. Therefore, to restore a single customer's data, the database administrator may have to restore the database to a temporary server, and then import the customer's tables into the production server—a complicated and potentially time-consuming task.
  • Shared Database, shared schema
    • A third approach involves using the same database and the same set of tables to host multiple tenants' data. A given table can include records from multiple tenants stored in any order; a Tenant ID column associates every record with the appropriate tenant.
    • Of the three approaches explained here, the shared schema approach has the lowest hardware and backup costs, because it allows you to serve the largest number of tenants per database server. However, because multiple tenants share the same database tables, this approach may incur additional development effort in the area of security, to ensure that tenants can never access other tenants' data, even in the event of unexpected bugs or attacks.
    • The procedure for restoring data for a tenant is similar to that for the shared-schema approach, with the additional complication that individual rows in the production database must be deleted and then reinserted from the temporary database. If there are a very large number of rows in the affected tables, this can cause performance to suffer noticeably for all the tenants that the database serves.

Round Robin Database (RRD)

  • RRDTool refers to Round Robin Database tool. Round robin is a technique that works with a fixed amount of data, and a pointer to the current element. Think of a circle with some dots plotted on the edge, these dots are the places where data can be stored. Draw an arrow from the center of the circle to one of the dots, this is the pointer. When the current data is read or written, the pointer moves to the next element. As we are on a circle there is no beginning nor an end, you can go on and on. After a while, all the available places will be used and the process automatically reuses old locations. This way, the database will not grow in size and therefore requires no maintenance. RRDTool works with Round Robin Databases (RRDs). It stores and retrieves data from them.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment