Skip to content

Instantly share code, notes, and snippets.

@prashant2796
Last active October 28, 2019 21:51
Show Gist options
  • Save prashant2796/ec1982d236e04b4500e135a8368d1a40 to your computer and use it in GitHub Desktop.
Save prashant2796/ec1982d236e04b4500e135a8368d1a40 to your computer and use it in GitHub Desktop.
Database Schema Description
Database is consider as backbone of any computer application or software. A relational database organizes data in tables (or relations) The Structural view of relations and attributes(columns) and their relationships with other relations helps to understand dataflow between relations.
A well-designed database shall eliminate Data Redundancy: the same piece of data shall not be stored in more than one place. This is because duplicate data not only waste storage spaces but also easily lead to inconsistencies.Also well-designed database ensures Data Integrity and Accuracy.
In order to attain data integrity and consistency, there are some requirements and design process that are needed to be followed while designing a database schema. The fact is most people are concerned with "defining attributes for their relation" rather than finding how this attributes relate with other atrributes in the relation.Avoiding such requirements and process can deeply affect the database design.
The requirements listed below influence the detailed structure of schemas that are produced.
The most crucial aspect in designing a relational database is to identify the relationships among tables.The power of relational database lies in the relationship that can be defined between tables.The types of relationship include:
a)One-to-One Relationship
b)One-to-many or Mant-to-One relationship
c)Many-to-Many Relationship
a)One-to-One Relationship: In a one-to-one relationship, one record in a table is associated with one and only one record in another table.For example, in a school database having student and student-info table, each student is associated with one student ID in student-info table, and each student ID is assigned to only one student.
b)One-to-Many Relationship: In a one-to-many relationship, one record in a table can be associated with one or more records in another table. For Example, considering customer and order table, a customer can have many orders while a particular order will belong to one particular customer.
c)Many-to-Many Relationship: Many-to-Many relationship is a bit complex relationship in which many records in a table can link to many records in another table.For example, consider a customer and product table, a customer can buy many products and a product can be buyed by many customers.In order to define Many-to-Many relationship we would require a third table called as junction table.This third table will have primary key from customer table and product table to show association between the two table.This junction table will uniquely identify which products belongs to which customer since it will have product id from product table and customer id from customer table.
Establishing consistent relationships between database tables helps ensure data integrity, contributing to database normalization.
There are some anomalies that every database designer should be aware of which is called Database Anomalies.
In relational database design, we not only want to create a structure that stores all of the data, but we also want to do it in a way that minimize potential errors when we work with the data. In particular, SQL can be used to manipulate data in the following ways: insert new data, delete unwanted data, and update existing data. Similarly, in an un-normalized design, there are 3 problems that can occur when we work with the data:
a)Insert Anomaly
b)Delete Anomaly
c)Update Anomaly
Insert Anomaly:An Insert Anomaly occurs when certain attributes cannot be inserted into the database without the presence of other attributes.
For example consider StudentCourse table having studentId, courseId, studentName, address, course i.e StudentCourse(studentId,courseId,studentName,address,course). Now here, we can't add a new course unless we have at least one student enrolled on the course. This will be problamatic while inserting data into the table.
Delete Anomaly: A Delete Anomaly exists when certain attributes are lost because of the deletion of other attributes.
For example, considering the previous table, what happens if Student with studentid lets say S05 is the last student to leave the course - All information about the course is lost, i.e if we remove a record of student from the table, then the course associated with that student will also be deleted and that means we no longer have information about that course.
Update Anomaly: An Update Anomaly exists when one or more instances of duplicated data is updated, but not all.
For example, consider changing address of a particular student in the previous example - you will need to update all instances of that particular student's address or else it will lead to data inconsistency.If there are many instances of that particular student then updating every instance can be a tedious job.
To address the 3 problems above, we go through the process of normalization. When we go through the normalization process, we increase the number of tables in the database, while decreasing the amount of data stored in each table.
The next important topic to study before designing any database schema is NORMAlIZATION.
If you've been working with databases for a while, chances are you've heard the term normalization. Perhaps someone's asked you "Is that database normalized?" or "Is that in 3NF?" Normalization is often brushed aside as a luxury that only academics have time for. However, knowing the principles of normalization and applying them to your daily database design tasks really isn't all that difficult and it could drastically improve the performance of your database.
Now, What is Normalization? Normalization is the process of efficiently organizing data in a database.This process takes a relation schema through a series of test to certify whether it satisfies a certain normal forms. There are two main goals of the normalization process: eliminating redundant data(storing the same data in more than one table) and ensuring data dependencies(only storing related data in a table). Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is logically stored.
Normal Forms:There is series of guidelines for ensuring that databases are normalized. These are referred to as normal forms.
FIRST NORMAL FORM (1NF): A database is in first normal form if it satisfies the following conditions:
a)Contains only atomic values.
b)There are no repeating groups.
An atomic value is a value that cannot be divided.
How do we bring an unnormalized table into first normal form? Consider the following example:
Consider a product table with attributes productID, Color, and Price.Product(ProductId,color,price).
Now consider its values productID=1, color=blue,Green , price=20 .Product((1,'blue''green',20),(2,'red',15))
As you can see [color] column has two values blue and green.This table is not in first normal form because the [Color] column can contain multiple values. The values in the [Color] column in the first row can be divided into "blue" and "green".
To bring this table to first normal form, we split the table into two tables and now we have the resulting tables:
product_price(productId,Price) and product_color(productId,color).Now this tables can have attributes like product_price((1,20),(2,15)) and product_color((1,'blue'),(2,'red'),(1,'green')).
Now first normal form is satisfied, as the columns on each table all hold just one value and we can still uniquely identify product and its color.
The resion we cant have multiple values for a particular column is because it can affect database performance while querying the database.From the above example of unnormalized table, as you can see [colour] column contains multiple value. Now, If I need data of all the productId whose color is green, I will have to make multiple comparison for a particular record(row). Since a particular row has 2 or more values i will have to right a logic to compare it with each value and give me back the desired result which is a very tedious process.
Another important condition that should be met if a table is to be in first normal form is that it should not have any repeating columns representing the same information.
For example, Consider a Unnormalized FlimAgent table with attributes AgentID,AgentName,AgencyName,OfficeLoction,customer1,customer2,customer3.
As you can see we have group of columns i.e 3 columns customer1, customer2, customer3 representing the same information
Now, Problem here is that everytime i want to add a new customer i have to create a new column for it and this also leads to data redundancy.
SECOND NORMAL FORM(2NF): A database is in second normal form if it satisfies the following conditions:
a)It is in first normal form.
b)All non-key columns are fully functional dependent on the primary key.
Functional dependency is a relationship that exists when one column uniquely determines another column.
If the primary key is a composite key(combination of 2 keys), all non-key column cannot be dependent on a subset of the primary key i.e all non-key column cannot dependent on 1 part of primary key.
Consider the following example: purchase_detail(CustomerID,StoreID,PurchaseLocation).
This table has a composite primary key [CustomerID, StoreID]. The non-key attribute is [PurchaseLocation]. In this case, [Purchase Location] only depends on [StoreID], which is only 1 part of the primary key and does'nt have to do anything with 2nd part of the primary which is [CustomerID]. Therefore, this table does not satisfy second normal form.
To bring this table to second normal form, we break the table into two tables, and now we have the following:
Purchase(CustomerID,StoreID) and Store(StoreID,PurchaseLocation).
What we have done is to remove the partial functional dependency that we initially had. Now, in the table [STORE], the column [Purchase Location] is fully dependent on the primary key of that table, which is [StoreID].
Note that if the primary key is not a composite key, all non-key attributes are always fully dependent on the primary key. A table that is in 1st normal form and contains only a single key as the primary key is automatically in 2nd normal form.
THIRD NORMAL FORM(3NF): A database is in third normal form if it satisfies the following conditions:
a)It is in second normal form
b)There is no transitive functional dependency
By transitive functional dependency, I mean we have the following relationships in the table: A is functionally dependent on B, and B is functionally dependent on C. In this case, C is transitively dependent on A via B i.e if A->B and B->C then A->C.
Consider the following example: BookDetail(BookID,GenreID,GenreType,Price)
In the table, [BookID] determines [GenreID], and [GenreID] determines [GenreType]. Therefore, [BookID] determines [GenreType] via [GenreID] and we have transitive functional dependency, and this structure does not satisfy third normal form.
In an another example, we have faculty table with faculty(FacultyID,Name,Dept,DeptHead) we can see values of column DeptHead is dependent on Dept because department head will be specific for a department.This way a non prime column is dependent on non prime column.
To bring BookDetail table to third normal form, we split the table into two as follows:Book(BookID,GenreID,Price) and Genre(GenreID,GenreType)
Now all non-key column are fully functional dependent only on the primary key. In [BOOK] table, both [GenreID] and [Price] are only dependent on [BookID]. In [GENRE] table, [GenreType] is only dependent on [GenreID].
Not using normalization was my biggest mistake when I first attemped to build database schema for my application.Later On, I realized that center of all problems which I faced were because my tables where not normalized.Normalizing tables may be tough task at first, but later on as you progress, you tend to understand it well.
Generalization, Specialization and Inheritance.
Now, In database schema design, there is an important concept which every database designer should know called INHERITENCE.
If you are familiar with inheritance you know that inheritance enables child class(sub class) to inherit properties from parent class(super day). Same principle is used in database design.Inheritance enables you to share attributes between tables such that a sub table inherits attributes from its parent table.Basically you should know concepts like SPECIALIZATION, GENERALIZATION and attribute inheritance.
The one main drawback of Entity-Relationship(ER) diagram is it does'nt support inheritance.To use Inheritance property, We have to use Enhanced Entity Relationship(EER) diagram, and the two main concept involved in it are generalization and specialization.
GENERALIZATION: Generalization is a process of extracting common characteristics from two or more tables and combining them into a generalized table.So, it is a bottom up approach as two or more lower lever entities are combined to form a higher level entity.
Consider an example database having 3 tables: CAR, TRUCK, MOTORCYCLE. The 3 tables might have some common attributes between them like vehicleID, vehicleName, price etc. Now since this tables have some common attributes between them, we put the shared attributes in a supertype in another table called generalized table.
SPECIALIZATION: Specialization is opposite to Generalization. It is a top-down approach in which one higher level table can be broken down into one or more lower level tables.Its like if you a have a table and that table has some sub-type then that table can be broken down into sub-tables.This sub types(child) can inherit attributes from the main class(parent) and along with that it will have some attribute that belongs to that particular table only.
Take ‘Person’ for example. A person has name, date of birth, gender, etc. These properties are common in all persons, human beings. But in a company, persons can be identified as employee, employer, customer, or vendor, based on what role they play in the company.
Similarly, in a school database, persons can be specialized as teacher, student, or a staff, based on what role they play in school as entities.
So for developing a database schema for any application all the above topics and concepts must be considered and understood well to make the database efficient, error free ,consistant and complete.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment