Data Independence in SQL and DBMSs

An important advantage of database management systems is Data Independence.

Data Independence refers to the insulation from applications programs from changes in the way the data is structured and stored.

Levels of Abstraction in a DBMS

graph TD F(fa:fa-table View 1)---E G(fa:fa-table View 2)---E H(fa:fa-table View 3)---E["fa:fa-database Conceptual Schema"] E---A["fa:fa-server Physical Schema"] A---B["fa:fa-HDD-o Disk"]

Logical Independence

If the structure of the data changes, applications and users can be insulated from this changes thanks to external schemas (in SQL, views). If the conceptual schema changes, views (external schemas) can be changed to insulate users and apps from this change.

Example

Let’s suppose we have an application that we have an SQL table called persons:

datacomy@localhost:tempdb> select * from persons;                                             
+------+--------------+-------------+--------------------+----------+
| id   | first_name   | last_name   | email              | phone    |
|------+--------------+-------------+--------------------+----------|
| 1    | Mike         | Smith       | [email protected]   | 11123456 |
| 2    | Johana       | Lopez       | [email protected] | 22123456 |
+------+--------------+-------------+--------------------+----------+

and a view named persons_view:

datacomy@localhost:tempdb> select * from persons_view;                                        
+--------------+-------------+--------------------+
| first_name   | last_name   | email              |
|--------------+-------------+--------------------|
| Mike         | Smith       | [email protected]   |
| Johana       | Lopez       | [email protected] |
+--------------+-------------+--------------------+

And our application queries a persons_view returning a list of tuples, each one containing first names, last_names, and email addresses:

Python 3.7.6 (default, Jan  8 2020, 19:59:22) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.13.0 -- An enhanced Interactive Python. Type '?' for help.

   ...:  
   ...: import psycopg2 
   ...:  
   ...:  
   ...: class Database: 
   ...:     """A DB class """ 
   ...:  
   ...:     def __init__(self): 
   ...:         self.host = 'localhost' 
   ...:         self.port = 5432 
   ...:         self.database = "tempdb" 
   ...:         self.user = "datacomy" 
   ...:         self.password = "PASSWORD" 
   ...:         self.conn = None 
   ...:  
   ...:     def connect(self): 
   ...:         if self.conn is None: 
   ...:             try: 
   ...:                 self.conn = psycopg2.connect( 
   ...:                     dbname=self.database, 
   ...:                     user=self.user, 
   ...:                     password=self.password, 
   ...:                     host=self.host, 
   ...:                     port=self.port 
   ...:                     ) 
   ...:             except psycopg2.DatabaseError as e: 
   ...:                 print(e) 
   ...:             finally: 
   ...:                 print("Connection opened") 
   ...:  
   ...:     def select_records(self, query): 
   ...:         self.connect() 
   ...:         with self.conn.cursor() as cur: 
   ...:             cur.execute(query) 
   ...:             records = [row for row in cur.fetchall()] 
   ...:             cur.close() 
   ...:             return records 
   ...:  
   ...:  
   ...: my_db = Database() 
   ...: people = Database.select_records(my_db, 'select fist_name, last_name, email from persons_view') 
   ...: print(people) 
   ...:                                                                                       
Connection opened
[('Mike', 'Smith', '[email protected]'), ('Johana', 'Lopez', '[email protected]')]

But later, the table persons is modified. It is spitted into two tables, one containing only ids and names, and the other containing email addresses and phone numbers.

select id,email,phone into contact_info from persons;
alter table contact_info add constraint my_foreign_k foreign key (id) references persons(id);
alter table persons drop column email, drop column phone;

Now, if our application tries to run the previous Python script, it will recieve an error:

UndefinedColumn: column "email" does not exist
LINE 1: select first_name, last_name, email from persons_view

Because persons_view has not been modified to take into account that the email_address field has been moved into another table.

To isolate our application from this change in the conceptual schema, we can modify our SQL view:

datacomy@localhost:temp> create or replace view persons_view as  
 select p.first_name, p.last_name, c.email  
 from persons p  
 inner join contact_info c ON c.id = p.id;                                                    

Now, our Python script runs without issues and returns the same values. As we can see, an SQL view (external schema) can isolate the application from changes in the conceptual database schema.

Physical Data Independence

The conceptual schema insulates users from changes in the physical storage details. Storage details can change, but the application will keep working without any change, aside from performance issues.

Due to Physical Data Independence, any of the below listed changes will not affect the conceptual schema:

  • If the database storage is moved from a HDD to a SDD.
  • If the database location is moved to a new disk.
  • If the file system is changed from ext4 to xfs.

Summary

Different levels of abstraction in database management systems help to achieve two layers of data independence:

  • Logical Data Independence
  • Physical Data Independence

Data Independence help database administrators to:

  • Improve the quality of the data (logical schema) without breaking applications.
  • Make changes in the physical schema to improve performance, without breaking the conceptual schema.