An important advantage of database management systems is Data Independence.
Data Independence refers to the insulation from applications programs from changes in the way the data is structured and stored.
Levels of Abstraction in a DBMS
graph TD F(fa:fa-table View 1)---E G(fa:fa-table View 2)---E H(fa:fa-table View 3)---E["fa:fa-database Conceptual Schema"] E---A["fa:fa-server Physical Schema"] A---B["fa:fa-HDD-o Disk"] Logical Independence If the structure of the data changes, applications and users can be insulated from this changes thanks to external schemas (in SQL, views).
PostgreSQL provides the utility program pg_dump to generate a text file with SQL commands that, when fed back to the server, will recreate the database in the same state as it was at the time of the dump.
pg_dump dbname > outfile
pg_dump can be used from the Linux console.
For example, if you want to backup a database called ‘my_database’ you run, from the console:
pg_dump my_database > my_database.sql
A data model is a collection of descriptions of the data and it’s relationships that hides details of the storage implementation. A database management system such as PostgreSQL or MS SQL allows users to define data models. The relational data model is a particular type of data model.
A more abstract data model is the semantic data model. It is usually used to develop an initial description of the data in a real world scenario, like a business or any other real world organization.
In the relational data model, a relation (table) can be thought as a set of records, like a table that contains information about employees in a company.
A record (row or tuple) represents a simgle item in a relation. Every row in the same relation has the same structure. For example, if, in the employees table, we can have a row that describes the employe named ‘Mike Smith’.
A field (column) represents a characteristic or variable of an entity described.
In Jupyter Lab, the %sql magic command persist creates a new table in the database to which we are connected. The table name will be the name as the name of the Python variable.
import pandas as pd df = pd.read_csv('/tmp/data.csv') df column1 column2 0 1 a 1 2 b 2 3 b # set environment variable $DATABASE_URL %sql postgresql://datacomy:PASSWORD@localhost/datacomy 'Connected: datacomy@datacomy' %sql persist df * postgresql://datacomy:***@localhost/datacomy 'Persisted df' %%sql SELECT tablename,tableowner FROM pg_catalog.