# Correation in SQL

The correlation is a statistical measure that shows how closely related are two sets of values.

The most used correlation measure used is the Pearson’s Correlation Coefficient. The Pearson’s Correlation Coefficient gives:

• a value of 1 for a perfect correation.
• a value of 0 for no correlation.
• a value of -1 for a perfect negative correlation

Examples of Positive and Negative Correlation:

• Positive Correlation: marketing expenditure and sales, price of a competitor product and sales.
• Negative Correlation: price of a product and sales of that product, price of a complementary product and sales.

## Correlation vs. Causation

A positive correlation between two variables doesn’t nececssarily means that there is a causal relationship between those variables. For instance, a strong positive or negative correlation between two variables could be caused by randomness.

Additionally, while there might be a causal relationship between two variables and a strong correlation between them, the correlation coefficient doesn’t tell us anything in relation to the directionality of the causal relationship.

## Correlation in SQL

Calculating the correlation coefficient in SQL is pretty straightforward. The corr(Y, X) function returns the correlation coefficient between a set of variables.

### Support

The corr function is supported by:

• PostgreSQL
• Oracle

It is not directly supported by:

• MySQL
• MS-SQL
• Sqlite
import getpass  # for password input

# magic command to load the ipython-sql extension. We can connect to any
# database which is supported by SQLAlchemy.

# create a connection
postgresql_pass = getpass.getpass()
%sql postgresql://fede:{postgresql_pass}@localhost/datacomy
print(r'DON\'T FORGET TO CLOSE THE CONNECTION')

 ········

DON\'T FORGET TO CLOSE THE CONNECTION

%%sql
SELECT * FROM  "auto"
limit 5

 * postgresql://fede:***@localhost/datacomy
5 rows affected.

id Unnamed: 0 mpg cylinders displacement horsepower weight acceleration year origin name
0 0 18.0 8 307.0 130.0 3504.0 12.0 70 1 chevrolet chevelle malibu
1 1 15.0 8 350.0 165.0 3693.0 11.5 70 1 buick skylark 320
2 2 18.0 8 318.0 150.0 3436.0 11.0 70 1 plymouth satellite
3 3 16.0 8 304.0 150.0 3433.0 12.0 70 1 amc rebel sst
4 4 17.0 8 302.0 140.0 3449.0 10.5 70 1 ford torino

## Examples on how to use the corr function

%%sql
SELECT CORR (DISPLACEMENT , MPG )
from AUTO A

 * postgresql://fede:***@localhost/datacomy
1 rows affected.

corr
-0.8042028248059
%%sql
SELECT CORR (DISPLACEMENT , WEIGHT )
from AUTO A

 * postgresql://fede:***@localhost/datacomy
1 rows affected.

corr
0.932824146841634
%%sql
SELECT CORR (MPG , WEIGHT )
from AUTO A

 * postgresql://fede:***@localhost/datacomy
1 rows affected.

corr
-0.831740933244338
%%sql
SELECT CORR (MPG , ACCELERATION )
from AUTO A

 * postgresql://fede:***@localhost/datacomy
1 rows affected.

corr
0.420288912101655

## More about the correlation SQL function

The correlation function is an aggregate function, it computes a single result from a set of input values.