# Correation in SQL

The correlation is a statistical measure that shows how closely related are two sets of values.

The most used correlation measure used is the Pearson’s Correlation Coefficient. The Pearson’s Correlation Coefficient gives:

- a value of 1 for a perfect correation.
- a value of 0 for no correlation.
- a value of -1 for a perfect negative correlation

Examples of Positive and Negative Correlation:

**Positive Correlation**: marketing expenditure and sales, price of a competitor product and sales.**Negative Correlation**: price of a product and sales of that product, price of a complementary product and sales.

## Correlation vs. Causation

A positive correlation between two variables doesn’t nececssarily means that there is a **causal** relationship between those variables. For instance, a strong positive or negative correlation between two variables could be caused by randomness.

Additionally, while there might be a causal relationship between two variables and a strong correlation between them, the correlation coefficient doesn’t tell us anything in relation to the directionality of the causal relationship.

## Correlation in SQL

Calculating the correlation coefficient in SQL is pretty straightforward. The `corr(Y, X)`

function returns the correlation coefficient between a set of variables.

### Support

The `corr`

function is supported by:

- PostgreSQL
- Oracle

It is not directly supported by:

- MySQL
- MS-SQL
- Sqlite

```
import getpass # for password input
```

```
# magic command to load the ipython-sql extension. We can connect to any
# database which is supported by SQLAlchemy.
%load_ext sql
```

```
# create a connection
postgresql_pass = getpass.getpass()
%sql postgresql://fede:{postgresql_pass}@localhost/datacomy
print(r'DON\'T FORGET TO CLOSE THE CONNECTION')
```

```
········
DON\'T FORGET TO CLOSE THE CONNECTION
```

```
%%sql
SELECT * FROM "auto"
limit 5
```

```
* postgresql://fede:***@localhost/datacomy
5 rows affected.
```

id | Unnamed: 0 | mpg | cylinders | displacement | horsepower | weight | acceleration | year | origin | name |
---|---|---|---|---|---|---|---|---|---|---|

0 | 0 | 18.0 | 8 | 307.0 | 130.0 | 3504.0 | 12.0 | 70 | 1 | chevrolet chevelle malibu |

1 | 1 | 15.0 | 8 | 350.0 | 165.0 | 3693.0 | 11.5 | 70 | 1 | buick skylark 320 |

2 | 2 | 18.0 | 8 | 318.0 | 150.0 | 3436.0 | 11.0 | 70 | 1 | plymouth satellite |

3 | 3 | 16.0 | 8 | 304.0 | 150.0 | 3433.0 | 12.0 | 70 | 1 | amc rebel sst |

4 | 4 | 17.0 | 8 | 302.0 | 140.0 | 3449.0 | 10.5 | 70 | 1 | ford torino |

## Examples on how to use the corr function

```
%%sql
SELECT CORR (DISPLACEMENT , MPG )
from AUTO A
```

```
* postgresql://fede:***@localhost/datacomy
1 rows affected.
```

corr |
---|

-0.8042028248059 |

```
%%sql
SELECT CORR (DISPLACEMENT , WEIGHT )
from AUTO A
```

```
* postgresql://fede:***@localhost/datacomy
1 rows affected.
```

corr |
---|

0.932824146841634 |

```
%%sql
SELECT CORR (MPG , WEIGHT )
from AUTO A
```

```
* postgresql://fede:***@localhost/datacomy
1 rows affected.
```

corr |
---|

-0.831740933244338 |

```
%%sql
SELECT CORR (MPG , ACCELERATION )
from AUTO A
```

```
* postgresql://fede:***@localhost/datacomy
1 rows affected.
```

corr |
---|

0.420288912101655 |

## More about the correlation SQL function

The correlation function is an *aggregate function*, it computes a single result from a set of input values.