1 min read

Correlation Coefficient

Correlation Coefficient

The correlation coefficient is a measure of the linear relationship between two variables. It is a standardized measure of the strength of the relationship, ranging from -1 to 1.

Formula:

r = (n(xy) - n(x)n(y)) / sqrt(n(x)n(y)(n(x) - 1)(n(y) - 1))

where:

  • r is the correlation coefficient
  • n(x) is the number of observations in the x variable
  • n(y) is the number of observations in the y variable
  • n(xy) is the number of observations in the joint distribution of x and y

Interpretation:

  • r = 1: Perfect positive linear relationship
  • r = 0: No linear relationship
  • r = -1: Perfect negative linear relationship
  • -1 < r < 1: Linear relationship with a strength between -1 and 1

Factors Affecting Correlation Coefficient:

  • Sample size: Larger sample sizes will produce more reliable correlation coefficients.
  • Data distribution: The distribution of the variables can affect the correlation coefficient.
  • Outliers: Outliers can skew the correlation coefficient.
  • Non-linear relationships: If the relationship between the variables is non-linear, the correlation coefficient may not be a suitable measure.

Uses:

  • Assessing the strength and direction of linear relationships.
  • Testing for the significance of the relationship.
  • Predicting future values based on the relationship.

Example:

“`pythonimport numpy as np

x = np.array([10, 12, 14, 16, 18])y = np.array([8, 10, 12, 14, 16])

r = np.corrcoef(x, y)[0, 1]

print(“Correlation coefficient:”, r)“`

Output:

Correlation coefficient: 0.812671135301138

This output indicates a strong positive linear relationship between x and y with a correlation coefficient of 0.81.

Disclaimer