LINEAR REGRESSION:
- Linear regression is used to predict one variable using the other variables.
- The variable which we need to predict can be called as dependent variable and the variables used here to predict the dependent variable can be called as independent variable.
- The linear regression assumes a linear relationship between the dependent and independent variables and find the best fitting line which describes this relationship.
- If the linear regression is used to predict one dependent variable using one independent variable, then it is called SIMPLE LINEAR REGRESSION.
- If the linear regression is used to predict one dependent variable using two or more independent variable, then it is called MULTIPLE LINEAR REGRESSION.
SIMPLE LINEAR REGRESSION:
- As we need only one dependent and one independent variable for simple linear regression module to find the best fit line, we can define it by formula y=m*x+c, where y is the independent variable, m is the slope, x is the dependent variable and c is the intercept at x=0.
- As we need best values for m and c for finding best fit line, we need to find theĀ minimum error between the predicted values and actual value. And for this we use Residual Sum of Squares where residuals are the difference between the observed value of the dependent variable and predicted variable. Here predicted value is our mx+c.
THINGS I OBSERVED FROM DIABETES DATA SET:
- The given data is about the percentage of obesity, inactivity and diabetes in the states of USA in the year 2018.
- There are 3142 entries in Diabetes, 363 entries in obesity and 1370 entries in inactivity.
- If we take only obesity to build a model to predict diabetes then i think we should use simple linear regression.
- If we take both obesity and inactivity to build a model to predict diabetes then i think we should use multiple linear regression.