In our morning session, we studied how to evaluate a model that looks at factors related to obesity, inactivity, and diabetes. We used 5-fold cross-validation on the obesity, inactivity and diabetes data.
We had a dataset with 354 pieces of information about these factors. To test our model, we divided this dataset into five equal parts. From the given dataset we have divided four parts with 71 data points and the remaining one with 70 data points. We used four of these parts to train our model and the remaining part to see how well it works. We did this five times, each time using a different part as the test.
We also looked at how well our model fits the entire dataset. To do this, we trained the model on all the data and calculated a measure of how well it predicts the real values.
From our results, we noticed that as we made our model more complex, it did a better job of fitting the data.
When we test our model, it’s important to divide the data into these five groups carefully, especially when there might be duplicate information. We did this carefully to make sure our test is fair. We found that sometimes our model didn’t work as well on the test data, especially towards the end of the testing process.
So, in simple terms, we studied a model to understand factors related to health. We tested it on different parts of our data to see how well it works and found that it works better when we make it more complex. We also made sure our testing was fair, considering possible duplicate information in the data.