Variables and relationships

Dependent and independent variables

Let's say we want to build an AI algorithm to predict the price of a house based on its characteristics. In this case, the dependent variable (what we want to predict) is the price of the house. The independent variables (what we use to predict the price) are the characteristics of the house. Some examples of independent variables in this case might include:

  • Size of the house (in square feet)
  • Number of bedrooms
  • Number of bathrooms
  • Age of the house
  • Location (e.g. city, neighborhood)

We would gather data on each of these independent variables for a sample of houses, along with their corresponding prices. We would then use this data to train the AI algorithm to make predictions on new, unseen data.

It's important to select independent variables that are likely to be good predictors of the dependent variable.

It's important to select independent variables that are likely to be good predictors of the dependent variable. For example, in the case of predicting the price of a house, it's reasonable to assume that the size of the house and the number of bedrooms and bathrooms would be strong predictors of the price. On the other hand, the color of the walls or the type of flooring might not be as strong predictors of price.

In general, selecting good independent variables for an AI algorithm requires a combination of domain expertise (knowing which variables are likely to be good predictors based on the subject matter), data analysis (examining the relationship between each independent variable and the dependent variable), and trial and error (testing different combinations of independent variables to see which ones produce the best results).

Linear relationship: example

Let's consider an example of how the price of a car might be related to its age. If we plot the data on a graph with age on the x-axis and price on the y-axis, a linear relationship between age and price would mean that the price decreases at a constant rate as the car gets older. In other words, the line connecting the data points on the graph would be straight.

Non-linear relationship: example

Suppose we have a dataset of students and their test scores. We want to analyze how the time spent studying is related to the test score. If we plot the time spent studying on the x-axis and the test score on the y-axis, we might expect to see a positive relationship between the two variables, meaning that students who spend more time studying tend to score higher on the test. However, the relationship between time spent studying and test score might not be linear. For example, if a student's test score tends to increase at a decreasing rate as the time spent studying increases, then we would have a non-linear relationship. In other words, the test score might increase rapidly at first as the student increases their study time, but then start to level off as the student approaches the maximum benefit from studying. In this case, the data points on the plot would not form a straight line, but instead would have a curved shape.

Next page