AI algorithms: supervised

View the above image in a separate web page:  https://www.leerschool.be/othercontent/aialgosen.php

Choose the appropriate algorithm depending on the desired goal or output

Depending on the "learning goal," as a developer, you choose a supervised or unsupervised learning method. For supervised, you then choose the desired goal: 

  1. classification: you want a category or class as output
  2. regression: you want a number or binary value as output. 

In the family tree above, we place the "logistic regression" algorithm under "regression," but very often it is also considered a "classification" algorithm, because like an SVM it can only return 2 values: 0 or 1. 

Make the exercise  Select the algorithm
Make the exercise:  What are the variables?

Supervised learning

1. Classification algorithms  

Some of the best-known classification algorithms include:

  • Naive Bayes
  • Decision Tree
  • Random Forest
  • Support Vector Machines
  • K-Nearest Neighbors
  • .

Naive Bayes 

. Naive Bayesclassifies data by estimating the statistical probability that data belong to a particular class or group. It does not look at relationships, but at the number of similarities and calculates probabilities based on that.  

Decision tree

Decision Tree(decision tree) has a family tree-like structure. At each 'branch' you test a certain condition or attribute that determines the output to the next branch. Beware: it can lead to 'overfit' and the slightest adjustment to the input (in this case training data), can lead to drastic differences in the output.

A decision tree

Random forest 

Random Forestis a collection of decision trees where each decision tree has been given a subset of the attributes of data and predicts based on that subset. The average of the votes of all decision trees leads to the answer. This algorithm reduces the problem of "overfitting" and leads to a much more accurate classification.

Random forest

Support vector machine 

A Support Vector Machine(SVM) classifies binary, it divides the input data into two classes according to attributes you have pre-selected. Before the SVM is able to do this, you have to train the model with historical data. It is called a "vector" because each data point is given an x and y position in a graph (or vector space). SVMs work well for small data sets that contain little noise and virtually no overlapping categories. SVMs are not suitable for larger data sets.

Data points on a graph. 

K-Nearest Neighbors  

K-Nearest Neighbors (KNN)is based on the idea that the class to which an element belongs corresponds to the classes of its closest points. Suppose everyone in your neighborhood chooses "green," chances are you do, too. At K = 5, you look at the five closest 'neighbors' (albeit in a vector space). At K = 3 you look at the three closest neighbors, and so on.

K Nearest Neightbours
Make the exercise:  Choose an appropriate classification algorithm

2. Regression Algorithms

In brief: Logistic regressionpredicts the category data falls into based on other information. For example, based on a student's previous results, you can predict what their likely final outcome will be.

Logistic regression

Linear regressionis the simplest and most effective regression algorithm. It is used to weigh two or more variables (for example, the asking price of a house and its location) against each other. You can plot this on a graph to derive a regression line. Based on this linear regression, you can predict prices based on the location of a property.  

Linear regression (Source: https://medium.com/@yennhi95zz/2-introduction-to-linear-regression-predicting-house-prices-a36f6172030)
Next page