Supervised or unsupervised?

Supervised learning - when you have past examples where the dependent variables are also known

When determining the price of a house, you are typically dealing with relatively low-dimensional data. Variables that can affect house prices include the number of bedrooms, the size of the house, its location, its state of repair and so on. Although there are multiple factors, the number of variables remains manageable and limited.

Since we have examples here where the end result was also known (after all, we have data from houses that have already been sold. So the dataset from which the AI can learn has examples where both the independent and dependent variables are known. If you have new examples (new data) where only the independent variables are known, you can therefore predict the "dependent" variable.

Unsupervised learning - models are trained on data without labelled outcomes. The aim is to discover structures or patterns in the data.

With some data, you have lots of variables that might affect the result. But we don't always know which ones. Suppose, as with recommendation engines, you have a lot of data, you don't immediately know which recommendation to give. It depends on a lot of variables such as:

click behaviour of the new visitor
products in the basket
purchase history of the same customer
purchase history of all your other customers
similar products
interests of the buyer or potential buyer
search behaviour on the website
....

Each possible variable adds an extra dimension to the data, an extra element you have to take into account during the analysis. So you cannot immediately give examples from the past because you have to take into account so many variables.

Unsupervised learning is often used with high-dimensional data, especially as it can be useful in situations where predefined labels or results are missing. Here are some reasons why and how unsupervised learning can be particularly valuable in this context:

Feature learning: In face recognition, for example, you can use feature learning. Instead of teaching the algorithm to look for features such as nose, 2 eyes and a mouth, the algorithm can detect that itself from a large collection of pictures of faces
Dimensionality reduction: It can be beneficial to reduce the number of dimensions of the data. These techniques help preserve the essential characteristics of the data, making analysis simpler and more efficient.
Discovering hidden patterns: In high-dimensional datasets, it can be difficult to intuitively understand the structure of the data.
Anomaly detection: You can try to detect anomalous patterns in large amounts of data.
...

Unsupervised learning is thus particularly useful in scenarios where the goal is to gain insight into the data without prior knowledge of the outcomes, or where the number of variables is large and direct observation and analysis are impractical. This makes it a powerful tool in the toolbox of data scientists working with high-dimensional data.

A good example of high-dimensional data in a marketing context could be customer behaviour data on an e-commerce platform. Imagine a company collecting data on every interaction a customer has with their website. This could include details such as:

Which pages the customer visited.
Which products viewed.
How much time was spent on each page.
Which searches were performed.
Feedback scores given to products.
Acquisition history.
Responses to marketing campaigns (emails, ads, etc.).

Each customer interaction generates a series of data points, each of which is a 'dimension' in the dataset. When you have thousands of customers having interactions across thousands of products, with additional information such as timestamps and responses to different types of content, the number of variables grows rapidly. This results in a complex dataset with potentially thousands to millions of dimensions, depending on how detailed the data is collected.

This type of high-dimensional data provides powerful insights for marketing purposes, such as personalised recommendations and targeted ad campaigns, but also requires advanced data analytics techniques to manage and interpret effectively. This is why the use of big data analytics and machine learning techniques is often essential in such scenarios.

Make the exercise: https://www.leerschool.be/quiz/unsuperviseden/

Supervised or unsupervised?

Supervised learning - when you have past examples where the dependent variables are also known

Unsupervised learning - models are trained on data without labelled outcomes. The aim is to discover structures or patterns in the data.

This magazine is not yet complete...