AI and image recognition

Image recognition

Your smartphone usually has at least one camera, many laptops are equipped with a webcam. But this does not mean that those devices can "see" like people or animals. With our eyes we can not only see ('vision'), but we can also recognize objects ('recognition'). If we want to teach a computer to recognize information in images, we speak of ' computer vision' and ' image recognition'.

But beware: "vision" is not yet "recognition." Indeed, recognition assumes not only "vision," but also recognition and prior knowledge.

Computer vision

Computer vision is a field within artificial intelligence that focuses on interpreting and understanding the visual world. Here are some of the most common techniques used in computer vision, each with a brief explanation:

Colour detection:This involves detecting specific colours in images or video frames. This technique is often used in applications such as sorting objects by colour or tracking coloured markings in a scene.

Test color detection: https://www.leerschool.be/experiment/colordetection
Edge Detection:Identifies edges within an image, often used to find object boundaries and other important transitions in intensity.
Face Recognition:This technique identifies human faces within images or videos. It is used in many applications, ranging from security to user interface interaction.
Blob Detection:A technique that identifies groups of contiguous pixels that stand out from the rest of the image. This is useful for tracking objects and determining their properties such as size and shape.
Motion Detection:Detects changes in the position of objects between consecutive video frames. This is widely used in security systems and to trigger recordings or alarms.

Test motion detection: https://www.leerschool.be/experiment/motiondetection
Pattern Recognition:Automatically identifies and classifies patterns in images. This can be used for text recognition, symbol identification or recognition of natural scenes.
In stores, scanners "recognize" bar codes. Most smartphones recognize information in QR codes. Smartphones recognize the user's face or fingerprint. Software is required to recognize the raw image information. Just as in humans it is not the eyes but the brain that recognizes objects and people.

Text barcode recognition: https://www.publihub.media/publiceer/scanner
Depth Perception:These techniques use various methods, such as stereo vision (using two cameras to estimate depth) or depth cameras, to measure the distance of objects from the camera.
Optical Character Recognition (OCR):Converts images from typed or written text to machine-encoded text. Commonly used for digitising printed documents and reading number plates.
Object Recognition (Object Recognition):Identifies objects within an image or video. This includes classifying an object as a particular type (such as a car, human, animal) and locating its position in the image.
Segmentation:Divides an image into multiple segments, each with corresponding properties such as colour or texture. This is useful for detailed analysis and editing of specific parts of an image.
...

These techniques are the building blocks for much more complex computer vision systems widely used in industries such as automotive, healthcare, security and entertainment.

Traditional OCR without AI

Stripe codes, QR codes and text recognition are relatively old techniques. The software reduces the image to a black-and-white image. In OCR("optical character recognition"), the software compares grids of black pixels to existing letters. In this way, OCR software can recognize text. However, this does not mean that the software understands what the text is about. OCR technology converts the pixel data back into editable text. Just as a small child must learn the name of all objects and living things in the physical world, AI software must learn to recognize elements incrementally. If you show a child certain letters often enough, it will learn to know those letters. The same is true with AI. Classical OCR cannot recognize letters printed upside down, but a child or AI can.

OCR is not "new. Emanuel Goldberg (1881-1970) acquired a patent in 1970 for a device that could use photoelectric cells and pattern recognition to search documents on microfilm.

OCR with ANN's (AI)

Recognising objects in real-world images

If you want to teach a piece of software to recognize a dog, you first have to feed the software hundreds or thousands of images of dogs. The mass of data collected through big data techniques comes to the rescue here. ImageNet is a giant database of millions of images. The project has been instrumental in advancing computer vision and deep learning research. The data are freely available to researchers for non-commercial use. Through a selection of data from such a database, 'image recognition' software learns incrementally that both a pekinese and a pit bull are dogs. Moreover, the software must learn that both Disney's Pluto, and Lassie are dogs.

Only if a robot can quickly and accurately recognize objects can it (she?) "walk around" like real people. Self-driving cars, for example, use AI-based computer 'vision' and 'image recognition' to recognize pedestrians, road signs, vehicles.... They must be able to recognize not just one particular object, but all parts in the visual field, and preferably at lightning speed. A self-driving car traveling at a decent speed must recognize a light that turns red, a child running after a ball in the right context. A child playing on the sidewalk, for example, is not an immediate problem, only if it suddenly runs into the street. So the AI software must recognize not only the parts (" object segmentation"), but also the overall context of the image constantly, as a stream of incoming data, and estimate how to respond appropriately. But whether understanding the context, the "umwelt," will also lead to a form of "artificial awareness" is another story.

Image recognition, also known as image classification, is a process of identifying and labeling objects or patterns within digital images. This technology is widely used in various industries, including healthcare, security, and marketing.

Learning to classify image content

Image recognition is able to recognize everything because it uses machine learning algorithms that are trained on a large dataset of images. These algorithms learn to identify patterns and features within images that correspond to different objects or categories.

The key to this process is the use of deep neural networks, particularly convolutional neural networks (CNNs). CNNs are designed to recognize complex patterns within images by using multiple layers of interconnected neurons that apply different filters to the image data.

During training, the CNN is presented with a large dataset of images labeled with the corresponding object or category. The CNN learns to recognize patterns and features within the images that are associated with specific labels. For example, a CNN might learn that certain combinations of edges, colors, and textures are characteristic of cats.

Once the CNN has been trained, it can be used to classify new images. When an image is presented to the CNN, it is processed through the multiple layers of the network, with each layer extracting different features and patterns from the image. The final layer of the network produces a vector of probabilities that represent the likelihood of the image belonging to each of the possible categories.

The key advantage of this approach is that it allows the machine learning algorithm to recognize a wide variety of objects and patterns, without the need for explicit programming or feature extraction. Instead, the algorithm learns to recognize patterns and features automatically, based on the patterns and features present in the training data.

In summary, image recognition is able to recognize everything because it uses machine learning algorithms that are trained on large datasets of images, allowing the algorithm to learn to recognize patterns and features associated with different objects and categories.

Image datasets

There are many free or open datasets for images available that can be used for training image recognition models. Here are some popular datasets and where you can find them:

ImageNet: This is a large-scale image database with over 14 million labeled images covering more than 20,000 categories. The dataset is commonly used for training deep neural networks for image recognition. You can download the dataset from the ImageNet website ( http://www.image-net.org/).
CIFAR-10 and CIFAR-100: These are datasets of 60,000 32x32 color images in 10 or 100 classes, respectively. They are commonly used for benchmarking image recognition algorithms. You can download the datasets from the CIFAR website ( https://www.cs.toronto.edu/~kriz/cifar.html).
MNIST: This is a dataset of 70,000 handwritten digits, commonly used for benchmarking image recognition algorithms. You can download the dataset from the MNIST website ( http://yann.lecun.com/exdb/mnist/).
OpenImages: This is a large-scale image dataset with millions of annotated images covering a wide range of categories, including objects, scenes, and activities. You can download the dataset from the OpenImages website ( https://storage.googleapis.com/openimages/web/index.html).
COCO: The Common Objects in Context (COCO) dataset is a large-scale dataset for object detection, segmentation, and captioning. It contains over 330,000 images with more than 2.5 million object instances labeled in 80 different categories. You can download the dataset from the COCO website ( https://cocodataset.org/).

These are just a few examples of the many free and open datasets available for image recognition. Other sources of image datasets include academic research projects, government agencies, and industry initiatives.

You can train your own image recognition model on this website: https://teachablemachine.withgoogle.com/train/image

What does a dataset look like?

For image recognition, AI models receive a series of photos as training data. These photos are usually provided in digital form, such as JPEG or PNG files. The training data can come from various sources, such as public datasets, specifically constructed datasets or even generated synthetic datasets.
In the context of deep learning and convolutional neural networks (CNNs), which are often used for image recognition, the training images usually come with associated labels. These labels indicate the appropriate classification or category of each image. For example, when it comes to recognizing cats and dogs, the labels may indicate whether a particular image contains a cat or a dog.

The training data can be stored in different formats depending on the specific needs of the AI model and framework used. However, it is common to link the images and labels in a structured format, such as a JSON file. Each item in the JSON can contain the image location and associated label. In this way, the AI model can easily read and process the training data during the training process.

An example of a corresponding JSON file:

{
  "dataset": "ImageNet",
  "description": "Dataset of labeled images for object recognition",
  "images": [
    {
      "id": 1,
      "file_path": "/path/to/image1.jpg",
      "label": "cat"
    },
    {
      "id": 2,
      "file_path": "/path/to/image2.jpg",
      "label": "dog"
    },
    {
      "id": 3,
      "file_path": "/path/to/image3.jpg",
      "label": "cat"
    },
    ...
  ]
}

What if you want to recognize multiple objects in one photo?If you want to recognize multiple objects in a single photo, you must tell the AI during training where you find those objects in the photo. You can do that by using a "bounding box" to indicate the pixel locations where the objects can be found.

An example of a corresponding JSON file:

{
  "dataset": "Multi-Object Recognition",
  "description": "Dataset of labeled images for multi-object recognition",
  "images": [
    {
      "id": 1,
      "file_path": "/path/to/image1.jpg",
      "objects": [
        {
          "label": "cat",
          "bounding_box": [100, 150, 250, 300]
        },
        {
          "label": "chair",
          "bounding_box": [350, 200, 500, 400]
        }
      ]
    },
    {
      "id": 2,
      "file_path": "/path/to/image2.jpg",
      "objects": [
        {
          "label": "dog",
          "bounding_box": [200, 100, 350, 250]
        },
        {
          "label": "ball",
          "bounding_box": [400, 300, 450, 350]
        }
      ]
    },
    ...
  ]
}

It is important to note that modern AI models, such as Aidan, often require huge amounts of training data to achieve accurate results. Datasets for image recognition can contain thousands or even millions of images, with associated labels, to help the AI models learn and recognize patterns.

The image recognition process

The process of image recognition can be broken down into several stages:

Acquisition: The first step is to acquire an image, typically through a camera or other sensors. The image is then digitized and stored as a matrix of pixels, where each pixel represents a color and intensity value.
Preprocessing: The raw image data is usually noisy and may contain irrelevant or redundant information. Preprocessing techniques such as resizing, cropping, and normalization are used to remove noise and standardize the image's size and color space.
Feature extraction: Once the image has been preprocessed, the next step is to extract relevant features that distinguish different objects or patterns within the image. Common feature extraction techniques include edge detection, texture analysis, and color histograms.
Classification: The extracted features are then used to train a machine learning model to recognize different objects or patterns. The most common classification algorithms used for image recognition are deep neural networks such as convolutional neural networks (CNNs), which are particularly effective at detecting patterns in images.
Post-processing: Finally, the output of the classification model is typically further processed to improve accuracy and eliminate false positives. This may involve techniques such as thresholding, morphological operations, and clustering.

In summary, image recognition involves acquiring and preprocessing images, extracting relevant features, training a classification model, and post-processing the results to improve accuracy.