The convolutional neural networks (CNN / ESA) have frequently mentioned themselves in academia and industry with their success in object classification. CNN architectures began to be tested on other problems as the work in this area progressed. The most important of these problems is object detection problem which is the next stage of object classification. The object in the object detection task is to find the classes and **locations** of the object (s) in a visual.

We use metrics to quantify the success of our solution in **machine learning** problems. Accuracy, precision and Cohen’s Kappa coefficient are used to measure the performance of the model in classification tasks. The difficult aspect of object detection is that the above-mentioned classical criteria cannot be used for performance. The reason for this problem is that our aim is to find the position of objects besides classifying them. In this article, we will refer to the Mean Average Precision, a performance criterion used for object detection problems. The logic behind this criterion will be explained by following the order of the following.

- What is the object recognition task?
- What is the Jaccard index?
- What is ground truth?
- What is Intersection over Union?
- How is Precision, Average Precision and Mean Average Precision calculated?

Contents

**Object Recognition**

**Difference Between Object Classification and Object Recognition**

Let’s explain the difference between object classification and recognition with the following visual.

**How Object Classification Works**

Network architectures used for object classification have a structure called multi-output network. In this structure, the **deep learning** model has multiple outputs (object class and location for object recognition). Each output has its own Loss function. These error functions are attempted to be optimized simultaneously. Below you will find the image of the architecture of a three-dimensional object recognition network.

As you can see in the image, the model has two outputs, Classifier Head and Regressor Head. The Classifier Head tries to find the type of object and uses Cross Entropy as the Error function. Regressor Head tries to find the corner coordinates of the rectangle that surrounds the object and uses Mean Squared Error as the error function.

**What is Used for Object Recognition?**

The most prominent of the architectures used for the object recognition task is YOLO.

**What is the Jaccard Index?**

**Jaccard Index for Clusters**

For the Jaccard index account for the clusters, please refer to the following.

**What is Ground Truth?**

An estimation is attempted by using models in **machine learning** tasks. The predictive ability of this model is measured by a test dataset. The target values that should be in the test data set are called ground truth. Let’s look at the difference between the prediction of an object recognition model and the ground truth in the image used.

**What is Intersection over Union (IoU)?**

The similarity between Ground Truth and model prediction is measured by the Jaccard index. For this reason, the IoU account is calculated as the portion of the intersection of the two rectangles into the area of the compound of these two rectangles.

**NumPy**

Let’s do the IoU similarity for a two-dimensional scenario. We will use Matplotlib for visualization of NumPy for rendering. First, we will create blank image matrices with NumPy and put two rectangles into them. Then we will find the logical operations and the intersection and combination of these rectangles. Then we’ll visualize our results with the imshow function of the Matplotlib library.

**Mean Average Precision Account**

**Precision and Recall**

Two of the performance criteria of **machine learning** models are certainty and formulas for recall:

These two concepts are generally inversely related to each other. So generally, as recall increases, certainty decreases.

Let’s calculate precision and recall for an example “multi-class classification”. For this example, let’s use the Iris data set.

**Precision-Recall Curve (P / R Curve)**

Calculate the precision-recall curve for the three classes in this dataset and draw the curves.

For the above three classes, the precision-recall graph is shown. The classification ideal for the blue class, AP: 1.0. The green class has more certainty at all levels except the last recall levels.

**Average Precision**

Let’s do an AP account. We will average the accuracy of recall values for each class.

**{0: ‘1.0’, 1: ‘0.963835205261813’, 2: ‘0.9721435876607143’}**

**Mean Average Precision (MAP)**

We will calculate the averaged AP values of all classes.

0.9786595976408424We are at the end of this article.