Master Thesis By: **Zappia Enrico**

Thesis Title:

**MONOCULAR OBJECT LOCALIZATION BY SUPERQUADRICS ****CURVATURE REPROJECTION AND MATCHING**

This work presents a new method for 3D object localization from a single image. It is known that single camera images provide 2D data, annihilating valuable 3D information about object and its localization in space.

To retrieve localization of an object I used an a-priori known model for its representation. Moving this target on space is possible to find the correct pose that fits the image of the object.

As a-priori model I used SuperQuadrics representation for the objects. The main new idea of this work is to match object’s image gradient with the reprojection of 3D curvature onto the image plane to retrieve the object relative positions to the camera. SuperQuadrics (SQ) allow an analytical formulation of curvature. The image processing stage includes the separation of the object of interest from the background by the histogram oriented gradient (HOG) algorithm.

The method proposed finds the relations between the SQ curvature and the image gradient considering illumination conditions and contour all embedded into a cost function minimized by particle swarm optimization (PSO) whose output is the object location estimation.

Figure 1 The flowchart of the object localization algorithm form static images

The proposed localization algorithm is a multi-step process.

It first starts with the object detection from static image by HOG. Histogram of Oriented Gradients (HOG ) is one of the latest and effective algorithm of object detection. It is a stochastic algorithm that uses the distribution of intensity gradients for recognizing a known object. The algorithm focuses on finding the robust features descriptor, maintaining invariance to wide variety of articulared pose, complexity of background, illumination conditions and also body scaling. HOG processes a sample image comparing it with a inner object model. Model is obtained through a training session with several different image of the object from different point of view. All data are collected in support vector machine; then HOG compare the sample image with its model and as positive results returns the localization of target in the image plane.

Figure 2 The set of some training images (for a cylinder and a box)

and the final results of HOG detection (the red rectangle surrounding the objects for test images).

From the localization of the object in the image space the first pose of the object is estimated. To evaluate the guess position the SQ object model is used. The advantages of using this formulation are the compactness and the fact that shapes are known in closed form, so each point can be known analytic as its proprieties, such as curvature.

Figure 3 In left figure is reported the graphic representation of the SuperQuadric models; in right picture is presented the mapping of curvature, brighter values represent higher curvature.

It is well known that strong variations on images gradients are located along edges and corners. These approaches for object recognition and localization focuses only on high feedback on the image gradient small variations along the surface are ignored. For this reason these methods become instable if borders are partial occluded.

As already said SQ models formulation allows to obtain continuous surfaces, so edges and corners are modelled as local high curvature surfaces. According to the edge based methods this high curvature can be related with high gradient intensity; moreover it can be observed that also low curvature can also be connected with low gradient magnitude. In this way we have feedbacks both for high and low features.

Figure 4 Comparison between normalized image gradient and curvature.

If position of the light source’s position is known respect to model, appearance information as bright and dark sides can be gather. This data can be used to improve the curvature matching and provide a more robust approach for the object localization.

Figure 5 The comparison between greyscale images and illumination model image generated with Phong’s model.

Another method to improve the matching between the static image and the reprojected model is to check the alignment of normals onto image plane with the gradient pixels orientation.

It is well known that along edges the gradient magnitude is higher and its orientation is orthogonal to the border. Normal directions are 3D data; anyway if we project this direction another direction on the image plane is obtained. What has been observed in this work is that this new direction along the silhouette of the objects has the same orientation as the image’s gradient. Counting the align vectors is possible to retrieve a performance index of the matching, and so better estimate the overlapping.

Figure 6 In figure is present the silhouette extraction of model (left); the normal connected to each silhouette points (center); and the final reprojection on image plane. Blue arrows are the reprojected normal on the image plane, the green arrows the gradient

All the contributions cited are mixed into a cost function. Due to the complexity of the problem, the minimization is lead with a Particle Swarm Algorithm (PSO). The goal of the optimization is to achieve the transformation matrix that represents the transformation from camera to the 3D object position.

To prove detection correctness and to provide a quantitative estimation of the localization accuracy in different conditions. We also propose a comparison between our method and state of the art algorithms that use RGB-d camera. For this we chose Microsoft Kinect sensor for its high quality to cost ratio.

The setup for our tests is composed by and RGB camera and Kinect sensor (ref. to Figure 6 ). Camera is equipped also with a known position spot light. Both sensors are calibrated in reference to the chessboard (origin on the top left corner). This calibration further provides intrinsic and extrinsic parameters, gives us information about the lying plane of the chessboard.

Figure 7 Experiment setup layout, data are acquired both from the camera and Kinect. Chessboard is used to estimate camera and Kinect positions in order to provide the reference frame for the localization.

From The results obtained is apparent that Kinect localization respect the method proposed provides comparable results. To see with more in detail the magnitude of the errors for each localization it is also useful have a look to errors distributions with its uncertainty ellipse.

Figure 8 In this plot is shown the error the distributions of error along xy. The graph underlain the direction of maximum dispersion of data. The dimension of the ellipsoid is directly related with the uncertainty of the estimations, blue is the camera localization and green Kinect’s one.

Main uncertainty results are shown in table (eigenvalues of eclipses):

This work presents an innovative approach for monocular object localization. The object detection from the image has been made with HOG algorithm in a pre-processing stage. Then 3D localization is performed by a cost-function that considers: a) the matching of the object reprojected curvature with the image gradient;

b) the convolution with object-Light appearance and the image gray levels; c) the quantification of the reprojected contours aligned with the image gradients. The 3D pose estimation accuracy has been quantified with a calibration grid giving the

eigenvalues of the uncertainty ellipses identifying the dispersion of data along principal directions. These data are also compared to Microsoft Kinect. Our method gives an object localization accuracy comparable and in some cases even better using only a single camera. The algorithm is competitive in cluttered scenes

as it relies on the whole object in contrast to only edges as in the case of edge/gradient methods. Figure 8 represent the limit of detection: for box the percentage error is around 2-2.5% and for cylinder is around 1.5%.

Figure 9 In (a) are presented some localization results in general locations; in (b) the same is proposed for cylinders. In the second picture a partially occluded scene is proposed while the localization remains correct.