The Science Behind Camera Calibration in Machine Vision

The Science Behind Camera Calibration in Machine Vision

Machine Vision is a concept that refers to the processes computer systems perform to capture, and analyze images to generate information about their surroundings; allowing them to detect and measure objects in the real world, information necessary for tasks such as 3D scene reconstruction, robot localization and navigation, inspection of industrial processes, and more.

However, images captured by vision systems have a 2D coordinate frame in pixels, while the 3D location of objects in the world is described by units such as inches or millimeters. The pinhole camera model links these two coordinate frames, by allowing us to mathematically represent the conversion of the position of real-world points to their projection in image space.

Pinhole Camera Model

The pinhole camera model is a simplification of the way images are acquired in a digital camera system. It describes a camera without a lens and with a small aperture, or pinhole, through which light rays pass projecting an inverted image on the opposite side.
pinhole example 1
Real-world cameras don't have an infinitesimally small aperture but use lenses that refract light, focusing rays parallel to the optical axis to a single point, the 'focal point'. The distance between the focal point and the center of the lens is what is known as the focal length.
pinhole example 2
pinhole example 3
This creates a construction similar to the pinhole model, which establishes a relationship between a point in 3D space and its corresponding point in the 2D image plane.

Intrinsic and Extrinsic Parameters

This relation is represented by geometric transformations associated with the intrinsic parameters of the camera system, associated with the image sensor and lens, and the extrinsic parameters related to the camera's location in the 3D scene.
transformations
To describe the change of coordinates from world points to camera coordinates we use the first transformation which consists of a rotation R and a translation t.
transformations example 2
In standard Cartesian coordinates, rotations are represented as matrix multiplications while translations are represented as vector additions. Using homogeneous coordinates we can represent these transformations as a matrix multiplication, simplifying the equations:
his homogeneous transformation is formed by a 3x3 rotation matrix R, and a 3x1 translation vector t, known as the extrinsic parameters. They are called extrinsic because they are external to and do not depend on the camera but its pose.
The second transformation corresponds to the projection of the 3D camera coordinates into the image plane or 2D-pixel coordinates, for this we consider the camera’s internal properties, known as the intrinsic parameters which are the focal lengths (fx,fy), the optical center or principal point (cx,cy), and the skew coefficient (sk) which together form the camera matrix A (also referred to as K).
Note: The skew coefficient is considered zero if the image axes are perpendicular.
Combining these expressions we obtain the projective transformation that maps 3D world coordinates into 2D image coordinates.

Where s is an arbitrary scale factor of the projective transformation.

Finally, the complete projection matrix P is a combination of both the intrinsic parameter matrix A and the extrinsic parameter matrix [R t]:

Distortion

We have now studied the ideal mathematical model, however in the real world lenses tend to introduce some distortion to images, these are caused by defects in their design that generate deviations from the "perfect" pinhole model. Nevertheless, distortion is an optical aberration that does not lose the information in the image but simply misplaces it geometrically. This means that we can remove distortions from an image if we consider their influence on our camera model.

Note: Lenses with larger FOVs, such as the fisheye lens, will introduce extreme distortions, because of this the pinhole camera model cannot be used with a fisheye camera.

To accurately represent a real camera system we can complete the model by including radial and tangential distortions:

  • Radial distortion: occurs when light rays bend more near the edges of a lens than at its optical center causing straight lines to appear curved at the edges of an image. There are two types of radial distortions: pincushion (positive displacement) and barrel (negative displacement).
distortion examples
Radial distortion can be represented with the following expression:

Where x and y are the undistorted locations in normalized image coordinates, which are pixel coordinates translated to the optical center and divided by the focal length, and k1, k2, and k3 are the radial distortion coefficients.

Note: For real lenses radial distortion is always monotonic, which means that the coefficients consistently increase or decrease, if the calibration algorithm produces a non-monotonic result it should be considered an error in the calibration.

  • Tangential distortion: occurs when the lens and the image plane are not aligned perfectly parallel causing some areas in the image to look closer than others.

Tangential distortion can be represented as:

Where p1 and p2 are the tangential distortion coefficients.

In summary, we need to calculate five distortion parameters to remove the distortion from our image: k1, k2, k3, p1, and p2. These coefficients do not depend on the scene viewed which is why they can also be considered intrinsic camera parameters.

Note: In practice, it is enough to consider only two radial distortion coefficients for calibration. However, for extreme distortion, such as when using wide-angle lenses, it is recommended to use three coefficients.

Camera Calibration

Now, we have a model that can calculate the location of a real-world object from the images acquired by a camera, but to use it we need to know the camera’s parameters, which might not be available. However, we do have access to the images the camera takes which means that if we took pictures of a known object we could estimate the camera’s parameters based on the object’s projection onto the image plane. This process of finding the intrinsic and extrinsic camera parameters is known as camera calibration.

To estimate the camera parameters, we need to have the 3D world points of an object and their corresponding 2D image points. This is why the traditional way to calibrate a camera is to use a reference object of known geometry to use as ground truth. This is where calibration targets come in handy as they have specific patterns that through image analysis techniques can be processed to easily obtain the location of pattern features. These features may correspond to the intersection of vertical and horizontal lines in a checkerboard or the center of circles when the calibration board is a dot pattern.

Why Checkerboards, Circles, and Dot Patterns are used for Calibration

Checkeboards and dot grids provide high contrast by alternating black and white markers (squares or circles) making feature detection robust, as even in varying lighting conditions the sharp color transitions can be accurately detected providing a high level of precision in locating feature points.

Additionally, the geometric regularity imposes constraints that algorithms can use to refine the detected points by ensuring alignment with the expected pattern. This predictability helps to identify and discard outliers (false detections) as instead of treating each feature as independent, the regularity of the pattern can be used with global optimization techniques. Moreover, the uniform spacing in the pattern helps to identify lens distortions by analyzing how the lines or points deviate from their expected positions.

These patterns are also very easy to customize by simply changing their size or pattern density. This adaptability has made them widely used in the community, and the simplicity of their analysis has made them a standard for calibration tools making them universally supported in popular computer vision libraries such as OpenCV and MATLAB. However, to achieve an accurate camera calibration it is recommended to tailor these patterns to your specific use case using a custom pattern generator. This combined with a high-quality printing service, ensures optimal results by balancing the flexibility of these patterns with rigorous production standards.

0 Comments

    Leave A Reply

    Loading...
    Order Your Custom Calibration Target Today!
    Loading...