How Do Computers Use And Understand Images - Xane AI
17006
post-template-default,single,single-post,postid-17006,single-format-standard,ajax_fade,page_not_loaded,, vertical_menu_transparency vertical_menu_transparency_on,qode_grid_1400,hide_top_bar_on_mobile_header,qode-content-sidebar-responsive,qode-theme-ver-17.2,qode-theme-bridge,wpb-js-composer js-comp-ver-5.6,vc_responsive

How Do Computers Use And Understand Images

How Do Computers Use And Understand Images

Blog by Oluwasegun Oke

How a computer perceives and interprets images is a branch of computer science that has its roots technically deepened to fully recognize and identify each item’s orientation, by taking into consideration, corresponding internal lines, exterior edges, geometry, and texture, alongside other areas that make up its profile. This involves reading images as numbers, usually in a two (greyscale) or three (color images) -dimensional matrix-structural arrangement. 

In addition to this, the conceptual providence of these matrices provides that it ranges from 0 to 255, in number. 

As computers presently lack the features to view objects stored on them, directly, as humans do, its images, however through computing methodologies, only do so by scanning into layers of binary numbers, which over the years, have been trained as algorithms models, usually updated with profiles of new images, for it to decipher in a conclusive way, through deep learning, the identity of such objects. Assent to the graphic adaptor, using varieties of features, including the left-hand side corner of the rectangular pixels of the learned algorithm image, to be processed, integrated, and displayed on your screen. 

And if we go further, on how they are stored as grayscale and color images, on the computer, truly possess different characteristics, in that, while grayscale color, maybe ‘white’ or ‘not white’, color mages are commonly depicted as Red, Green, and Blue (RGB).

For instance, a toddler can view the sceneries of his immediate environment, differentiate all items, engraved within, and identify each in its entirety. And if there are aspects he had previously not taken note of, due to oversight, he is able to build interest, direct focus on such, with the help of light-sensitive cells, on his retina, to capture images, on varieties of levels, including resolutions. Let’s take a dog, as a definite example. They come in different sizes, shapes, names, and other features. And given in this case that, he had only been exposed to bulldogs. Now, judging from his perspective of a dog, at this point, he was taken to a dog show and crossed-path with other varieties of dog breeds.

By taking into grasp, this sudden inflow of a large amount of data of images to be processed by the above toddler, all at once.

Even at such a tender age, the required region of his brain cells has developed well enough, to process such a huge amount of data, to fully comprehend each boundary of profiling distinction. And if curious enough, may as well ask what type of classifications, these new breeds of dogs fall within. In the same way, a computer is able to handle a large number of images, and videos, which by the way, pass through the same process. It is also of great importance to note that, deep learning, computer vision, and artificial intelligence all help in combining every variable, by first viewing such objects, through a sensor, connected to the computer vision system, next, such an image is passed through a process, that is able to break it into smallest indivisible particles, in the image, known as Pixels. What’s more, in the study of digital storage of images, pixels are primarily assigned to three colors, Red, Green, and Blue (RGB). 

The above particles represent the units of the image in focus, and of which properties and orientation would consequently be applied to extract, analyze, learn, train, and detect the presence of a match, usually from a wide variety of possible identities. In other words, the amount of millions of images stored by a particular computer determines its level of effectiveness, usefulness, reliability, and indeed capacity to respond to diverse categories of queries about broadly defined concepts

How Images are Brought up on a Computer Screen ( The General Concept)

The first step in this process is the retrieval of the image file from the disk, which the application makes possible, by locating, opening, and drawing the file, in the right direction. During this stage, a BMP image file remains uncompressed, while those of PNG, JPG, and GIF, become flattened, in order to concentrate critical attributes of the image data, with the right orientation. After this, the compression and decompression, follow, with the help of a codec (coder or decoder), only if it had already been compressed (flattened). Once conducted, the codec identifies with an inbuilt architecture and returns a proportionate number of bytes.

Consequently, all four variables which must be integrated by the application, in order to display the image, includes decompressed data, image width, image height, and the pixel format (given that the data are compressed). Both width and height of the image determine the size of the output image. Next, each pixel as defined in the image data is read, to be converted into native bit depth, for formatting applying pixel format from the image header, which then copies into the right compartment, for the output data to be displayed, upon being transferred within the reach of the video memory.

How Images are Stored on a Computer

With such knowledge of how computer systems keep images stored, approximately, by reading them as either ‘0s’ or ‘1s’, it becomes a great discovery able to provide us with insightful details, on processes involved, in order to derive fully optimized conversion scenarios. Since we have mentioned pixels as the smallest elements of any image, in the previous sections, of which number can be ascertained by multiplying the height and width, of such a picture (Pixel = Width × Height).

Therefore, a picture with a resolution of 1500 (height) × 850 (width), rounds up the pixels, as being equal to 1275000. And on balance, a computer’s depiction of the value of a given image, from metadata – resolution and color depth, aids inaccurate interpretation. These entail finding where the colors of the Pixel fall within and representing them with corresponding values. And on balance, a computer’s depiction of the value of a given image, from metadata – resolution and color depth, aids inaccurate interpretation. Additionally, a monochrome image (grayscale) requires 1 bit to represent each Pixel, 0 depicts white, while 1 represents Black. Similarly, multiple bits are applied in representing colored images. This brings us to color depth, which is the number of bits assigned to each pixel color.

Therefore, if an image contains 32 colors, then its color depth is 5-bits. The more the scale of a color depth, the more the broad list of colors and associated shades. Some of the most applied color depths around The more the scale of a color depth, the more the broad list of colors and associated shades., including 8 bits (2^8 = 256 colors), 16-bits (2^16 = 65536 colors), and 24-bit (2^24 = 16.7 million colors).

As an illustration, properties of images using 32 colours, are as follows: 

(2^5 = 32 colours)

Colour depth = 5-bits

00000 = White;       00001= Black;       00010 = Yellow;         00100 = Blue

Similarly, using color depth of seven gives

(2^7 = 128 colours)

Colour depth = 7-bits

0000000 = White;      0000001 = Black;     0000010 = Yellow;   0000100 = Blue.

Convolutional Neural Network

They are deep learning algorithms used in processing images when a file is imported from a disk, upon which the resulting output is placed into its memory. It is composed of input, output, and hidden layers. Designed using the context of spatial representation of biological molecular models, as described in the visual cortex. The hidden layer consists of tens of hundreds of ReLU, convolutional, pooling, and other deeply integrated layers, which can detect patterns, and extract features, from a variety of images. This outlook proves to be fully effective in understanding the three-dimensional structures of color images, for encoder and decoder methods to be introduced, thereby removing the need of performing manually operated methods in extracting image features.

The encoder ensures an image features are increased, by decreasing both its Height and Width. While on the other hand, the decoder increase the size back to a regulated proportion, while its depth remained intact (compressed), by utilizing transposed convolution techniques. With convolutional algorithms, applied in learning features of different images, without help from hand-made filters, which happens to be the case for other means. The following are the steps in learning about features in a non-linear image, and training new algorithm models, by utilizing large scale image processing layers:

  • Begins with inputting an image into the interphase
  • Adds filters to create appropriate layering (Feature Map)
  • Adds a ReLU function, to remove embedded linearity
  • Introduces a pooling layer, to every feature map
  • Spreads out the pooled images into a thin, long vector.
  • Add the vector into an optimally integrated artificial neural network.
  • Set processing in motion, through manipulation of learned features within the network.
  • Trains models in utilizing appropriate forward and backpropagation. This process is fully automated and restarted to bring to life a fully detailed and scalable neural network, including feature detectors, and trained models.