The latest devices using computer vision technology and the AI training data that powers them.
Whether it’s an advanced virtual reality headset, a household robot, or an autopilot for our flying cars, a huge proportion of the dreams of science fiction requires computers to see.
A machine can’t interact with the physical world if it doesn’t know what’s in that physical world. But for decades, engineers struggled to teach computers how to extract a useful spatial understanding from two-dimensional, and eventually even three-dimensional images.
Now, machine learning is well on the way to solving that problem by giving computers the power of sight.
The breakthrough came with the switch away from the direct insights of software engineers and toward the slow evolutionary process made possible only with neural networks. Suddenly, the job of a computer vision developer changed from designing the underlying rules of sight to building AI data sets that allow the development of those same rules through machine learning.
By leaving the development of sight to the iterative learning process and focusing instead on providing the resources needed by this process, developers suddenly found that computers could not just see, but even begin to assign some measure of understanding to what they see.
The Evolution of Computer Vision Technology
Computer vision can be used to identify not just general types of objects, but the more nuanced details and nested informational content. This extends from optical character recognition (OCR), which reinterprets the outlines of visible letters as readable text, to algorithmic lip reading, which does much the same for spoken language.
In recent years, computer vision technology has graduated from simplistic facial recognition —the finding of faces within an image—to the rapid facial identification of specific individuals and much more.
Here is a selection of the latest applications of computer vision technology.
Computer Vision Technology in Sport
The sight of athletes wearing motion-capture suits to analyze precise details of their movement is becoming increasingly commonplace. This method, however, is not practical for capturing an athlete’s movement in a ‘live’ scenario, where it is not possible to wear the motion-capture suit.
This is where human pose estimation can help. Video recordings of an athlete can be used to assess performance without coming into physical contact.
Computer vision is also valuable in analyzing team behavior. In the NFL, for example, computer vision has been used to track the coordinates of every player in a team from video footage of a game. This generates a wealth of data that helps coaches to assess their teams’ performance and evaluate rival teams’ patterns of play.
As well as monitoring players, computer vision technology can be used to detect specific events during a sporting occasion, such as a goal in a soccer match. This technique opens the possibility of automating the otherwise time-consuming process of producing sports highlights. By identifying and excerpting specific events, whether on the field of play or among the watching crowd, it is possible to transform a lengthy cricket match, for example, into a concise package of highlights.
Computer Vision Technology in Healthcare
The same method used to observe physical movement in sports players can be used to detect neurological and musculoskeletal issues in the wider population.
It can also be used as an automated diagnostics tool at a cellular level, detecting the growth of cancer cells, and brain tumors, for example.
More recently, computer vision technology has been used in the diagnosis of COVID-19. COVID-Net, for example, can detect the virus from chest X-ray (CXR) images. Computer vision also offers a non-intrusive method of detecting the virus by identifying indicative changes in infected people’s breathing patterns.
Computer Vision Technology in Retail
Crowd counting, or people counting, helps the retail sector in a variety of ways, including the analysis of customer behavior, queue management, and staffing deployment. It can also help retailers to detect dubious behavior, such as suspicious lingering, shoplifting, and people accessing unauthorized areas.
The concept of self-checkout, where shoppers scan their own barcodes, is well-established. Computer vision technology, however, can expedite the checkout process even further by identifying products without the need for barcodes. The Tiliter system, for example, detects and prices non-barcoded, fresh produce.
Using computer vision to identify products is also useful in the area of inventory management. Shelfie, for example, minimizes inventory stock-outs by identifying empty shelves and alerting staff to the products that need replacing.
Another approach involves all-in-one, mobile inspection systems such as Tally, which scans and tracks products within a diverse range of retail environments. Capturing visual data using over a dozen high-resolution cameras, an audit using Tally can take place three times a day and provide results with around 99% accuracy. This compares to 65% accuracy for a typical once-a-week, manual audit.
Computer Vision Technology in Agriculture
It’s not just people and products that computer vision can look at, there’s lots to see down on the farm, too. Using the technology to monitor crops continuously and non-destructively removes the problem of time-restricted, subjective human judgment.
Looking more closely at the crops, computer vision technology can be used to undertake more complex visual analysis, such as plant health monitoring, weed detection, growth estimation, and flower and fruit classification.
Combine this level of crop monitoring with robotic harvesting devices, and the whole process, from planting to picking, can be automated.
Computer vision systems using the You Only Look Once (YOLO) method are able to recognize and count flying insects. In many parts of the world, this rapid detection of swarming insects will avert potential catastrophe for farmers.
AI Training Data: How Machines Can See
Computer vision technology operates by recognizing patterns and connections based on previously acquired AI training data. For basic systems, publicly available image datasets will be sufficient. More complex applications, however, require more sophisticated solutions. Better computer vision solutions will only come about through the collection and curation of better training datasets.
In machine learning, a dataset is a curated collection of information that’s organized to allow useful learning on a specific topic. So, when Google famously wanted to teach a program to identify videos of cats, it first had to create a number of datasets to be used by its nascent cat-finding neural network. The dataset has to not only contain videos of cats and non-cats but metadata that specifies the true answer—cat-containing, or non-cat-containing.
Without this curation of the AI dataset, the neural network has no way to know if a cat-guessing run was successful. And it’s the feedback from correct and incorrect guesses that provide the context for machine learning algorithms to restructure a neural network to be better at solving a given problem.
So, the creation of high-quality, highly accurate datasets is a huge concern in the development of neural network models in general, and computer vision models in particular. With a well-formed dataset in hand, along with a well-chosen machine learning algorithm, a developer can largely sit back and wait for their program to improve.
Computer Vision Technology Datasets
At Summa Linguae Technologies, one of our specialties is the collection, tagging, and overall curation of datasets—whether they’re for market research, scientific insight, law enforcement, product development, or any machine learning device requiring a specifically tailored AI dataset.
Are you currently using or thinking about using computer vision technology to create business opportunities and deliver better customer experiences? For datasets that will set your computer vision solution apart from the competition, contact us today.