The next-generation of computer vision applications will leverage more than just high-level image classification, these applications will want to know what is happening inside the image. Our algorithms can do more if they know not just that “cars” were in a photo, but can count how many cars passed through an intersection per day. This specific example might be used, for example, in applications where we want to forecast sales at a shopping center based on how many cars drove into a parking lot.
Deep learning has revolutionized the computer vision sector with advances in convolutional neural networks and specific architectures such as "You only look once" (YOLO). YOLO is compelling because it trades only minimal accuracy to be extremely fast at inference time when making object detection predictions. On a Pascal Titan X a YOLO network processes images at 30 FPS and has a mAP of 57.9% on COCO test-dev.
The Patterson Consulting team can integrate next-generation object detection methods into your applications that allow you to not only classify objects in photos but also get accurate bounding box coordinates in the photo as well. This allows a non-phd application engineer to enjoy the benefits of next generation computer vision methods while focusing on building the line of business application.
Many applications have the need to be able to parse typed or hand written language on a scanned document. Integration with OCR models allow applications to work with the raw text generated from the model and further make deterinations on what to do with the document (and more).
Another method in applied computer vision is the ability to detect and extract text phrases from different angles that occur in natural (typically city) settings. These is typically useful in applications that need to map the world, such as Google's street view system that needs to be able to read signs. Text rarely occurs in the perfect position, so being able to read text at different angles and scales is also a strength of this method. Patterson consulting can work with your team to build a text extraction pipeline from your incoming image sources to better inform your applications.
These methods generate a text description for a given input image. Objects are described in a plain text sentence that can be further analyzed by other natural langauge processing methods.
These computer vision applications take an image and a natural langauge question as input and output a natural language answer (typically single word). Example: "What color is broccoli?" (along with a picture of broccoli), given answer: "green".
Operationalize your investment by leveraging the Patterson Consulting team to deliver end-to-end computer vision application solutions to your organization.
Many times customers wonder what is possible for a given custom model. We can spend some time with a customer's data to better understand the potential for a successful model project in the form of a feasibility study.