Einstein Vision: Insights into Image Recognition and Prediction

Kirk Steffke
Dec 13, 2016
5 min read

Updated: Jan 20, 2022

Artificial Intelligence ("AI") is here and is producing exciting results throughout the tech industry. Companies like IBM, Google, and Facebook, to name just a few, are leading they way by introducing AI to their products and services. Notice automated recommendations significantly improving over the last 6-12 months? They're likely the result of AI and deep learning! A great example of this is the Spotify "Discover Weekly" curated playlist. Just in the last 3 months or so, I've noticed the song choices to be more likely songs that I'll listen through as opposed to skipping through to find what catches my fancy.

Social networks and entertainment are just the tip of the iceberg for AI use cases. It's no secret that in the CRM realm, Salesforce's Einstein has built excitement through this fall's Dreamforce, is coming to life in the latest Salesforce release, Winter '17, and has a long and adventurous road for itself ahead. Einstein isn't just a single Salesforce technology, though. It's the hybrid of several other products of Salesforce acquisitions like PredictionIO, RelateIQ, and Einstein.

In this blog post, I'll be reviewing the Einstein.ai Vision APIs. Boiled down to its most basic parts, this API allows us geeks to ask an API endpoint, "Hey vision, what's in this picture?" More elegantly, it's a structured data system that allows us to upload vast categorized "datasets" to, provide "training" against various "models" to teach the AI how to analyze the images in the datasets, and be returned "predictions" aka, the results of what the API thinks we've provided.

Sound familiar? This is the same basic pattern that Ami has demonstrated in the past, using text (not images), with IBM's Watson's Classifier API.

Link: Integration with the App Cloud: Callouts to IBM Watson

Let's take a deeper look at a dummy scenario where vision would prove mighty helpful...

Sample Scenario - Venomous vs Non-venomous Snake Identification

You're out in the wilderness with your colleagues, hitting some real-life trails to make the most of a warm and sunny day out of the office. Birds are chirping and in the distance you can hear some running water. You decide to step off-trail to investigate for some better photo opportunities. In your excitement, you've become a little less aware of your surroundings and suddenly you feel an intense pain coming from your leg. Something bit you and now it has slithered away to a safe distance. Neither you or your friends are snake experts and you're rightly worried about whether or not the snake was venomous. Your quick thinking friends take a quick picture of the snake before whisking you away for help.

Now, you've made it to a local hospital's emergency room and you're hoping those photos are going to be a great help. However, like your friends, no one in the ER has studied herpetology and the staff is aggressively searching the internet. Time is of the essence, how quickly they can find a result might make all the difference if the snake was venomous.

That's where vision comes into play. The ER staff takes your picture and uploads it to their Snake Identification tool and out pop the results that say...

Yeah, yeah, yeah - I know there's some snakesperts out there saying, what's a ball python doing in these hypothetical woods, in probably the US. Just go with me here. For anyone else, a ball python, in the US, outside of captivity is probably someone's kid friendly pet that was released or on the loose.

However, the tool could have just as well returned some more threatening results...

Within both screenshots, I've uploaded an image. In our working example, this is the picture your friends took of the snake as it retreated to bask on the warm rocks in the sun. The image was sent to the vision API, which was compared against the existing datasets, and based on the models, returned the varying probability of what kind of snake was in the photo.

For this exercise, I prepared a dataset with three different labels or classifications of snakes: Ball Pythons, Black Mambas, and King Cobras. All in all, I probably uploaded between 20-50 sample images of each (thanks Google Images!). In the real world, there'd be as many types of snakes to serve as labels and as many samples within each of those categories as possible for more accurate predictions.

More About the Einstein Vision

Check out the "Build Smarter Apps w/ New Einstein Vision Service" webinar at the link below. Emily Rose, Lead Developer Evangelist and Michael Machado, Senior Product Manager, both of Salesforce, do a great job in running through the technology to give an overview of what vision is, some sample use cases, and a deep dive into how it can be leveraged from within Salesforce.

Link: Build Smarter Apps with New Einstein Vision Service

In their examples, they demonstrate how vision is intelligent enough to tell you the manufacturer of the car in an image they upload. They also provide a GitHub repository with some starter code that leverages pre-built models to determine what type of animal is in a picture you upload.

GitHub Repo: Code shown in the Einstein Einstein Vision Service webinar

The Einstein documentation has been updated to be Salesforce first. There are great instructions to quickly get up and running with sample Apex Classes and Visualforce Pages. The code is ripe with opportunities to make it more dynamic and Salesforce oriented.

However, the downsides are that the API endpoints used for POSTS are multipart form data typed, which traditionally aren't the most straight forward to use from Apex. Prepare for some string manipulation, padding, and encoding. Another thing I found odd was while the documentation was clearly updated (or maybe created... I'm not sure what existed before the Salesforce acquisition), some of the API basics, not specific to Salesforce were lacking. And while predictions can be made using base 64 strings, images, or image URLs, the uploading of an example for your dataset build is only file based - I didn't see anything around providing a base 64 encoded string or an image URL which would be handy.

For time's sake, I created the dataset, labels, and examples via cURL, but then call the prediction endpoint and parse the response in Apex, using modified examples of what was provided in their GitHub repo.

If experimenting with vision, here's a quick shell script to make uploading your sample images a snap. Just create an "images" folder with subfolders for each of your labels. Copy your images from the internet into these folders and modify the script below.

for f in $(ls "/Users/kirk/Desktop/Metamind/Images/KingCobra/"*.jpg); do curl -X POST -H "Authorization: Bearer <your token>" -H "Cache-Control: no-cache" -H "Content-Type: multipart/form-data" -F "name=$f" -F "labelId=<your King Cobra label id>" -F "data=@$f" https://api.metamind.io/v1/vision/datasets/<your snake dataset id>/examples done

Don't forget to modify your directory path, the token, label id, and dataset id.

What are some better real-world, Salesforce oriented, business use cases?

You might be thinking, where does the analysis of an image come into play with Salesforce? Text based recommendations are obvious, but where to pictures come into play?

Field technicians needing assistance identifying machine parts
Visual inventory system automation
Automated image tagging

What use cases can you think of? Let us know @crmscience!

Einstein Vision: Insights into Image Recognition and Prediction

Recent Posts

Advanced Salesforce AI Solutions: Agentforce In Action

What is a Salesforce Partner and How Can It Help Your Business

Navigating the Salesforce Galaxy: A Guide to Campaigns, Account Engagement, and Marketing Cloud