Consortium Releases New Measurement Benchmarks for AI

MLPerf, a consortium of 40 technology companies including Google and Facebook, just released benchmarks for evaluating artificial intelligence-enabled tools, including image recognition, object detection and voice translation. MLPerf general chair Peter Mattson, a Google engineer, reported, “for CIOs, metrics make for better products and services they can then incorporate into their organization.” Thus far, organizations have been slow to adopt AI technologies, in part due to the plethora of tools and services available.

The Wall Street Journal reports that International Data Corp. surveyed 2,473 organizations “of various sizes across industries worldwide” in 2018. The survey found that a mere 18 percent had AI models in production; “16 percent were in the proof-of-concept stage and 15 percent were experimenting with AI.” One of the obstacles to adoption, said the survey, is the overwhelming number of decisions and tools to choose from.

IDC research director for AI systems David Schubmehl noted that the new benchmarks “can help companies better address the complexities around AI adoption.” “It’s coming at a useful time as we’re seeing more organizations move from experimentation to production,” he said.

In addition to choosing tools, organizations have to decide “whether or not to run AI in the cloud … [and] whether to use graphics processors, which specialize in video and graphics but are now also handling AI, or central processing units, which mainly run central computer operations, for experimentation.”

California-based startup ServiceChannel offers facilities-management services via the cloud and is using AI to “verify the identity and performance of its contractors.” Chief executive Tom Buiocchi noted that the MLPerf benchmarks will “give the company confidence that it is deploying the right solution.”

MLPerf’s first measurement tool “released in May 2018, focused on training models,” and the latest tools “focus on results from trained models.” Harvard University associate professor of electrical engineering and co-chair of MLPerf’s inference working group Vijay Janapa Reddi noted, “there are many ways to implement AI, but the benchmarks are meant to identify optimal solutions.”

“You can literally look at the results and understand the trade-offs at a higher level,” he said.

VentureBeat reports that one of the latest tools to be released is MLPerf Inference v0.5 to measure AI system power efficiency and performance, with five “benchmarks that include English-German machine translations with the WMT English-German data set, two object detection benchmarks with the COCO data set, and two image classification benchmarks with the ImageNet data set.”

Among the organizations involved in defining inference standards were ARM, Facebook, Google, General Motors, Nvidia and Toronto University.

According to MLPerf Inference Working Group co-chair David Kanter, “benchmarks are important to using inference systems to definitively decide which solutions are worth the investment … [and] for engineers to understand what to optimize in the systems they’re making.”

The Power, and Limits, of Artificial Intelligence, Wired, 6/20/19