My name is Ajay K Nair, I work at Google as a Product Lead for Edge machine learning. I was part of the team that helped build the cloud GPUs and I led the team that worked on the edge TPU. We're working on getting the edge TPU-based products out in the market right now and I'm leading that effort through a program called Coral, which is just to get AI hardware and solutions out there as much as possible into customers' hands.
In this article, I’m going to give you an industry overview of where I think the industry is heading, where it's come from, and a few opportunities for all of us, especially the developers, on what to focus on when we are making the next systems and from the user perspective, to be aware of what could be coming and how to adapt our solutions to meet those needs.
Here’s a breakdown of our main talking points:
- An industry overview
- How and why machine learning applications are moving toward local/edge inference needs
- How the industry is evolving to address the needs of key vertical markets
Let’s dive straight in. 👇
An industry overview
AI adoption is continuing at a rapid pace
The needs for AI come from all around in terms of applications, from server architecture, self-driving cars, and sensor level devices, which are sending data all the time, that's where AI is starting to see a lot of development.
The graph above shows that between 2009 and 2017, there's a huge uptick in the amount of AI research which is due to the need for AI in all of these applications for training initially, as well as for inference.
Why?: A sample of AI use cases
Where's that coming from? It's coming from applications that can make use of AI. The image above shows a sample of different industry verticals, and what use cases in those industry verticals would benefit from machine learning.
You will see in the use cases, that there are needs for machine learning in the Cloud, where you are sending your data but there are also a lot of needs, where you would like to run machine learning algorithms at the Edge, outside of the cloud, where you have inference needs.
How and why machine learning applications are moving toward local/edge inference needs
Need for more compute (12x in 18 months)
What does this increase in all the requirements for AI mean? Primarily, the compute requirements driven by the increase in machine learning needs have gone through the roof. The chart above, which is a representation of the size of models between 2013 to now, shows the time required for training these models has really gone up to the roof. What does that mean?
In terms of requirements from applications, it means you need a lot more compute. From a Google scale perspective, we needed to add data centers at a rapid scale which was not sustainable so we stepped back to find out how we can, as a company, slow that down by making existing data centers more efficient.
One was trying to build TPUs which was announced three years ago, cloud TPUs v3 was released towards mid last year, which has about 480 teraflops per chip. We have a few of those connected in a tray that goes into a pod and each of those pods is a supercomputer scale, which is a tremendous amount of compute that we've been able to put into hardware specifically for machine learning. And that's really remarkable.
What this also shows is, in terms of the scale, there's a 12x increase in 18 months in the need for compute requirements in the data center. The 18 months is important because, by Moore's law, it’s how quickly we expected compute performance from a technology scale to double. It clearly shows we can't rely on just technological advancement anymore in terms of process nodes to provide the compute needed for ML applications.
The rate at which the newer models with a new architecture are being deployed is about three and a half months because a lot of research has been going on into that.
Every problem is an opportunity
The problem of needing a lot more compute has been addressed and embraced by the industry rapidly, specifically, the folks who put their hand up for being implementers of AI accelerators. That's a golden opportunity, especially in the semiconductor field that's come after a very long time.
Since the 2000s, semiconductors have been commoditized and these ML accelerators bring an opportunity for people to develop that which has been shown in how much money is going into that market.
In the past few years, about $500 billion have been pumped into the market in terms of funding for AI accelerators. This provides a platform for people to launch new hardware capabilities, either by increasing the amount of CPUs used in data centers, or GPUs can become faster, bigger, and more powerful.
A lot of innovation has been enabled by all the funding coming into the industry with TPUs, we went to v2 to v3. At last count, there were at least 15 startups with at least $15 million in funding which is an important number, because that's what it takes to tape out a chip, put a small team in place, and productize that in a newer technology node. It’s expensive, but the industry has been trying to tackle that. Between 2019 and 2020, most of those 65 startups would have some form of a chip coming out and will be available to folks planning to use AI accelerators and their applications.
The steps that have gone towards helping AI accelerators with one of the primary needs, which is to run your machine learning workloads, either training or inference in the Cloud and you can add more compute, and you will probably get it cheaper in the future.
Why AI at the Edge is becoming important
I define the Edge as anything not in the Cloudsensor-level, starting from sensor level devices, hubs connecting the collected data from multiple sensors, to on-prem data centers.
In terms of machine learning workloads, there are a lot of needs. When we released the TPU, many customers came to us and said “This is great for the cloud, but I have needs to run machine learning workloads at the Edge. What do I do for those workloads?’
Privacy
The primary reasoning was privacy. If you're looking at a camera application, there might be privacy concerns and we don't want that information going out in the cloud so running inferencing at the Edge and abstracting out the data, and sending just what we call the semantic data would be the way to extract only the useful information.
Latency
Let’s take the example of a self-driving car going 65 miles an hour. I can't afford the round trip latency of looking at something, going back to the Cloud, coming back, and applying brakes, it would be too late. However, there are multiple other applications where you need to run things at the Edge.
Bandwidth
Bandwidth is key. Now, every camera coming out is either 20 P or 4k, just one frame is 24 megabytes of data at 30 frames per second, which is almost 700 megabytes of data in a second. Imagine those cameras that are always on and capturing your data and sending it off to the cloud, well, that's a huge amount of bandwidth being used up and most of the data probably is not used.
You want to run machine learning when something interesting is happening. What if we ran inferencing locally, and transferred only the useful parts of the data to the Cloud and not everything, that would save on bandwidth significantly and free up bandwidth for other applications that might need it more.
Offline
What happens if you have no internet connectivity or the connection is spotty, and you can't rely on it? That's another reason why you want to run inferencing on the Edge. I keep saying inferencing because that's where the Edge needs are today and training is coming in as well but that will be probably a little further out.
Cost
The cost of deploying these solutions at the Edge isn’t trivial. The cost in terms of not just dollars, but also in terms of the power footprint that inferencing at the Edge takes. For these needs, we think having ML solutions for the Edge is important as well. At Google, we created something called the Edge TPUs for that, and now solutions around that which customers can take to market.
High-quality AI is compute-intensive at the edge too!
Why is Edge compute-intensive? Why do we need a lot more compute in the Edge? Let’s take the example of an image you downsample to 224 by 224 and you want to run inference on a MobileNet v2, kind of architecture.
To run that inferencing on that and on your hardware platforms to get about 65% accuracy on a 224 by 224 image, you would need about 120 million operations per second which represents a significant amount of hardware needed. But 65% accuracy is not sufficient, we're all shooting for close to 100. If I need better accuracy, and I go to 75%, I need 1.2 billion operations per second which is a 10x increase in the amount of compute needed at the Edge.
In most use cases these days, even 75% is likely not sufficient, we want close to 90%. To achieve that, we can expect you would need your hardware to be able to perform about 12 billion ops.
That kind of compute is not available today, traditionally, in the kind of processing power that exists in Edge nodes either your phone or sensor devices or even smaller portable devices that people are interested in.
How the industry is evolving to address the needs of key vertical markets
Every problem is an opportunity: (Evolution v2.0)
A lot of companies are trying to fix the left side of the balance hereby:
- Providing more compute in terms of accelerators, specifically IPs that go into SOCs that exist today
- Providing end to end solutions that abstract out the difficulty in that hardware
- Designing hardware and software and algorithms together
One of the key differences in where the industry is heading so far is model developments happening in the cloud, then you deploy it, and then you try to provide as much compute as possible.
There was no end-to-end view of designing this model for this hardware architecture, or this hardware needs to take care of these kinds of models, so that core design doesn't exist. A few companies have started to add up that from within Google as well.
On the other side, there's optimization that is happening from a model perspective. For model developers, there was a lot of compute available to train on the cloud and get your accuracy higher.
To get that accuracy higher was the only goal, but now folks are realizing the usefulness of these models is dependent on how efficiently you can run inferencing on them at whatever compute is available. These huge models present a big problem that is being addressed now with optimizations and model architecture.
Some examples here are:
Quantization: where the accuracy that comes as floating-point is probably not required. If you can take either an int8 or an int16 and truncate all your other digits, then your hardware requirements become much lower.
Pruning: all your peripherals in your models are probably not required. There's a lot of effort going into optimizing models, making them lighter and more compact to be able to run on Edge devices.
TensorFlow is the framework for creating machine learning models and Google is putting a lot of effort into TensorFlow light, which is also open-source and available today. It essentially tries to automate this process as much as possible by taking a few subsets of the opposite TensorFlow supports, and then mapping and then optimizing those either through quantization embeddings, pruning, distillation, and some other techniques that they have, which broadly fall under the bucket of learning to compress. There also are free tools to convert TensorFlow models to TensorFlow light.
Inflection point possible
The potential problem with this and with the people developing inference acceleration is if there is a possible inflection point with this shift in the balance now. We have more compute available and the model development is pushing down the requirements existing on the hardware itself. Will we be at a point where the models are small enough with sufficiently low latency and sufficient accuracy so that you don't need specific dedicated hardware anymore?
It’s happened in the past where you don't need specific accelerators and the accelerators become part of whatever computer is available at the Edge. If that happens, what would be the next opportunity or the next evolution in the industry? It would be bad for the semiconductor industry. I live there, I went through the cycle where it became commoditized, and didn't like that at all. Now it's sort of a revival for semiconductor professionals with all the AI revolution, there's a lot of hype around it, for good reason, so we could be the next opportunity.
AI is still a nascent industry, there are a lot of different AI models, AI architectures, and AI frameworks that exist and multiple companies are working on it. Numerous hardware startups are trying to address these problems.
Each hardware startup or each network architecture chooses what hardware it runs on, or each hardware architecture chooses some frameworks that are close to them, primarily driven by scale. When you start, you have to start slowly. You build hardware, you choose a few network architectures you do really well with, and then you try to scale up from there.
But today, the situation is very complex. You have to choose and design specifically for each hardware architecture and it's a little bit of a mess. What if we could have a common intermediate representation, where you can write any hardware, any machine learning model in any architecture on any framework, and have an intermediate representation that takes care of determining how it best can be applied to any given hardware and underlying architecture there?
That's a field that is developing rapidly and also an exciting space. If we look back at how compute evolved, even with PCs, when somebody writes an application today to run on a PC, they don't need to worry about whether it's going to be an AMD processor or an Intel processor.
Things were not like that back in the past or with GPU. When you write a gaming platform, you don't need to worry as much about whether it's going to be running on an Nvidia GPU vs. an AMD GPU. There are some dependencies but most of those have been abstracted out. That’s where machine learning is going as well and it’s the next evolution or the next opportunity for people developing that field.
Other areas where AI is evolving
There are a few more opportunities or evolutions in the industry right now.
In terms of model development, I talk to a lot of customers in the field of deploying Edge machine learning and companies have a lot of interest in using machine learning but have no idea how to do it. It requires me to create a model, train that model and then deploy it. I have to start with data scientists who understand machine learning and how to build models. It's not easy but that's being challenged right now.
People want an end-to-end solution that would look like “Why don't you come to me with a product, I can just plugin, and I can get my output”.
For example, I put a camera in my retail store and I get a dashboard somewhere in my headquarters, which tells me how many people walked in the store, what kind of product they were interested in and how I should rearrange my products to make it more efficient, in terms of retail space.
That's an end-to-end story that doesn't exist, it needs to start with creating models which are difficult but there are efforts to make that as automated as possible. At Google, we have auto ML, where you bring in your labeled data, and it will create models on its own for your hardware architecture. That’s one significant area where the industry is evolving, and it's very useful.
The other is to create efficient ML models using AI. Today, a lot of this model development is happening either manually, or with what we talked about in the first step, it's just starting. But in the future, I see a lot of opportunities to develop machine learning models using machine learning, because you can create a cost function and tell your model developing system what you're shooting for and it should be able to do that right.
The other aspect is, how low at the Edge can you go? There are microcontroller class devices now, which will need some small kind of ML, it does not need to be complex, primarily for sensor data so we may not even be talking about vision there.
But that'll be useful for applications such as automotive. For example, in a car, your machines could have these sensors listening to the vibration patterns of your engines all the time and then create an alert when it goes out of the bounds you've set. Because there is a sound pattern for what sounds like a well-functioning engine and before it breaks down it'll change its sound pattern so you can take corrective action.
To wrap up
I think the industry is heading towards a lot of changes, I think it's a nascent industry, so there will be changes. We are at a point where now that these changes will start becoming pretty rapid, because of multiple reasons:
- The amount of funding that has been pumped into the number of customers, and the number of customers that are using these as well.
- Starting with a cloud focus and now adding an edge focus to that
- Needing to change our focus to hardware, software, algorithms, and toolchains, all of them working together with the end goal in mind, rather than each being a separate system that doesn't worry about the other.
- Specialized architectures for some specific applications to provide general applications that can work for multiple different needs.
- Manual to automated in terms of model development and hardware development.
Of course, all of this from a customer perspective, it's going to lower the cost, it's going to get commoditized to some extent, and it will mean lowering the cost, which would translate into ML being deployed much widely across the industry. There are a lot of potential areas where we can innovate and develop together and look forward to what the industry has in store.