Depending on where you’ve lived you’re probably using at least one of WhatsApp, WeChat, LINE, or Kakaotalk. At first glance, these apps look like communication tools. They allow you talk to your friends, family, coworkers and business partners. Sometimes all at once. But they are actually signaling the beginning of a much larger trend. Pretty soon, you’ll use these apps not to talk to your friends, but to talk to machines. They will become part of a new interface that we use consume information and services. In fact, many companies have started to move into this direction. We’ll get to that, but bear with me for a little while.
By design, all interfaces impose constraints on their users. Some constraints are good. They force us to focus on the essential, making us more efficient. Some are limiting. They prevent us from doing what feels natural to us. Take search as an example. Humans are quite good at asking questions. We’ve been doing that for thousands of years. But can you have a conversation with Google to get the information you need? No, you can’t. (Okay, except for a tiny fraction of very simple and commonly asked questions). Instead, we’ve adjusted to the interface that Google is providing us. We’re entering keywords that we think Google will be able to utilize to give us what we want. Most of us are doing this unconsciously already. We’ve been conditioned to do so. But if watch your mom who didn’t grow up with Google, you’ll see that she has hard time wrapping her head around what “good” keywords are, and why she can’t find what she needs. Keyword search doesn’t come natural to her.
Now, there’s an alternative interface that we use to communicate, get information and consume services almost every day: Natural Language. Natural Language is nothing more than the transmission of information, or meaning, through a medium such as sound or script. That’s also why I won’t make a distinction between speech and text-based solution in this post. They both have their place, but they embody the same thing. Language is an interface that, currently, primarily facilitates communication between humans. But you can imagine using this same interface to facilitate communication between humans and machines. Instead of putting keywords into Google you’d just ask a question and have a quick conversation. Instead of clicking through a dozen screens you’d just say what food you want to order, or tell your car where you want to go.
Just like other interfaces, the natural language interface has constraints. It isn’t suited for everything. Obviously it’s bad for visual tasks – shopping clothes for example. It also doesn’t seem so great for discovery, browsing items on Amazon without a clear goal. But you can certainly imagine use cases where natural language excels. Asking questions to get information, or conveying tasks with a clear intention – ordering a specific food item, cab, or giving directions for example. Natural Language places a low cognitive load on our brains, because we’re so used to it. Hence, it feels more natural and effortless than using an app or browser.
Of course, Facebook (M), Google (Now), Apple (Siri), Microsoft (Cortana) and Amazon (Echo) have been working on this interface for a while now. They’re calling it personal assistants (PAs). However, there’s another group of companies who are attacking the same problem, but from a different angle. Messenger apps. These platforms are uniquely positioned to enable communication with machines. The leader of the pack is probably WeChat, which allows you to order everything from food to taxis from within the app, using a messaging as the interface. Slack is moving into the same direction by encouraging the creation of various bots. LINE also has a range of robotic accounts, ranging from translation services to erotic entertainment.
Why do I think these two groups of companies are working on the same problem? PAs follow what I would call a top-down approach. They’re trying to solve a challenging problem with a long history in AI, understanding your intentions and acting on them. They are trying to build general purpose software that is smart enough to also fulfill specific tasks. It’s a research area with lots of stuff left to figure out. Messengers on the other hand follow a bottom-up approach. They are starting with simple bots (“Hi, translate this sentence for me”), solving very specific problems, and are gradually moving towards more sophisticated AI. Over time, these two approaches will converge.
When it comes to the adoption of natural language interfaces, messaging apps may actually have a leg up. Here’s why. Talking to Siri on a crowded train still feels awkward, even in SF. It’s also troublesome, because you have to open yet another app. However, many of us are spending our time in messengers apps anyway, so it feels completely natural to just have an additional conversation with a bot. Ditto for Slack. The transition for consumers seems so much smoother. As conversational interfaces penetrate our world, voice interfaces will start feeling just as natural as selfie sticks. Hopefully more than selfie sticks. But another reason is data. Messenger apps have collected huge amounts of conversational data that can be used towards building better Natural Language models. Facebook has that too, both with their messenger and with WhatsApp. Microsoft doesn’t. Amazon doesn’t. Apple may or may not, depending on how secure iMessage really is.
If you’ve actually used any of the PAs you may be skeptical. Siri still barely understands what you want, and Facebook has put hordes of human workers behind M to get it to do anything useful. How will these things ever replace all the complex tasks we’re doing apps and browsers? That’s another reason for why the bottom-up approach seems promising. It yields immediate benefits and practical applications.
But I also believe that over the coming years we will see rapid improvements in conversational interfaces. There are clear enabling technologies for this trend: Natural Language Processing (NLP) and Deep Learning. Deep Learning techniques have led to breakthroughs in Computer Vision, and they are now penetrating natural language research. Whether or not Deep Learning itself will lead to breakthroughs in NLP is hard to say, but one thing is clearly happening: Many smart people who didn’t previously focus on NLP are now seeing the potential and are starting working on NLP problems, and we can surely expect something to come out of this.