Voice Input Challenge: Closing thoughts

Taylor Martin
 from  Concord, NC
| August 5, 2012

Ever since last Monday, I have been using only voice input – or dictation – to enter text on my phones. That means no keyboards, no typing, no word prediction. Just speaking and going all-in on dictation, putting all of my text entry in the hands of mobile dictation software.

I had been wanting to do a challenge similar to what Aaron and Sydney had done over the past several months, but I wanted it to be based more on software rather than hardware. I wanted a challenge that forced me out of my comfort zone with any and all handsets regardless of specifications, hardware and other various features. Eventually, I came up with the idea of abandoning all soft keyboard text entry for an entire week to give dictation and other voice input a serious go.

Let me begin by saying it has been an interesting week, both inside and out of the challenge. I have been carrying and actively using three phones for the challenge and I've been rather surprised by how my usage has changed just over dictation, how I would subconsciously choose one phone over another because the voice input seemed more accurate than the other. And I could not be any more excited for that dreadfully long week to be over.

Without further ado, here are my final takeaways from the Voice Input Challenge (unless a few more thoughts hit me over the next few days, like continued use, other findings, etc.):


iOS dictation delays

Of all the things that go wrong when using dictation, such as misinterpreted and uncommon words, there was one little thing I found more frustrating than anything else: a delay in iOS dictation. Most of the time, especially if you only have a snippet to input, iOS dictation speeds are great. The transcribed words will usually appear within one or two seconds. However, if you have to delete a few misinterpreted words in the middle of a sentence and retry saying them or speak a second part of a message, the response times go through the roof. I've had to wait upwards of two minutes for dictation to input the words, that is, if it doesn't simply time out.

It doesn't happen like this every time. But I would say nine times out of ten, of you don't get it right the first time, you will end up waiting at least a minute for dictation to respond. I found this very frustrating (partly for how iOS handles dictation, which I will touch on in the next section) and often reached for an Android phone for more arduous dictation, despite its less profound command abilities.


Android and iOS dictation are different beasts

Like I explained on Thursday, iOS dictation does not display words as they are spoken, but rather after you finish speaking. Android, on the other hand, will input words as they are spoken and make adjustments on the fly. As you would imagine, each of these different methods have their own unique set of benefits.

As far as iOS goes, its dictation high points are promptness and short messages. It's extremely accurate and quick, and it seems to get some more ambiguous entry. For example, just tonight I wanted to type, "I'll be sad to see the C3 go. It has served me well over the last few months." It input it perfectly on the first try. That said, with iOS, you have to have planned everything you are going to say beforehand, even rehearse everything in your head before hitting the dictation button. If not, you may end up forgetting what you've already said, repeating yourself or missing words.

While Android struggles with capitalization and proper nouns, it's noticeably better at longer sentences and messages since words are displayed as you speak them. You have a chance to pause, read what you've spoken and articulate your sentences with a more confidence and assurance. This made dictation a little more doable for me, as I could stop as soon as I made a mistake, whereas on iOS, you don't know if you've made a minor mistake until you've finished speaking.

Either way, these two operating systems handle dictation in two very different ways. In the end, it boils down to preference, which method you like better. For me, while I really wish Android could adopt some of the commands in iOS, like virtually all punctuation, I prefer Android for the instant gratification.


Speaking punctuation will never feel natural

No matter how much I have used dictation this week (hint: it's a lot) or how much I've grown used to talking to my phones, there are some things that will neither feel – or sound – right ... ever. As far as dictation goes, the one thing that seemed to draw more attention from bystanders than anything else was speaking punctuation.

In speech, punctuation is omitted, implied, understood. But in the written language, punctuation is everything. It can make a huge contextual difference in a sentence. Going from speech to text, this can be a problem. Of course, as I've mentioned ample times, you can simply say "comma" to insert a comma in either iOS or Android. But I quickly found that this screwed me up more than anything else with dictating. It was easy to come to the end of a sentence, pause and begin dictating the next sentence without ever entering a period.

Towards the end of the challenge, this posed less of a problem than at the very beginning. But that doesn't mean I grew to like it anymore, which gets me to my next point …


The breakthrough in dictation will be contextual awareness

We've established by now that artificial intelligence and a device's ability to understand context and its owner is the future of mobile technology – possibly even all technology. And that extends to dictation software more than you might initially want to believe.

Dictating is all about natural language, making speech to text as natural as possible. Since speaking the punctuation that is generally omitted, there is a great hurdle that dictation software is going to have to overcome before it will be recognized as a serious form of input (versus simply being a convenience factor or supplementary form of input). If dictation software is able to apply context to your sentences and pick up on various audible cues in spoken human languages (like a raised pitch at the end of a question or a pause for the end of a sentence or comma, for instance) to automatically insert punctuation, dictation could become a primary input for a large number of users.


I think I'll stick to typing … primarily

Since this isn't the future, however, dictation is not going to replace my soft keyboards. That said, I wasn't entering this challenge thinking it would either. I was simply attempting to put a finger on some of the shortcomings and potentially even some of the overlooked highlights of the various software.

The good news is I learned quite a bit, and I can only hope I conveyed much of that to you. And the challenge wasn't so bad either; I survived, faced no major texting mishaps and I also came away with a new found comfort in dictating much of my text entry. I have every intention to keep using dictation more and more.

But trust me when I say I could not be more happy to wake up to being able to use my phones' keyboards again. Oh, how I have missed SwiftKey