Voice search allows users to search by using a voice command rather than typing them. To achieve that, it uses speech recognition technology to understand the search queries.

Voice search popularity is rising due to proliferation of mobile phones and other web-connected devices such as Amazon Echo and Google Home and its ease of use compared with typing of search queries.

We use voice search mostly through interaction with assistants such as Amazon Alexa, Siri, Microsoft Cortana and Google Assistant.

Examples of voice search usage

Voice search can be used for various purposes:

  • querying search engines for information
  • dialing contacts
  • searching photos or audio
  • starting programs
  • selection of options

Voice search is also often employed with searches based on location. This has for example led to a large increase in number of searches using the query “near me” as shown by Google Trends statistics:

Why is usage of voice search rising ?

Many of us probably remember first attempts of using digital assistants around 2011, when querying more complex questions mostly led us to quickly abandon our attempts from extracting information from assistants due to their inability to properly understand and interpret our questions.

Speech recognition technologies have made major improvements since then, due to advances in deep learning and availability of larger amounts of data on which AI models are trained.

The increase in accuracy of speech recognition programs has been indeed steep since 2013:


Source: Recode

Google’s machine learning-based voice recognition thus achieved a 95% word accuracy rate for the English language already as of May 2017, which is on par with human accuracy. Similar accuracy has been achieved by Qualcom with on-device models using recurrent and convolutional neural networks.

It is expected that speech recognition accuracy will continue to play a crucial role in adoption of voice search, with increasing accuracy enticing more users to regular use of voice search. After systems reach 99% accuracy, the importance of further increases in accuracy will probably gradually diminish.

Ask Jeeves had the right idea – just too early

The second major factor which affects the adoption of voice search for search engine queries is the ability of search engines to understand what we are searching for or so-called natural language understanding (NLU). Note that this is a separate issue from speech recognition, which is concerned only with the ability to properly convert our spoken speech to text.

Natural language understanding has a long history, with first attempts done at MIT in 1960s.

In context of modern use in search engines an interesting attempt was that of a now mostly forgotten search engine Ask Jeeves, launched in 1997. Original idea of Ask Jeeves was to allow users to get answers to questions posed in everyday, natural language, as well as by traditional keyword searching. The idea was right and validated in recent years with search engines increasingly using the same approach for an increasing number of queries.

However the lack of excellent NLU models, computational complexity, and corresponding infrastructure was probably too high for the era of 90s and could be one of the reasons why Ask Jeeves did not succeed in the search engine space.

Until 2011 most search engines algorithms revolved around keywords with NLU not playing a prominent part.

This changed when Microsoft introduced new natural language capabilities to Bing Shopping product in March 2011. With updated Bing you could enter conversational phrases like “Air Jordans under $100” or “Hudson jeans under $200” and get improved search results from Bing.

Hummingbird Update

The ability of search engines to understand our questions has significantly improved in recent years with natural language understanding becoming a central part of search engines algorithms. One of the most important changes in google search engine algorithm was the so-called Hummingbird Update. One of the its main changes was increased focus on semantic search, i.e. natural language queries, considering context and meaning over individual keywords.

However they are still far from perfect and many users, after repeated failed attempts to get assistant understand their questions and intent, give up on using the voice search.

Advances in this area will be key if the technology companies want to achieve a more wide spread adoption of voice search.

Limited nature of voice search format – challenge for search engines

Even with perfect conversion of our speech to text and correct understanding of our questions, voice search would still suffer from a limitation which is more inherent and part of the use format.

By definition voice search responds with spoken text which limits the response. When searching for specific facts, e.g. what is the birth date of particular person, this is not of limiting nature. However when our taks is of exploratory nature, e.g. looking for the best translation agency to hire, voice search can make it cumbersome to achieve our goal.

Improving the delivery of voice search results will be one of the main challenges of the search engines for certain categories of search queries in years ahead.