press enter to search

Speech technology for efficient, easier communication

Dessi Puji Lestari
Dessi Puji Lestari

Cofounder and chief scientist of speech of

Jakarta  /  Thu, April 16, 2020  /  03:02 pm
Speech technology for efficient, easier communication

Illustration of voice recognition technology (Shutterstock/metamorworks)

Talking to each other is the most natural form of communication for humans. It is an efficient way to express desires, opinions and ideas. Along with body gestures, talking can also be used to express feelings or emotions. Humans continue to strive to communicate with machines or computers using speech or voice command instead of using a keyboard or touchpad.

The first machine that used voice commands was invented in 1920, a toy dog named "Radio Rex", which would come out of its cage when its name was called. The technology was still very simple, in the form of a spring that would be released if it was given acoustic energy of 500Hz. It is the average of the resonant frequency of the sound "e" in the word "Rex" of men's voices. Speech technology continues to grow and until now it has used deep learning-based technology that is able to recognize large vocabulary continuous speech very accurately.

One of the most popular applications of speech technology is speech recognition, usually called speech-to-text. This is used by voice-based virtual assistants and robots, which are becoming more popular. Virtual assistants can be implemented on specific devices, such as smart speakers like Amazon’s Alexa and others, including locally developed devices. It is also implemented as a software application in devices such as Google Assistant, Samsung Bixby and Apple Siri, which is getting better every day.

Read also: Unlock your iPhone with Face ID while wearing a mask

The voice-enabled devices can also be built in your watch, car’s dashboard and any internet of things (IoT) device. Even now we are able to talk to voice capable devices in our home and make them do stuff, like turn devices on or off, search for movies or send signals to other devices. We can also use speech recognition technology for dictation or automatically transcribe voice memo for easier searching and analytics. It is already here and we will see the adoption of the technology get wider over time.

Businesses can also take advantage of speech recognition technology. Among important things in a business is gaining insight from customer feedback. Speech recognition technology can be used to automatically transcribe tons of customer service calls, to be processed further by natural language processing to identify keywords, topics and trends. Along with listening to and understanding customers, businesses can use the technology to get insight on how to streamline the support process and monitor support agents and representatives’ performance. Since agents are at the frontline of customer interface for a company, it is important that they deliver representative brand image and accurate information in an appropriate and approachable manner. Using the transcribed call recording data, they can understand customers more by applying in-depth data mining in gender, age estimation, language, accent, emotion and sentiment, topic, speech patterns and more. This will enable businesses to make highly-targeted marketing campaigns as well as improve services, support and sales performance.

Read also: Google announces braille keyboard on Android, no external hardware needed

Beside understanding the content of our speech, devices also need to generate voiced responses to be more human-friendly. This is where the speech synthesizer or text-to-speech technology plays the part. This technology can be used in many use case scenarios, such as generating dynamic voice announcements in public facilities; reading emails, e-books or news sites; and certainly to add personality to virtual assistants. It helps elder readers interact easily with new technologies, such as phones, computers and other digital devices, as information is available via voices. Speech synthesis has also helped people obtain content in the form of speech, such as those with visual disabilities, low vision, dyslexia or other learning disabilities and even low rates of literacy such as young children. This will help students, workers or individuals to explore more in the written world using voice representation generated by computers.

Another equally important speech technology is voice biometric. Similar to other biometric technology such as fingerprint, face or Iris, it functions to identify a speaker from its voice. This enables enhanced security and fraud protection through the application of voice authentication. Financial institutions and government agencies are among those who can adopt this technology as soon as possible. Voice biometric authentication can be applied in call centers to verify customers' identity. With passive enrollment, customers don't need to make special phone calls to register their voice prints to the system. Another scenario is to use voice authentication before carrying out a transaction using a voice command application to reduce fraud.

With the advancement of speech technology, especially for the Indonesian language, we will see in the near future that we can interact with digital systems more naturally, easily and more fun. As for business, it helps create better understanding of customers' voice and agents performance by automated insight extraction and avoiding fraud for more secure transactions. In turn, it will definitely create a better customer experience. (wng)


Dessi Puji Lestari is the cofounder and chief scientist of speech of PROSA.AI, an Indonesian artificial intelligence company working specializing in natural language and speech processing, computer vision and data mining.


Your premium period will expire in 0 day(s)

close x
Subscribe to get unlimited access Get 50% off now

Disclaimer: The opinions expressed in this article are those of the author and do not reflect the official stance of The Jakarta Post.