How might we design a new and usable product which enables users to look and speak about their surroundings?

 

Overview

The Cerence Multi-Modality project combines speech recognition and eye-tracking to answer natural questions like “what is that over there.” Users are able to reference buildings and landmarks by looking at them and speaking about them. Such a system is a powerful example of how non-verbal information plays an important role in communication, and represents a completely unique design space.

As lead designer of this product, I worked closely with experts in speech recognition and gesture recognition to define the user experience of an in-vehicle assistant equipped with this feature. Through research and prototyping, I created a visual and verbal interface which communicated back to the user what it was capable of, and inspired a new kind of system interaction.

The project was first demonstrated at the consumer electronic show (CES) in 2019 and has since been implemented in Mercedes Benz MBUX.

Company: Cerence

Role: Voice design lead

Collaborators: German Research Centre for AI (DFKI), UX researchers, AI developers, NLU developers, dialog developers, product managers, sales managers

Time: Oct 2019 - Dec 2021

Customer: Mercedes Benz

The challenge

The engineering team at the German Research Centre for AI (DFKI) had devised a way to correlate eye tracker data with a 3D map of the area around a car. This would empower drivers to look out the window and speak to their in-car assistant on-board: “What’s that building over there?” But such an interface had never been used before. They needed a designer to define how this kind of experience could be completed.

As lead designer of the product, I began researching our primary objectives.

Core design questions

 

How will users naturally try to use an eye-tracker while driving?

What concerns do drivers have about using this technology?

How might we make this new way of interacting with assistants discoverable?

 

Research process

This was an exciting problem to take on, as such a system would allow users to incorporate non-verbal information into a conversation with a virtual assistant. However, users have never used an in-car voice assistant with eye-tracking capabilities, so foundational groundwork needed to be done to understand how and why users might interact with such a system.

Fortunately, our technical team in Germany had created a life-sized prototype using a highly immersive video projector that wrapped around a parked vehicle. My team ran studies to gather initial feedback and test how the system might operate.

However, I am located in Montreal and didn’t have access to a working prototype, so I needed to be creative with how I collected feedback. My solution was to host an online study where I had participants watch a video from behind a car dashboard and speak freely about what they saw. I was careful to provide a vague introduction to the product idea, without biasing how they might use it by making use of metaphors to real world situations without the use of technology.

By transcribing their audio, I discovered what felt natural for users to try at first. For example, many of them were trying to describe the buildings using visual reference by asking questions like “What’s that yellow building on the corner.” By observing patterns in user’s initial interactions with the system, I found discrepancies in our expectations and technical abilities.

Through my initial experiments with prototypes, I was able to gather research questions from which I could base my first design explorations.

Photo of me sitting in the immersive “igloo” prototype at the DFKI lab. Full sized vehicle was placed inside a wrap-around video simulator. During testing the driver could use full range of vision to look at life sized surroundings and ask about them (2019).

Screenshot from video user research study. Participants were shown images and videos of a vehicle driving in New York. They were asked to speak candidly about things that interested them (2019).

Exploring the design space.

After grounding my understanding of the problem in some inital feedback, I was ready to enter a fun period of exploration. At this stage in the process, it was super important for me to work closely with the developers, as eye-tracking isn’t something I had experience with before. I felt my job was to create an accurate mental model for users, I didn’t want to build something in an ideal system without considering how it would actually feel. 

In general, I tend to use variable methods and tools to conduct ideation work because I find improvising ways to explore an idea are the best ways to think outside of a box. I’ll make whatever quick mockup or scrappy prototype which best gives me the chance to explore a wide range of ideas before digging too deeply in one direction or another.

For this project, I played around with primarily the following:

  • Storyboarding

  • Technical specifications

  • Graphic mockups

  • User journey

Core design problems

During my exploration, I converged on core design problems which presented critical to the experience we wanted to create. One such problem was how to communicate a novel mental model to users which included the notion of “sight”. After all, how would a user know and remember their assistant was equipped with an eye-tracker, and that they could use non-verbal cues in their speech?

My proposal was to craft the system speech using particular language which implied sight. For example: “On your left. I see Brunos Bakery.”

Deliverables

Throughout this project I created 5 UX specifications ranging from technical to graphic. I launched 4 research studies, and when the time came to productize and sell the innovation, I presented and worked with the customers trying to understand the UX and bring their own ideas into it. During this time, I helped product managers and professional services strip down what the customer was not interested in and add the features they were. My role in that transition was mostly to stand up for the research and core UX we had learned by having exposure to the technology. But also taking in what the customers were looking for so that for our next projects, we have a better understanding of what our goals should be to meet those needs. 

 

Result

After building the inital design, I was able to present our finished product at the consumer electronic show (CES) in Las Vegas during 2019 and 2020. Over the course of the next year, I worked closely with customers and the professional service team to implement the product with customers.

Our first customer, Mercedes Benz, released the following video showcasing the new feature: https://www.linkedin.com/posts/cerence_multimodal-mbux-sclass-activity-6754753288232673280-wYEU/

See another project