Meet SAM, the Robot Waiter

SAM was a group project: four people, ten weeks, three progressively harder challenges. I was responsible for all interaction design: the emotional state system, dialogue flow, voice decisions, and the user evaluation.

2023 · Human-Robot Communication course · Team of 4 · University of Twente

Role Interaction design lead: emotional states, dialogue flow, voice decisions, user evaluation

Problem How do you design a robot that feels socially competent without being misleading about its capabilities? SAM had to communicate intent, handle errors gracefully, and feel trustworthy to strangers, using LEDs, movement, and voice.

What we did Three progressively harder challenges, each adding a communication constraint. Co-creation and peerplay before building anything, a structured voice test comparing robot-like and human-like delivery, and a full user evaluation in Challenge 3.

Finding A robot-like voice outperformed a human-like voice. Sincerity matters more than warmth. And among all the expressive channels, the eyes carried the most weight: participants who could not see them clearly missed SAM's emotional state entirely.

Context

Three open-ended challenges, each adding a new communication constraint: verbal only, then non-verbal only, then both combined. We chose a robot waiter. A socially loaded role that made every constraint harder and more interesting.

The course ran across three challenges, each adding a new constraint. We chose a restaurant context throughout. A waiter is one of the most socially loaded roles there is, and getting a robot to feel like it is genuinely attending to you, not just completing a delivery, is a real design challenge.

My focus was interaction design: the emotional state system, dialogue flow, voice decisions, documentation, and presentation. The hardware and coding were handled by my teammates.

Verbal — robot receptionist

Speech only, handling unpredictable natural language from a stranger

Non-verbal — silent waiter

LEDs and movement only. No speech, no fallback.

Multi-modal — voice and expression combined

Face tracking, dialogue management, and voice informed by what failed in Challenge 2

A co-creation session early in the process confirmed that eyes and a warm expression were non-negotiable. Every participant said so independently.

SAM is a laser-cut wooden robot with hexagonal LED eyes, an LED matrix mouth, a serving tray, and a Huskylens face-tracking camera mounted in the head. The wooden body was built by teammates. The interaction layer was mine to design: what SAM sees, says, and expresses.

Process

We prototyped through peerplay before building anything, tested two voice types with real participants, and let the data decide.

Challenge 2 started with peerplay: we acted out the scenarios before building anything. One person played the robot, one the customer. That exercise surfaced the three core branches early and grounded every subsequent decision in something we had actually experienced, not assumed.

For Challenge 3, a co-creation session defined SAM's full modality set. Before committing to a voice, we ran a dedicated comparative voice test comparing robot-like and human-like delivery. The robot-like voice won, since a human-sounding voice raised expectations SAM could not meet. In a service context, sincerity matters more than warmth.

SAM Challenge 2 prototype with blue LED eyes showing apologetic state — Challenge 2: apologetic state. Blue LEDs, no speech, no fallback.

SAM Challenge 2 prototype with white LED eyes showing happy state — Challenge 2: happy state. White LEDs. The eyes were readable from across the room.

SAM the robot waiter — full wooden laser-cut body with LED eyes, mouth, and serving tray — The final version of SAM. Face tracking, voice, and full emotional expression combined.

SAM used three states. The default state (blue eyes, slight smile) was SAM's baseline as it approached and waited. Happy and error handling are shown below.

State II

Happy

Triggered when the customer picks up the correct drink. In Challenge 3, this state combines with a verbal response. SAM confirms the order and says goodbye before moving on.

Correct pick-up

State III

Error handling

Triggered by a wrong pick-up or no response. SAM acknowledges the issue verbally and offers a way forward: "Oh sorry, what else did you order?" Proactive rather than defeated.

Wrong order or no response

SAM approaches, makes eye contact via Huskylens face tracking, greets the user verbally, and moves forward to offer the drink. An LDR sensor on the tray detects when the drink is taken. Google Dialogflow handles every branch: silence, wrong order, ambiguous response. Python handled speech processing via Google APIs; Arduino handled sensing and actuation, connected via serial. SAM technical architecture: Python handling dialog management and speech via Dialogflow, Arduino handling physical sensing (LDR, Huskylens) and expression (eyes, mouth, wheels), connected via serial communication

SAM technical architecture: Python handling dialog management and speech via Dialogflow, Arduino handling physical sensing (LDR, Huskylens) and expression (eyes, mouth, wheels), connected via serial communication

Outcome

The concept landed well. Participants engaged with SAM unprompted, and the eyes were the most legible expressive channel by far.

Participants responded warmly. Several described SAM as beautiful or special, and many engaged with it unprompted, waving, smiling, and saying thank you without being asked. The eyes were the most legible expressive channel by far. One participant felt more comfortable with SAM than with a human waiter, citing less social pressure in the exchange.

The full modality set in place, we ran a structured user evaluation across three scenarios: correct pick-up, wrong pick-up, and no response. Two numbers stand out:

6/9

Task completion

Participants completed their scenario correctly, even when uncertain during the interaction.

2.57

Concept liking

On a 7-point scale where 1 is most positive. Highest-scoring dimension in the evaluation.

9 participants across 3 scenarios (correct pick-up, wrong pick-up, no response). Evaluation combined Likert-scale surveys, observation notes, and open-ended questions analysed using thematic analysis. On a 7-point scale where 1 is most positive: initial reaction 2.75, engagement 3.13, interaction flow 3.5, concept liking 2.57. Cue clarity scored 4.13, the gap Challenge 3 addressed. Qualitative themes across scenarios: design, clear communication, anthropomorphism, entertainment, uncertainty.

This is what Challenge 3 looked like in practice:

Challenge 3 demo, the final version of SAM. It combines face tracking, voice, and emotional expression.

Takeaways

Three things this project reflects about how I work:

Collaboration

Defining and holding the interaction design perspective in a technical team

When roles are split between design and engineering, the interaction designer has to be deliberate about what they own. Across all three challenges, that meant translating user needs into concrete system behaviours, making sure the emotional logic was clear before a single line of code was written, and staying involved as the robot evolved.

Research-informed design

Small, targeted tests can change the direction of a design

Rather than defaulting to a human-like voice because it felt more natural, we ran a structured comparison with real participants. The result contradicted our assumption and directly shaped a core design decision. It reinforced that user research does not need to be large-scale to be meaningful.

Interaction design

Designing social competence is the same challenge across contexts

Making SAM feel appropriate required understanding which social signals matter most, how to express them within hardware constraints, and how to handle failure states gracefully. The same challenge appears in any system that needs to feel responsive and trustworthy. The constraints just look different.