PVC Robot
Design Brief
This story starts with a drawing that my daughter Claire created of a little robot that she received as a gift for Christmas. That drawing turned into a conversation on a long car ride. We talked about how she imagined interacting with her little robot and the things that it could do. Along the way I suggested that maybe if we were clever we could design and create our own house robot. We discussed various characteristics that it could have, what it might look like and what it could do.
I wanted to build a robot, self contained and mobile, with voice interaction and computer vision. Claire wanted a robot that has a vacuum and an arm.
Our Robot Goals
- Has an arm
- Has a vacuum
- Can Listen
- Can Speak
- Can See
- Can Move
Design Documentation
Version 1.0 design: The robot can go fast and has a remote control. Drawing by Claire.
Detailed robot arm design and schematics by Claire.
Construction
The robot is simple and made of materials available easily found at a hardware store. It is constructed from PVC pipe, lexan and aluminum stock.
The chassis is a simple tank drive / caster setup using two optically encoded motors from a first generation Roomba.
The robot will have a head unit that can rotate roughly 90 degrees independently of the robot's body
The robot will use a simple ultrasonic distance sensor to detect obstacles. You can see the PVC cap that will become its head-unit sitting on the right.
Here is the completed chassis sitting on the workspace.
An Arduino is used to control the motors via a simple motor controller board, read the motor encoders and measure distance with an ultrasonic sensor.
Everything fits nicely into a six-inch PVC pipe junction.
A Raspberry Pi takes care of sensing, networking and all other interactions.
Inside the head unit you can see:
- Color OLED screen
- Rasperry Pi
- Rasperry Pi Camera
- Head rotation servo
- USB Speaker
- USB Microphone
Development
Motor controller
I created a simple loop for the Arduino that reads commands from the serial port and detects changes in the optical sensors via an interrupt.
Eyes
Detect a face using OpenCV and react to it
Ears
Snowboy, a hotword detection engine is used to listen for the robot's name and forward the audio to the Google voice to text service. The robot will respond if the response matches any of its commands.
Mouth
eSpeak TTS is used for the robots voice. eSpeak is a compact open source software speech synthesizer for English and other languages.
Face
luma.oled is used to control the OLED and create a "happy" face for the robot.
Testing
| Feature | Pass / Fail | Quality | Conclusion |
|---|---|---|---|
| Move Body | Pass | D | Motion commands are disregarded |
| Move Head | Pass | C | Head motor is very noisy |
| Detect Name | Pass | C | Speaker must be very loud |
| Detect People | Pass | D | Is very slow to detect |
| Following | Pass | D | Can follow a very patient person |
| Understand Commands | Pass | C | Response from VTT is slow |
| Robot Speech | Pass | D | Sounds like a robot |
| Display interactive face | Pass | B | Face is responsive and animates nicely |
| Intelligence | Fail | F | Does not appear to be intelligent |
| Engagement | Fail | F | Does not appear to be engaging for children or adults |
Important things that we learned
Give the robot a nice name
I started out using the name PVC for this robot thinking that it was a clever play on the acronym for polymerizing vinyl chloride. Once I had generated a model for the keyword recognizer and began to work on connecting it to the robot's code I quickly discovered that calling PVC to get its attention gets tiresome fast. At one point my daughter Claire yelled across the house at me to stop repeating the robot's name.
Acknowledge the user
When the robot does not acknowledge its keyword immediately the user has no insight into the robot's state. Any sort of immediate response from the robot can remediate this issue. I started out by using a quick beep to acknowledge the user. In the future I would like to test how well non-auditory cues work. An example would be to change the robot’s facial expression to appropriate states when it is listening, processing and responding to the user. If the robot had directional microphones it could turn toward the user in a manner similar to how that Amazon Alexa uses lights to show the relative angle of the detected voice.
Respond quickly
Stream the voice if possible to a voice to text system that can process a stream. My robot records and then uploads the voice to a VTT service and then processes the return string. A latency as short as one second can frustrate the user and dissuade them from continued interaction. A delay longer than two seconds will create doubt in the users perception of the robot's capacity to listen to commands.
If the system opens a stream to the VTT sever when it expects voice interaction, it can use it to convert voice to text in real time once a keyword has been detected. Ideally the server can also send back converted text and commands to the robot in real time.
Do not put it inside the body of the robot where non-voice noises will interfere with recording quality. This robot has the microphone installed inside its head where it performs poorly. I usually have to raise my voice to have it respond, while my Alexa Dot is much more responsive.
Core Commands
Make sure that the robot is highly receptive to all of the users basic expectations and set the users expectations early.
Keep the conversation going
If the robot doesn’t have the capacity to respond to a command, keep the response terse. A verbose response will frustrate the user when they are trying to figure out how to interact with it. I personally find it frustrating when Alexa and Siri try to respond with humor as it is typically rote, predictable and dead ends the interaction (as intended?).
Use the robot's face
Show that the robot it is paying attention. When the robot is listening, squelch its other behaviors. If you know where the user is, turn to them.
Don't try to trick the user
Don’t try to trick the user into believing that the robot is intelligent. This will never work until robots are truly more intelligent than humans.
Keep the dialog simple
Don’t over do it with flowery responses and complicated commands, this falls into the fake intelligence trap and users will quickly tire of this.
Allow commands to be as simple as ‘robot_name’ ‘word’. For example, allow the user to ask for the time by saying ‘time’ and the weather by saying ‘weather’. This is a good place to start. For commands requiring context, allow the user to communicate them with the minimum information. For example, ‘robot name’ ‘play message’ should make the robot play the latest message from the principle messaging system that the robot has access too.
If you want to add personality to your robot, use motion. Using timbre and other vocal expression will push the interaction towards the uncanny valley. If your robot has a face, use the face but keep it simple. Simplify the facial expression to the smallest set of features that your interactions require. Users are more likely to trust simple robots.
Next Steps
- Make an arm and attach it
- Attach a vacuum
- Add Intelligence
- Install a battery