AR / VR / XR 2017 Completed

Articulab SARA — Socially-Aware Robot Assistant

A 3D virtual humanoid AI agent showcased at the World Economic Forum in Davos (2017) and Tianjin (2016). Personalizes interaction and improves task performance by building rapport through social intelligence. John Choi built the new Unity 2017 framework with JSON-driven body animation, blendshape facial expressions, lip-sync, and TTS.

UnityC#Virtual AgentAvatarFacial AnimationLip SyncJSON APIMecanim3D ArtCMUArticulabWorld Economic ForumAIRapportNLP
Articulab SARA — Socially-Aware Robot Assistant

Overview

SARA (Socially-Aware Robot Assistant) is a virtual human AI agent that interacts with people in a whole new way — personalizing interaction and improving task performance by relying on information about the relationship between the human user and the assistant. Rather than replacing people, SARA is programmed to collaborate with her human users, depending on socio-emotional bonds to improve performance.

SARA always attends to two simultaneous goals:

  • Task goal — finding information, navigating a conference, teaching a subject
  • Social goal — ensuring interaction is comfortable, engaging, and builds closeness over time

SARA was showcased at:

  • World Economic Forum Annual Meeting — Davos, January 2017
  • World Economic Forum Annual Meeting of New Champions — Tianjin, China, June 2016

My Work

My work for the SARA Project was primarily focused on building a new SARA Unity Framework for Unity 2017 as a lightweight replacement to the older VHToolkit and Smartbody base frameworks.

To control the virtual SARA humanoid character, a formatted JSON message is sent to the Unity Player, which then parses the incoming data and plays:

  • Advanced synchronized body animation control (Mecanim)
  • Blendshape facial expressions (FACS-based)
  • Lip-sync
  • Text-to-speech

The New SARA Unity Framework was designed to serve as the foundation for all future Articulab projects — superior compatibility, faster development, easier maintenance. Building this required C# programming, Unity’s Mecanim animation system, 3D art asset development (modeling, texturing, rigging, animating), and deep knowledge of human expression.

The Mery character model was rigged using Mixamo as part of the pipeline.

Technical Architecture

SARA’s AI pipeline operates through three phases:

Detection — Recognizes visual (body language via OpenFace), vocal (acoustic features via OpenSmile), and verbal (conversational strategies) cues from the user. Uses recurrent neural networks and L2 regularized logistic regression to estimate rapport in real time.

Reasoning — Carries out classic task reasoning alongside novel social reasoning — a spreading activation network that selects the best conversational strategy given the current rapport level and context.

Generation — The NLG module generates language and body language (gestures, eye gaze, head nods, smiles) through BEAT (Behavior Expression Animation Toolkit) and BML (Behavior Markup Language), rendered by the virtual human.

Team

  • Justine Cassell — Principal Investigator
  • Yoichi Matsuyama, Ran Zhao, Arjun Bhardwaj, Fadi Botros, David Slebodnick, Jiajia Li, Luo Yi Tan — ArticuLab
  • Oscar Romero, Sushma Ananda — CMU-Yahoo InMind project
  • Summer 2016 Interns: Divya Sai Jitta, Orson Xu, Ting Yan, Ying Shen, Zhao Meng
  • Fall 2016 Interns: Akanksha Kartik, Alexander Bainbridge, Anna Tan, Chileshe Otieno, Ethel Chou, Jacqueline Yeung, Sara Stalla, Sasha Volodin
  • Zhao, Sinha, Black & Cassell (2016). Socially-Aware Virtual Agents: Automatically Assessing Dyadic Rapport from Temporal Patterns of Behavior. IVA 2016Best Student Paper
  • Zhao, Sinha, Black & Cassell (2016). Automatic Recognition of Conversational Strategies in the Service of a Socially-Aware Dialog System. SIGDIAL 2016
  • Matsuyama, Bhardwaj, Zhao, Romero, Akoju & Cassell (2016). Socially-Aware Animated Intelligent Personal Assistant Agent. SIGDIAL 2016

Media Coverage

Featured in BBC Business Daily, Foreign Policy, CNBC Africa, USA TODAY, MIT Technology Review, CNET, Popular Science, Science Friday, Washington Post, Financial Times, Bloomberg Quint, and many more.

Funding

Microsoft Faculty Research Award, Microsoft equipment donation, gift from LivePerson, IT R&D program of MSIP/IITP [2017-0-00255], Google Faculty Award and Google Cloud Platform donation. Computing power by DroneData.