Gesture generation challenge for virtual assistantants

We’re challenging virtual assistants to be creative, deliberate, and able to explain their decisions and behavior

“I’ve made you this playlist of songs about cities that we both love!”

“I’d like to offer you this book because the relationship between the main characters reminds me of our friendship!”

“I’ll play you a ukulele cover of the theme song of your favorite TV show!”

People often make kind gestures like the above, gestures which might reasonably be described as both thoughtful and creative. This ability of ours to demonstrate thoughtfulness towards others (loved ones, acquaintances, colleagues and customers, even near strangers) leads to some of the most valued instances of human social behavior.

To achieve this, we use knowledge and reasoning of various types, as well as social awareness and emotional intelligence, to identify situations in which thoughtful gestures are opportune, and to adapt them to the person they are directed toward and to the situations that prompt them. We may behave cautiously to keep our initiatives pleasantly surprising by not revealing our intentions before we reveal the actual gesture.

Thoughtful gestures are often spontaneous and unconstrained. They can occur within unstructured, open-ended interactions, such as those between long-term friends, as well as in more structured, task-oriented conversations, if these happen to be conducive to personal rapport.

But if a virtual assistant, rather than a human, were to take on the part of the thoughtful gesture creator in such an interaction, how well could it perform its task? While we wouldn’t necessarily request that the assistant play the ukulele for you, it should at least be capable of expressing a well-founded intention to do so.

The Artificial Intelligence (AI) community has already challenged researchers to create virtual assistants that can conduct general conversation, compose music, drive cars, and play Jeopardy! (Ferrucci et al., 2012). We challenge them to develop assistants that can come up with appropriate thoughtful gestures for people (Coman, Mueller, and Mayer, 2018). After having a brief conversation with a human, such an assistant should generate a text-based description of an appropriate thoughtful gesture for its dialogue partner, or conclude that the conversation was not conducive to a gesture (e.g., relevant personal information was not exchanged). The conversation may be general or task-oriented, but it must not be about the thoughtful gestures themselves. Some variants of the challenge are simpler, as we’ll see. For the purposes of the challenge, we define a thoughtful gesture as one that is:

  • Directed toward a person
  • Intended to have a positive impact on the person it is directed toward
  • Intended to be unexpected by that person.

Gestures might include offering gifts, creating personalized playlists, and writing poetry inspired by the recipient.

AI Challenges should have broader significance beyond the specific application they’re built around. That way, they can drive the development and demonstration of abilities applicable in a variety of domains and meant to help advance AI as a field. When we challenge virtual assistants to come up with thoughtful gestures, we’re also challenging them to understand stories told by humans, to reason about the implications of social actions (e.g., “Might the gesture I’m planning be misunderstood or have other unintended negative consequences?”), to be informed by social norms and aligned with human goals, to be creative, deliberate, and able to explain their decisions and behavior. Hence, progress in this direction is also progress toward “machine enculturation” (Riedl 2016), the teaching of human values to AI systems such as virtual assistants.

Thoughtful gestures and computational creativity

In the computational creativity field, various AI techniques are used to produce artistic artifacts and performances such as narratives, music, visual art, poetry, choreography, and content for computer games (Loughran and O’Neill 2017).

While more mundane than art, thoughtful gesture generation is arguably also more universally human and doesn’t require exceptional skills or talent (although, if available, skills and talent can serve to enhance gestures, as in the ukulele cover example above). In computational creativity parlance, we can call thoughtful gestures products of “everyday creativity” (O'Neill and Riedl 2011). To demonstrate that generating them is a genuine application of computational creativity, we turn to four criteria used in the field to determine if a computational process can be meaningfully described as “creative”: novelty, value, unexpectedness (Boden 1990), and intentionality (Ventura 2016).

Novelty

In order to achieve novelty, virtual assistants that compete in our challenge wouldn’t be required to conduct full generative processes, i.e., to create something new, such as a book, from scratch. It’s true that such generative acts might make for particularly compelling gestures (e.g., writing a sonnet inspired by the recipient). But even selecting a gesture out of several available ones can demonstrate creativity if the assistant comes up with a novel and interesting connection between the selected gesture and the person it was selected for (“e.g., I chose this book for your because the main character is similar to you in this way: ...”).

Value

The value of a gesture lies in its thoughtfulness. To be considered thoughtful, it must at the very least:

  1. Be likely, based on all available information, to have a positive socio-emotional effect on the recipient
  2. Be demonstrably rooted in the information provided by the recipient, and appropriately justified.

Hence, the virtual assistant must demonstrate no misunderstanding or willful disregard of personal information provided by the recipient during the conversation (e.g., offering a cookbook to a person who has expressed an aversion to cooking).

Unexpectedness

Unexpectedness is central to our challenge: both the planned thoughtful gesture itself and the very fact that a thoughtful gesture is being planned must be kept surprising. As the virtual assistant carries on dialogue with humans, it should plan what it says carefully, in support of these often conflicting goals:

  1. Acquiring relevant information for deciding upon an appropriate thoughtful gesture (e.g., “Is this person an avid reader?”, “Do they already own the book I’m thinking of offering them?”, “Are they allergic to chocolate?”)
  2. Maintaining unexpectedness by not revealing its thoughtful intentions through an ill-planned question or remark.

We call this “surprise-preserving dialogue”. To engage in it, virtual assistants will likely need to be able to reason about the beliefs of their conversation partners (including what they expect and do not expect) and about how the assistants’ own utterances might affect those beliefs.

Intentionality

Intentionality is defined by Ventura (2016) as “the fact of being deliberative or purposive; that is, the output of the system is the result of the system having a goal or objective — the system’s product is correlated with its process.” This is a particularly significant and problematic criterion, with implications beyond computational creativity. For what does it mean for any virtual assistant (not necessarily a creative one) to demonstrate intentionality? We explore this next.

It’s all in the framing: AI explainability and the thoughtfulness challenge

So, you’re in luck! A seemingly thoughtful virtual assistant would like to send you some delicious chocolates! But is that because you’ve mentioned having a sweet tooth, and the assistant understood that and identified it as salient personal information that makes you who you are and should be celebrated with a gift? Or is the assistant just hardcoded to give the chocolates in question to everyone because they are a fairly safe default gift?

Admittedly, some gifts, combined with the interaction preceding them, are more inherently convincing. If a virtual assistant sends you elephant-shaped chocolates after you’ve mentioned liking both chocolate and elephants, there’s a stronger point to be made for genuine deliberation and understanding. But how can we be consistently sure, over a variety of gestures, that we’re not ascribing more intentionality to the gesture creator than it actually has? In other words, how do we evaluate intentionality?

Computational creativity as a field is understandably accustomed to such objections. Readers  may respond to computationally-generated poetry with “This looks intriguing, but did this AI poet really know what it was doing when it wrote it?” One approach to addressing this is to provide framing of the computationally creative process.

What do we mean by “framing” in this context? It is defined by Colton, Charnley, and Pease (2011) as “a piece of natural language text that is comprehensible by people, which refers to [generative acts].” It can include information about the creative process, offering a glimpse into the intentionality underlying it.

In our case, the framing could take the form of a note, written in natural language, accompanying the thoughtful gesture. Below is an example of framing that might accompany this hypothetical thoughtful gesture: a gift of volume of “In Search of Lost Time” by Marcel Proust.

“Dear Claire, [acknowledgment of the conversation that triggered the thoughtful gesture] I really enjoyed talking with you about your trip to France! [explanation of the surprise in relation to the conversation] Your story about how the smell of vinegar reminds you of Christmas with your grandma made me think of Proust’s story about how the taste of a madeleine dipped in tea brought back childhood memories of his aunt. I hope that you enjoy reading this book and that it reminds you of France :)!”

A more in-depth type of framing could require the virtual assistant to explain all decision processes that led to generating the gesture, e.g.: “I decided to initiate thoughtful gesture generation when Claire told me about her trip to France. The gesture is relevant to Claire for the following reasons: […]. I first thought of sending her a French recipe cookbook, but then I found out that she dislikes cooking. I think that Claire will be pleased with a gift that reminds her of France because she seemed to enjoy the trip.”

By illuminating the deliberation behind thoughtful gestures, framing could also help reduce the risk of “gaming” this AI challenge. “Gaming” means attempting to succeed in a challenge by exploiting its theoretical or implementation weaknesses rather than by developing the skills that the challenge was meant to advance in the first place.

Of course, generating framing is also an AI explainability endeavor and a challenging problem in its own right.

With this additional framing requirement in mind, we could perhaps see our hypothetical assistant’s box-of-chocolates gift in a different light. Offering the same box of chocolates to all its dialogue partners (except the ones who have informed it of an allergy/aversion to it), with a personalized explanation for each person as to why that box of chocolates is just right for them, could be a fairly impressive feat of understanding and creativity.

Thoughtful virtual assistants, from trainees to maestros: Variants of the challenge

Suppose you’d like to create a virtual assistant that could participate in a thoughtful gesture generation challenge. In the full variant of the challenge, the assistant would have to engage in dialogue with a human and then possibly decide to do something nice for them based on information exchanged during the dialogue. So, in order to even stand a chance of doing well, the assistant would need to be an accomplished conversationalist as well as a thoughtful and creative generator of nice gestures (in fact, its thoughtfulness and creativity might manifest in its dialogue skills as well). But requiring this could create prohibitive challenge entry standards. Full conversational skills are still a difficult problem for virtual assistants, so we’d be compounding difficult problems and decreasing the likelihood of promising short-term results.

Instead, we propose increasingly complex challenge modules. In a simpler variant, instead of conversing with a person, a rookie thoughtful virtual assistant would simply “read” a short, first-person story about the person in question. Then, for further simplification, it could be asked to select an appropriate gesture for the story protagonist from a multiple-choice list, rather than to create a gesture description. Framing would still be required, especially since this multiple-choice variant is more vulnerable to gaming.

Of course, in all variants of the proposed challenge, the allowed thoughtful behavior is more or less restricted for the purpose of a well-defined, containable, formalized evaluation framework. Real human thoughtfulness manifests much more broadly, and includes, for example, spontaneously expressing compassion when we feel that it is appropriate or offering a bus seat to a person who may need it more. All of these will ultimately have to be within the reach of enculturated virtual assistants, and the skills acquired and demonstrated for the purposes of our thoughtful gesture generation challenge can hopefully help virtual assistants acquire such deep and broad social awareness.

References:

  • Boden, M.A. 1990. The Creative Mind: Myths and Mechanisms. Weidenfield and Nicholson, London
  • Charnley, J.W., Pease, A., and Colton, S. 2012. On the Notion of Framing in Computational Creativity. In Proceedings of ICCC 2012, 77–81
  • Colton, S., Charnley, J., and Pease, A. 2011. Computational Creativity Theory: The FACE and IDEA Descriptive Models. In Proceedings of ICCC 2011, 90–95
  • Coman, Mueller, and Mayer. 2018. Thoughtful Surprise Generation As A Computational Creativity Challenge, In Proceedings of the Ninth International Conference on Computational Creativity (ICCC)
  • Ferrucci, D., Levas, A., Bagchi, S., Gondek, D., Mueller, E.T. 2012. Watson: Beyond Jeopardy!, J. Artif. Intell.
  • Loughran, R., and O’Neill, M. 2017. Application Domains Considered in Computational Creativity. In Proc. of ICCC 2017, 197–204
  • O'Neill, B., and Riedl, M.O. 2011. Simulating the Everyday Creativity of Readers. In Proc. of ICCC 2011, 153–158
  • Riedl, M.O. 2016. Computational Narrative Intelligence: A Human-Centered Goal for Artificial Intelligence. In Proc. of CHI 2016 Workshop on Human-Centered Machine Learning
  • Ventura, D. 2016. Mere Generation: Essential Barometer or Dated Concept? In Proc. of ICCC 2016, 17–24.

Alexandra Coman, Sr. Mgr, Software Engineering, Capital One

I'm an AI Research Scientist at Capital One. My research areas include cognitive systems, narrative intelligence, affective computing, automated planning, case-based reasoning, and goal reasoning.

Related Content