A Chatbot in the Crowd
How to build a time-saving bot for a multi-topic community forum
By Cosette Goldstein, Senior Software Engineer, and Alison Chi
If you’ve ordered pizza or set up an appointment online recently, you've likely engaged with a chatbot to help achieve your goal. Often, this experience takes the form of a direct message where a bot leads you through tasks under one specific topic. This automation of tasks is a clear time saver for both the user interacting with the bot and the humans who would otherwise be performing these tasks manually.
This article will dive into some suggestions and lessons learned when taking this time-saving experience to a community forum (i.e. an online support forum with many question-askers and many question-answerers, who play interchanging roles by providing help to each other) where the bot helps with tasks about multiple, expanding topics.
Why chatbots? Why community forums?
We typically interact with chatbots in a direct message setting where the bot helps us accomplish a limited set of highly specific tasks about one particular subject matter (or topic). For example, the COVID-19 bot helps identify COVID symptoms, and it does not know how to answer questions that fall outside the topic of COVID-19 such as “where to buy a bicycle.”
Single-topic direct-message chatbots bring incredible value through time saved for users and companies, but this value can also be brought to community forums to address queries that span a wide array of constantly expanding topics.
In 2018, our team at Capital One built a chatbot to answer associates’ questions about internal tools and processes. The bot originally existed solely as a direct message experience and covered just two topics. But in order to assist more users, we expanded the bot’s reach to our most-used support forum -- Slack. Capital One’s internal Slack has hundreds of channels dedicated to topics ranging from deploying software to corporate travel. Within each channel, users post about the channel’s designated topic. Within these channels, many of the same questions get asked repeatedly, and responding to them takes valuable time. Today, our bot responds to FAQs in over 150 internal Capital One Slack channels, giving askers immediate answers and letting the community members who respond more easily focus on other aspects of their jobs.
In this article, we will discuss the strategies we’ve found for implementing bots in community settings where both the number of topics and the size of individual topics are constantly expanding.
Integrating a chatbot into a community forum
In a growing number of industries, one-on-one human-bot interactions have replaced human-human interactions, cutting down on wait times on the customer side and resources used on the business side. These same benefits are seen in many-to-one human-bot interactions. Below, we’ll discuss two main challenges we’ve faced when building our community Slackbot.
Challenge 1: To respond or not to respond - determining relevance
In a one-on-one setting, anything the user says is directed at the chatbot and therefore must be responded to. But in a community setting, most users intend to interact with other users, asking and answering questions while the bot listens in. To prevent unwanted responses, it’s important for the bot to respond only when a posted message is relevant to the training data.
For chatbots, training data refers to example user queries (known as utterances) that a bot receives as well as the associated responses to each query given the user’s goal (known as intent). The topics that a bot is able to cover are determined by the training data, and anything outside of that scope is irrelevant for the bot. Below are a few suggestions for how to determine relevance.
- Develop a binary classifier: If topic scope stays constant, a simple solution is to build a model to distinguish between two classes of messages, relevant (within topic scope) and irrelevant (outside topic scope). But if scope changes frequently and you lack the resources to constantly update this model’s training data to cover new topics and intents, you can create a more general question vs. statement classifier as our team did. Our binary convolutional neural network (CNN) text classifier can successfully filter out “statements” - replies, announcements, etc. Once this model classifies a message as a question, we send the message to our Q&A models, which output a probability score indicating likelihood of belonging to a certain intent in the bot’s topic scope. We then use this score to decide whether to respond or not. Please note that while we used a machine learning model, other methods like a rules-based model (which looks for certain keywords or punctuation) may also be successful in determining relevance.
- Incorporate thread structure into decisioning: In communities like Slack, Reddit, or Stack Overflow there is a built-in structure where questions typically appear as the parent message of a thread. Exclusively considering parent posts can be a simple and effective way to reduce irrelevant responses.
Challenge 2: Extracting most important parts of messages given group communication styles
To streamline both in-person and online communication, people tend to change their speech patterns and word usage. This phenomenon is known as linguistic alignment. While it is typically seen in exchanges between two humans, it is also seen between bots and humans. We’ve observed that when users directly address our chatbot, they structure their questions like search queries. If they want to know how to reset their account, they might directly say “how to reset account.” But when posting in a large Slack support channel, they might say something more like “Hi team, I have been trying to log in for almost an hour and it seems like I’m locked out of my account. Does anyone have tips on how to reset it?”
In this case, their efforts to be polite to a large group of busy people makes their question noisier and harder for a model to understand. Below are tips for how to create a system that can respond to both direct and noisy inquiries.
- Combine direct message and forum interactions into one training set: It is well documented that utilizing call transcripts and chat logs is a good starting place for training data. However, combining this data with questions from existing support forums will create a training set that covers how people address both groups and individuals, allowing the model to learn a user’s intent no matter their communication style.
- Break up messages: Messages posted in community forums are often extremely long, telling a whole story before getting to a main point or containing multiple questions. So it can be useful to employ preprocessing techniques to isolate individual questions, whether it’s tokenizing the message to consider each sentence individually or using more advanced models to extract the question(s).
Covering a broad scope of company-specific topics
No matter the number of topics a model-based chatbot will cover, training data is essential. While the exact amount of necessary data depends on factors like model complexity, it is important to start with as much of it as possible while minimizing noisy cases. It is also important to keep maintaining the data in order to include new features, correct stale information, and address any gaps in existing bot performance. Below are two challenges associated with scaling to more topics.
Challenge 1: Developing training data across multiple topics
For most industry chatbots, dedicated teams constantly gather, refine, and label training data as they evaluate how well the current system is performing. This process is incredibly time consuming, even when labeling data on just one topic. Here are a couple approaches to collecting and maintaining training data that we have found useful.
- Let subject matter experts contribute: Many forums have an existing community of subject matter experts (SMEs) who answer questions. These SMEs are a great resource for crowdsourcing training data to handle topics much broader in scope than is achievable by a single, dedicated team at a company.
- Establish clear data guidelines and expectations: Whether you have a dedicated team or a community of SMEs contributing to training data, it’s important for them to understand the relationship between data quality and bot performance. For instance, without guidance, people may label just one example question per intent. But many more examples are required for a model to recognize patterns and variations in phrasing. So presenting new contributors with a document containing guidelines like these is a great way to ensure success. To take this guidance a step further, our team built a user interface that has built-in checks to validate any submitted training data. This interface greatly increased SME participation, resulting in more training data and topics and ultimately allowing our chatbot to provide more value.
Challenge 2: Achieving state-of-the-art performance with less data
In order for your bot’s models to keep up with widening topic scope, a constant influx of new labeled data is needed. But the proprietary nature of said scope may prevent the use of resources like Amazon Mechanical Turk to label. Here are two ways to overcome this.
- Fine-tune an existing model: Many models that are available on the internet demonstrate an impressive understanding of language, in large part because they have been trained on billions of utterances. Sentence-BERT, for example, is a modified version of the language model BERT that has been optimized for fast computation on tasks like finding semantic textual similarity. After fine-tuning it on only a few thousand new utterances, we found that it could effectively distinguish between all sorts of proprietary questions.
- Employ data augmentation: Data augmentation encompasses a set of techniques to artificially generate data by programmatically modifying existing data. This automated approach is very useful in cases where a model requires more training data than manual labeling allows. Our team, by applying simple transformations like synonym replacement, random insertion, random swap, and random deletion, has increased our training data size by more than a level of magnitude.
What’s next for community forum chatbots?
Chatbots are an incredible tool when confined to a direct message experience where they provide support on just one topic. However, we see many opportunities to bring this tool to community forums where the number and size of topics is constantly expanding. Although this endeavor came with the challenges of relevance detection, question extraction, developing a scalable approach to labeling data, and getting the most out of limited data, it resulted in huge benefits for our users in the form of time saved. We believe that countless multi-topic support forums (e.g. Piazza and Stack Overflow) could also experience these benefits, and we hope that the lessons we learned from our experience can be applied to these use cases as well.
DISCLOSURE STATEMENT: © 2022 Capital One. Opinions are those of the individual author. Unless noted otherwise in this post, Capital One is not affiliated with, nor endorsed by, any of the companies mentioned. All trademarks and other intellectual property used or displayed are property of their respective owners.