When Microsoft launched Tay on Twitter in March 2016, the bot absorbed adversarial inputs and began producing offensive outputs within sixteen hours. Microsoft pulled it offline. The failure was not purely about content filtering — it was a design failure in turn-taking logic: Tay was built to mirror whatever conversational initiative users took, with no structural guard on who could steer the dialogue and in which direction.
Conversation flow refers to the sequence, rhythm, and initiative structure of an exchange. In human-to-human talk, linguists Harold Sacks, Emanuel Schegloff, and Gail Jefferson documented in their 1974 paper "A Simplest Systematics for the Organization of Turn-Taking for Conversation" that speakers use transition-relevance places — points where speaker change becomes possible — and adjacency pairs (question → answer, greeting → greeting) to organize who speaks when.
Chatbot designers must encode these same structures artificially. A bot that ignores adjacency pairs — say, responding to a question with another question — creates friction. A bot that never yields initiative leaves users feeling interrogated.
The bot asks structured questions in a fixed order. Low error rate but users feel constrained. Common in early IVR (Interactive Voice Response) systems.
User types freely; bot must interpret any input. Maximum flexibility, highest ambiguity. Works when NLU coverage is broad.
Bot and user can both redirect the dialogue. Most natural; requires careful design to avoid simultaneous question stacking.
Google Duplex, demonstrated at Google I/O in May 2018, used mixed-initiative design to make restaurant reservations by phone. It yielded initiative when the human receptionist asked clarifying questions ("What time works?") and reclaimed it when confirmation was needed. The demo drew attention precisely because the turn-taking felt natural enough to be mistaken for human.
Research by Dialogflow's UX team and published case studies from Rasa identify five recurring breakdown types: premature closure (bot ends the conversation before the user's goal is met), topic drift (neither party maintains coherent focus), over-questioning (bot stacks multiple questions in one turn), dead ends (bot hits an unrecognized intent with no graceful recovery), and initiative conflict (both parties attempt to redirect simultaneously).
Of these, over-questioning is the easiest to fix in design: the rule is one question per turn. If you need three pieces of information, collect them across three turns.
One question per turn. One task per question. Let the user complete before the system redirects. This single rule eliminates the majority of flow-breakdown complaints in production chatbot deployments.
You are working with a conversation flow analyst. Present a broken chatbot dialogue — or describe a scenario — and ask for a diagnosis. Then explore how to restructure the flow using the principles from Lesson 1. Complete at least 3 exchanges to finish the lab.
When Bank of America launched its virtual assistant Erica in 2017, the design team at Personetics spent months calibrating tone. Early prototypes used formal banking language — "Your inquiry has been received and will be processed." Focus groups found this distant and anxiety-inducing. The shipped version used warmer phrasing — "Got it. Let me pull up your balance." By 2023, Erica had surpassed one billion client interactions. The tone shift was not cosmetic — it measurably reduced call center escalations.
A chatbot persona is the consistent set of personality traits, communication style, and voice characteristics that shape every response. It is not a name and an avatar — those are surface elements. Persona operates at the sentence level: word choice, sentence length, formality register, use of hedges, use of humor, and error recovery style.
Cathy Pearl's 2016 book Designing Voice User Interfaces (O'Reilly) established that users form persistent mental models of a bot's "character" within three to five exchanges. Once formed, those models are hard to update. Inconsistency — where a bot is warm in one turn and clinical in the next — triggers what researchers call persona fragmentation, which correlates with task abandonment.
No warmth signals. Direct, efficient, transactional. Best for expert users completing repetitive high-stakes tasks.
Warm but businesslike. Acknowledges emotion without dwelling on it. Most common in enterprise customer service.
High warmth, uses humor and small talk. Risk of inappropriate levity in serious contexts.
Woebot, a mental health chatbot developed at Stanford by Alison Darcy and launched commercially in 2017, carefully positioned itself in the neutral-professional range — warm enough to reduce disclosure barriers but not so casual that users over-relied on it for crisis support. Studies published in JMIR Mental Health (2017) found Woebot reduced anxiety scores in college students over two weeks, with tone calibration cited as a key trust factor. Crucially, Woebot always disclosed its non-human nature — a transparency choice that paradoxically increased engagement rather than reducing it.
In February 2024, a Canadian civil tribunal ruled against Air Canada after its chatbot gave a customer incorrect bereavement fare policy information. The bot's confident, persona-appropriate tone — without appropriate hedges — made the error more damaging. Air Canada argued the chatbot was "a separate legal entity" responsible for its own information; the tribunal rejected this. The case became a landmark in chatbot liability and the danger of confident tone in high-stakes domains.
Work with a persona design consultant to craft or evaluate chatbot voices. You can present a persona brief for critique, ask for help calibrating tone for a specific domain, or request before/after rewrites of chatbot responses. Complete at least 3 exchanges to finish.
A 2019 Salesforce study of 15,000 consumers found that 67% of customers reported that their opinion of a company improved when they received a thoughtful response to a problem — even if the problem wasn't fully solved. The finding applied to chatbots: recovery quality mattered more than zero-error rates. A bot that fails gracefully retains users; one that fails with a dead generic error loses them permanently.
Chatbot errors fall into three classes. Recognition errors occur when the NLU fails to match input to any known intent. Fulfillment errors occur when the intent is recognized but backend execution fails (API timeout, missing data). Understanding errors occur when the bot maps to the wrong intent — often worse than no match because the bot confidently does the wrong thing.
Error handling design must address all three differently. Recognition errors warrant explicit acknowledgment and re-prompting. Fulfillment errors require status transparency. Understanding errors require a check — "Just to confirm, you're asking about X?" — before proceeding.
Industry practice, codified in Dialogflow's design guides and Nuance's developer documentation, recommends a maximum of three consecutive non-understanding events before automatic escalation to a human agent or alternative channel. Each failed attempt should use a different phrasing of the reprompt — repeating the same words verbatim is the most common and most frustrating error-handling mistake.
KLM's Facebook Messenger bot BlueBot, launched in 2017, handled over 15,000 conversations per week. The team publicly documented that its highest satisfaction scores came from escalation conversations — not successful self-service ones. Users rated the bot highly when it clearly identified its limits ("I'm not able to help with visa questions — let me connect you to a KLM agent") rather than attempting to answer outside its competency. Acknowledging limits, it turned out, was itself a competency.
Error messages are trust moments, not failure notices. Users who see a well-crafted recovery sequence — ownership, empathy, concrete next step — rate interactions higher than users who had zero errors but received robotic responses throughout.
Work with an error handling specialist to write and evaluate recovery sequences. Present a chatbot error message you've written, describe an error scenario, or ask for a complete three-strikes sequence for a specific domain. Complete at least 3 exchanges.
When Amazon launched Alexa in 2014, the device had no cross-session memory. Each conversation started fresh. By 2016, Amazon introduced persistent attributes in the Alexa Skills Kit, letting third-party skills store user preferences across sessions. By 2023, the Alexa Memory feature allowed users to explicitly instruct Alexa to remember facts. The seven-year arc from stateless to contextual illustrated the entire industry's trajectory — and the engineering and privacy tradeoffs at every step.
Retains entities collected within a single intent — name, date, order number. Lost when the session ends or a new intent fires without proper slot passing.
Persists across turns within one conversation session. Enables pronoun resolution ("I want to change it" — where "it" = the order discussed three turns ago).
Stored between sessions. Enables personalization but introduces data governance, GDPR/CCPA obligations, and user expectation management challenges.
In-session context enables anaphora resolution — the ability to resolve pronouns and ellipsis from previous turns. "Book me a flight to Berlin" → "Make it business class" — where "it" can only be resolved by referencing the prior turn's flight context. Without session context, the bot either fails or asks the user to repeat information they already gave.
Research from the 2020 ConvAI challenge (a competition to build open-domain conversational agents) found that bots with context windows of five to seven previous turns outperformed stateless bots on user satisfaction scores by a margin of 30–40%, even when underlying language model quality was held constant. Context management, not language quality, was the dominant variable.
OpenAI introduced user memory in ChatGPT in early 2024, allowing the model to persist facts across conversations — "You prefer concise answers," "You're a Python developer." The feature immediately raised privacy concerns, with researchers noting that memory could be manipulated through prompt injection to store false or harmful facts. OpenAI responded by making memory visible and user-deletable. The case illustrated that cross-session memory is not purely a UX problem — it is simultaneously a trust, security, and data governance problem.
Memory is power, and power requires accountability. Every piece of context a bot retains must be justified by user benefit, disclosed transparently, and deletable on request. The difference between helpful personalization and surveillance is consent and control.
Work with a context architecture advisor to design or evaluate memory strategies for a chatbot. Describe a multi-turn scenario, ask how to implement slot carry-forward, or explore the privacy implications of cross-session memory in your domain. Complete at least 3 exchanges.