jessvb/.Learning-Affects-Trust-Conversational-Agent-Concepts-Appendix.md

## .Learning-Affects-Trust-Conversational-Agent-Concepts-Appendix.md

      
    Raw
  

              .Learning-Affects-Trust-Conversational-Agent-Concepts-Appendix.md
            
          
    Learning Affects Trust: Design Recommendations and Concepts for Teaching Children—and Nearly Anyone—about Conversational Agents

This Gist contains the appendix from the study presented in the paper, "Learning Affects Trust: Design Recommendations and Concepts for Teaching Children—and Nearly Anyone—about Conversational Agents" (EAAI at AAAI 2023). You can find a preprint of this article on arXiv.
The study can be cited as follows:
Van Brummelen, J., Tian, M. C., Kelleher, M., Nguyen, N. H. (2023). Learning Affects Trust: Design Recommendations and Concepts for Teaching Children—and Nearly Anyone—about Conversational Agents. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 37).

or with BibTeX:
@INPROCEEDINGS{learning-affects-trust-conversational-agents,
  author = {Van Brummelen, Jessica and Tian, Mingyan Claire and Kelleher, Maura and Nguyen, Nghi Hoang},
  booktitle = {Proceedings of the AAAI Conference on Artificial Intelligence},
  title={Learning Affects Trust: Design Recommendations and Concepts for Teaching Children---and Nearly Anyone---about Conversational Agents},
  volume = {37},
  publisher = {AAAI},
  year={2023},
}
Contents:

Table 1: Forty conversational agent concepts foundational for students learning to develop conversational agents.
Table 2: Conversational agent usability design recommendations, which educators can teach their students as they learn to develop agents.
Table 3: Description of the themes in participants’ answers to long-answer questions.
Table 4: Descriptions of concepts from the workshops participants found most challenging.
Table 5: Descriptions of the main themes identified in participants’ responses to questions about their ideal conversational agents.
Bibliography: The bibliography for this appendix.


## 1-Table.md

      
    Raw
  

              1-Table.md
            
          
    Table 1: Forty conversational agent concepts foundational for students learning to develop conversational agents. There are five main categories of concepts: (1) Natural Language Understanding, (2) Conversation Representation, (3) Dialog Management, (4) Data Access and Conversation Context, and (5) Human-Agent Interaction. The concepts are described with respect to conversational concept examples.


Category
Concept
Brief Description and Example with Respect to Agents


Natural Language Understanding
Semantic analysis
The ability of an agent to derive meaning from text (e.g., paragraphs, sentences, phrases). One example includes an agent identifying various parts of speech (e.g., verbs, nouns, etc.) in sentences.


Machine learning
The ability of an agent to learn, often by training to find patterns in large amounts of data. One example includes an agent learning to determine the sentiment (e.g., happy, sad, etc.) of a sentence by training on many sentences humans have labeled with particular sentiments.


Similarity scores
Values, often between 0 and 1, representing the similarity between two words or phrases. One example includes an agent determining the word "programming" is similar to "coding" (e.g., through calculating cosine distance of the two word embeddings [5]), and therefore giving it a value (similarity score) close to 1.


Large language models
A machine learning model that has been trained on enormous amounts of text (e.g., hundreds of gigabytes of text). These models can be used as starting points for conversational agents to understand language, and can be trained further (e.g., using transfer learning) by developers to enable agents to understand specific pre-determined phrases or "intents".


Transfer learning
A machine learning technique allowing developers to do small amounts of additional training with already pre-trained machine learning models (e.g., large language models) to develop sufficient models. For example, agent developers may complete transfer learning on large language models to enable agents to understand specific pre-determined phrases or "intents" without having to train their own models from scratch.


Intents
Groups of phrases with similar meanings. For example, the phrases, "What games can I play?", "What type of games do you have?" and "Which games are available?" might be categorized as an Ask About Games intent. Agents can recognize such groups of phrases as particular intents, allowing them to respond the same way to multiple different phrases.


Entities
Groups of words with similar characteristics that agents may identify in users' speech, and use in their responses. For example, an agent might recognize a "date" entity in the phrase, "Set a reminder for March 24th", or a name entity in the phrase, "Send a text to Morgan". Often, agents recognize entities using machine learning models.


Training
The process agents may use to learn. Often, agents train by analyzing large amounts of data (e.g., many sentences), predicting something about the data (e.g., predicting sentences have sad or happy sentiments), and updating their models based on whether their predictions were correct or incorrect.


Testing
The process developers use to assess their agents and whether the agents sufficiently address their needs. For example, developers may test whether agents' machine learning models categorize phrases into the correct intents.


Constrained vs. unconstrained natural language
A spectrum of "constrained" to "unconstrained", which describes how restricted agents' are in their understanding of language. For example, constrained natural language agents only attempt to understand a limited number of phrases, whereas unconstrained natural language agents will attempt to understand any phrases. Generally, constrained natural language agents have high accuracy in their understanding, whereas unconstrained natural language agents have lower accuracy.


Conversation Representation
Textual representations
The mode of representing agent conversation visually using characters. Textual representations are common when developing agents. They can also be used to quickly identify any misrecognition of users' speech.


Undirected graphs
A type of visualization that includes nodes and connections. Undirected graphs may be used to represent language; for example, with nodes containing words and connections representing word relatedness. This can allow learners to quickly understand agents' perceptions of language.


Histograms
A type of visualization that compares different values using bars. For example, a histogram may represent word similarity scores with different heights of bars. This can allow learners to quickly compare which words agents find most similar to each other.


Event-driven program representations
A method of visualizing the logic behind events and agents' responses to these events, often through event definition code and event response code. For example, one piece of code might define when an event occurs (e.g., when an agent recognizes a greeting intent), and another piece of code might define how the agent responds (e.g., by exclaiming, "Hello there!"). This type of programming representation can quickly become complex, and may benefit by being transformed into a directed graph representation.


Storyboards
A method of visualizing agent conversation, which includes agent phrases, human responses and data the agent has stored about conversation context. This method is user-centric, and allows developers to envision realistic conversations with contextual information.


Directed graphs
A type of visualization that includes nodes and arrows (i.e., directed connections). This visualization may be used to represent agents' logic and conversation flow. It is developer-centric, and allows developers to observe many paths of potential conversations.


Dialog Management
Turn-taking
The way in which conversation control is passed between the user and agent. For example, initially the user may hold conversation control and say, "What day is it today?", after this, the agent holds control and may say, "It is July 11th, 2022". In this way, the agent and user take turns holding control.


Events
Something that triggers an agent to do something. For example, an event might include someone saying, "Hello, agent!", which might trigger the agent to respond with, "Why hello there, Hal!".


Entity-filling
The process of identfying and storing entities (i.e., groups of words with similar characteristics). For example, an agent may identify a date entity in the phrase, "Create a calendar event for September 25th", and store this for later use. Agents often use machine learning models to identify entities.


Conditions
Methods of controlling conversation flow, which can be described by if-statements. For example, one condition might be, "If it is the user's birthday, play 'Happy Birthday to You', otherwise, say, 'Good morning!'". Conditions allows agents to perform different functions depending on contextual data.


Conversation State
The location in a given conversation that an agent is in. This depends on past conversation events. For instance, an agent may initially be in a "Start" state, and after being asked to open a banana bread recipe, it may transition to a "Banana Bread Recipe: Step 1" state. The agent's response will depend on the conversation state. For example, if it is in the "Banana Bread Recipe: Step 1" state, and it is asked to say the next step, it may say, "Preheat the oven to 350° F", whereas if it were in a nacho recipe state and asked the same question, it may say, "Preheat the oven to 425°F".


State machines
A method of organizing conversation control, which can be represented by a directed graph. Each node in the state machine is a conversation state. Each connection is a transition. When events occur (e.g., a user says, "Which games are available?"), agents will transition to different states (e.g., transition to a state called, "List Available Games").


Data Access and Conversation Context
Pre-programmed data
Data that an agent has stored prior to engaging with users. For example, a developer may provide agents with pre-programmed data about titles of games it knows how to play.


User-defined data
Data that an agent stores while engaging with users. For example, a user might tell an agent their birthday. The user's birthday is an example of a piece of user-defined data that an agent may store for later use.


Contextual data
Data from a particular conversation, which has been collected by an agent. This data helps the agent respond in a way that makes sense based on the prior conversation. For instance, a piece of contextual data might be how a user asked to order an orange shirt. Based on this contextual data, the agent may send a request to a store's database to determine whether orange shirts are in stock.


Agent modularization
How agents do not share data or functions, unless explicitly programmed to do so. For instance, if one agent has stored a user's name, another unrelated agent will not know that user's name unless it has been programmed to obtain data from the first agent (or the user also tells the second agent their name).


Device access
The ability of an agent to communicate with other technology, or the ability of a user to communicate with an agent. For instance, a developer may have created an agent, which they can test and have conversations with using their personal computer. Unless this agent is deployed to an accessible location (e.g., the cloud), however, no other users would be able to access it or communicate with it. Another example includes how oftentimes agents need permission from users to access other devices, like smart lights. If a user has not given the agent permission to control their smart lights, the agent will not be able to do so.


Cloud computing
When agents' computation occurs in remote locations (i.e., the cloud), as opposed to developers' personal computers. The cloud is a network of many computers, which typically have large computing power and data storage available. Developers often use cloud computing to train machine learning models, since this involves large amounts of computation. Developers often also deploy agents to the cloud so that many others can access their agents.


Webhooks and APIs
Methods agents may use to access external information and systems. For example, an agent may utilize a webhook to receive real-time weather updates, or an API to communicate with a flight-booking website.


Flow and page modularization
Approaches for structuring agent conversation, which prevent or allow access to intents depending on the conversation state. For example, if a pet store agent had "Dog Inquiry" and "Cat Inquiry" conversation flows, and the conversation was on the "Describe Current Dog" page (i.e., state), the user likely could not immediately access information about cats, until they return to a page that allows the conversation to transition to the Cat Inquiry flow. This allows developers to structure the progression of agent conversations.


Human-Agent Interaction
Speech synthesis
The ability of an agent to create speech sounds. For instance, an agent may have trained on many human voice sounds to recognize the speech patterns for particular words. Based on these patterns, it may be able to develop similar patterns to create speech audio (i.e., synthesize speech).


Speech recognition
The ability of an agent to recognize language from human speech. For example, an agent may be able to convert human speech recordings to machine-readable text by training on many examples of human speech.


Recovery
The ability of an agent to continue a conversation regardless of misunderstandings or errors. For example, an agent may misrecognize the number "fifteen" as "fifty", in the phrase, "Send me fifteen pizzas". If the agent is able to recognize that the word "fifty" is unusual in this context, and follow up with the user by asking whether they really want to order fifty pizzas, the agent would be able to recover the conversation.


Societal impact and ethics
How agents can affect human life and culture, as well as the moral considerations that come along with this. For example, if an agent's speech recognition system is trained using only male voices, the agent may not recognize female voices as accurately. This would have serious implications for the female users of the agent, and would not generally be considered fair.


Text-based interaction
When an agent interacts using characters, as opposed to audio, for input and output. Agents that use text-based interaction are often called chatbots. This mode of communication is typically very accurate, but oftentimes less efficient than voice-based interaction.


Voice-based interaction
When an agent interacts using audio, as opposed to characters, for input and output.  This mode of communication is typically very efficient, but oftentimes less accurate than text-based interaction.


Multimodal interaction
When an agent interacts using multiple different modalities (e.g., voice, text, haptics, etc.). For instance, an agent may say, "So, you'd like to deliver your pizza to 484 Zorilla Street?" and display a map on a screen while it speaks. In this example, the agent is interacting multimodally through visuals and audio.


Task- vs. non-task-oriented
The extent to which an agent acts socially, or alternatively, with the purpose of completing specific goals. For instance, a pet robot agent may interact socially (i.e., with less task-orientation) by asking about the user's day and engaging in small talk. A customer service bot may interact with task-orientation by helping a user return a faulty product.


Deployment
The act of making an agent available to users. For instance, a developer may deploy their agent to a customer service website, or to an app store.


Effective conversation design
Methods to improve agent communication. For example, a developer may improve an agent's conversational flexibility by allowing users to interrupt it at any time. A list of such conversation design principles is shown in Table 2.


## 2-Table.md

      
    Raw
  

              2-Table.md
            
          
    Table 2: Conversational agent usability design recommendations, which educators can teach their students as they learn to develop agents. The general usability guidelines (labeled with “G”) and additional conversational-specific guidelines (labeled with “A”) are from Murad et al.’s article [3]. The perceptions-inspired guidelines (labeled with “P”) are from Van Brummelen’s thesis research [10]. The opportunities in practice are inspired by both [3] and [10], as well as other literature involving conversational agents and K-12 education [2, 8, 9, 11, 12, 13, 14].


Agent Design Recommendation
Opportunities in practice


G1: Visibility/Feedback of System Status
-  Indicate when users can speak to the agent. 
 -  Indicate the agent's capabilities. 
 -  Provide feedback on the agent's interpretations of users' speech and how users can speak differently to improve recognition.


G2: Mapping Between System and Real World
-  Provide example conversation schema and scaffolding.  
 -  Since users mimic agents' speech patterns, develop agents that speak in the way users should speak for best speech recognition (e.g., using complete sentences ).


G3: User Control and Freedom
-  Provide sufficient time for users to speak to the agent and understand its speech. 
 -  Allow users to interrupt the agent (see G7). 
 -  Allow for flexibility in the flow of the conversation; for example, by allowing users to provide additional information at various times in the conversation (see G7).


G4: Consistency throughout the Interface
-  Use consistent conventions (e.g., terminologies, tone of voice, language) when designing conversation and conversational flow .


G5: Preventing User Errors
-  Enable agents to understand wide-ranging diction, idioms, and phrases (see G7). 
 -  Confirm choices when users make consequential decisions to ensure statements were understood correctly (see G9). (Implicit confirmations may be used to address G8.)


G6: Recognition Rather than Recall
-  Provide information to users on how to complete tasks and structure responses, rather than forcing them to guess.


G7: Flexibility and Efficiency of Use
-  Ask users open-ended questions, rather than yes or no questions (e.g., “Where would you like to go?” vs. “Do you know where you'd like to go?” ). 
 -  Allow users to interrupt the agent (see G3). 
 -  Allow for flexibility in the flow of the conversation; for example, by allowing users to provide additional information at various times in the conversation (see G3). 
 -  Enable agents to understand wide-ranging diction, idioms, and phrases (see G5). 
 -  Minimize or scaffold the amount of information and options given to the user (see G8).


G8: Minimalism in Design and Dialogue
-  Use implicit confirmations in feedback statements rather than repetitive, explicit confirmations when possible. 
 -  Minimize or scaffold the amount of information and options given to the user (see G7).


G9: Allowing Users to Recognize and Recover from Errors
-  Indicate when speech recognition is uncertain, and enable users to repeat or rephrase their responses. 
 -  Enable agents to recognize when users want to edit misunderstood responses. (This may be when users repeat their response, when they speak more loudly, or explicitly ask to edit.) 
 -  Allow users to undo actions or go back to previous menus. 
 -  Confirm choices when users make consequential decisions to ensure statements were understood correctly (see G5). (Implicit confirmations may be used to address G8.)


G10: Providing Help and Documentation
-  Provide interactive tutorials. 
 -  Provide scaffolded help contextually throughout interactions.


A1: Ensure Transparency/Privacy
-  Provide information on what data is being collected. 
 -  Recognize quiet or whispered speech (or consider silent speech recognition, like lip reading ) to increase privacy in public spaces (see A2).


A2: Considering How Context Affects Speech Interaction
-  Enable various devices or input modalities to increase social appropriateness of interacting with agents in different spaces (e.g., speaking vs. inputting text in a public space, speaking to a smart speaker vs. a phone while in conversation with others). 
 -  Recognize quiet or whispered speech (or consider silent speech recognition, like lip reading ) to increase social acceptability in public spaces (see A1).


P1: Inform users about trustworthiness
-  Develop conversational agents to include explicit or implicit diction to indicate how confident the agent is about the accuracy of its responses (e.g., “probably”, “unsure”, “likely not”, etc.). 
 -  Develop conversational agents with the ability to explain themselves and provide transparency in terms of their abilities to provide trustworthy information . 
 -  Focus on the competence, then predictability and then integrity aspects of trust when developing agents; for instance, by informing end users how the agent obtains its information, what it does with information given to it and how it is programmed. 
 -  Since personification may increase users' trust of agents, align the amount of personification of agents with the actual trustworthiness of the conversational agent  (see also, P2). 
 -  For audiences that may particularly find conversational agents more trustworthy through increased interaction (e.g., children) ensure conversational agent development emphasizes trust indicators throughout interactions. 
 -  For audiences that may have more initial distrust of agents (e.g., females from WEIRD countries), developers may want to use techniques to encourage trust, such as developing trustworthy personas (see P2).


P2: Design conversational agent personas to foster appropriate partner models
-  Design conversational agent personas to foster appropriate relationship building (e.g., whether that means shifting perceptions from co-worker to friend or vice-versa) and therefore trust, as described in  (see also, P1). 
 -  Develop diverse, multilingual conversational agent personas that many people can relate to and understand (see  with respect to similarity and familiarity of conversational agents). 
 -  For an audience from non-WEIRD countries, developers may want to focus on creating competent personas. 
 -  For an adult audience, developers may want to focus on creating more human-like and warm personas. 
 -  For an audience with lack of trust in agents' correctness, developers may want to ensure their conversational agents use the audience's first language.


## 3-Table.md

      
    Raw
  

              3-Table.md
            
          
    Table 3: Description of the themes in participants’ answers to the questions, “Please explain why you think conversational agents say things that are right/wrong”, “Do you think you changed your opinion on whether conversational agents say things that are right/wrong since going through the activities? [Why?]”, and “Do you think you changed your opinions on any of the above [partner model] questions since going through the activities? [Why?]”.


Tag
Definition
Example utterances


Opinion changed
Opinion did change
Through the activities, the participant's perception of their partner model or trust of correctness changed
“Yes definitely, the workshop has changed it in many different all amazing ways”, “I think that my opinion has changed a little”


Opinion did not change
Through the activities, the participant's perception of their partner model or trust of correctness of agents did not change
“No, I have not”, “not much has changed in my opinion”, “My opinion has not changed”


Ambiguous
It is not clear whether the participant's opinion changed or did not change
“Tried to run Alexa and result in utterances”, “I don't trust them very much because I am skeptical of what they do with our information”, “It is all the `man behind the gun' ”


Trustworthiness
Trustworthy
The agent provides reasonable and accurate information
“because it can give information almost 100% true”, “I think they say things that are correct”, “Again a computer is logical the things that they can say are logical and mostly correct when it regards facts”


Not Trustworthy
The agent gives incorrect or unreliable information
“they can get sources but they don't know if it's reliable”, “Conversational agents sometimes say things wrong as they collect data from multiple sources, which can often contradict each other”


Complex
In between trustworthy and not: agent can be trustworthy or untrustworthy
“Conversational agents often pull information from online sources, which could possibly be correct but could also be wrong as information from certain sources changes in both what the information contains and its accuracy”, “I think it depends on the program itself”


Unsure
No opinion or not enough information to form an opinion
“I don't have much experiences with conversational agents so i'm not sure myself”, “I have no opinion”


What changed (partner model)
Machine-like/ human-like
Addresses the artificiality or human-likeness of the agent
“It does have potential to be more human like.. but there is still a long way to go”, “The fact that behind alexa human like voices are just programs by people Alexa can be trained to do better”


Inflexible/ flexible
Addresses the flexibility of the agent
“I've learned that Alexa can be a bit more flexible than what I've experienced”, “I would think it sort of showed me in a sense the inflexibility of Alexa”


Unreliable/ dependable
Addresses the dependability of the agent
“I am more confident in the reliability of Alexa's understanding of questions”, “i Think alexa is a bit more dependable then when i start this activety”


Interactive/ start-stop
Addresses the interactivity of the agent
“I did change one of my opinions because I found out that alexa sometimes isn't really that start-stop”, “Yes I think as we progressed I could see ways for more interaction and personalized connections with the device”


Competent/ incompetent
Addresses the competence of the agent
“it increased my feelings about its competence”, “Became more knowledgeable and aware of Alexa competencies”


Authority/ peer
Addresses the authority of the agent
“Now I think Alexa is more like peer rather than a authority figure”, “Alexa was never an authority figure—if it was replaced with `servant', I might have put it closer to the middle”


Reasoning
Agent-related reasoning: Programmed
Programmed by humans, programmed to know things, program has bugs/errors
“The fact that behind alexa human like voices are just programs by people Alexa can be trained to do better”, “Because the conversational agents are programmed to give you correct information”, “they don't know everything unless they are programmed to know it”


Agent-related reasoning: Experience with agent
The participant had impactful interactions with conversational agents
“I think that I've learned that Alexa can be a bit more flexible than what I've experienced”, “I would think it sort of showed me in a sense the inflexibility of Alexa, as when coding it”


Agent-related reasoning: Nature of the agent
Agent is factual, agent generally does not understand, agent has limited knowledge,      etc.
“conversational AI are still new technology that are still being improved”, “they mispronounce things”, “Conversational Agents act very rational and always speak factually, making them suitable for most situations”


Agent’s data source: Internet
Agent draws information from internet sources, internet information has varying degrees of accuracy, etc.
“no, because their data is still from the internet”, “the information on the Internet to provide answers, which could be right or wrong”, “Based on Google search results”


Agent’s data source: Human data
Agent uses data from humans to formulate responses, agent is trained based on data it is given, etc.
“conversational AI's sources are from humans so there will always be a mistake that the conversational agents will make”, “They take their information from things we have set ourselves”, “learns from it so depends on how it is trained”, “It all depends on how’s it’s been trained”


Agent’s data source: Undisclosed
Agent obtains information, but it is unclear where the information is from
“the information we get from the agents are right or wrong depends on the sources of the agents which could be incorrect”, “they can get sources but they don't know if it's reliable”, “not all information that are publicly available are accurate”


Personal reasoning: Personal learning
The learning might be implied, like how the workshops were useful or how they did not know much before
“yes, because i learned something from this zoom meeting”, “I do think that after learning, it does help me realize the feature and the potential that these agents have”, “But i have learned more about it so i have different opinions about it”


Information given to agent: User input
The way the user phrases the question, accent, etc.
“it really depends on the situation and how complex the question was”, “Always room for misinterpretation”, “depend on the input of the user”, “can't understand accent”


Information given to agent: Data disclosure
What happens with the information, government spying, etc.
“because someone or even goverment can spying on us”, “I am skeptical of what they do with our information”, “we really have no way of confirming what a personal assistant does with the information it collects from you”


Ambiguous
No reason given or not explained well
“I changed a few opinions”, “No”, “No I think I have stayed the same”, “It does not always give you the correct answer”


## 4-Table.md

      
    Raw
  

              4-Table.md
            
          
    Table 4: Descriptions of concepts from the workshops participants found most challenging. These descriptions were used in the mid-survey.


Concept
Description


Training
How agents need to be trained with examples or “utterances”


Intent
How you can say different things (an “intent”) to an agent and it still understands


Agent Modularization
How you need to say the name of the Alexa skill (“invocation name”) for it to understand


Entities
How to make Alexa get (or use) specific information, like a number, from the user (i.e., using “slots”, which are also known as entities)


Events
How to make Alexa do or say things (making the “endpoint function”)


Terminology
The terminology (e.g., what “invocation name” or “intent” means)


Testing
How to type/say something so that Alexa understands (“testing”)


Turn-taking
How to get Alexa to ask a follow-up question (the “ask” block)


Other
(Participants can enter in their own description of challenges)


## 5-Table.md

      
    Raw
  

              5-Table.md
            
          
    Table 5: Descriptions of the main themes identified in participants’ responses to questions about their ideal conversational agents.


Tag
Definition
Example utterances


Convenient
Portable, ambient, can be found many places, or can do things at a distance, like sending a car to get gas
“Something easy to bring around so it can easily be relocated to wherever the user needs it”, “Some sort of an accessory (most likely glasses)”, “A vast network of smart speakers and displays, similar to current times”


Personalized
Able to be customized by the user or automatically personalized to the user
“Impersonate my voice, and get on the phone with people I know.”, “It would sing happy Birthday with the persons name.”, “insert name”


Proactive
Will initiate conversation
“Hey! What is your plan for today?”, “How can I assist you today?”, “How is your day?”, “How are you feeling today”


Approachable/ Friendly
Has friendly, warm, approachable, helpful features, phrases, or appearances
“makes the user feel comfortable talking to the bot.”, “maybe positive uplifting notes like `you can do it!!!' ”, “Friendly reminders”, “cute robot”


Familiar or pop-culture related
References to pop-culture or familiar concepts
“BB8”, “Microsoft paper clip”, “R2D2”, “Stevie Nicks”


Fun
Amusing, humorous, enjoyable, etc.
“a fancy penguin with a bowtie and a monocle”, “Toonish 3D image on a screen”, “Video game partner”, “jokes”, “It would sing happy Birthday”


Culturally intelligent
Can adapt to different cultures/languages/accents, or knows how to be respectful/ considerate
“say local words without an accent”, “Add persian language”, “Polite responses”, “Many more languages than main languages”


Addresses concerns
Features/phrases/ideas that address specific concerns, like security, mental health and physical health
“encourage the listener be their best self and be emotionally and mentally stable”, “Don't give it too much intelligence otherwise it'll summon something unholy”, “covid-19 updates”


Basic features
Common functions or appearances that current conversational agents already have or look like
“updates on news”, “fun facts”, “Stock market analysis”, “reminders”, “A vast network of smart speakers and displays, similar to current times”


## Bibliography-Appendix.pdf

      
Display the source blob

    
Display the rendered blob

    
    Raw
  

              Bibliography-Appendix.pdf
            
          
        Loading

      Sorry, something went wrong. Reload?
      Sorry, we cannot display this file.
      Sorry, this file is invalid so it cannot be displayed.
      
          Viewer requires iframe.
Category	Concept	Brief Description and Example with Respect to Agents
Natural Language Understanding	Semantic analysis	The ability of an agent to derive meaning from text (e.g., paragraphs, sentences, phrases). One example includes an agent identifying various parts of speech (e.g., verbs, nouns, etc.) in sentences.
	Machine learning	The ability of an agent to learn, often by training to find patterns in large amounts of data. One example includes an agent learning to determine the sentiment (e.g., happy, sad, etc.) of a sentence by training on many sentences humans have labeled with particular sentiments.
	Similarity scores	Values, often between 0 and 1, representing the similarity between two words or phrases. One example includes an agent determining the word "programming" is similar to "coding" (e.g., through calculating cosine distance of the two word embeddings [5]), and therefore giving it a value (similarity score) close to 1.
	Large language models	A machine learning model that has been trained on enormous amounts of text (e.g., hundreds of gigabytes of text). These models can be used as starting points for conversational agents to understand language, and can be trained further (e.g., using transfer learning) by developers to enable agents to understand specific pre-determined phrases or "intents".
	Transfer learning	A machine learning technique allowing developers to do small amounts of additional training with already pre-trained machine learning models (e.g., large language models) to develop sufficient models. For example, agent developers may complete transfer learning on large language models to enable agents to understand specific pre-determined phrases or "intents" without having to train their own models from scratch.
	Intents	Groups of phrases with similar meanings. For example, the phrases, "What games can I play?", "What type of games do you have?" and "Which games are available?" might be categorized as an Ask About Games intent. Agents can recognize such groups of phrases as particular intents, allowing them to respond the same way to multiple different phrases.
	Entities	Groups of words with similar characteristics that agents may identify in users' speech, and use in their responses. For example, an agent might recognize a "date" entity in the phrase, "Set a reminder for March 24th", or a name entity in the phrase, "Send a text to Morgan". Often, agents recognize entities using machine learning models.
	Training	The process agents may use to learn. Often, agents train by analyzing large amounts of data (e.g., many sentences), predicting something about the data (e.g., predicting sentences have sad or happy sentiments), and updating their models based on whether their predictions were correct or incorrect.
	Testing	The process developers use to assess their agents and whether the agents sufficiently address their needs. For example, developers may test whether agents' machine learning models categorize phrases into the correct intents.
	Constrained vs. unconstrained natural language	A spectrum of "constrained" to "unconstrained", which describes how restricted agents' are in their understanding of language. For example, constrained natural language agents only attempt to understand a limited number of phrases, whereas unconstrained natural language agents will attempt to understand any phrases. Generally, constrained natural language agents have high accuracy in their understanding, whereas unconstrained natural language agents have lower accuracy.
Conversation Representation	Textual representations	The mode of representing agent conversation visually using characters. Textual representations are common when developing agents. They can also be used to quickly identify any misrecognition of users' speech.
	Undirected graphs	A type of visualization that includes nodes and connections. Undirected graphs may be used to represent language; for example, with nodes containing words and connections representing word relatedness. This can allow learners to quickly understand agents' perceptions of language.
	Histograms	A type of visualization that compares different values using bars. For example, a histogram may represent word similarity scores with different heights of bars. This can allow learners to quickly compare which words agents find most similar to each other.
	Event-driven program representations	A method of visualizing the logic behind events and agents' responses to these events, often through event definition code and event response code. For example, one piece of code might define when an event occurs (e.g., when an agent recognizes a greeting intent), and another piece of code might define how the agent responds (e.g., by exclaiming, "Hello there!"). This type of programming representation can quickly become complex, and may benefit by being transformed into a directed graph representation.
	Storyboards	A method of visualizing agent conversation, which includes agent phrases, human responses and data the agent has stored about conversation context. This method is user-centric, and allows developers to envision realistic conversations with contextual information.
	Directed graphs	A type of visualization that includes nodes and arrows (i.e., directed connections). This visualization may be used to represent agents' logic and conversation flow. It is developer-centric, and allows developers to observe many paths of potential conversations.
Dialog Management	Turn-taking	The way in which conversation control is passed between the user and agent. For example, initially the user may hold conversation control and say, "What day is it today?", after this, the agent holds control and may say, "It is July 11th, 2022". In this way, the agent and user take turns holding control.
	Events	Something that triggers an agent to do something. For example, an event might include someone saying, "Hello, agent!", which might trigger the agent to respond with, "Why hello there, Hal!".
	Entity-filling	The process of identfying and storing entities (i.e., groups of words with similar characteristics). For example, an agent may identify a date entity in the phrase, "Create a calendar event for September 25th", and store this for later use. Agents often use machine learning models to identify entities.
	Conditions	Methods of controlling conversation flow, which can be described by if-statements. For example, one condition might be, "If it is the user's birthday, play 'Happy Birthday to You', otherwise, say, 'Good morning!'". Conditions allows agents to perform different functions depending on contextual data.
	Conversation State	The location in a given conversation that an agent is in. This depends on past conversation events. For instance, an agent may initially be in a "Start" state, and after being asked to open a banana bread recipe, it may transition to a "Banana Bread Recipe: Step 1" state. The agent's response will depend on the conversation state. For example, if it is in the "Banana Bread Recipe: Step 1" state, and it is asked to say the next step, it may say, "Preheat the oven to 350° F", whereas if it were in a nacho recipe state and asked the same question, it may say, "Preheat the oven to 425°F".
	State machines	A method of organizing conversation control, which can be represented by a directed graph. Each node in the state machine is a conversation state. Each connection is a transition. When events occur (e.g., a user says, "Which games are available?"), agents will transition to different states (e.g., transition to a state called, "List Available Games").
Data Access and Conversation Context	Pre-programmed data	Data that an agent has stored prior to engaging with users. For example, a developer may provide agents with pre-programmed data about titles of games it knows how to play.
	User-defined data	Data that an agent stores while engaging with users. For example, a user might tell an agent their birthday. The user's birthday is an example of a piece of user-defined data that an agent may store for later use.
	Contextual data	Data from a particular conversation, which has been collected by an agent. This data helps the agent respond in a way that makes sense based on the prior conversation. For instance, a piece of contextual data might be how a user asked to order an orange shirt. Based on this contextual data, the agent may send a request to a store's database to determine whether orange shirts are in stock.
	Agent modularization	How agents do not share data or functions, unless explicitly programmed to do so. For instance, if one agent has stored a user's name, another unrelated agent will not know that user's name unless it has been programmed to obtain data from the first agent (or the user also tells the second agent their name).
	Device access	The ability of an agent to communicate with other technology, or the ability of a user to communicate with an agent. For instance, a developer may have created an agent, which they can test and have conversations with using their personal computer. Unless this agent is deployed to an accessible location (e.g., the cloud), however, no other users would be able to access it or communicate with it. Another example includes how oftentimes agents need permission from users to access other devices, like smart lights. If a user has not given the agent permission to control their smart lights, the agent will not be able to do so.
	Cloud computing	When agents' computation occurs in remote locations (i.e., the cloud), as opposed to developers' personal computers. The cloud is a network of many computers, which typically have large computing power and data storage available. Developers often use cloud computing to train machine learning models, since this involves large amounts of computation. Developers often also deploy agents to the cloud so that many others can access their agents.
	Webhooks and APIs	Methods agents may use to access external information and systems. For example, an agent may utilize a webhook to receive real-time weather updates, or an API to communicate with a flight-booking website.
	Flow and page modularization	Approaches for structuring agent conversation, which prevent or allow access to intents depending on the conversation state. For example, if a pet store agent had "Dog Inquiry" and "Cat Inquiry" conversation flows, and the conversation was on the "Describe Current Dog" page (i.e., state), the user likely could not immediately access information about cats, until they return to a page that allows the conversation to transition to the Cat Inquiry flow. This allows developers to structure the progression of agent conversations.
Human-Agent Interaction	Speech synthesis	The ability of an agent to create speech sounds. For instance, an agent may have trained on many human voice sounds to recognize the speech patterns for particular words. Based on these patterns, it may be able to develop similar patterns to create speech audio (i.e., synthesize speech).
	Speech recognition	The ability of an agent to recognize language from human speech. For example, an agent may be able to convert human speech recordings to machine-readable text by training on many examples of human speech.
	Recovery	The ability of an agent to continue a conversation regardless of misunderstandings or errors. For example, an agent may misrecognize the number "fifteen" as "fifty", in the phrase, "Send me fifteen pizzas". If the agent is able to recognize that the word "fifty" is unusual in this context, and follow up with the user by asking whether they really want to order fifty pizzas, the agent would be able to recover the conversation.
	Societal impact and ethics	How agents can affect human life and culture, as well as the moral considerations that come along with this. For example, if an agent's speech recognition system is trained using only male voices, the agent may not recognize female voices as accurately. This would have serious implications for the female users of the agent, and would not generally be considered fair.
	Text-based interaction	When an agent interacts using characters, as opposed to audio, for input and output. Agents that use text-based interaction are often called chatbots. This mode of communication is typically very accurate, but oftentimes less efficient than voice-based interaction.
	Voice-based interaction	When an agent interacts using audio, as opposed to characters, for input and output. This mode of communication is typically very efficient, but oftentimes less accurate than text-based interaction.
	Multimodal interaction	When an agent interacts using multiple different modalities (e.g., voice, text, haptics, etc.). For instance, an agent may say, "So, you'd like to deliver your pizza to 484 Zorilla Street?" and display a map on a screen while it speaks. In this example, the agent is interacting multimodally through visuals and audio.
	Task- vs. non-task-oriented	The extent to which an agent acts socially, or alternatively, with the purpose of completing specific goals. For instance, a pet robot agent may interact socially (i.e., with less task-orientation) by asking about the user's day and engaging in small talk. A customer service bot may interact with task-orientation by helping a user return a faulty product.
	Deployment	The act of making an agent available to users. For instance, a developer may deploy their agent to a customer service website, or to an app store.
	Effective conversation design	Methods to improve agent communication. For example, a developer may improve an agent's conversational flexibility by allowing users to interrupt it at any time. A list of such conversation design principles is shown in Table 2.
Agent Design Recommendation	Opportunities in practice
G1: Visibility/Feedback of System Status	- Indicate when users can speak to the agent. - Indicate the agent's capabilities. - Provide feedback on the agent's interpretations of users' speech and how users can speak differently to improve recognition.
G2: Mapping Between System and Real World	- Provide example conversation schema and scaffolding. - Since users mimic agents' speech patterns, develop agents that speak in the way users should speak for best speech recognition (e.g., using complete sentences ).
G3: User Control and Freedom	- Provide sufficient time for users to speak to the agent and understand its speech. - Allow users to interrupt the agent (see G7). - Allow for flexibility in the flow of the conversation; for example, by allowing users to provide additional information at various times in the conversation (see G7).
G4: Consistency throughout the Interface	- Use consistent conventions (e.g., terminologies, tone of voice, language) when designing conversation and conversational flow .
G5: Preventing User Errors	- Enable agents to understand wide-ranging diction, idioms, and phrases (see G7). - Confirm choices when users make consequential decisions to ensure statements were understood correctly (see G9). (Implicit confirmations may be used to address G8.)
G6: Recognition Rather than Recall	- Provide information to users on how to complete tasks and structure responses, rather than forcing them to guess.
G7: Flexibility and Efficiency of Use	- Ask users open-ended questions, rather than yes or no questions (e.g., “Where would you like to go?” vs. “Do you know where you'd like to go?” ). - Allow users to interrupt the agent (see G3). - Allow for flexibility in the flow of the conversation; for example, by allowing users to provide additional information at various times in the conversation (see G3). - Enable agents to understand wide-ranging diction, idioms, and phrases (see G5). - Minimize or scaffold the amount of information and options given to the user (see G8).
G8: Minimalism in Design and Dialogue	- Use implicit confirmations in feedback statements rather than repetitive, explicit confirmations when possible. - Minimize or scaffold the amount of information and options given to the user (see G7).
G9: Allowing Users to Recognize and Recover from Errors	- Indicate when speech recognition is uncertain, and enable users to repeat or rephrase their responses. - Enable agents to recognize when users want to edit misunderstood responses. (This may be when users repeat their response, when they speak more loudly, or explicitly ask to edit.) - Allow users to undo actions or go back to previous menus. - Confirm choices when users make consequential decisions to ensure statements were understood correctly (see G5). (Implicit confirmations may be used to address G8.)
G10: Providing Help and Documentation	- Provide interactive tutorials. - Provide scaffolded help contextually throughout interactions.
A1: Ensure Transparency/Privacy	- Provide information on what data is being collected. - Recognize quiet or whispered speech (or consider silent speech recognition, like lip reading ) to increase privacy in public spaces (see A2).
A2: Considering How Context Affects Speech Interaction	- Enable various devices or input modalities to increase social appropriateness of interacting with agents in different spaces (e.g., speaking vs. inputting text in a public space, speaking to a smart speaker vs. a phone while in conversation with others). - Recognize quiet or whispered speech (or consider silent speech recognition, like lip reading ) to increase social acceptability in public spaces (see A1).
P1: Inform users about trustworthiness	- Develop conversational agents to include explicit or implicit diction to indicate how confident the agent is about the accuracy of its responses (e.g., “probably”, “unsure”, “likely not”, etc.). - Develop conversational agents with the ability to explain themselves and provide transparency in terms of their abilities to provide trustworthy information . - Focus on the competence, then predictability and then integrity aspects of trust when developing agents; for instance, by informing end users how the agent obtains its information, what it does with information given to it and how it is programmed. - Since personification may increase users' trust of agents, align the amount of personification of agents with the actual trustworthiness of the conversational agent (see also, P2). - For audiences that may particularly find conversational agents more trustworthy through increased interaction (e.g., children) ensure conversational agent development emphasizes trust indicators throughout interactions. - For audiences that may have more initial distrust of agents (e.g., females from WEIRD countries), developers may want to use techniques to encourage trust, such as developing trustworthy personas (see P2).
P2: Design conversational agent personas to foster appropriate partner models	- Design conversational agent personas to foster appropriate relationship building (e.g., whether that means shifting perceptions from co-worker to friend or vice-versa) and therefore trust, as described in (see also, P1). - Develop diverse, multilingual conversational agent personas that many people can relate to and understand (see with respect to similarity and familiarity of conversational agents). - For an audience from non-WEIRD countries, developers may want to focus on creating competent personas. - For an adult audience, developers may want to focus on creating more human-like and warm personas. - For an audience with lack of trust in agents' correctness, developers may want to ensure their conversational agents use the audience's first language.
	Tag	Definition	Example utterances
Opinion changed	Opinion did change	Through the activities, the participant's perception of their partner model or trust of correctness changed	“Yes definitely, the workshop has changed it in many different all amazing ways”, “I think that my opinion has changed a little”
	Opinion did not change	Through the activities, the participant's perception of their partner model or trust of correctness of agents did not change	“No, I have not”, “not much has changed in my opinion”, “My opinion has not changed”
	Ambiguous	It is not clear whether the participant's opinion changed or did not change	“Tried to run Alexa and result in utterances”, “I don't trust them very much because I am skeptical of what they do with our information”, “It is all the `man behind the gun' ”
Trustworthiness	Trustworthy	The agent provides reasonable and accurate information	“because it can give information almost 100% true”, “I think they say things that are correct”, “Again a computer is logical the things that they can say are logical and mostly correct when it regards facts”
	Not Trustworthy	The agent gives incorrect or unreliable information	“they can get sources but they don't know if it's reliable”, “Conversational agents sometimes say things wrong as they collect data from multiple sources, which can often contradict each other”
	Complex	In between trustworthy and not: agent can be trustworthy or untrustworthy	“Conversational agents often pull information from online sources, which could possibly be correct but could also be wrong as information from certain sources changes in both what the information contains and its accuracy”, “I think it depends on the program itself”
	Unsure	No opinion or not enough information to form an opinion	“I don't have much experiences with conversational agents so i'm not sure myself”, “I have no opinion”
What changed (partner model)	Machine-like/ human-like	Addresses the artificiality or human-likeness of the agent	“It does have potential to be more human like.. but there is still a long way to go”, “The fact that behind alexa human like voices are just programs by people Alexa can be trained to do better”
	Inflexible/ flexible	Addresses the flexibility of the agent	“I've learned that Alexa can be a bit more flexible than what I've experienced”, “I would think it sort of showed me in a sense the inflexibility of Alexa”
	Unreliable/ dependable	Addresses the dependability of the agent	“I am more confident in the reliability of Alexa's understanding of questions”, “i Think alexa is a bit more dependable then when i start this activety”
	Interactive/ start-stop	Addresses the interactivity of the agent	“I did change one of my opinions because I found out that alexa sometimes isn't really that start-stop”, “Yes I think as we progressed I could see ways for more interaction and personalized connections with the device”
	Competent/ incompetent	Addresses the competence of the agent	“it increased my feelings about its competence”, “Became more knowledgeable and aware of Alexa competencies”
	Authority/ peer	Addresses the authority of the agent	“Now I think Alexa is more like peer rather than a authority figure”, “Alexa was never an authority figure—if it was replaced with `servant', I might have put it closer to the middle”
Reasoning	Agent-related reasoning: Programmed	Programmed by humans, programmed to know things, program has bugs/errors	“The fact that behind alexa human like voices are just programs by people Alexa can be trained to do better”, “Because the conversational agents are programmed to give you correct information”, “they don't know everything unless they are programmed to know it”
	Agent-related reasoning: Experience with agent	The participant had impactful interactions with conversational agents	“I think that I've learned that Alexa can be a bit more flexible than what I've experienced”, “I would think it sort of showed me in a sense the inflexibility of Alexa, as when coding it”
	Agent-related reasoning: Nature of the agent	Agent is factual, agent generally does not understand, agent has limited knowledge, etc.	“conversational AI are still new technology that are still being improved”, “they mispronounce things”, “Conversational Agents act very rational and always speak factually, making them suitable for most situations”
	Agent’s data source: Internet	Agent draws information from internet sources, internet information has varying degrees of accuracy, etc.	“no, because their data is still from the internet”, “the information on the Internet to provide answers, which could be right or wrong”, “Based on Google search results”
	Agent’s data source: Human data	Agent uses data from humans to formulate responses, agent is trained based on data it is given, etc.	“conversational AI's sources are from humans so there will always be a mistake that the conversational agents will make”, “They take their information from things we have set ourselves”, “learns from it so depends on how it is trained”, “It all depends on how’s it’s been trained”
	Agent’s data source: Undisclosed	Agent obtains information, but it is unclear where the information is from	“the information we get from the agents are right or wrong depends on the sources of the agents which could be incorrect”, “they can get sources but they don't know if it's reliable”, “not all information that are publicly available are accurate”
	Personal reasoning: Personal learning	The learning might be implied, like how the workshops were useful or how they did not know much before	“yes, because i learned something from this zoom meeting”, “I do think that after learning, it does help me realize the feature and the potential that these agents have”, “But i have learned more about it so i have different opinions about it”
	Information given to agent: User input	The way the user phrases the question, accent, etc.	“it really depends on the situation and how complex the question was”, “Always room for misinterpretation”, “depend on the input of the user”, “can't understand accent”
	Information given to agent: Data disclosure	What happens with the information, government spying, etc.	“because someone or even goverment can spying on us”, “I am skeptical of what they do with our information”, “we really have no way of confirming what a personal assistant does with the information it collects from you”
	Ambiguous	No reason given or not explained well	“I changed a few opinions”, “No”, “No I think I have stayed the same”, “It does not always give you the correct answer”
Concept	Description
Training	How agents need to be trained with examples or “utterances”
Intent	How you can say different things (an “intent”) to an agent and it still understands
Agent Modularization	How you need to say the name of the Alexa skill (“invocation name”) for it to understand
Entities	How to make Alexa get (or use) specific information, like a number, from the user (i.e., using “slots”, which are also known as entities)
Events	How to make Alexa do or say things (making the “endpoint function”)
Terminology	The terminology (e.g., what “invocation name” or “intent” means)
Testing	How to type/say something so that Alexa understands (“testing”)
Turn-taking	How to get Alexa to ask a follow-up question (the “ask” block)
Other	(Participants can enter in their own description of challenges)
Tag	Definition	Example utterances
Convenient	Portable, ambient, can be found many places, or can do things at a distance, like sending a car to get gas	“Something easy to bring around so it can easily be relocated to wherever the user needs it”, “Some sort of an accessory (most likely glasses)”, “A vast network of smart speakers and displays, similar to current times”
Personalized	Able to be customized by the user or automatically personalized to the user	“Impersonate my voice, and get on the phone with people I know.”, “It would sing happy Birthday with the persons name.”, “insert name”
Proactive	Will initiate conversation	“Hey! What is your plan for today?”, “How can I assist you today?”, “How is your day?”, “How are you feeling today”
Approachable/ Friendly	Has friendly, warm, approachable, helpful features, phrases, or appearances	“makes the user feel comfortable talking to the bot.”, “maybe positive uplifting notes like `you can do it!!!' ”, “Friendly reminders”, “cute robot”
Familiar or pop-culture related	References to pop-culture or familiar concepts	“BB8”, “Microsoft paper clip”, “R2D2”, “Stevie Nicks”
Fun	Amusing, humorous, enjoyable, etc.	“a fancy penguin with a bowtie and a monocle”, “Toonish 3D image on a screen”, “Video game partner”, “jokes”, “It would sing happy Birthday”
Culturally intelligent	Can adapt to different cultures/languages/accents, or knows how to be respectful/ considerate	“say local words without an accent”, “Add persian language”, “Polite responses”, “Many more languages than main languages”
Addresses concerns	Features/phrases/ideas that address specific concerns, like security, mental health and physical health	“encourage the listener be their best self and be emotionally and mentally stable”, “Don't give it too much intelligence otherwise it'll summon something unholy”, “covid-19 updates”
Basic features	Common functions or appearances that current conversational agents already have or look like	“updates on news”, “fun facts”, “Stock market analysis”, “reminders”, “A vast network of smart speakers and displays, similar to current times”