Skip to content

Instantly share code, notes, and snippets.

@jessvb
Last active September 9, 2021 06:02
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save jessvb/30271c1b7885a517296d1d30c566ea3f to your computer and use it in GitHub Desktop.
Save jessvb/30271c1b7885a517296d1d30c566ea3f to your computer and use it in GitHub Desktop.
"Convo: What does conversational programming need?" Appendix

Convo: What does conversational programming need?

This Gist contains additional information about the study presented in "Convo: What does conversational programming need?" at VL/HCC 2020.

The study can be cited as follows:

Van Brummelen, J., Weng, K., Lin, P., & Yeo, C. (2020). Convo: What does conversational programming need?. In 2020 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC).

or with BibTeX:

@INPROCEEDINGS{vanbrummelen-convo,
  author={{Van Brummelen}, Jessica and Weng, Kevin and Lin, Phoebe and Yeo, Catherine},
  booktitle={2020 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC)}, 
  title={Convo: What does conversational programming need?}, 
  year={2020},
}

The paper can be found here on IEEE Explore. Also see this paper on arXiv with additional information about the study.

Footnote 1: Quantitative Results

We analysed the quantitative results of the study using analysis of variance (ANOVA) for between-subjects analyses (e.g., comparing novice and advanced participants) and repeated measures ANOVA for within-subjects analyses (e.g., comparing input modality type).

The type of input modality had a significant effect on participants' perception of the system. Our results show that both novice and advanced participants strongly preferred the text-based system over the voice-based system. Participants felt it was more difficult to complete the programming goals with the voice-based system, and were generally more satisfied with the text-based system.

Novice participants made significantly more incorrect utterances with the voice-based system (M=17.38) compared to the text-based (M=1.38, F1,38=17.79, p=0.0001) and voice-or-text-based systems (M=4.77, F1,38=11.39, p=0.0017), whereas no significant difference was observed for advanced participants. In addition, novice participants were more satisfied with the voice-or-text based system (M=2.61) than the voice-based system (M=3.44, F1, 35=15.90, p=0.0003), and found the voice-or-text-based system (M=2.66) more efficient to use than the voice-based system (M=3.47, F1, 35=14.18, p=0.0006). There was no significant difference in preference observed by advanced participants.

image Figure A: Novice responses to Likert scale questions. Novices generally found voice useful and enjoyable. Refer to the limitations section in the paper for further discussion.

image Figure B: Advanced user responses to Likert scale questions. Advanced user responses tended to be less favorable towards voice than novice responses.

Advanced participants perceived the voice-or-text-based system (M=2.94) to be more difficult to use compared to the text-based system (M=3.5, F1,15=6.36, p=0.02); there was no significant difference found for novice participants. Overall, novice participants found the voice interaction of the voice-based and voice-or-text-based systems to be useful and enjoyable (see Fig. A), whereas advanced participants tended to disagree more with those statements (see Fig. B).

Prior programming knowledge and gender did not have a significant effect on completion time when considering all data. Novice participants and advanced participants completed the practice and novice stages in around the same time. There was also no significant difference between the number of voice utterances and text utterances during the novice stage. Advanced participants tended to use more text utterances than voice utterances during the advanced stage.

To investigate cognitive load effects, we examined the number of resets of the system (as participants mentioned they reset due to forgetting where they were in the program they were creating), time to goal completion, and number of times users asked for help. Note that we only analyzed the advanced stage for cognitive load, since the instructions were provided line-by-line in the novice stage (i.e., minimal cognitive load involved), whereas users needed to determine which steps to take next on their own in the advanced stage (i.e., significant cognitive load involved). There was no observed significant difference in the number of times asked for help with the voice-input, text-input, and voice-or-text system. The input modality also did not have a significant effect on the number of resets or time to goal completion during the advanced stage.

Footnote 2: Fourteen Design Themes

Through open coding, we dentified fourteen design themes, which fell into two main categories, positive feedback and recommendations.

image

We coded 651 occurrences of these themes. Representative quotations from each of the themes follow.

Quotations from the positive feedback category

Efficient (49/651):

  • "I liked how quick it was. Having to just speak to program is far quicker than typing [...]"
  • "It was super fast and I was able to type out shorter commands while speaking the longer ones"
  • "I liked being able to use the voice for longer commands, and the text for shorter commands or misunderstood commands"

Usable (48/651):

  • "I liked how straight-forward and logical it is because it translates the logic of the code into everyday speak."
  • "Easy to use just had to talk."
  • "It allows for easy navigation through procedures"
  • "I liked that I could just tell it what needs to be done."

Accessible (32/651):

  • "I liked the availability of the text option because it usually would take me a few attempts to get the voice working."
  • "It was super fast and I was able to type out shorter commands while speaking the longer ones"

Effective coding features (9/651):

  • "I liked that it tried to catch cases like 'not having a false condition'. I imagine this will be super useful in recommending base cases for recursion problems"
  • "I liked that I wasn't beholden to strict grammar (didn't complain about missing commas that would likely give good context) It was very speedy for simple actions and I had an idea of how it was working under the hood It played a cricket sound!"
  • "I liked that it tried to catch cases like "not having a false condition". I imagine this will be super useful in recommending base cases for recursion problems :) "

Interesting (6/651):

  • "It feels cool to do this - I can imagine coding while driving or doing housework."
  • "It's pretty cool that I was able to construct a program with my voice!"

Quotations from the recommendations category: improving agent's output

Increase agent interaction (91/651):

  • "[I would add] a spellchecker, like if a word is spelled incorrectly it could say 'You said 'dune', did you mean 'done'?"
  • "It would be interesting to be able to ask the agent for information on the code I already wrote [...] so you can remember what was that about"

Add visualization (72/651):

  • "[I would add] some sort of visualization of the function being built up as interaction progresses"
  • "[I would add] a way to visualize where you are in the program, and a way to modify your previous lines that were misinterpreted."

Improve efficiency (30/651):

  • "It also seems quite inefficient to figure out the right way to express a statement in actual words that otherwise can be typed in a programming language [...]"
  • "Mostly, lots of typing which made it feel inefficient. (Caveat: this is coming from someone who codes often and appreciates the conciseness of code over natural language, or even things like being able to type "y" for "yes" in a terminal session...)"

Reduce cognitive load (12/651):

  • "I can't see my program and I have to remember what's going on, that will become infeasible very quickly."
  • "[...] the instructions just existed behind the scenes somewhere in the computer. While this wasn't a huge issue with simple programs, I imaging it could get really confusing when procedures have more instructions or more than one level of conditional nesting... this would probably start reaching the limits of working memory. In educational environments, the additional cognitive load may hinder learning as well."

Quotations from the recommendations category: improving users' understanding

Increase transparency (25/651):

  • "[I would ask] How do you recognize the voices?"
  • "Do you use any sort o [sic] machine learning to recognize the accents?"

Reduce ambiguity (12/651):

  • "I'm interested in how does the program differentiate similar commands."
  • "What are my options for this method?"
  • "1. What is the next step? 2. If I want to have a step repeat multiple times, what procedure should I do? 3. What variables belong inside/outside a loop? 4. Can I type it?"

Convey system purpose (9/651):

  • "Who is the intended audience and what sort of programs do you imagine them writing? [...]"
  • "[I would ask] How is this system going to be implemented? Where would you use this system?"

Quotations from the recommendations category: improving agent's recognition

Improve speech-to-text (190/651):

  • "Differentiating between voices and then telling the difference with accents [was a challenge for the system]"
  • "It seems like if speech recognition worked well, it would be a better choice, but having this is useful (especially in a noisy environment)."

Reduce NL constraints (66/651):

  • "Allowing more variability in what I can say to the agent to get it to do the same command would feel more natural."
  • "Maybe pragmatically more diverse? Right know the speaking style is still very "computery""
  • "[...] I expect more natural-language input support such as "nope", "no thanks", etc. would be valuable as well."
  • "If there was a richer set of predefined control sequences, that would be great - maybe customizable by the user?"

image Figure A: Total number of occurrences for the top seven themes from advanced user responses and top seven from novice user responses. Novice responses emphasized transparency over efficiency. Note how the colors represent which user group(s) the theme came from (e.g., pink represents a top theme from novice users, dark blue represents a top theme from both novice and advanced users).

image Figure B: Total number of occurrences for the top five themes from each system survey. The voice-input system responses emphasized efficiency; text-input, a need to improve efficiency; and voice-or-text, accessibility. Note how the bars' colors represent which input system(s) the theme came from (e.g., pink represents text-input system, and green represents voice-input and voice-or-text systems).

As shown in Fig. A, six of the top seven themes for novice and advanced users were the same, including improve speech recognition, increase agent interaction, and add visualization. Novice users emphasized increasing transparency over efficiency, and vice versa for advanced users.

Among input modalities, participants emphasized improve speech recognition and increase agent interaction for all three systems (voice-input, text-input, and voice-or-text) (see Fig. B). The voice-input system responses emphasized efficiency; the text-input system responses emphasized improving efficiency; and the voice-or-text system responses emphasized accessibility. Both the voice- and text-input system responses emphasized usability; both the voice-input and voice-or-text system responses emphasized adding a visualization; and both the text-input and voice-or-text systems emphasized reducing the NL constraint.

Example Conversations from the User Study

The following conversation snippets show how developing a program with Convo works. The first example shows the "happy path" (i.e., scenario with no errors) for the novice goal in the user study, the second example shows a scenario with misrecognized speech (in the novice and then the advanced stage), and the third example shows a scenario with an unrecognized action (i.e., a participant asks Convo to do something it doesn't understand). Additionally, see this video demo of creating a procedure with Convo.

Happy Path Example

This example shows someone completing the novice goal with Convo. Convo's utterances are in bold.

Great job! Now let’s create a procedure that can play sounds of a couple of pets. Let’s get started! How would you like to start?

Create a procedure called pet sounds

What do you want to happen in the procedure first?

Get user input and save it as pet

Added action to the procedure! Do you want to do anything else?

If the value of pet is bird, play the bird sound

Anything else if condition is true? You can say 'done' if you are finished.

Done

Would you like to do anything if condition is false? If so, what would you like to do first?

No

Do you want to do anything else in the procedure?

If the value of pet is cricket, play the cricket sound

Anything else if condition is true? You can say 'done' if you are finished.

Done

Would you like to do anything if condition is false? If so, what would you like to do first?

No

Do you want to do anything else in the procedure?

Done

I finished creating the procedure. You can say, "run pet sounds" to play it.

Misrecognized Speech Examples

In the previous example, if the utterance, "If the value of pet is cricket, play the cricket sound" was misrecognized, Convo would recognize that the user wasn't following the user study instructions and respond as follows.

...

Do you want to do anything else in the procedure?

If the value of pet is cricket, play the ticket sound

I think your action is slightly wrong. Please follow the instructions and try again.

If the value of pet is cricket, play the cricket sound

Anything else if condition is true? You can say 'done' if you are finished.

...

The above conversation was in the novice stage of the user study, which had line-by-line instructions for participants to follow. If the misrecognized speech occurred during regular program creation (e.g., in the advanced stage where there aren't line-by-line instructions), Convo would respond differently. First, it would check if it has a "ticket sound" in its sound files and since it doesn't, it would respond as follows.

...

Do you want to do anything else in the procedure?

If the value of pet is cricket, play the ticket sound

I cannot play the ticket sound. I might not have this sound file.

If the value of pet is cricket, play the cricket sound

Anything else if condition is true? You can say 'done' if you are finished.

...

Unrecognized Action

If instead an action was not recognized (i.e., the participant asks Convo to do something it doesn't understand or something that hasn't been implemented yet), the conversation would be as follows.

...

Make a game

I didn't quite catch that. What action did you want me to add?

Make a procedure

What do you want to call the procedure?

...

In future iterations, Convo will utilize unconstrained natural language processing techniques and ask follow-up questions, as in the following conversation. In this way, participants will be able to define unrecognized actions (and variables) themselves.

...

Make a game

What's a game? Is it like a procedure or a variable?

It's a procedure

Okay, let's create a game that's like a procedure. What's the first step?

Ask the user for input

...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment