madmag77/gist:d92748bc67f02e96688475bc2cf9e563 Secret

## gistfile1.txt
# Evolution of E2E testing


1. What do we want from E2E testing?
2. E2E testing outcome
3. Current situation with E2E testing
4. Problems we have now with E2E testing
5. Ideal solution
6. The ideal solution outcome
7. Virtual users solution
8. Would the outcome be the same as in the ideal solution?
9. Some words about the design and implementation
10. Summary

## What do we want from E2E testing?
[[Software testing definition]]

Using E2E testing we want to make sure that from the real user perspective all flows work fine and that the user can successfully reach their goal using our application.

## E2E testing outcome
As a result of testing we would have the following artifacts:
1. All the testing scenarios with the % of testers/runs who successfully finish them.
2. Bug reports with enough info to reproduce the bugs.

Using this outcome we can decide if the new version of our app is ready to be released in production.

## Current situation with E2E testing
Currently, in order to implement E2E testing, we need to prepare/update the testing scenarios, prepare test data (test accounts, test cards, etc), ask manual testers to test the application using the testing scenarios, or automate running those scenarios using different tools and methods.

## Problems we have now with E2E testing
There are quite a few problems with E2E testing. We can enlist some:
1. Additional, sometimes considerable, efforts to keep testing scenarios up to date every time a new version is being released or some code changes are being done.
2. In the case of manual testing, we have to sacrifice some combination of user parameters like country, language, etc., in order to reduce the number of testing hours that can make the results of testing not so reliable.
3. In the case of automation, there are a lot of technical challenges on each platform, especially on mobile, mostly related to the way of searching the proper UI elements and reliability of interaction with them and with device/browser.


## Ideal solution
Then the question is how the ideal solution could look like. How do we see the perfect E2E testing?

What if we can have all our real users use our application, do whatever they need, and report to us if they find a bug or inconvenience. They should not remove the app and leave a bad rating in store or call our support, instead, they should just send us a well-prepared reports with enough info to reproduce and fix the bug.

This approach definitely satisfies the definition of the E2E testing and it doesn’t have those problems from the previous chapter and actually, it sounds like perfect E2E testing, at least from my point of view.

##  The ideal solution outcome
In the case of the ideal solution we would have the same artifacts as from usual E2E testing plus some more:
1. Using the business analytics and metrics that we anyway should have in our application we would know how many users participated, how many of them successfully do what they want, how many errors were caught, and a lot of other product specific metrics that we would not have in case of usual E2E testing.
2. Bug reports with enough info to reproduce the bugs.
3. Reports about services load from our backend
4. Reports about any errors in our systems, especially on the boundaries of the systems
5. We event can make some A/B testing before officially releasing the app :)

Sounds rather cool. Of course, there are its own difficulties around the usage of special environments and organization of the process but it seems possible to solve them as we’ve already done for E2E testing.

The next question is how we can make that dream come true without bothering our real users.

## Virtual users
What if we can invent a virtual user who would install our application or open our website and use it as the real user with some purpose in their mind (or maybe more correct - in our Neural Network). Then we summon many such users and they will be doing exactly what we described in the ideal solution.

However, there are two requirements that make the task rather hard:
1. Those users should behave differently in order to imitate the different real users. And they shouldn’t behave chaotically, they should be trying to reach their goal (buy a chair, search for a movie, or anything else).
2. In order to reduce the load to our environment and to the machinery that emulate those virtual users we’d like to make their number as small as possible but enough to cover all the functionality.

One way of solving both is to set up different types of users that will match the different groups of real users. For example, our app is a mega online store, so we can, probably, define the following groups of users using their intentions:
1. Search for one specific thing a lot, add many options in the basket, then choose one and buy it (intent to choose one thing from one category and buy it ASAP)
2. Search for a group of similar things, rarely add something, almost never buy (intent to explore a broad category of goods)
3. Come by a link from outside and immediately buy (intent to buy the specific thing)
4. …
I’m not an expert in user behavior, it’s just what comes to my mind first.
The main idea is that there are some common behavior patterns and they can be reproduced by our virtual users. However, those are not the testing scenarios. The testing scenarios are hardcoded and should give the tester or testing pipeline clear instructions: tap this, input that, check this, and so on. The behaviors or patterns are very general, it’s on a higher level of abstraction: search for black office chair under 100 pounds with an overall rating bigger than 4.5 and check all reviews so there are no any injures mentioned, then buy it using {card} and {address}. A virtual user will then imitate the real users - trying to understand UI, find the needed elements on the screen, and interact with them in the same way the real users do.

Using this approach we can summon few virtual users per group and “ask” them to use our app in the same way the real users do it.

##  Would the outcome from the virtual users solution be the same as in the ideal solution?
The outcome would be pretty much the same, so we will have the same stats about users who participated and successfully reached their goals, the same reports about any bugs that prevent our virtual users to reach their goals, reports about backend errors that didn’t impact user experience, and so on. What will be missing:
1. Load testing. It depends on the capacity of our environment and testing pipeline actually, so theoretically we can summon thousands or millions of virtual users but the costs of such testing would be super high.
2. Proper business metrics about the real users’ behavior. There are a lot of factors that impact our users’ behavior - from the political situation in the world to the weather and modern trends. But nothing impacts our virtual users - they will behave almost the same way each time. Sure thing we can retrain them if something really important changes in the world. So the data about virtual users behavior can’t help product owners to evolve the product but it can help them to measure how easy is to this or that thing using our application by measuring the time virtual users spend to reach their goals. It’s, actually, a really interesting idea to use the virtual users in order to do UX research of our apps. If an image on a button is not easily understandable for the virtual user it will be the same for the real one.

## Some words about the design and implementation
How would we implement the virtual users?

Let’s see what would we need to make the virtual users do what we want. There could be the following layers:
1. UI that we should already have in the app
2. A system that understands the UI. So it can extract all the UI elements, be able to interact with them, and understand their purpose using text or images on elements.
3. A system that can expand the goal into a series of steps to reach it. If you want to buy a black chair you need to search for one first, you want to use filters and keywords, then you want to check it out and finally pay for it.
4. The very high-level system that translates our text into the virtual user’s goal.

We can think of layer 3 as a core of the system, layer 4 as an input, layer 2 as an adapter from core to any UI (web, mobile, PC apps, or even something special like medical equipment with touch screens or industrial monitoring systems with hardware buttons).

How the example about the buying of a black office chair could work here?

### Layer 4
Layer 4 gets the text input, analyzes and extracts all the important features from it. It could look like this:
*  Main goal: buy
* Subject: chair
* Subject properties: black, office
* Limitations: price (100 pounds), rating (better than 4.5), reviews (not contain injures).
* Account: …
* Card: …

I suppose the best way to do that is to use machine learning and train it to extract the meaning from the text, probably, there are already some on the market.

### Layer 3
This one is a bit tricky. It should look like a problem solver and for each task provides the algorithm or the sequence of steps to solve it. In our example, the steps could look like this:
1. Open the search page
2. Search for a subject “chair” with keywords: black and office.
3. Check the list of options, if there is none -> return error
4. Filter out all the items with a price bigger than 100 and rating less than 4.5
5. Iterate the list and choose the first one where reviews don’t contain keyword injures
6. Proceed to checkout of the chosen item
7. If it’s available buy it using credentials {card}

I can see two ways of implementing this:
1. Use the real users’ analytics data, label a lot of different funnels and user’s journeys with goals and properties, and then use this data to train NN that can do the opposite - using a goal and properties suggests the sequence of events that lead to the goal.
2. Once layers 1 and 2 are ready use reinforcement learning to train NN to reach different goals in the real application.

I appreciate that I’m not an expert in machine learning and maybe there are many other ways to implement this layer but I’m sure that the task is very interesting, challenging, and, what is more important, promising as the results can be reused in many other fields like robotics, game AI and many more.

### Layer 2
This layer is very important as it gives us the real interface to any application. What I mean here is that UI is rather different in web, mobile apps, and PC apps and the way we can interact with the UI is different too. Layer 2 abstracts it and provides us with something more high level.

In our example, Layer 2 will be constantly analyzing the page it sees (using screenshots, some special tools like ‘adb’ for android or something else), extract all UI elements, and “understand” them, so when it receives from Layer 3 command “Open search page” it looks up something that has the meaning of search - button with the text “Search”, magnifying glass icon, menu with the text “Search”, etc, and then activating it by tapping using special tools or just tapping on screen using elements coordinates.

This layer alone can be rather useful for UI testing as it gives the opportunity to interact with any UI universally and does not depend on special ‘ids’ of elements, special tooling for interacting with internal UI representation.

We made a PoC for this layer and it looks very promising, hopefully, in the next articles, we can reveal more.

### Layer 1
This is just the UI of our app and the bidirectional way of interacting with it. We already mentioned some options in the previous section.

## Summary
In this article, we discussed one possible way of E2E testing evolution and suggested some ways of going there.
We think that the future of E2E testing in the hands of virtual users (AI) who can test our apps in the same way our users use them every day.
It’s definitely the challenge to make that AI work but each step to the end state can bring its own value and can reveal new unknown ways of testing, using the apps, and even thinking about the apps.

First of all, I’m super happy somebody reaches this far and I’ll be even happier to discuss this topic in comments to the article or in any other appropriate place. Feel free to share your ideas, critics, and comments.
	# Evolution of E2E testing


	1. What do we want from E2E testing?
	2. E2E testing outcome
	3. Current situation with E2E testing
	4. Problems we have now with E2E testing
	5. Ideal solution
	6. The ideal solution outcome
	7. Virtual users solution
	8. Would the outcome be the same as in the ideal solution?
	9. Some words about the design and implementation
	10. Summary

	## What do we want from E2E testing?
	[[Software testing definition]]

	Using E2E testing we want to make sure that from the real user perspective all flows work fine and that the user can successfully reach their goal using our application.

	## E2E testing outcome
	As a result of testing we would have the following artifacts:
	1. All the testing scenarios with the % of testers/runs who successfully finish them.
	2. Bug reports with enough info to reproduce the bugs.

	Using this outcome we can decide if the new version of our app is ready to be released in production.

	## Current situation with E2E testing
	Currently, in order to implement E2E testing, we need to prepare/update the testing scenarios, prepare test data (test accounts, test cards, etc), ask manual testers to test the application using the testing scenarios, or automate running those scenarios using different tools and methods.

	## Problems we have now with E2E testing
	There are quite a few problems with E2E testing. We can enlist some:
	1. Additional, sometimes considerable, efforts to keep testing scenarios up to date every time a new version is being released or some code changes are being done.
	2. In the case of manual testing, we have to sacrifice some combination of user parameters like country, language, etc., in order to reduce the number of testing hours that can make the results of testing not so reliable.
	3. In the case of automation, there are a lot of technical challenges on each platform, especially on mobile, mostly related to the way of searching the proper UI elements and reliability of interaction with them and with device/browser.


	## Ideal solution
	Then the question is how the ideal solution could look like. How do we see the perfect E2E testing?

	What if we can have all our real users use our application, do whatever they need, and report to us if they find a bug or inconvenience. They should not remove the app and leave a bad rating in store or call our support, instead, they should just send us a well-prepared reports with enough info to reproduce and fix the bug.

	This approach definitely satisfies the definition of the E2E testing and it doesn’t have those problems from the previous chapter and actually, it sounds like perfect E2E testing, at least from my point of view.

	## The ideal solution outcome
	In the case of the ideal solution we would have the same artifacts as from usual E2E testing plus some more:
	1. Using the business analytics and metrics that we anyway should have in our application we would know how many users participated, how many of them successfully do what they want, how many errors were caught, and a lot of other product specific metrics that we would not have in case of usual E2E testing.
	2. Bug reports with enough info to reproduce the bugs.
	3. Reports about services load from our backend
	4. Reports about any errors in our systems, especially on the boundaries of the systems
	5. We event can make some A/B testing before officially releasing the app :)

	Sounds rather cool. Of course, there are its own difficulties around the usage of special environments and organization of the process but it seems possible to solve them as we’ve already done for E2E testing.

	The next question is how we can make that dream come true without bothering our real users.

	## Virtual users
	What if we can invent a virtual user who would install our application or open our website and use it as the real user with some purpose in their mind (or maybe more correct - in our Neural Network). Then we summon many such users and they will be doing exactly what we described in the ideal solution.

	However, there are two requirements that make the task rather hard:
	1. Those users should behave differently in order to imitate the different real users. And they shouldn’t behave chaotically, they should be trying to reach their goal (buy a chair, search for a movie, or anything else).
	2. In order to reduce the load to our environment and to the machinery that emulate those virtual users we’d like to make their number as small as possible but enough to cover all the functionality.

	One way of solving both is to set up different types of users that will match the different groups of real users. For example, our app is a mega online store, so we can, probably, define the following groups of users using their intentions:
	1. Search for one specific thing a lot, add many options in the basket, then choose one and buy it (intent to choose one thing from one category and buy it ASAP)
	2. Search for a group of similar things, rarely add something, almost never buy (intent to explore a broad category of goods)
	3. Come by a link from outside and immediately buy (intent to buy the specific thing)
	4. …
	I’m not an expert in user behavior, it’s just what comes to my mind first.
	The main idea is that there are some common behavior patterns and they can be reproduced by our virtual users. However, those are not the testing scenarios. The testing scenarios are hardcoded and should give the tester or testing pipeline clear instructions: tap this, input that, check this, and so on. The behaviors or patterns are very general, it’s on a higher level of abstraction: search for black office chair under 100 pounds with an overall rating bigger than 4.5 and check all reviews so there are no any injures mentioned, then buy it using {card} and {address}. A virtual user will then imitate the real users - trying to understand UI, find the needed elements on the screen, and interact with them in the same way the real users do.

	Using this approach we can summon few virtual users per group and “ask” them to use our app in the same way the real users do it.

	## Would the outcome from the virtual users solution be the same as in the ideal solution?
	The outcome would be pretty much the same, so we will have the same stats about users who participated and successfully reached their goals, the same reports about any bugs that prevent our virtual users to reach their goals, reports about backend errors that didn’t impact user experience, and so on. What will be missing:
	1. Load testing. It depends on the capacity of our environment and testing pipeline actually, so theoretically we can summon thousands or millions of virtual users but the costs of such testing would be super high.
	2. Proper business metrics about the real users’ behavior. There are a lot of factors that impact our users’ behavior - from the political situation in the world to the weather and modern trends. But nothing impacts our virtual users - they will behave almost the same way each time. Sure thing we can retrain them if something really important changes in the world. So the data about virtual users behavior can’t help product owners to evolve the product but it can help them to measure how easy is to this or that thing using our application by measuring the time virtual users spend to reach their goals. It’s, actually, a really interesting idea to use the virtual users in order to do UX research of our apps. If an image on a button is not easily understandable for the virtual user it will be the same for the real one.

	## Some words about the design and implementation
	How would we implement the virtual users?

	Let’s see what would we need to make the virtual users do what we want. There could be the following layers:
	1. UI that we should already have in the app
	2. A system that understands the UI. So it can extract all the UI elements, be able to interact with them, and understand their purpose using text or images on elements.
	3. A system that can expand the goal into a series of steps to reach it. If you want to buy a black chair you need to search for one first, you want to use filters and keywords, then you want to check it out and finally pay for it.
	4. The very high-level system that translates our text into the virtual user’s goal.

	We can think of layer 3 as a core of the system, layer 4 as an input, layer 2 as an adapter from core to any UI (web, mobile, PC apps, or even something special like medical equipment with touch screens or industrial monitoring systems with hardware buttons).

	How the example about the buying of a black office chair could work here?

	### Layer 4
	Layer 4 gets the text input, analyzes and extracts all the important features from it. It could look like this:
	* Main goal: buy
	* Subject: chair
	* Subject properties: black, office
	* Limitations: price (100 pounds), rating (better than 4.5), reviews (not contain injures).
	* Account: …
	* Card: …

	I suppose the best way to do that is to use machine learning and train it to extract the meaning from the text, probably, there are already some on the market.

	### Layer 3
	This one is a bit tricky. It should look like a problem solver and for each task provides the algorithm or the sequence of steps to solve it. In our example, the steps could look like this:
	1. Open the search page
	2. Search for a subject “chair” with keywords: black and office.
	3. Check the list of options, if there is none -> return error
	4. Filter out all the items with a price bigger than 100 and rating less than 4.5
	5. Iterate the list and choose the first one where reviews don’t contain keyword injures
	6. Proceed to checkout of the chosen item
	7. If it’s available buy it using credentials {card}

	I can see two ways of implementing this:
	1. Use the real users’ analytics data, label a lot of different funnels and user’s journeys with goals and properties, and then use this data to train NN that can do the opposite - using a goal and properties suggests the sequence of events that lead to the goal.
	2. Once layers 1 and 2 are ready use reinforcement learning to train NN to reach different goals in the real application.

	I appreciate that I’m not an expert in machine learning and maybe there are many other ways to implement this layer but I’m sure that the task is very interesting, challenging, and, what is more important, promising as the results can be reused in many other fields like robotics, game AI and many more.

	### Layer 2
	This layer is very important as it gives us the real interface to any application. What I mean here is that UI is rather different in web, mobile apps, and PC apps and the way we can interact with the UI is different too. Layer 2 abstracts it and provides us with something more high level.

	In our example, Layer 2 will be constantly analyzing the page it sees (using screenshots, some special tools like ‘adb’ for android or something else), extract all UI elements, and “understand” them, so when it receives from Layer 3 command “Open search page” it looks up something that has the meaning of search - button with the text “Search”, magnifying glass icon, menu with the text “Search”, etc, and then activating it by tapping using special tools or just tapping on screen using elements coordinates.

	This layer alone can be rather useful for UI testing as it gives the opportunity to interact with any UI universally and does not depend on special ‘ids’ of elements, special tooling for interacting with internal UI representation.

	We made a PoC for this layer and it looks very promising, hopefully, in the next articles, we can reveal more.

	### Layer 1
	This is just the UI of our app and the bidirectional way of interacting with it. We already mentioned some options in the previous section.

	## Summary
	In this article, we discussed one possible way of E2E testing evolution and suggested some ways of going there.
	We think that the future of E2E testing in the hands of virtual users (AI) who can test our apps in the same way our users use them every day.
	It’s definitely the challenge to make that AI work but each step to the end state can bring its own value and can reveal new unknown ways of testing, using the apps, and even thinking about the apps.

	First of all, I’m super happy somebody reaches this far and I’ll be even happier to discuss this topic in comments to the article or in any other appropriate place. Feel free to share your ideas, critics, and comments.