wmealing/Leveraging non important flaws in exploit chains.txt

## Leveraging non important flaws in exploit chains.txt
Abstract

This paper intends to demonstrate how to score the importance of lower impact flaws can be chained together to allow
higher impacting vulnerabilities to be exploited correctly in a single package.  A common vocabulary and scoring system
will be established and a few of the current high-profile exploit chains being used in pwn2win and chrome exploit
challenge will be explained and scored in this system to show where they lie.

1. Introduction

Software vendors analyse and score security flaws based on their existence without considering existing unfixed flaws
in the system.  Significantly complex systems with long life cycles can have multiple analysts dealing with flaws and
they can be unable to keep the history of unfixed issues in mind to know which flaw to prioritize.

Using chains of exploits to attack a system is a common technique used by attackers.  Chained exploits are complex and
far more difficult to defend against and project-developers do not have the birds-eye view of a system to have either
the data or the the agency to use the gathered data.

In the ideal world, every security flaw found would be fixed,  this method is not a financially sustainable method of
fixing flaws.  This document introduces a method of determining when to fix flaws based on the how flaws are chained
together to do reach further.

The rest of this paper first discusses related work in Section 2, and then describes our implementation in Section 3.
Section 4 describes how we evaluated our system and presents the results. Section 5 presents our conclusions and
describes future work.

2. Related Work

Tenable has a product name “Predictive prioritization”, this product is a tool that attempts to rank flaws based on the
configuration of the host in the context of its use.   This is particularly useful for customized reporting based on
system configuration.

Dependency Drift attempts is a metric to classify the risk software dependencies aging, which deals with the complexity of having to update the parent package and ensuring the software still 'works' with the updated releases.

https://www.sophos.com/en-us/medialibrary/Gated-Assets/white-papers/Sophos-Comprehensive-Exploit-Prevention-wpna.pdf
https://www.amazon.com/Chained-Exploits-Advanced-Hacking-Attacks/dp/032149881X
Computer incident response teams - CISCO PRESS ( talks about it)

2.1  Other related work.

CVSSV3 scoring system is a system which classifies an individual flaw for easier understanding  .  It is very useful
in understanding the flaw in generic context.  It has greatly standardized the terminology and understanding of attack
mechanisms and impact of the exploit.

CVSS has gone through many revisions in an attempt to better represent the flaw with clarity.  They have achieved a
high quality of documentation around the scoring system and widespread industry.

3. Implementation

Traditional analysts define the risk level of each flaw as

  [risk level] = [value of resource] x [likelihood of exploit]

This becomes part of the problem we’re trying to solve.

  [flaw fix priority] = [likelihood of being used in chain] x [component weighting] x [risk level]

This currently only considers individual components and not libraries which are considered to be sub-components of other larger binaries.  For example, glibc is a subcomponent of almost every application.  This idea is tabled for a later discussion.

3.1 - Calculating the likelihood of being used in chain

From initial analysis of the writeups of a sample size of 9 writeups.  The terminology is used across both kernel
and userspace.  Ranked below is the order of mentions in the documentation.

* Misconfiguration   ( 11% )
* Uncontrolled read  ( 66% )
* Controlled Read    ( 11% )
* Uncontrolled Write ( 33% )
* Controlled Write   ( 22% )

Terminology such as 'buffer overflow' were grouped into 'uncontrolled write'. Multiple flaw types can be used in a
single exploit chain as a keen reader would expect..

Any any given time there is any number of known and unknown flaws in a package.  A dedicated attacker who is finds a
higher impact flaw will likely need to use the known unfixed low-moderate rated flaws to defeat protection mechanisms or
leak secrets used in the higher impact flaws.

One problem that attackers face is that not all low-impact flaws are not always available through attacker accessible
paths.  For this reason the greater number of low-impact flaws across a code base increases the probability that one of
these will be able to be used as part of the exploit chain.

|========================|
| Intelligence Gathering |
|========================|
            |
|========================|
| Defeating mitigation   |<─┬───────┐
|========================|  │       │
            |               │       │
|========================|  │       │
| Exploitation           |──┘       │
|========================|          │
            |                       │
|========================|          │
| Privilege Permanence   |──────────┘
|========================|
            |
|========================|
| Clean up               |
|========================|

The graph is not always a single direction and the process can frequently require going back to previous steps.  Because of this the same minor impact flaws to be repeatedly misused and their effective impact increased.

3.2  The proposed formula for weighting could be:

[weighting] = ([unfixed count] /  [size] ) * [maturity rating]

Size.

The size of executable code within a component should also be a consideration.   Larger codebases with few flaws are
less likely to contain a usable 'gadget' for an attacker compared to a smaller codebase with a few flaws.

Maturity.

Project maturity should also be a consideration. Larger mature codebases with few unfixed flaws would reduce the
probability that one of these known flaws would be used as part of an exploit change.  Also considering the inverse,
smaller newer codebases with a few unfixed exploits would mean that there are a larger chance of one of these flaws to be
used as part of a chain if this component was to be abused.

4. Evaluation

This theory has yet to be validated, but could be tested with existing data on any significant component with a number of flaws categorized correctly.

The resulting values will provide a measure of which component is most at risk of 'degredation' that should be fixed.


5. Conclusions and Future Work

At the moment there is no method of understanding which lows and moderates in which component is worth fixing.  As at organization level we should work on improving security debt and without a measure to do so any reccomendations or comparisons are blind and guesswork.

We can further improve this by looking at cross-component states when binaries rely on libraries or services.


References

https://nimbleindustries.io/2020/01/31/dependency-drift-a-metric-for-software-aging/
	Abstract

	This paper intends to demonstrate how to score the importance of lower impact flaws can be chained together to allow
	higher impacting vulnerabilities to be exploited correctly in a single package. A common vocabulary and scoring system
	will be established and a few of the current high-profile exploit chains being used in pwn2win and chrome exploit
	challenge will be explained and scored in this system to show where they lie.

	1. Introduction

	Software vendors analyse and score security flaws based on their existence without considering existing unfixed flaws
	in the system. Significantly complex systems with long life cycles can have multiple analysts dealing with flaws and
	they can be unable to keep the history of unfixed issues in mind to know which flaw to prioritize.

	Using chains of exploits to attack a system is a common technique used by attackers. Chained exploits are complex and
	far more difficult to defend against and project-developers do not have the birds-eye view of a system to have either
	the data or the the agency to use the gathered data.

	In the ideal world, every security flaw found would be fixed, this method is not a financially sustainable method of
	fixing flaws. This document introduces a method of determining when to fix flaws based on the how flaws are chained
	together to do reach further.

	The rest of this paper first discusses related work in Section 2, and then describes our implementation in Section 3.
	Section 4 describes how we evaluated our system and presents the results. Section 5 presents our conclusions and
	describes future work.

	2. Related Work

	Tenable has a product name “Predictive prioritization”, this product is a tool that attempts to rank flaws based on the
	configuration of the host in the context of its use. This is particularly useful for customized reporting based on
	system configuration.

	Dependency Drift attempts is a metric to classify the risk software dependencies aging, which deals with the complexity of having to update the parent package and ensuring the software still 'works' with the updated releases.

	https://www.sophos.com/en-us/medialibrary/Gated-Assets/white-papers/Sophos-Comprehensive-Exploit-Prevention-wpna.pdf
	https://www.amazon.com/Chained-Exploits-Advanced-Hacking-Attacks/dp/032149881X
	Computer incident response teams - CISCO PRESS ( talks about it)

	2.1 Other related work.

	CVSSV3 scoring system is a system which classifies an individual flaw for easier understanding . It is very useful
	in understanding the flaw in generic context. It has greatly standardized the terminology and understanding of attack
	mechanisms and impact of the exploit.

	CVSS has gone through many revisions in an attempt to better represent the flaw with clarity. They have achieved a
	high quality of documentation around the scoring system and widespread industry.

	3. Implementation

	Traditional analysts define the risk level of each flaw as

	[risk level] = [value of resource] x [likelihood of exploit]

	This becomes part of the problem we’re trying to solve.

	[flaw fix priority] = [likelihood of being used in chain] x [component weighting] x [risk level]

	This currently only considers individual components and not libraries which are considered to be sub-components of other larger binaries. For example, glibc is a subcomponent of almost every application. This idea is tabled for a later discussion.

	3.1 - Calculating the likelihood of being used in chain

	From initial analysis of the writeups of a sample size of 9 writeups. The terminology is used across both kernel
	and userspace. Ranked below is the order of mentions in the documentation.

	* Misconfiguration ( 11% )
	* Uncontrolled read ( 66% )
	* Controlled Read ( 11% )
	* Uncontrolled Write ( 33% )
	* Controlled Write ( 22% )

	Terminology such as 'buffer overflow' were grouped into 'uncontrolled write'. Multiple flaw types can be used in a
	single exploit chain as a keen reader would expect..

	Any any given time there is any number of known and unknown flaws in a package. A dedicated attacker who is finds a
	higher impact flaw will likely need to use the known unfixed low-moderate rated flaws to defeat protection mechanisms or
	leak secrets used in the higher impact flaws.

	One problem that attackers face is that not all low-impact flaws are not always available through attacker accessible
	paths. For this reason the greater number of low-impact flaws across a code base increases the probability that one of
	these will be able to be used as part of the exploit chain.

	\|========================\|
	\| Intelligence Gathering \|
	\|========================\|
	\|
	\|========================\|
	\| Defeating mitigation \|<─┬───────┐
	\|========================\| │ │
	\| │ │
	\|========================\| │ │
	\| Exploitation \|──┘ │
	\|========================\| │
	\| │
	\|========================\| │
	\| Privilege Permanence \|──────────┘
	\|========================\|
	\|
	\|========================\|
	\| Clean up \|
	\|========================\|

	The graph is not always a single direction and the process can frequently require going back to previous steps. Because of this the same minor impact flaws to be repeatedly misused and their effective impact increased.

	3.2 The proposed formula for weighting could be:

	[weighting] = ([unfixed count] / [size] ) * [maturity rating]

	Size.

	The size of executable code within a component should also be a consideration. Larger codebases with few flaws are
	less likely to contain a usable 'gadget' for an attacker compared to a smaller codebase with a few flaws.

	Maturity.

	Project maturity should also be a consideration. Larger mature codebases with few unfixed flaws would reduce the
	probability that one of these known flaws would be used as part of an exploit change. Also considering the inverse,
	smaller newer codebases with a few unfixed exploits would mean that there are a larger chance of one of these flaws to be
	used as part of a chain if this component was to be abused.

	4. Evaluation

	This theory has yet to be validated, but could be tested with existing data on any significant component with a number of flaws categorized correctly.

	The resulting values will provide a measure of which component is most at risk of 'degredation' that should be fixed.


	5. Conclusions and Future Work

	At the moment there is no method of understanding which lows and moderates in which component is worth fixing. As at organization level we should work on improving security debt and without a measure to do so any reccomendations or comparisons are blind and guesswork.

	We can further improve this by looking at cross-component states when binaries rely on libraries or services.


	References

	https://nimbleindustries.io/2020/01/31/dependency-drift-a-metric-for-software-aging/