High Definition Binning {#HD_binning} ===================== The process of binning (or discretization) of variables is a well-established practice in building credit scorecards. The binning process involves taking raw values e.g. income and cutting that data into bins (discrete ranges) such as 2000-3000, 3000-4000. Typically we would see an upward trend in terms of Good/Bad Odds as the income levels go up. In this blog post I would like to explain a novel approach to binning that can produce very fine binning. Automatic Binary Binning Algorithm (ABBA) --------- To understand how High Definition Binning can be done we need to first accept that there exists algorithm that can help automatically bin the raw factors. One such algorithm is the Automatic Binary Binning Algorithm. It can bin the variable subject to certain GD odds trends being satisfied, e.g. upward trend in GB odds after binning. Bootstrap --------- High definition binning is basically an application of bootstrapping with the ABBA algorithm. Let's define what bootstrapping is: treat your dataset as the whole universe. I think if you have applied bootstrapping before you might have a misconception that bootstrapping is about sampling. Actually it is not. Let me explain. Suppose your dataset consists of only 3 records: call them A, B, and C. If those 3 records were your whole universe, then there are only $$3^3 = 27$$ ways to obtain a set of 3 records from it: Sample No. | Sample --------- | ----- 1 | A, A, A 2 | A, A, B 3 | A, A, C 4 | A, B, A ... | ... 26 | C, C, B 27 | C, C, C If you compute some summary statistics such as average of the sample's income etc then for each of the possible 27 samples you will end up as a potentially different number. Now the 27 different numbers form a distribution and you can analyse this distribution and make inferences. Now most datasets contains more than 3 rows, in credit scoring it is common for one to have tens of millions of rows of data to work with. This is where sampling comes in. When you dataset is large it's impossible to enumerate all possible combinations for your dataset, hence you would need to sample with replacement for a large number of times to derive bootstrapped sample. # High definition binning In credit scoring each binning is basically a step function that map a raw value to a WOE. So it would look something like this. http://biostat.mc.vanderbilt.edu/wiki/Main/CatContinuous ### Definition Lists **Markdown Extra** has a special syntax for definition lists too: Term 1 Term 2 : Definition A : Definition B Term 3 : Definition C : Definition D > part of definition D ### Fenced code blocks GitHub's fenced code blocks[^gfm] are also supported with **Prettify** syntax highlighting:  // Foo var bar = 0;  > **Tip:** To use **Highlight.js** instead of **Prettify**, just configure the Markdown Extra extension in the Settings dialog. ### Footnotes You can create footnotes like this[^footnote]. [^footnote]: Here is the *text* of the **footnote**. ### SmartyPants SmartyPants converts ASCII punctuation characters into "smart" typographic punctuation HTML entities. For example: | | ASCII | HTML | ------------------|------------------------------------------|------------------------------------- | Single backticks | 'Isn't this fun?' | ‘Isn’t this fun?’ | | Quotes | "Isn't this fun?" | “Isn’t this fun?” | | Dashes | -- is an en-dash and --- is an em-dash | – is an en-dash and — is an em-dash | ### Table of contents You can insert a table of contents using the marker [TOC]: [TOC] ### Comments Usually, comments in Markdown are just standard HTML comments. **StackEdit** extends HTML comments in order to produce useful, highlighted comments in the preview but not in your exported documents. ### MathJax You can render *LaTeX* mathematical expressions using **MathJax**, as on [math.stackexchange.com][1]: The *Gamma function* satisfying $\Gamma(n) = (n-1)!\quad\forall n\in\mathbb N$ is via the Euler integral $$\Gamma(z) = \int_0^\infty t^{z-1}e^{-t}dt\,.$$ > **Tip:** Make sure you include MathJax into your publications to render mathematical expression correctly. Your page/template should include something like:   > **NOTE:** You can find more information: > > - about **Markdown** syntax [here][2], > - about **Markdown Extra** extension [here][3], > - about **LaTeX** mathematical expressions [here][4], > - about **Prettify** syntax highlighting [here][5], > - about **Highlight.js** syntax highlighting [here][6]. [^stackedit]: [StackEdit](https://stackedit.io/) is a full-featured, open-source Markdown editor based on PageDown, the Markdown library used by Stack Overflow and the other Stack Exchange sites. [^gfm]: **GitHub Flavored Markdown** (GFM) is supported by StackEdit. [1]: http://math.stackexchange.com/ [2]: http://daringfireball.net/projects/markdown/syntax "Markdown" [3]: https://github.com/jmcmanus/pagedown-extra "Pagedown Extra" [4]: http://meta.math.stackexchange.com/questions/5020/mathjax-basic-tutorial-and-quick-reference [5]: https://code.google.com/p/google-code-prettify/ [6]: http://highlightjs.org/