Skip to content

Instantly share code, notes, and snippets.

@vkargov
Created November 8, 2018 05:27
Show Gist options
  • Save vkargov/708c38e809cb1d476e39a480e7f53f19 to your computer and use it in GitHub Desktop.
Save vkargov/708c38e809cb1d476e39a480e7f53f19 to your computer and use it in GitHub Desktop.
C4.5 upper error limit formula
In his seminal article "C4.5. Programs for Machine Learning." Quinlan uses a criterion U_CF for determining error limits in nodes of decision trees. This criterion is important as it drives the tree pruning heuristic. Problem is, no clear formula is provided in the article. However, the formula can be found in the source code for C4.5 and C5.0:
U_CF(E, N) ≡ RawExtraErrs(N, E)/N, with RawExtraErrs defined in pruning.c
To verify the article results, you can insert the following line into main() or whatever:
#define U_CF(E, N, expectation) printf("U_CF(%d, %d) = %0.3f (expected to be " #expectation ")\n", E, N, RawExtraErrs(N, E)/N)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment