Skip to content

Instantly share code, notes, and snippets.

@kevinywlui kevinywlui/test.html
Last active Sep 19, 2019

Embed
What would you like to do?
<!DOCTYPE html>
<!-- saved from url=(0071)https://scottroy.github.io/implementing-a-neural-network-in-python.html -->
<html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>
Implementing a neural network in Python | statsandstuff
</title>
<meta name="viewport" content="width=device-width, initial-scale=1">
<link rel="stylesheet" href="./Implementing a neural network in Python _ statsandstuff_files/main.css">
<link rel="stylesheet" href="./Implementing a neural network in Python _ statsandstuff_files/syntax.css">
<!-- Use Atom -->
<link type="application/atom+xml" rel="alternate" href="https://scottroy.github.io/feed.xml" title="statsandstuff">
<!-- Use RSS-2.0 -->
<!--<link href="https://scottroy.github.io/rss-feed.xml" type="application/rss+xml" rel="alternate" title="statsandstuff | a blog on statistics and machine learning"/>
//-->
<link rel="stylesheet" href="./Implementing a neural network in Python _ statsandstuff_files/css">
<link rel="stylesheet" href="./Implementing a neural network in Python _ statsandstuff_files/css(1)">
<link rel="stylesheet" href="./Implementing a neural network in Python _ statsandstuff_files/css(2)">
<link rel="stylesheet" href="./Implementing a neural network in Python _ statsandstuff_files/font-awesome.min.css">
<script async="" src="./Implementing a neural network in Python _ statsandstuff_files/analytics.js"></script><script type="text/javascript" async="" src="./Implementing a neural network in Python _ statsandstuff_files/MathJax.js">
</script>
<!-- Google Analytics -->
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-135466463-1', 'auto');
ga('send', 'pageview');
</script>
<meta name="author" content="Scott Roy">
<meta property="og:locale" content="en_US">
<meta property="og:description" content="In this post, I walk through implementing a basic feed forward deep neural network in Python from scratch. See Introduction to neural networks for an overview of neural networks. The...">
<meta property="description" content="In this post, I walk through implementing a basic feed forward deep neural network in Python from scratch. See Introduction to neural networks for an overview of neural networks. The...">
<meta property="og:title" content="Implementing a neural network in Python">
<meta property="og:site_name" content="statsandstuff">
<meta property="og:type" content="article">
<meta property="og:url" content="https://scottroy.github.io/implementing-a-neural-network-in-python.html">
<meta property="og:image" content="https://scottroy.github.io/assets/img/backprop_prevoutput.png">
<meta property="og:image:secure_url" content="https://scottroy.github.io/assets/img/backprop_prevoutput.png">
<meta property="og:image:width" content="1200">
<meta property="og:image:height" content="630">
<style type="text/css">.MathJax_Hover_Frame {border-radius: .25em; -webkit-border-radius: .25em; -moz-border-radius: .25em; -khtml-border-radius: .25em; box-shadow: 0px 0px 15px #83A; -webkit-box-shadow: 0px 0px 15px #83A; -moz-box-shadow: 0px 0px 15px #83A; -khtml-box-shadow: 0px 0px 15px #83A; border: 1px solid #A6D ! important; display: inline-block; position: absolute}
.MathJax_Menu_Button .MathJax_Hover_Arrow {position: absolute; cursor: pointer; display: inline-block; border: 2px solid #AAA; border-radius: 4px; -webkit-border-radius: 4px; -moz-border-radius: 4px; -khtml-border-radius: 4px; font-family: 'Courier New',Courier; font-size: 9px; color: #F0F0F0}
.MathJax_Menu_Button .MathJax_Hover_Arrow span {display: block; background-color: #AAA; border: 1px solid; border-radius: 3px; line-height: 0; padding: 4px}
.MathJax_Hover_Arrow:hover {color: white!important; border: 2px solid #CCC!important}
.MathJax_Hover_Arrow:hover span {background-color: #CCC!important}
</style><style type="text/css">#MathJax_About {position: fixed; left: 50%; width: auto; text-align: center; border: 3px outset; padding: 1em 2em; background-color: #DDDDDD; color: black; cursor: default; font-family: message-box; font-size: 120%; font-style: normal; text-indent: 0; text-transform: none; line-height: normal; letter-spacing: normal; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; z-index: 201; border-radius: 15px; -webkit-border-radius: 15px; -moz-border-radius: 15px; -khtml-border-radius: 15px; box-shadow: 0px 10px 20px #808080; -webkit-box-shadow: 0px 10px 20px #808080; -moz-box-shadow: 0px 10px 20px #808080; -khtml-box-shadow: 0px 10px 20px #808080; filter: progid:DXImageTransform.Microsoft.dropshadow(OffX=2, OffY=2, Color='gray', Positive='true')}
#MathJax_About.MathJax_MousePost {outline: none}
.MathJax_Menu {position: absolute; background-color: white; color: black; width: auto; padding: 5px 0px; border: 1px solid #CCCCCC; margin: 0; cursor: default; font: menu; text-align: left; text-indent: 0; text-transform: none; line-height: normal; letter-spacing: normal; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; z-index: 201; border-radius: 5px; -webkit-border-radius: 5px; -moz-border-radius: 5px; -khtml-border-radius: 5px; box-shadow: 0px 10px 20px #808080; -webkit-box-shadow: 0px 10px 20px #808080; -moz-box-shadow: 0px 10px 20px #808080; -khtml-box-shadow: 0px 10px 20px #808080; filter: progid:DXImageTransform.Microsoft.dropshadow(OffX=2, OffY=2, Color='gray', Positive='true')}
.MathJax_MenuItem {padding: 1px 2em; background: transparent}
.MathJax_MenuArrow {position: absolute; right: .5em; padding-top: .25em; color: #666666; font-size: .75em}
.MathJax_MenuActive .MathJax_MenuArrow {color: white}
.MathJax_MenuArrow.RTL {left: .5em; right: auto}
.MathJax_MenuCheck {position: absolute; left: .7em}
.MathJax_MenuCheck.RTL {right: .7em; left: auto}
.MathJax_MenuRadioCheck {position: absolute; left: .7em}
.MathJax_MenuRadioCheck.RTL {right: .7em; left: auto}
.MathJax_MenuLabel {padding: 1px 2em 3px 1.33em; font-style: italic}
.MathJax_MenuRule {border-top: 1px solid #DDDDDD; margin: 4px 3px}
.MathJax_MenuDisabled {color: GrayText}
.MathJax_MenuActive {background-color: #606872; color: white}
.MathJax_MenuDisabled:focus, .MathJax_MenuLabel:focus {background-color: #E8E8E8}
.MathJax_ContextMenu:focus {outline: none}
.MathJax_ContextMenu .MathJax_MenuItem:focus {outline: none}
#MathJax_AboutClose {top: .2em; right: .2em}
.MathJax_Menu .MathJax_MenuClose {top: -10px; left: -10px}
.MathJax_MenuClose {position: absolute; cursor: pointer; display: inline-block; border: 2px solid #AAA; border-radius: 18px; -webkit-border-radius: 18px; -moz-border-radius: 18px; -khtml-border-radius: 18px; font-family: 'Courier New',Courier; font-size: 24px; color: #F0F0F0}
.MathJax_MenuClose span {display: block; background-color: #AAA; border: 1.5px solid; border-radius: 18px; -webkit-border-radius: 18px; -moz-border-radius: 18px; -khtml-border-radius: 18px; line-height: 0; padding: 8px 0 6px}
.MathJax_MenuClose:hover {color: white!important; border: 2px solid #CCC!important}
.MathJax_MenuClose:hover span {background-color: #CCC!important}
.MathJax_MenuClose:hover:focus {outline: none}
</style><style type="text/css">.MathJax_Preview .MJXf-math {color: inherit!important}
</style><style type="text/css">.MJX_Assistive_MathML {position: absolute!important; top: 0; left: 0; clip: rect(1px, 1px, 1px, 1px); padding: 1px 0 0 0!important; border: 0!important; height: 1px!important; width: 1px!important; overflow: hidden!important; display: block!important; -webkit-touch-callout: none; -webkit-user-select: none; -khtml-user-select: none; -moz-user-select: none; -ms-user-select: none; user-select: none}
.MJX_Assistive_MathML.MJX_Assistive_MathML_Block {width: 100%!important}
</style><style type="text/css">#MathJax_Zoom {position: absolute; background-color: #F0F0F0; overflow: auto; display: block; z-index: 301; padding: .5em; border: 1px solid black; margin: 0; font-weight: normal; font-style: normal; text-align: left; text-indent: 0; text-transform: none; line-height: normal; letter-spacing: normal; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; -webkit-box-sizing: content-box; -moz-box-sizing: content-box; box-sizing: content-box; box-shadow: 5px 5px 15px #AAAAAA; -webkit-box-shadow: 5px 5px 15px #AAAAAA; -moz-box-shadow: 5px 5px 15px #AAAAAA; -khtml-box-shadow: 5px 5px 15px #AAAAAA; filter: progid:DXImageTransform.Microsoft.dropshadow(OffX=2, OffY=2, Color='gray', Positive='true')}
#MathJax_ZoomOverlay {position: absolute; left: 0; top: 0; z-index: 300; display: inline-block; width: 100%; height: 100%; border: 0; padding: 0; margin: 0; background-color: white; opacity: 0; filter: alpha(opacity=0)}
#MathJax_ZoomFrame {position: relative; display: inline-block; height: 0; width: 0}
#MathJax_ZoomEventTrap {position: absolute; left: 0; top: 0; z-index: 302; display: inline-block; border: 0; padding: 0; margin: 0; background-color: white; opacity: 0; filter: alpha(opacity=0)}
</style><style type="text/css">.MathJax_Preview {color: #888}
#MathJax_Message {position: fixed; left: 1em; bottom: 1.5em; background-color: #E6E6E6; border: 1px solid #959595; margin: 0px; padding: 2px 8px; z-index: 102; color: black; font-size: 80%; width: auto; white-space: nowrap}
#MathJax_MSIE_Frame {position: absolute; top: 0; left: 0; width: 0px; z-index: 101; border: 0px; margin: 0px; padding: 0px}
.MathJax_Error {color: #CC0000; font-style: italic}
</style><style type="text/css">.MJXp-script {font-size: .8em}
.MJXp-right {-webkit-transform-origin: right; -moz-transform-origin: right; -ms-transform-origin: right; -o-transform-origin: right; transform-origin: right}
.MJXp-bold {font-weight: bold}
.MJXp-italic {font-style: italic}
.MJXp-scr {font-family: MathJax_Script,'Times New Roman',Times,STIXGeneral,serif}
.MJXp-frak {font-family: MathJax_Fraktur,'Times New Roman',Times,STIXGeneral,serif}
.MJXp-sf {font-family: MathJax_SansSerif,'Times New Roman',Times,STIXGeneral,serif}
.MJXp-cal {font-family: MathJax_Caligraphic,'Times New Roman',Times,STIXGeneral,serif}
.MJXp-mono {font-family: MathJax_Typewriter,'Times New Roman',Times,STIXGeneral,serif}
.MJXp-largeop {font-size: 150%}
.MJXp-largeop.MJXp-int {vertical-align: -.2em}
.MJXp-math {display: inline-block; line-height: 1.2; text-indent: 0; font-family: 'Times New Roman',Times,STIXGeneral,serif; white-space: nowrap; border-collapse: collapse}
.MJXp-display {display: block; text-align: center; margin: 1em 0}
.MJXp-math span {display: inline-block}
.MJXp-box {display: block!important; text-align: center}
.MJXp-box:after {content: " "}
.MJXp-rule {display: block!important; margin-top: .1em}
.MJXp-char {display: block!important}
.MJXp-mo {margin: 0 .15em}
.MJXp-mfrac {margin: 0 .125em; vertical-align: .25em}
.MJXp-denom {display: inline-table!important; width: 100%}
.MJXp-denom > * {display: table-row!important}
.MJXp-surd {vertical-align: top}
.MJXp-surd > * {display: block!important}
.MJXp-script-box > * {display: table!important; height: 50%}
.MJXp-script-box > * > * {display: table-cell!important; vertical-align: top}
.MJXp-script-box > *:last-child > * {vertical-align: bottom}
.MJXp-script-box > * > * > * {display: block!important}
.MJXp-mphantom {visibility: hidden}
.MJXp-munderover, .MJXp-munder {display: inline-table!important}
.MJXp-over {display: inline-block!important; text-align: center}
.MJXp-over > * {display: block!important}
.MJXp-munderover > *, .MJXp-munder > * {display: table-row!important}
.MJXp-mtable {vertical-align: .25em; margin: 0 .125em}
.MJXp-mtable > * {display: inline-table!important; vertical-align: middle}
.MJXp-mtr {display: table-row!important}
.MJXp-mtd {display: table-cell!important; text-align: center; padding: .5em 0 0 .5em}
.MJXp-mtr > .MJXp-mtd:first-child {padding-left: 0}
.MJXp-mtr:first-child > .MJXp-mtd {padding-top: 0}
.MJXp-mlabeledtr {display: table-row!important}
.MJXp-mlabeledtr > .MJXp-mtd:first-child {padding-left: 0}
.MJXp-mlabeledtr:first-child > .MJXp-mtd {padding-top: 0}
.MJXp-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 1px 3px; font-style: normal; font-size: 90%}
.MJXp-scale0 {-webkit-transform: scaleX(.0); -moz-transform: scaleX(.0); -ms-transform: scaleX(.0); -o-transform: scaleX(.0); transform: scaleX(.0)}
.MJXp-scale1 {-webkit-transform: scaleX(.1); -moz-transform: scaleX(.1); -ms-transform: scaleX(.1); -o-transform: scaleX(.1); transform: scaleX(.1)}
.MJXp-scale2 {-webkit-transform: scaleX(.2); -moz-transform: scaleX(.2); -ms-transform: scaleX(.2); -o-transform: scaleX(.2); transform: scaleX(.2)}
.MJXp-scale3 {-webkit-transform: scaleX(.3); -moz-transform: scaleX(.3); -ms-transform: scaleX(.3); -o-transform: scaleX(.3); transform: scaleX(.3)}
.MJXp-scale4 {-webkit-transform: scaleX(.4); -moz-transform: scaleX(.4); -ms-transform: scaleX(.4); -o-transform: scaleX(.4); transform: scaleX(.4)}
.MJXp-scale5 {-webkit-transform: scaleX(.5); -moz-transform: scaleX(.5); -ms-transform: scaleX(.5); -o-transform: scaleX(.5); transform: scaleX(.5)}
.MJXp-scale6 {-webkit-transform: scaleX(.6); -moz-transform: scaleX(.6); -ms-transform: scaleX(.6); -o-transform: scaleX(.6); transform: scaleX(.6)}
.MJXp-scale7 {-webkit-transform: scaleX(.7); -moz-transform: scaleX(.7); -ms-transform: scaleX(.7); -o-transform: scaleX(.7); transform: scaleX(.7)}
.MJXp-scale8 {-webkit-transform: scaleX(.8); -moz-transform: scaleX(.8); -ms-transform: scaleX(.8); -o-transform: scaleX(.8); transform: scaleX(.8)}
.MJXp-scale9 {-webkit-transform: scaleX(.9); -moz-transform: scaleX(.9); -ms-transform: scaleX(.9); -o-transform: scaleX(.9); transform: scaleX(.9)}
.MathJax_PHTML .noError {vertical-align: ; font-size: 90%; text-align: left; color: black; padding: 1px 3px; border: 1px solid}
</style><style type="text/css">.mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0}
.MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0}
.mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table}
.mjx-full-width {text-align: center; display: table-cell!important; width: 10000em}
.mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0}
.mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left}
.mjx-numerator {display: block; text-align: center}
.mjx-denominator {display: block; text-align: center}
.MJXc-stacked {height: 0; position: relative}
.MJXc-stacked > * {position: absolute}
.MJXc-bevelled > * {display: inline-block}
.mjx-stack {display: inline-block}
.mjx-op {display: block}
.mjx-under {display: table-cell}
.mjx-over {display: block}
.mjx-over > * {padding-left: 0px!important; padding-right: 0px!important}
.mjx-under > * {padding-left: 0px!important; padding-right: 0px!important}
.mjx-stack > .mjx-sup {display: block}
.mjx-stack > .mjx-sub {display: block}
.mjx-prestack > .mjx-presup {display: block}
.mjx-prestack > .mjx-presub {display: block}
.mjx-delim-h > .mjx-char {display: inline-block}
.mjx-surd {vertical-align: top}
.mjx-mphantom * {visibility: hidden}
.mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%}
.mjx-annotation-xml {line-height: normal}
.mjx-menclose > svg {fill: none; stroke: currentColor}
.mjx-mtr {display: table-row}
.mjx-mlabeledtr {display: table-row}
.mjx-mtd {display: table-cell; text-align: center}
.mjx-label {display: table-row}
.mjx-box {display: inline-block}
.mjx-block {display: block}
.mjx-span {display: inline}
.mjx-char {display: block; white-space: pre}
.mjx-itable {display: inline-table; width: auto}
.mjx-row {display: table-row}
.mjx-cell {display: table-cell}
.mjx-table {display: table; width: 100%}
.mjx-line {display: block; height: 0}
.mjx-strut {width: 0; padding-top: 1em}
.mjx-vsize {width: 0}
.MJXc-space1 {margin-left: .167em}
.MJXc-space2 {margin-left: .222em}
.MJXc-space3 {margin-left: .278em}
.mjx-chartest {display: block; visibility: hidden; position: absolute; top: 0; line-height: normal; font-size: 500%}
.mjx-chartest .mjx-char {display: inline}
.mjx-chartest .mjx-box {padding-top: 1000px}
.MJXc-processing {visibility: hidden; position: fixed; width: 0; height: 0; overflow: hidden}
.MJXc-processed {display: none}
.mjx-test {font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; text-indent: 0; text-transform: none; letter-spacing: normal; word-spacing: normal; overflow: hidden; height: 1px}
.mjx-test.mjx-test-display {display: table!important}
.mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px}
.mjx-test.mjx-test-default {display: block!important; clear: both}
.mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex}
.mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left}
.mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right}
.mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0}
#MathJax_CHTML_Tooltip {background-color: InfoBackground; color: InfoText; border: 1px solid black; box-shadow: 2px 2px 5px #AAAAAA; -webkit-box-shadow: 2px 2px 5px #AAAAAA; -moz-box-shadow: 2px 2px 5px #AAAAAA; -khtml-box-shadow: 2px 2px 5px #AAAAAA; padding: 3px 4px; z-index: 401; position: absolute; left: 0; top: 0; width: auto; height: auto; display: none}
.mjx-chtml .mjx-noError {line-height: 1.2; vertical-align: ; font-size: 90%; text-align: left; color: black; padding: 1px 3px; border: 1px solid}
.MJXc-TeX-unknown-R {font-family: STIXGeneral,'Cambria Math','Arial Unicode MS',serif; font-style: normal; font-weight: normal}
.MJXc-TeX-unknown-I {font-family: STIXGeneral,'Cambria Math','Arial Unicode MS',serif; font-style: italic; font-weight: normal}
.MJXc-TeX-unknown-B {font-family: STIXGeneral,'Cambria Math','Arial Unicode MS',serif; font-style: normal; font-weight: bold}
.MJXc-TeX-unknown-BI {font-family: STIXGeneral,'Cambria Math','Arial Unicode MS',serif; font-style: italic; font-weight: bold}
.MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw}
.MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw}
.MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw}
.MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw}
.MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw}
.MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw}
.MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw}
.MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw}
.MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw}
.MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw}
.MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw}
.MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw}
.MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw}
.MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw}
.MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw}
.MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw}
.MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw}
.MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw}
.MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw}
.MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw}
.MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw}
@font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')}
@font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')}
@font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold}
@font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')}
@font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')}
@font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold}
@font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')}
@font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic}
@font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')}
@font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')}
@font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold}
@font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')}
@font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic}
@font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')}
@font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')}
@font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')}
@font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')}
@font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold}
@font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')}
@font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic}
@font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')}
@font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')}
@font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic}
@font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')}
@font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')}
@font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')}
@font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')}
@font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')}
@font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')}
@font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold}
@font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')}
</style></head>
<body><div id="MathJax_Message" style="">File failed to load: https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/jax/element/mml/optable/BasicLatin.js</div>
<div class="container">
<header class="masthead">
<h3 class="masthead-title">
<a href="https://scottroy.github.io/">statsandstuff</a>
<small class="masthead-subtitle">a blog on statistics and machine learning</small>
<div class="menu">
<nav class="menu-content">
<a href="https://scottroy.github.io/menu/about.html">About</a>
<a href="https://scottroy.github.io/menu/writing.html">Writing</a>
<a href="https://scottroy.github.io/menu/contact.html">Contact</a>
</nav>
<nav class="social-icons">
<a href="https://www.github.com/scottroy" target="_blank"><i class="fa fa-github" aria-hidden="true"></i></a>
<a href="https://www.linkedin.com/in/scott-roy/" target="_blank"><i class="fa fa-linkedin" aria-hidden="true"></i></a>
<a href="mailto:scott.michael.roy@gmail.com" target="_blank"><i class="fa fa-envelope" aria-hidden="true"></i></a>
<a href="https://scottroy.github.io/feed.xml"><i class="fa fa-rss-square" aria-hidden="true"></i></a>
</nav>
</div>
</h3>
</header>
<div class="post-container">
<h1>
Implementing a neural network in Python
</h1>
<img src="./Implementing a neural network in Python _ statsandstuff_files/backprop_prevoutput.png">
<p>In this post, I walk through implementing a basic feed forward deep neural network in Python from scratch. See <a href="https://scottroy.github.io/introduction-to-neural-networks.html">Introduction to neural networks</a> for an overview of neural networks.</p>
<p>The post is organized as follows:</p>
<ul>
<li>Predictive modeling overview</li>
<li>Training DNNs
<ul>
<li>Stochastic gradient descent</li>
<li>Forward propagation</li>
<li>Back propagation</li>
</ul>
</li>
<li>Code</li>
</ul>
<p>The <a href="https://scottroy.github.io/implementing-a-neural-network-in-python.html#predictive-modeling-overview">Predictive modeling overview</a> section discusses predictive modeling in general and how predictive models are fit. Deep neural networks are a type of predictive model and are fit like other predictive models. The section <a href="https://scottroy.github.io/implementing-a-neural-network-in-python.html#training-dnns">Training DNNs</a> goes over computing derivatives of the loss function with respect to a DNN’s parameters. Finally the code is given in section <a href="https://scottroy.github.io/implementing-a-neural-network-in-python.html#code">Code</a>.</p>
<h2 id="predictive-modeling-overview">Predictive modeling overview</h2>
<p>A DNN is a type of <em>predictive model</em> and so before we discuss training DNNs in particular, let’s briefly go over what predictive models are and how they are fit. The basic task in predictive modelling is given data <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-1"><span class="MJXp-mo" id="MJXp-Span-2" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-msubsup" id="MJXp-Span-3"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-4" style="margin-right: 0.05em;">x</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-5" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-6">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-7">i</span><span class="MJXp-mo" id="MJXp-Span-8">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-9" style="margin-left: 0em; margin-right: 0.222em;">,</span><span class="MJXp-msubsup" id="MJXp-Span-10"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-11" style="margin-right: 0.05em;">y</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-12" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-13">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-14">i</span><span class="MJXp-mo" id="MJXp-Span-15">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-16" style="margin-left: 0em; margin-right: 0em;">)</span></span></span><span id="MathJax-Element-1-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-1" class="mjx-math"><span id="MJXc-Node-2" class="mjx-mrow"><span id="MJXc-Node-3" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-4" class="mjx-msubsup"><span class="mjx-base"><span id="MJXc-Node-5" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.297em;">x</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span id="MJXc-Node-6" class="mjx-texatom" style=""><span id="MJXc-Node-7" class="mjx-mrow"><span id="MJXc-Node-8" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-9" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">i</span></span><span id="MJXc-Node-10" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><span id="MJXc-Node-11" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.145em; padding-bottom: 0.543em;">,</span></span><span id="MJXc-Node-12" class="mjx-msubsup MJXc-space1"><span class="mjx-base" style="margin-right: -0.006em;"><span id="MJXc-Node-13" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.494em; padding-right: 0.006em;">y</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0.082em; padding-right: 0.071em;"><span id="MJXc-Node-14" class="mjx-texatom" style=""><span id="MJXc-Node-15" class="mjx-mrow"><span id="MJXc-Node-16" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-17" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">i</span></span><span id="MJXc-Node-18" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><span id="MJXc-Node-19" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span><script type="math/tex" id="MathJax-Element-1">(x^{(i)}, y^{(i)})</script> consisting of <em>features</em> <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-17"><span class="MJXp-msubsup" id="MJXp-Span-18"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-19" style="margin-right: 0.05em;">x</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-20" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-21">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-22">i</span><span class="MJXp-mo" id="MJXp-Span-23">)</span></span></span></span></span><span id="MathJax-Element-2-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-20" class="mjx-math"><span id="MJXc-Node-21" class="mjx-mrow"><span id="MJXc-Node-22" class="mjx-msubsup"><span class="mjx-base"><span id="MJXc-Node-23" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.297em;">x</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span id="MJXc-Node-24" class="mjx-texatom" style=""><span id="MJXc-Node-25" class="mjx-mrow"><span id="MJXc-Node-26" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-27" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">i</span></span><span id="MJXc-Node-28" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span></span></span></span><script type="math/tex" id="MathJax-Element-2">x^{(i)}</script> and <em>labels</em> <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-24"><span class="MJXp-msubsup" id="MJXp-Span-25"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-26" style="margin-right: 0.05em;">y</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-27" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-28">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-29">i</span><span class="MJXp-mo" id="MJXp-Span-30">)</span></span></span></span></span><span id="MathJax-Element-3-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-29" class="mjx-math"><span id="MJXc-Node-30" class="mjx-mrow"><span id="MJXc-Node-31" class="mjx-msubsup"><span class="mjx-base" style="margin-right: -0.006em;"><span id="MJXc-Node-32" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.494em; padding-right: 0.006em;">y</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0.082em; padding-right: 0.071em;"><span id="MJXc-Node-33" class="mjx-texatom" style=""><span id="MJXc-Node-34" class="mjx-mrow"><span id="MJXc-Node-35" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-36" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">i</span></span><span id="MJXc-Node-37" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span></span></span></span><script type="math/tex" id="MathJax-Element-3">y^{(i)}</script>, ‘‘learn’’ a model function <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-31"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-32">f</span></span></span><span id="MathJax-Element-4-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-38" class="mjx-math"><span id="MJXc-Node-39" class="mjx-mrow"><span id="MJXc-Node-40" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.494em; padding-right: 0.06em;">f</span></span></span></span></span><script type="math/tex" id="MathJax-Element-4">f</script> such that <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-33"><span class="MJXp-msubsup" id="MJXp-Span-34"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-35" style="margin-right: 0.05em;">y</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-36" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-37">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-38">i</span><span class="MJXp-mo" id="MJXp-Span-39">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-40" style="margin-left: 0.333em; margin-right: 0.333em;"></span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-41">f</span><span class="MJXp-mo" id="MJXp-Span-42" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-msubsup" id="MJXp-Span-43"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-44" style="margin-right: 0.05em;">x</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-45" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-46">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-47">i</span><span class="MJXp-mo" id="MJXp-Span-48">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-49" style="margin-left: 0em; margin-right: 0em;">)</span></span></span><span id="MathJax-Element-5-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-41" class="mjx-math"><span id="MJXc-Node-42" class="mjx-mrow"><span id="MJXc-Node-43" class="mjx-msubsup"><span class="mjx-base" style="margin-right: -0.006em;"><span id="MJXc-Node-44" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.494em; padding-right: 0.006em;">y</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0.082em; padding-right: 0.071em;"><span id="MJXc-Node-45" class="mjx-texatom" style=""><span id="MJXc-Node-46" class="mjx-mrow"><span id="MJXc-Node-47" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-48" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">i</span></span><span id="MJXc-Node-49" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><span id="MJXc-Node-50" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.199em; padding-bottom: 0.297em;"></span></span><span id="MJXc-Node-51" class="mjx-mi MJXc-space3"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.494em; padding-right: 0.06em;">f</span></span><span id="MJXc-Node-52" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-53" class="mjx-msubsup"><span class="mjx-base"><span id="MJXc-Node-54" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.297em;">x</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span id="MJXc-Node-55" class="mjx-texatom" style=""><span id="MJXc-Node-56" class="mjx-mrow"><span id="MJXc-Node-57" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-58" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">i</span></span><span id="MJXc-Node-59" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><span id="MJXc-Node-60" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span><script type="math/tex" id="MathJax-Element-5">y^{(i)} \approx f(x^{(i)})</script>. More precisely, we want the model that “best” satisfies <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-50"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-51">f</span><span class="MJXp-mo" id="MJXp-Span-52" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-msubsup" id="MJXp-Span-53"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-54" style="margin-right: 0.05em;">x</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-55" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-56">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-57">i</span><span class="MJXp-mo" id="MJXp-Span-58">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-59" style="margin-left: 0em; margin-right: 0em;">)</span><span class="MJXp-mo" id="MJXp-Span-60" style="margin-left: 0.333em; margin-right: 0.333em;"></span><span class="MJXp-msubsup" id="MJXp-Span-61"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-62" style="margin-right: 0.05em;">y</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-63" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-64">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-65">i</span><span class="MJXp-mo" id="MJXp-Span-66">)</span></span></span></span></span><span id="MathJax-Element-6-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-61" class="mjx-math"><span id="MJXc-Node-62" class="mjx-mrow"><span id="MJXc-Node-63" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.494em; padding-right: 0.06em;">f</span></span><span id="MJXc-Node-64" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-65" class="mjx-msubsup"><span class="mjx-base"><span id="MJXc-Node-66" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.297em;">x</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span id="MJXc-Node-67" class="mjx-texatom" style=""><span id="MJXc-Node-68" class="mjx-mrow"><span id="MJXc-Node-69" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-70" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">i</span></span><span id="MJXc-Node-71" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><span id="MJXc-Node-72" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span><span id="MJXc-Node-73" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.199em; padding-bottom: 0.297em;"></span></span><span id="MJXc-Node-74" class="mjx-msubsup MJXc-space3"><span class="mjx-base" style="margin-right: -0.006em;"><span id="MJXc-Node-75" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.494em; padding-right: 0.006em;">y</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0.082em; padding-right: 0.071em;"><span id="MJXc-Node-76" class="mjx-texatom" style=""><span id="MJXc-Node-77" class="mjx-mrow"><span id="MJXc-Node-78" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-79" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">i</span></span><span id="MJXc-Node-80" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span></span></span></span><script type="math/tex" id="MathJax-Element-6">f(x^{(i)}) \approx y^{(i)}</script> for all training data <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-67"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-68">i</span><span class="MJXp-mo" id="MJXp-Span-69" style="margin-left: 0.333em; margin-right: 0.333em;"></span><span class="MJXp-mo" id="MJXp-Span-70" style="margin-left: 0em; margin-right: 0em;">{</span><span class="MJXp-mn" id="MJXp-Span-71">1</span><span class="MJXp-mo" id="MJXp-Span-72" style="margin-left: 0em; margin-right: 0.222em;">,</span><span class="MJXp-mo" id="MJXp-Span-73" style="margin-left: 0em; margin-right: 0em;"></span><span class="MJXp-mo" id="MJXp-Span-74" style="margin-left: 0em; margin-right: 0.222em;">,</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-75">N</span><span class="MJXp-mo" id="MJXp-Span-76" style="margin-left: 0em; margin-right: 0em;">}</span></span></span><span id="MathJax-Element-7-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-81" class="mjx-math"><span id="MJXc-Node-82" class="mjx-mrow"><span id="MJXc-Node-83" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">i</span></span><span id="MJXc-Node-84" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.248em; padding-bottom: 0.396em;"></span></span><span id="MJXc-Node-85" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">{</span></span><span id="MJXc-Node-86" class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">1</span></span><span id="MJXc-Node-87" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.145em; padding-bottom: 0.543em;">,</span></span><span id="MJXc-Node-88" class="mjx-mo MJXc-space1"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.145em; padding-bottom: 0.347em;"></span></span><span id="MJXc-Node-89" class="mjx-mo MJXc-space1"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.145em; padding-bottom: 0.543em;">,</span></span><span id="MJXc-Node-90" class="mjx-mi MJXc-space1"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em; padding-right: 0.085em;">N</span></span><span id="MJXc-Node-91" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">}</span></span></span></span></span><script type="math/tex" id="MathJax-Element-7">i \in \{1, \ldots, N\}</script>, where best is defined with respect to a <em>loss function</em>. For each mistake where <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-77"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-78">f</span><span class="MJXp-mo" id="MJXp-Span-79" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-msubsup" id="MJXp-Span-80"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-81" style="margin-right: 0.05em;">x</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-82" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-83">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-84">i</span><span class="MJXp-mo" id="MJXp-Span-85">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-86" style="margin-left: 0em; margin-right: 0em;">)</span></span></span><span id="MathJax-Element-8-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-92" class="mjx-math"><span id="MJXc-Node-93" class="mjx-mrow"><span id="MJXc-Node-94" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.494em; padding-right: 0.06em;">f</span></span><span id="MJXc-Node-95" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-96" class="mjx-msubsup"><span class="mjx-base"><span id="MJXc-Node-97" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.297em;">x</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span id="MJXc-Node-98" class="mjx-texatom" style=""><span id="MJXc-Node-99" class="mjx-mrow"><span id="MJXc-Node-100" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-101" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">i</span></span><span id="MJXc-Node-102" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><span id="MJXc-Node-103" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span><script type="math/tex" id="MathJax-Element-8">f(x^{(i)})</script> is not <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-87"><span class="MJXp-msubsup" id="MJXp-Span-88"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-89" style="margin-right: 0.05em;">y</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-90" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-91">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-92">i</span><span class="MJXp-mo" id="MJXp-Span-93">)</span></span></span></span></span><span id="MathJax-Element-9-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-104" class="mjx-math"><span id="MJXc-Node-105" class="mjx-mrow"><span id="MJXc-Node-106" class="mjx-msubsup"><span class="mjx-base" style="margin-right: -0.006em;"><span id="MJXc-Node-107" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.494em; padding-right: 0.006em;">y</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0.082em; padding-right: 0.071em;"><span id="MJXc-Node-108" class="mjx-texatom" style=""><span id="MJXc-Node-109" class="mjx-mrow"><span id="MJXc-Node-110" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-111" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">i</span></span><span id="MJXc-Node-112" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span></span></span></span><script type="math/tex" id="MathJax-Element-9">y^{(i)}</script>, some loss <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-94"><span class="MJXp-msubsup" id="MJXp-Span-95"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-96" style="margin-right: 0.05em;"></span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-97" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-98">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-99">i</span><span class="MJXp-mo" id="MJXp-Span-100">)</span></span></span></span></span><span id="MathJax-Element-10-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-113" class="mjx-math"><span id="MJXc-Node-114" class="mjx-mrow"><span id="MJXc-Node-115" class="mjx-msubsup"><span class="mjx-base"><span id="MJXc-Node-116" class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;"></span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span id="MJXc-Node-117" class="mjx-texatom" style=""><span id="MJXc-Node-118" class="mjx-mrow"><span id="MJXc-Node-119" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-120" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">i</span></span><span id="MJXc-Node-121" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span></span></span></span><script type="math/tex" id="MathJax-Element-10">\ell^{(i)}</script> is incurred, e.g., <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-101"><span class="MJXp-msubsup" id="MJXp-Span-102"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-103" style="margin-right: 0.05em;"></span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-104" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-105">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-106">i</span><span class="MJXp-mo" id="MJXp-Span-107">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-108" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-mo" id="MJXp-Span-109" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-110">f</span><span class="MJXp-mo" id="MJXp-Span-111" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-msubsup" id="MJXp-Span-112"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-113" style="margin-right: 0.05em;">x</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-114" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-115">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-116">i</span><span class="MJXp-mo" id="MJXp-Span-117">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-118" style="margin-left: 0em; margin-right: 0em;">)</span><span class="MJXp-mo" id="MJXp-Span-119" style="margin-left: 0.267em; margin-right: 0.267em;"></span><span class="MJXp-msubsup" id="MJXp-Span-120"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-121" style="margin-right: 0.05em;">y</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-122" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-123">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-124">i</span><span class="MJXp-mo" id="MJXp-Span-125">)</span></span></span><span class="MJXp-msubsup" id="MJXp-Span-126"><span class="MJXp-mo" id="MJXp-Span-127" style="margin-left: 0em; margin-right: 0.05em;">)</span><span class="MJXp-mn MJXp-script" id="MJXp-Span-128" style="vertical-align: 0.5em;">2</span></span></span></span><span id="MathJax-Element-11-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-122" class="mjx-math"><span id="MJXc-Node-123" class="mjx-mrow"><span id="MJXc-Node-124" class="mjx-msubsup"><span class="mjx-base"><span id="MJXc-Node-125" class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;"></span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span id="MJXc-Node-126" class="mjx-texatom" style=""><span id="MJXc-Node-127" class="mjx-mrow"><span id="MJXc-Node-128" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-129" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">i</span></span><span id="MJXc-Node-130" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><span id="MJXc-Node-131" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.101em; padding-bottom: 0.297em;">=</span></span><span id="MJXc-Node-132" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-133" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.494em; padding-right: 0.06em;">f</span></span><span id="MJXc-Node-134" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-135" class="mjx-msubsup"><span class="mjx-base"><span id="MJXc-Node-136" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.297em;">x</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span id="MJXc-Node-137" class="mjx-texatom" style=""><span id="MJXc-Node-138" class="mjx-mrow"><span id="MJXc-Node-139" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-140" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">i</span></span><span id="MJXc-Node-141" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><span id="MJXc-Node-142" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span><span id="MJXc-Node-143" class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.297em; padding-bottom: 0.445em;"></span></span><span id="MJXc-Node-144" class="mjx-msubsup MJXc-space2"><span class="mjx-base" style="margin-right: -0.006em;"><span id="MJXc-Node-145" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.494em; padding-right: 0.006em;">y</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0.082em; padding-right: 0.071em;"><span id="MJXc-Node-146" class="mjx-texatom" style=""><span id="MJXc-Node-147" class="mjx-mrow"><span id="MJXc-Node-148" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-149" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">i</span></span><span id="MJXc-Node-150" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><span id="MJXc-Node-151" class="mjx-msubsup"><span class="mjx-base"><span id="MJXc-Node-152" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span id="MJXc-Node-153" class="mjx-mn" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">2</span></span></span></span></span></span></span><script type="math/tex" id="MathJax-Element-11">\ell^{(i)} = ( f(x^{(i)}) - y^{(i)} )^2</script> might be the square error. The average loss on the dataset is</p>
<span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math MJXp-display" id="MJXp-Span-129"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-130"></span><span class="MJXp-mo" id="MJXp-Span-131" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-mo" id="MJXp-Span-132" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-mn" id="MJXp-Span-133">1</span><span class="MJXp-mrow" id="MJXp-Span-134"><span class="MJXp-mo" id="MJXp-Span-135" style="margin-left: 0.111em; margin-right: 0.111em;">/</span></span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-136">N</span><span class="MJXp-mo" id="MJXp-Span-137" style="margin-left: 0em; margin-right: 0em;">)</span><span class="MJXp-mo" id="MJXp-Span-138" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-msubsup" id="MJXp-Span-139"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-140" style="margin-right: 0.05em;"></span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-141" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-142">(</span><span class="MJXp-mn" id="MJXp-Span-143">1</span><span class="MJXp-mo" id="MJXp-Span-144">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-145" style="margin-left: 0.267em; margin-right: 0.267em;">+</span><span class="MJXp-msubsup" id="MJXp-Span-146"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-147" style="margin-right: 0.05em;"></span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-148" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-149">(</span><span class="MJXp-mn" id="MJXp-Span-150">2</span><span class="MJXp-mo" id="MJXp-Span-151">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-152" style="margin-left: 0.267em; margin-right: 0.267em;">+</span><span class="MJXp-mo" id="MJXp-Span-153" style="margin-left: 0em; margin-right: 0em;"></span><span class="MJXp-mo" id="MJXp-Span-154" style="margin-left: 0.267em; margin-right: 0.267em;">+</span><span class="MJXp-msubsup" id="MJXp-Span-155"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-156" style="margin-right: 0.05em;"></span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-157" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-158">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-159">N</span><span class="MJXp-mo" id="MJXp-Span-160">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-161" style="margin-left: 0em; margin-right: 0em;">)</span><span class="MJXp-mo" id="MJXp-Span-162" style="margin-left: 0em; margin-right: 0.222em;">.</span></span></span><span class="mjx-chtml MJXc-display MJXc-processed" style="text-align: center;"><span id="MathJax-Element-12-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 113%; text-align: center;"><span id="MJXc-Node-154" class="mjx-math"><span id="MJXc-Node-155" class="mjx-mrow"><span id="MJXc-Node-156" class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;"></span></span><span id="MJXc-Node-157" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.101em; padding-bottom: 0.297em;">=</span></span><span id="MJXc-Node-158" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-159" class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">1</span></span><span id="MJXc-Node-160" class="mjx-texatom"><span id="MJXc-Node-161" class="mjx-mrow"><span id="MJXc-Node-162" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">/</span></span></span></span><span id="MJXc-Node-163" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em; padding-right: 0.085em;">N</span></span><span id="MJXc-Node-164" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span><span id="MJXc-Node-165" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-166" class="mjx-msubsup"><span class="mjx-base"><span id="MJXc-Node-167" class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;"></span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.584em; padding-left: 0px; padding-right: 0.071em;"><span id="MJXc-Node-168" class="mjx-texatom" style=""><span id="MJXc-Node-169" class="mjx-mrow"><span id="MJXc-Node-170" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-171" class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">1</span></span><span id="MJXc-Node-172" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><span id="MJXc-Node-173" class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.297em; padding-bottom: 0.445em;">+</span></span><span id="MJXc-Node-174" class="mjx-msubsup MJXc-space2"><span class="mjx-base"><span id="MJXc-Node-175" class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;"></span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.584em; padding-left: 0px; padding-right: 0.071em;"><span id="MJXc-Node-176" class="mjx-texatom" style=""><span id="MJXc-Node-177" class="mjx-mrow"><span id="MJXc-Node-178" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-179" class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">2</span></span><span id="MJXc-Node-180" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><span id="MJXc-Node-181" class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.297em; padding-bottom: 0.445em;">+</span></span><span id="MJXc-Node-182" class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.145em; padding-bottom: 0.347em;"></span></span><span id="MJXc-Node-183" class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.297em; padding-bottom: 0.445em;">+</span></span><span id="MJXc-Node-184" class="mjx-msubsup MJXc-space2"><span class="mjx-base"><span id="MJXc-Node-185" class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;"></span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.584em; padding-left: 0px; padding-right: 0.071em;"><span id="MJXc-Node-186" class="mjx-texatom" style=""><span id="MJXc-Node-187" class="mjx-mrow"><span id="MJXc-Node-188" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-189" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em; padding-right: 0.085em;">N</span></span><span id="MJXc-Node-190" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><span id="MJXc-Node-191" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span><span id="MJXc-Node-192" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.145em; padding-bottom: 0.347em;">.</span></span></span></span></span></span><script type="math/tex; mode=display" id="MathJax-Element-12">\ell = (1 / N) (\ell^{(1)} + \ell^{(2)} + \ldots + \ell^{(N)}).</script>
<p>Minimizing average loss on a <em>particular</em> dataset is usually not the goal (in fact, we can achieve zero loss by just “memorizing” the dataset). What we really care about solving is</p>
<span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math MJXp-display" id="MJXp-Span-166"><span class="MJXp-munderover" id="MJXp-Span-167"><span class=""><span class="MJXp-mo" id="MJXp-Span-168" style="margin-left: 0.333em; margin-right: 0.333em;">min</span></span><span class=" MJXp-script"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-169" style="margin-left: 0px;">f</span></span></span><span class="MJXp-mtext" id="MJXp-Span-170">&nbsp;</span><span class="MJXp-msubsup" id="MJXp-Span-171"><span class="MJXp-mrow" id="MJXp-Span-172" style="margin-right: 0.05em;"><span class="MJXp-mtext MJXp-bold" id="MJXp-Span-173">E</span></span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-174" style="vertical-align: -0.4em;"><span class="MJXp-mo" id="MJXp-Span-175">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-176">x</span><span class="MJXp-mo" id="MJXp-Span-177">,</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-178">y</span><span class="MJXp-mo" id="MJXp-Span-179">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-180" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-181"></span><span class="MJXp-mo" id="MJXp-Span-182" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-183">f</span><span class="MJXp-mo" id="MJXp-Span-184" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-185">x</span><span class="MJXp-mo" id="MJXp-Span-186" style="margin-left: 0em; margin-right: 0em;">)</span><span class="MJXp-mo" id="MJXp-Span-187" style="margin-left: 0em; margin-right: 0.222em;">,</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-188">y</span><span class="MJXp-mo" id="MJXp-Span-189" style="margin-left: 0em; margin-right: 0em;">)</span><span class="MJXp-mo" id="MJXp-Span-190" style="margin-left: 0em; margin-right: 0em;">)</span><span class="MJXp-mo" id="MJXp-Span-191" style="margin-left: 0em; margin-right: 0.222em;">,</span></span></span><span class="mjx-chtml MJXc-display MJXc-processed" style="text-align: center;"><span id="MathJax-Element-13-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 113%; text-align: center;"><span id="MJXc-Node-193" class="mjx-math"><span id="MJXc-Node-194" class="mjx-mrow"><span id="MJXc-Node-195" class="mjx-munderover"><span class="mjx-itable"><span class="mjx-row"><span class="mjx-cell"><span class="mjx-op"><span id="MJXc-Node-196" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">min</span></span></span></span></span><span class="mjx-row"><span class="mjx-under" style="font-size: 70.7%; padding-top: 0.236em; padding-bottom: 0.141em; padding-left: 0.904em;"><span id="MJXc-Node-197" class="mjx-mi" style=""><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.494em; padding-right: 0.06em;">f</span></span></span></span></span></span><span id="MJXc-Node-198" class="mjx-mtext MJXc-space1"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.293em; padding-bottom: 0.347em;">&nbsp;</span></span><span id="MJXc-Node-199" class="mjx-msubsup"><span class="mjx-base"><span id="MJXc-Node-200" class="mjx-texatom"><span id="MJXc-Node-201" class="mjx-mrow"><span id="MJXc-Node-202" class="mjx-mtext"><span class="mjx-char MJXc-TeX-main-B" style="padding-top: 0.347em; padding-bottom: 0.347em;">E</span></span></span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.275em; padding-right: 0.071em;"><span id="MJXc-Node-203" class="mjx-texatom" style=""><span id="MJXc-Node-204" class="mjx-mrow"><span id="MJXc-Node-205" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-206" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.297em;">x</span></span><span id="MJXc-Node-207" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.145em; padding-bottom: 0.543em;">,</span></span><span id="MJXc-Node-208" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.494em; padding-right: 0.006em;">y</span></span><span id="MJXc-Node-209" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><span id="MJXc-Node-210" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-211" class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;"></span></span><span id="MJXc-Node-212" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-213" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.494em; padding-right: 0.06em;">f</span></span><span id="MJXc-Node-214" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-215" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.297em;">x</span></span><span id="MJXc-Node-216" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span><span id="MJXc-Node-217" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.145em; padding-bottom: 0.543em;">,</span></span><span id="MJXc-Node-218" class="mjx-mi MJXc-space1"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.494em; padding-right: 0.006em;">y</span></span><span id="MJXc-Node-219" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span><span id="MJXc-Node-220" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span><span id="MJXc-Node-221" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.145em; padding-bottom: 0.543em;">,</span></span></span></span></span></span><script type="math/tex; mode=display" id="MathJax-Element-13">\min_f \ \textbf{E}_{(x,y)}(\ell(f(x), y)),</script>
<p>where the expectation is taken over the data distribution <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-192"><span class="MJXp-mo" id="MJXp-Span-193" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-194">x</span><span class="MJXp-mo" id="MJXp-Span-195" style="margin-left: 0em; margin-right: 0.222em;">,</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-196">y</span><span class="MJXp-mo" id="MJXp-Span-197" style="margin-left: 0em; margin-right: 0em;">)</span></span></span><span id="MathJax-Element-14-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-222" class="mjx-math"><span id="MJXc-Node-223" class="mjx-mrow"><span id="MJXc-Node-224" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-225" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.297em;">x</span></span><span id="MJXc-Node-226" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.145em; padding-bottom: 0.543em;">,</span></span><span id="MJXc-Node-227" class="mjx-mi MJXc-space1"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.494em; padding-right: 0.006em;">y</span></span><span id="MJXc-Node-228" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span><script type="math/tex" id="MathJax-Element-14">(x, y)</script>. The optimal model is called the <em>Bayes model</em> and the corresponding loss is called the <em>Bayes error</em>. The Bayes error is a hard limit on how well we can predict a response <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-198"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-199">y</span></span></span><span id="MathJax-Element-15-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-229" class="mjx-math"><span id="MJXc-Node-230" class="mjx-mrow"><span id="MJXc-Node-231" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.494em; padding-right: 0.006em;">y</span></span></span></span></span><script type="math/tex" id="MathJax-Element-15">y</script> from features <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-200"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-201">x</span></span></span><span id="MathJax-Element-16-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-232" class="mjx-math"><span id="MJXc-Node-233" class="mjx-mrow"><span id="MJXc-Node-234" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.297em;">x</span></span></span></span></span><script type="math/tex" id="MathJax-Element-16">x</script> with respect to a loss <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-202"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-203"></span></span></span><span id="MathJax-Element-17-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-235" class="mjx-math"><span id="MJXc-Node-236" class="mjx-mrow"><span id="MJXc-Node-237" class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;"></span></span></span></span></span><script type="math/tex" id="MathJax-Element-17">\ell</script> and is usually unknown. For some tasks like object detection or speech recognition, the Bayes error is near zero because humans can do these tasks with near zero error. On the other hand, predicting if a borrower will default on a loan given a few characteristics like the loan amount, income, and credit score has a higher Bayes error. We can improve the Bayes error by using more informative features.
(As an aside, for a regression problem with square loss, the Bayes regressor is the conditional expectation <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-204"><span class="MJXp-mrow" id="MJXp-Span-205"><span class="MJXp-mtext MJXp-bold" id="MJXp-Span-206">E</span></span><span class="MJXp-mo" id="MJXp-Span-207" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-208">y</span><span class="MJXp-mo" id="MJXp-Span-209" style="margin-left: 0.167em; margin-right: 0.167em;">|</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-210">x</span><span class="MJXp-mo" id="MJXp-Span-211" style="margin-left: 0em; margin-right: 0em;">)</span></span></span><span id="MathJax-Element-18-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-238" class="mjx-math"><span id="MJXc-Node-239" class="mjx-mrow"><span id="MJXc-Node-240" class="mjx-texatom"><span id="MJXc-Node-241" class="mjx-mrow"><span id="MJXc-Node-242" class="mjx-mtext"><span class="mjx-char MJXc-TeX-main-B" style="padding-top: 0.347em; padding-bottom: 0.347em;">E</span></span></span></span><span id="MJXc-Node-243" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-244" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.494em; padding-right: 0.006em;">y</span></span><span id="MJXc-Node-245" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">|</span></span><span id="MJXc-Node-246" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.297em;">x</span></span><span id="MJXc-Node-247" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span><script type="math/tex" id="MathJax-Element-18">\textbf{E}(y \vert x)</script> and the Bayes error is the conditional variance <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-212"><span class="MJXp-mrow" id="MJXp-Span-213"><span class="MJXp-mtext MJXp-bold" id="MJXp-Span-214">Var</span></span><span class="MJXp-mo" id="MJXp-Span-215" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-216">y</span><span class="MJXp-mo" id="MJXp-Span-217" style="margin-left: 0.167em; margin-right: 0.167em;">|</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-218">x</span><span class="MJXp-mo" id="MJXp-Span-219" style="margin-left: 0em; margin-right: 0em;">)</span></span></span><span id="MathJax-Element-19-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-248" class="mjx-math"><span id="MJXc-Node-249" class="mjx-mrow"><span id="MJXc-Node-250" class="mjx-texatom"><span id="MJXc-Node-251" class="mjx-mrow"><span id="MJXc-Node-252" class="mjx-mtext"><span class="mjx-char MJXc-TeX-main-B" style="padding-top: 0.396em; padding-bottom: 0.396em;">Var</span></span></span></span><span id="MJXc-Node-253" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-254" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.494em; padding-right: 0.006em;">y</span></span><span id="MJXc-Node-255" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">|</span></span><span id="MJXc-Node-256" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.297em;">x</span></span><span id="MJXc-Node-257" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span><script type="math/tex" id="MathJax-Element-19">\textbf{Var}(y \vert x)</script>. Regression modeling therefore reduces to efficiently estimating/learning the conditional expectation.)</p>
<p>For tractability, most machine learning and statistics (including deep learning) is parametric. This means we restrict our model to lie in a parametrized class <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-220"><span class="MJXp-mrow" id="MJXp-Span-221"><span class="MJXp-mi MJXp-cal" id="MJXp-Span-222">F</span></span><span class="MJXp-mo" id="MJXp-Span-223" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-mo" id="MJXp-Span-224" style="margin-left: 0em; margin-right: 0em;">{</span><span class="MJXp-msubsup" id="MJXp-Span-225"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-226" style="margin-right: 0.05em;">f</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-227" style="vertical-align: -0.4em;"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-228">θ</span></span></span><span class="MJXp-mo" id="MJXp-Span-229" style="margin-left: 0.111em; margin-right: 0.167em;">:</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-230">θ</span><span class="MJXp-mo" id="MJXp-Span-231" style="margin-left: 0.333em; margin-right: 0.333em;"></span><span class="MJXp-mi" id="MJXp-Span-232">Θ</span><span class="MJXp-mo" id="MJXp-Span-233" style="margin-left: 0em; margin-right: 0em;">}</span></span></span><span id="MathJax-Element-20-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-258" class="mjx-math"><span id="MJXc-Node-259" class="mjx-mrow"><span id="MJXc-Node-260" class="mjx-texatom"><span id="MJXc-Node-261" class="mjx-mrow"><span id="MJXc-Node-262" class="mjx-mi"><span class="mjx-char MJXc-TeX-cal-R" style="padding-top: 0.445em; padding-bottom: 0.347em; padding-right: 0.11em;">F</span></span></span></span><span id="MJXc-Node-263" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.101em; padding-bottom: 0.297em;">=</span></span><span id="MJXc-Node-264" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">{</span></span><span id="MJXc-Node-265" class="mjx-msubsup"><span class="mjx-base" style="margin-right: -0.06em;"><span id="MJXc-Node-266" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.494em; padding-right: 0.06em;">f</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.23em; padding-right: 0.071em;"><span id="MJXc-Node-267" class="mjx-texatom" style=""><span id="MJXc-Node-268" class="mjx-mrow"><span id="MJXc-Node-269" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.297em;">θ</span></span></span></span></span></span><span id="MJXc-Node-270" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.15em; padding-bottom: 0.347em;">:</span></span><span id="MJXc-Node-271" class="mjx-mi MJXc-space3"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.297em;">θ</span></span><span id="MJXc-Node-272" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.248em; padding-bottom: 0.396em;"></span></span><span id="MJXc-Node-273" class="mjx-mi MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">Θ</span></span><span id="MJXc-Node-274" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">}</span></span></span></span></span><script type="math/tex" id="MathJax-Element-20">\mathcal{F} = \{ f_{\theta} : \theta \in \Theta\}</script> (e.g., all linear functions or all neural networks of a given architecture). We also minimize loss over a sample of data. These simplifications lead to <em>model class error</em> and <em>sample error</em>:</p>
<ul>
<li>
<p>Finding the best model in <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-234"><span class="MJXp-mrow" id="MJXp-Span-235"><span class="MJXp-mi MJXp-cal" id="MJXp-Span-236">F</span></span></span></span><span id="MathJax-Element-21-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-275" class="mjx-math"><span id="MJXc-Node-276" class="mjx-mrow"><span id="MJXc-Node-277" class="mjx-texatom"><span id="MJXc-Node-278" class="mjx-mrow"><span id="MJXc-Node-279" class="mjx-mi"><span class="mjx-char MJXc-TeX-cal-R" style="padding-top: 0.445em; padding-bottom: 0.347em; padding-right: 0.11em;">F</span></span></span></span></span></span></span><script type="math/tex" id="MathJax-Element-21">\mathcal{F}</script> instead of the best model overall leads to model class error. Model class error can be improved by using a more complicated model class. Note that if a simple model already achieves loss close to the Bayes error, using a more complicated model won’t help much.</p>
</li>
<li>
<p>Training on a sample of data instead of an infinite population leads to sample error and jeopardizes generalizability. Sample error is usually addressed with training on more data or using regularization.</p>
</li>
</ul>
<p>After these simplifications the learning problem is</p>
<span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math MJXp-display" id="MJXp-Span-237"><span class="MJXp-munderover" id="MJXp-Span-238"><span class=""><span class="MJXp-mo" id="MJXp-Span-239" style="margin-left: 0.333em; margin-right: 0.333em;">min</span></span><span class=" MJXp-script"><span class="MJXp-mrow" id="MJXp-Span-240" style="margin-left: 0px;"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-241">θ</span><span class="MJXp-mo" id="MJXp-Span-242"></span><span class="MJXp-mi" id="MJXp-Span-243">Θ</span></span></span></span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-244">J</span><span class="MJXp-mo" id="MJXp-Span-245" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-246">θ</span><span class="MJXp-mo" id="MJXp-Span-247" style="margin-left: 0em; margin-right: 0em;">)</span><span class="MJXp-mo" id="MJXp-Span-248" style="margin-left: 0.111em; margin-right: 0.167em;">:=</span><span class="MJXp-mfrac" id="MJXp-Span-249" style="vertical-align: 0.25em;"><span class="MJXp-box"><span class="MJXp-mn" id="MJXp-Span-250">1</span></span><span class="MJXp-box" style="margin-top: -0.9em;"><span class="MJXp-denom"><span><span class="MJXp-rule" style="height: 1em; border-top: none; border-bottom: 1px solid; margin: 0.1em 0px;"></span></span><span><span class="MJXp-box"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-251">N</span></span></span></span></span></span><span class="MJXp-munderover" id="MJXp-Span-252"><span><span class="MJXp-over"><span class=" MJXp-script"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-258" style="margin-right: 0px; margin-left: 0px;">N</span></span><span class=""><span class="MJXp-mo" id="MJXp-Span-253" style="margin-left: 0.111em; margin-right: 0.167em;"><span class="MJXp-largeop"></span></span></span></span></span><span class=" MJXp-script"><span class="MJXp-mrow" id="MJXp-Span-254" style="margin-left: 0px;"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-255">i</span><span class="MJXp-mo" id="MJXp-Span-256">=</span><span class="MJXp-mn" id="MJXp-Span-257">1</span></span></span></span><span class="MJXp-msubsup" id="MJXp-Span-259"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-260" style="margin-right: 0.05em;"></span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-261" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-262">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-263">i</span><span class="MJXp-mo" id="MJXp-Span-264">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-265" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-266">θ</span><span class="MJXp-mo" id="MJXp-Span-267" style="margin-left: 0em; margin-right: 0em;">)</span><span class="MJXp-mo" id="MJXp-Span-268" style="margin-left: 0.267em; margin-right: 0.267em;">+</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-269">R</span><span class="MJXp-mo" id="MJXp-Span-270" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-271">θ</span><span class="MJXp-mo" id="MJXp-Span-272" style="margin-left: 0em; margin-right: 0em;">)</span><span class="MJXp-mo" id="MJXp-Span-273" style="margin-left: 0em; margin-right: 0.222em;">.</span></span></span><span class="mjx-chtml MJXc-display MJXc-processed" style="text-align: center;"><span id="MathJax-Element-22-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 113%; text-align: center;"><span id="MJXc-Node-280" class="mjx-math"><span id="MJXc-Node-281" class="mjx-mrow"><span id="MJXc-Node-282" class="mjx-munderover"><span class="mjx-itable"><span class="mjx-row"><span class="mjx-cell"><span class="mjx-op"><span id="MJXc-Node-283" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">min</span></span></span></span></span><span class="mjx-row"><span class="mjx-under" style="font-size: 70.7%; padding-top: 0.236em; padding-bottom: 0.141em; padding-left: 0.222em;"><span id="MJXc-Node-284" class="mjx-texatom" style=""><span id="MJXc-Node-285" class="mjx-mrow"><span id="MJXc-Node-286" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.297em;">θ</span></span><span id="MJXc-Node-287" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.248em; padding-bottom: 0.396em;"></span></span><span id="MJXc-Node-288" class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">Θ</span></span></span></span></span></span></span></span><span id="MJXc-Node-289" class="mjx-mi MJXc-space1"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em; padding-right: 0.078em;">J</span></span><span id="MJXc-Node-290" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-291" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.297em;">θ</span></span><span id="MJXc-Node-292" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span><span id="MJXc-Node-293" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.101em; padding-bottom: 0.347em;">:<span class="mjx-charbox MJXc-TeX-main-R" style="padding-bottom: 0.314em;">=</span></span></span><span id="MJXc-Node-294" class="mjx-mfrac MJXc-space3"><span class="mjx-box MJXc-stacked" style="width: 1.088em; padding: 0px 0.12em;"><span class="mjx-numerator" style="width: 1.088em; top: -1.368em;"><span id="MJXc-Node-295" class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">1</span></span></span><span class="mjx-denominator" style="width: 1.088em; bottom: -0.711em;"><span id="MJXc-Node-296" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em; padding-right: 0.085em;">N</span></span></span><span class="mjx-line" style="border-bottom: 1.3px solid; top: -0.281em; width: 1.088em;"></span></span><span class="mjx-vsize" style="height: 2.078em; vertical-align: -0.711em;"></span></span><span id="MJXc-Node-297" class="mjx-munderover MJXc-space1"><span class="mjx-itable"><span class="mjx-row"><span class="mjx-cell"><span class="mjx-stack"><span class="mjx-over" style="font-size: 70.7%; padding-bottom: 0.258em; padding-top: 0.141em; padding-left: 0.577em;"><span id="MJXc-Node-304" class="mjx-mi" style=""><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em; padding-right: 0.085em;">N</span></span></span><span class="mjx-op"><span id="MJXc-Node-298" class="mjx-mo"><span class="mjx-char MJXc-TeX-size2-R" style="padding-top: 0.74em; padding-bottom: 0.74em;"></span></span></span></span></span></span><span class="mjx-row"><span class="mjx-under" style="font-size: 70.7%; padding-top: 0.236em; padding-bottom: 0.141em; padding-left: 0.21em;"><span id="MJXc-Node-299" class="mjx-texatom" style=""><span id="MJXc-Node-300" class="mjx-mrow"><span id="MJXc-Node-301" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">i</span></span><span id="MJXc-Node-302" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.101em; padding-bottom: 0.297em;">=</span></span><span id="MJXc-Node-303" class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">1</span></span></span></span></span></span></span></span><span id="MJXc-Node-305" class="mjx-msubsup MJXc-space1"><span class="mjx-base"><span id="MJXc-Node-306" class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;"></span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.584em; padding-left: 0px; padding-right: 0.071em;"><span id="MJXc-Node-307" class="mjx-texatom" style=""><span id="MJXc-Node-308" class="mjx-mrow"><span id="MJXc-Node-309" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-310" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">i</span></span><span id="MJXc-Node-311" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><span id="MJXc-Node-312" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-313" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.297em;">θ</span></span><span id="MJXc-Node-314" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span><span id="MJXc-Node-315" class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.297em; padding-bottom: 0.445em;">+</span></span><span id="MJXc-Node-316" class="mjx-mi MJXc-space2"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">R</span></span><span id="MJXc-Node-317" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-318" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.297em;">θ</span></span><span id="MJXc-Node-319" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span><span id="MJXc-Node-320" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.145em; padding-bottom: 0.347em;">.</span></span></span></span></span></span><script type="math/tex; mode=display" id="MathJax-Element-22">\min_{\theta \in \Theta} J(\theta) := \frac{1}{N} \sum_{i=1}^N \ell^{(i)}(\theta) + R(\theta).</script>
<p>Notice that the loss <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-274"><span class="MJXp-msubsup" id="MJXp-Span-275"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-276" style="margin-right: 0.05em;"></span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-277" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-278">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-279">i</span><span class="MJXp-mo" id="MJXp-Span-280">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-281" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-282">θ</span><span class="MJXp-mo" id="MJXp-Span-283" style="margin-left: 0em; margin-right: 0em;">)</span></span></span><span id="MathJax-Element-23-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-321" class="mjx-math"><span id="MJXc-Node-322" class="mjx-mrow"><span id="MJXc-Node-323" class="mjx-msubsup"><span class="mjx-base"><span id="MJXc-Node-324" class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;"></span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span id="MJXc-Node-325" class="mjx-texatom" style=""><span id="MJXc-Node-326" class="mjx-mrow"><span id="MJXc-Node-327" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-328" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">i</span></span><span id="MJXc-Node-329" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><span id="MJXc-Node-330" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-331" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.297em;">θ</span></span><span id="MJXc-Node-332" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span><script type="math/tex" id="MathJax-Element-23">\ell^{(i)}(\theta)</script> on the <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-284"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-285">i</span></span></span><span id="MathJax-Element-24-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-333" class="mjx-math"><span id="MJXc-Node-334" class="mjx-mrow"><span id="MJXc-Node-335" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">i</span></span></span></span></span><script type="math/tex" id="MathJax-Element-24">i</script>th observation is a function of the model parameters (before the loss was a function of the model <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-286"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-287">f</span></span></span><span id="MathJax-Element-25-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-336" class="mjx-math"><span id="MJXc-Node-337" class="mjx-mrow"><span id="MJXc-Node-338" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.494em; padding-right: 0.06em;">f</span></span></span></span></span><script type="math/tex" id="MathJax-Element-25">f</script>, but now <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-288"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-289">f</span></span></span><span id="MathJax-Element-26-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-339" class="mjx-math"><span id="MJXc-Node-340" class="mjx-mrow"><span id="MJXc-Node-341" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.494em; padding-right: 0.06em;">f</span></span></span></span></span><script type="math/tex" id="MathJax-Element-26">f</script> is identified with its parameters <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-290"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-291">θ</span></span></span><span id="MathJax-Element-27-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-342" class="mjx-math"><span id="MJXc-Node-343" class="mjx-mrow"><span id="MJXc-Node-344" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.297em;">θ</span></span></span></span></span><script type="math/tex" id="MathJax-Element-27">\theta</script>). Also notice that we’ve included a regularization term <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-292"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-293">R</span><span class="MJXp-mo" id="MJXp-Span-294" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-295">θ</span><span class="MJXp-mo" id="MJXp-Span-296" style="margin-left: 0em; margin-right: 0em;">)</span></span></span><span id="MathJax-Element-28-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-345" class="mjx-math"><span id="MJXc-Node-346" class="mjx-mrow"><span id="MJXc-Node-347" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">R</span></span><span id="MJXc-Node-348" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-349" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.297em;">θ</span></span><span id="MJXc-Node-350" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span><script type="math/tex" id="MathJax-Element-28">R(\theta)</script> to deal with sample error. The most common form of regularization is L2 regularization in which <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-297"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-298">R</span><span class="MJXp-mo" id="MJXp-Span-299" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-300">θ</span><span class="MJXp-mo" id="MJXp-Span-301" style="margin-left: 0em; margin-right: 0em;">)</span><span class="MJXp-mo" id="MJXp-Span-302" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-303">α</span><span class="MJXp-mo" id="MJXp-Span-304" style="margin-left: 0.167em; margin-right: 0.167em;">|</span><span class="MJXp-mo" id="MJXp-Span-305" style="margin-left: 0.167em; margin-right: 0.167em;">|</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-306">θ</span><span class="MJXp-mo" id="MJXp-Span-307" style="margin-left: 0.167em; margin-right: 0.167em;">|</span><span class="MJXp-msubsup" id="MJXp-Span-308"><span class="MJXp-mo" id="MJXp-Span-309" style="margin-left: 0em; margin-right: 0.05em;">|</span><span class="MJXp-script-box" style="height: 1.86em; vertical-align: -0.64em;"><span class=" MJXp-script"><span><span style="margin-bottom: -0.25em;"><span class="MJXp-mn" id="MJXp-Span-311">2</span></span></span></span><span class=" MJXp-script"><span><span style="margin-top: -0.85em;"><span class="MJXp-mn" id="MJXp-Span-310">2</span></span></span></span></span></span></span></span><span id="MathJax-Element-29-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-351" class="mjx-math"><span id="MJXc-Node-352" class="mjx-mrow"><span id="MJXc-Node-353" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">R</span></span><span id="MJXc-Node-354" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-355" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.297em;">θ</span></span><span id="MJXc-Node-356" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span><span id="MJXc-Node-357" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.101em; padding-bottom: 0.297em;">=</span></span><span id="MJXc-Node-358" class="mjx-mi MJXc-space3"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.297em;">α</span></span><span id="MJXc-Node-359" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">|</span></span><span id="MJXc-Node-360" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">|</span></span><span id="MJXc-Node-361" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.297em;">θ</span></span><span id="MJXc-Node-362" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">|</span></span><span id="MJXc-Node-363" class="mjx-msubsup"><span class="mjx-base"><span id="MJXc-Node-364" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">|</span></span></span><span class="mjx-stack" style="vertical-align: -0.315em;"><span class="mjx-sup" style="font-size: 70.7%; padding-bottom: 0.255em; padding-left: 0px; padding-right: 0.071em;"><span id="MJXc-Node-366" class="mjx-mn" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">2</span></span></span><span class="mjx-sub" style="font-size: 70.7%; padding-right: 0.071em;"><span id="MJXc-Node-365" class="mjx-mn" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">2</span></span></span></span></span></span></span></span><script type="math/tex" id="MathJax-Element-29">R(\theta) = \alpha \vert \vert \theta \vert \vert_2^2</script>.</p>
<p>Minimizing the regularized loss <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-312"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-313">J</span></span></span><span id="MathJax-Element-30-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-367" class="mjx-math"><span id="MJXc-Node-368" class="mjx-mrow"><span id="MJXc-Node-369" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em; padding-right: 0.078em;">J</span></span></span></span></span><script type="math/tex" id="MathJax-Element-30">J</script> over <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-314"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-315">θ</span></span></span><span id="MathJax-Element-31-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-370" class="mjx-math"><span id="MJXc-Node-371" class="mjx-mrow"><span id="MJXc-Node-372" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.297em;">θ</span></span></span></span></span><script type="math/tex" id="MathJax-Element-31">\theta</script> may still be difficult. <em>Optimization error</em> occurs when we only find an approximate minimizer; this can be addressed by optimizing for more iterations (i.e., training for longer) or using a better optimization algorithm. The table below summarizes the different kinds of error in a predictive problem and how to improve each kind.</p>
<table>
<thead>
<tr>
<th style="text-align: center">Error</th>
<th style="text-align: center">How to improve</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center">Bayes error</td>
<td style="text-align: center">Use better features</td>
</tr>
<tr>
<td style="text-align: center">Model class error</td>
<td style="text-align: center">Use a more complicated model</td>
</tr>
<tr>
<td style="text-align: center">Sample error</td>
<td style="text-align: center">Use regularization; get more data</td>
</tr>
<tr>
<td style="text-align: center">Optimization error</td>
<td style="text-align: center">Train longer; use a better optimization algorithm; reformulate loss/regularization to have properties more conducive to optimization like differentiability, Lipschitz continuous gradients, or strong convexity</td>
</tr>
</tbody>
</table>
<p>Before we discuss training DNNs, let’s quickly go over binary classification because it is formulated slightly differently than described above. In classification, the labels <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-316"><span class="MJXp-msubsup" id="MJXp-Span-317"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-318" style="margin-right: 0.05em;">y</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-319" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-320">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-321">i</span><span class="MJXp-mo" id="MJXp-Span-322">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-323" style="margin-left: 0.333em; margin-right: 0.333em;"></span><span class="MJXp-mo" id="MJXp-Span-324" style="margin-left: 0em; margin-right: 0em;">{</span><span class="MJXp-mn" id="MJXp-Span-325">0</span><span class="MJXp-mo" id="MJXp-Span-326" style="margin-left: 0em; margin-right: 0.222em;">,</span><span class="MJXp-mn" id="MJXp-Span-327">1</span><span class="MJXp-mo" id="MJXp-Span-328" style="margin-left: 0em; margin-right: 0em;">}</span></span></span><span id="MathJax-Element-32-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-373" class="mjx-math"><span id="MJXc-Node-374" class="mjx-mrow"><span id="MJXc-Node-375" class="mjx-msubsup"><span class="mjx-base" style="margin-right: -0.006em;"><span id="MJXc-Node-376" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.494em; padding-right: 0.006em;">y</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0.082em; padding-right: 0.071em;"><span id="MJXc-Node-377" class="mjx-texatom" style=""><span id="MJXc-Node-378" class="mjx-mrow"><span id="MJXc-Node-379" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-380" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">i</span></span><span id="MJXc-Node-381" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><span id="MJXc-Node-382" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.248em; padding-bottom: 0.396em;"></span></span><span id="MJXc-Node-383" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">{</span></span><span id="MJXc-Node-384" class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">0</span></span><span id="MJXc-Node-385" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.145em; padding-bottom: 0.543em;">,</span></span><span id="MJXc-Node-386" class="mjx-mn MJXc-space1"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">1</span></span><span id="MJXc-Node-387" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">}</span></span></span></span></span><script type="math/tex" id="MathJax-Element-32">y^{(i)} \in \{0, 1\}</script> indicate whether an event occurred or not (e.g., did a person default on their loan or did a user buy a product). Rather than model the labels <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-329"><span class="MJXp-msubsup" id="MJXp-Span-330"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-331" style="margin-right: 0.05em;">y</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-332" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-333">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-334">i</span><span class="MJXp-mo" id="MJXp-Span-335">)</span></span></span></span></span><span id="MathJax-Element-33-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-388" class="mjx-math"><span id="MJXc-Node-389" class="mjx-mrow"><span id="MJXc-Node-390" class="mjx-msubsup"><span class="mjx-base" style="margin-right: -0.006em;"><span id="MJXc-Node-391" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.494em; padding-right: 0.006em;">y</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0.082em; padding-right: 0.071em;"><span id="MJXc-Node-392" class="mjx-texatom" style=""><span id="MJXc-Node-393" class="mjx-mrow"><span id="MJXc-Node-394" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-395" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">i</span></span><span id="MJXc-Node-396" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span></span></span></span><script type="math/tex" id="MathJax-Element-33">y^{(i)}</script> directly, the model returns <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-336"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-337">p</span><span class="MJXp-mo" id="MJXp-Span-338" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-339">f</span><span class="MJXp-mo" id="MJXp-Span-340" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-341">x</span><span class="MJXp-mo" id="MJXp-Span-342" style="margin-left: 0em; margin-right: 0em;">)</span></span></span><span id="MathJax-Element-34-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-397" class="mjx-math"><span id="MJXc-Node-398" class="mjx-mrow"><span id="MJXc-Node-399" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.494em;">p</span></span><span id="MJXc-Node-400" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.101em; padding-bottom: 0.297em;">=</span></span><span id="MJXc-Node-401" class="mjx-mi MJXc-space3"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.494em; padding-right: 0.06em;">f</span></span><span id="MJXc-Node-402" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-403" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.297em;">x</span></span><span id="MJXc-Node-404" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span><script type="math/tex" id="MathJax-Element-34">p = f(x)</script>, the probability that <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-343"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-344">y</span><span class="MJXp-mo" id="MJXp-Span-345" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-mn" id="MJXp-Span-346">1</span></span></span><span id="MathJax-Element-35-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-405" class="mjx-math"><span id="MJXc-Node-406" class="mjx-mrow"><span id="MJXc-Node-407" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.494em; padding-right: 0.006em;">y</span></span><span id="MJXc-Node-408" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.101em; padding-bottom: 0.297em;">=</span></span><span id="MJXc-Node-409" class="mjx-mn MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">1</span></span></span></span></span><script type="math/tex" id="MathJax-Element-35">y = 1</script> (see <a href="https://scottroy.github.io/ROC-space-and-AUC.html">ROC space and AUC</a> for a discussion of the difference between a classifier and a scorer). In the classification setting, the loss is usually based on the likelihood of observing the training data under the model, assuming each observation is independent. For example, given outcomes <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-347"><span class="MJXp-msubsup" id="MJXp-Span-348"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-349" style="margin-right: 0.05em;">y</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-350" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-351">(</span><span class="MJXp-mn" id="MJXp-Span-352">1</span><span class="MJXp-mo" id="MJXp-Span-353">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-354" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-mn" id="MJXp-Span-355">0</span></span></span><span id="MathJax-Element-36-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-410" class="mjx-math"><span id="MJXc-Node-411" class="mjx-mrow"><span id="MJXc-Node-412" class="mjx-msubsup"><span class="mjx-base" style="margin-right: -0.006em;"><span id="MJXc-Node-413" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.494em; padding-right: 0.006em;">y</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0.082em; padding-right: 0.071em;"><span id="MJXc-Node-414" class="mjx-texatom" style=""><span id="MJXc-Node-415" class="mjx-mrow"><span id="MJXc-Node-416" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-417" class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">1</span></span><span id="MJXc-Node-418" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><span id="MJXc-Node-419" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.101em; padding-bottom: 0.297em;">=</span></span><span id="MJXc-Node-420" class="mjx-mn MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">0</span></span></span></span></span><script type="math/tex" id="MathJax-Element-36">y^{(1)} = 0</script>, <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-356"><span class="MJXp-msubsup" id="MJXp-Span-357"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-358" style="margin-right: 0.05em;">y</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-359" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-360">(</span><span class="MJXp-mn" id="MJXp-Span-361">2</span><span class="MJXp-mo" id="MJXp-Span-362">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-363" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-mn" id="MJXp-Span-364">1</span></span></span><span id="MathJax-Element-37-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-421" class="mjx-math"><span id="MJXc-Node-422" class="mjx-mrow"><span id="MJXc-Node-423" class="mjx-msubsup"><span class="mjx-base" style="margin-right: -0.006em;"><span id="MJXc-Node-424" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.494em; padding-right: 0.006em;">y</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0.082em; padding-right: 0.071em;"><span id="MJXc-Node-425" class="mjx-texatom" style=""><span id="MJXc-Node-426" class="mjx-mrow"><span id="MJXc-Node-427" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-428" class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">2</span></span><span id="MJXc-Node-429" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><span id="MJXc-Node-430" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.101em; padding-bottom: 0.297em;">=</span></span><span id="MJXc-Node-431" class="mjx-mn MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">1</span></span></span></span></span><script type="math/tex" id="MathJax-Element-37">y^{(2)} = 1</script>, and <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-365"><span class="MJXp-msubsup" id="MJXp-Span-366"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-367" style="margin-right: 0.05em;">y</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-368" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-369">(</span><span class="MJXp-mn" id="MJXp-Span-370">3</span><span class="MJXp-mo" id="MJXp-Span-371">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-372" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-mn" id="MJXp-Span-373">0</span></span></span><span id="MathJax-Element-38-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-432" class="mjx-math"><span id="MJXc-Node-433" class="mjx-mrow"><span id="MJXc-Node-434" class="mjx-msubsup"><span class="mjx-base" style="margin-right: -0.006em;"><span id="MJXc-Node-435" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.494em; padding-right: 0.006em;">y</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0.082em; padding-right: 0.071em;"><span id="MJXc-Node-436" class="mjx-texatom" style=""><span id="MJXc-Node-437" class="mjx-mrow"><span id="MJXc-Node-438" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-439" class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">3</span></span><span id="MJXc-Node-440" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><span id="MJXc-Node-441" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.101em; padding-bottom: 0.297em;">=</span></span><span id="MJXc-Node-442" class="mjx-mn MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">0</span></span></span></span></span><script type="math/tex" id="MathJax-Element-38">y^{(3)} = 0</script> and model probabilities <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-374"><span class="MJXp-msubsup" id="MJXp-Span-375"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-376" style="margin-right: 0.05em;">p</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-377" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-378">(</span><span class="MJXp-mn" id="MJXp-Span-379">1</span><span class="MJXp-mo" id="MJXp-Span-380">)</span></span></span></span></span><span id="MathJax-Element-39-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-443" class="mjx-math"><span id="MJXc-Node-444" class="mjx-mrow"><span id="MJXc-Node-445" class="mjx-msubsup"><span class="mjx-base"><span id="MJXc-Node-446" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.494em;">p</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span id="MJXc-Node-447" class="mjx-texatom" style=""><span id="MJXc-Node-448" class="mjx-mrow"><span id="MJXc-Node-449" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-450" class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">1</span></span><span id="MJXc-Node-451" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span></span></span></span><script type="math/tex" id="MathJax-Element-39">p^{(1)}</script>, <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-381"><span class="MJXp-msubsup" id="MJXp-Span-382"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-383" style="margin-right: 0.05em;">p</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-384" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-385">(</span><span class="MJXp-mn" id="MJXp-Span-386">2</span><span class="MJXp-mo" id="MJXp-Span-387">)</span></span></span></span></span><span id="MathJax-Element-40-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-452" class="mjx-math"><span id="MJXc-Node-453" class="mjx-mrow"><span id="MJXc-Node-454" class="mjx-msubsup"><span class="mjx-base"><span id="MJXc-Node-455" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.494em;">p</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span id="MJXc-Node-456" class="mjx-texatom" style=""><span id="MJXc-Node-457" class="mjx-mrow"><span id="MJXc-Node-458" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-459" class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">2</span></span><span id="MJXc-Node-460" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span></span></span></span><script type="math/tex" id="MathJax-Element-40">p^{(2)}</script>, and <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-388"><span class="MJXp-msubsup" id="MJXp-Span-389"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-390" style="margin-right: 0.05em;">p</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-391" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-392">(</span><span class="MJXp-mn" id="MJXp-Span-393">3</span><span class="MJXp-mo" id="MJXp-Span-394">)</span></span></span></span></span><span id="MathJax-Element-41-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-461" class="mjx-math"><span id="MJXc-Node-462" class="mjx-mrow"><span id="MJXc-Node-463" class="mjx-msubsup"><span class="mjx-base"><span id="MJXc-Node-464" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.494em;">p</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span id="MJXc-Node-465" class="mjx-texatom" style=""><span id="MJXc-Node-466" class="mjx-mrow"><span id="MJXc-Node-467" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-468" class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">3</span></span><span id="MJXc-Node-469" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span></span></span></span><script type="math/tex" id="MathJax-Element-41">p^{(3)}</script>, the likelihood of observing the data under the model is <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-395"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-396">P</span><span class="MJXp-mo" id="MJXp-Span-397" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-mo" id="MJXp-Span-398" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-mn" id="MJXp-Span-399">1</span><span class="MJXp-mo" id="MJXp-Span-400" style="margin-left: 0.267em; margin-right: 0.267em;"></span><span class="MJXp-msubsup" id="MJXp-Span-401"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-402" style="margin-right: 0.05em;">p</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-403" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-404">(</span><span class="MJXp-mn" id="MJXp-Span-405">1</span><span class="MJXp-mo" id="MJXp-Span-406">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-407" style="margin-left: 0em; margin-right: 0em;">)</span><span class="MJXp-mo" id="MJXp-Span-408" style="margin-left: 0.267em; margin-right: 0.267em;"></span><span class="MJXp-msubsup" id="MJXp-Span-409"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-410" style="margin-right: 0.05em;">p</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-411" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-412">(</span><span class="MJXp-mn" id="MJXp-Span-413">2</span><span class="MJXp-mo" id="MJXp-Span-414">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-415" style="margin-left: 0.267em; margin-right: 0.267em;"></span><span class="MJXp-mo" id="MJXp-Span-416" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-mn" id="MJXp-Span-417">1</span><span class="MJXp-mo" id="MJXp-Span-418" style="margin-left: 0.267em; margin-right: 0.267em;"></span><span class="MJXp-msubsup" id="MJXp-Span-419"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-420" style="margin-right: 0.05em;">p</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-421" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-422">(</span><span class="MJXp-mn" id="MJXp-Span-423">3</span><span class="MJXp-mo" id="MJXp-Span-424">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-425" style="margin-left: 0em; margin-right: 0em;">)</span></span></span><span id="MathJax-Element-42-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-470" class="mjx-math"><span id="MJXc-Node-471" class="mjx-mrow"><span id="MJXc-Node-472" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em; padding-right: 0.109em;">P</span></span><span id="MJXc-Node-473" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.101em; padding-bottom: 0.297em;">=</span></span><span id="MJXc-Node-474" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-475" class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">1</span></span><span id="MJXc-Node-476" class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.297em; padding-bottom: 0.445em;"></span></span><span id="MJXc-Node-477" class="mjx-msubsup MJXc-space2"><span class="mjx-base"><span id="MJXc-Node-478" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.494em;">p</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span id="MJXc-Node-479" class="mjx-texatom" style=""><span id="MJXc-Node-480" class="mjx-mrow"><span id="MJXc-Node-481" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-482" class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">1</span></span><span id="MJXc-Node-483" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><span id="MJXc-Node-484" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span><span id="MJXc-Node-485" class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.002em; padding-bottom: 0.297em;"></span></span><span id="MJXc-Node-486" class="mjx-msubsup MJXc-space2"><span class="mjx-base"><span id="MJXc-Node-487" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.494em;">p</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span id="MJXc-Node-488" class="mjx-texatom" style=""><span id="MJXc-Node-489" class="mjx-mrow"><span id="MJXc-Node-490" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-491" class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">2</span></span><span id="MJXc-Node-492" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><span id="MJXc-Node-493" class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.002em; padding-bottom: 0.297em;"></span></span><span id="MJXc-Node-494" class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-495" class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">1</span></span><span id="MJXc-Node-496" class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.297em; padding-bottom: 0.445em;"></span></span><span id="MJXc-Node-497" class="mjx-msubsup MJXc-space2"><span class="mjx-base"><span id="MJXc-Node-498" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.494em;">p</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span id="MJXc-Node-499" class="mjx-texatom" style=""><span id="MJXc-Node-500" class="mjx-mrow"><span id="MJXc-Node-501" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-502" class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">3</span></span><span id="MJXc-Node-503" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><span id="MJXc-Node-504" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span><script type="math/tex" id="MathJax-Element-42">P = (1 - p^{(1)}) \cdot p^{(2)} \cdot (1 - p^{(3)})</script>. We define the loss as the negative log likelihood <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-426"><span class="MJXp-mo" id="MJXp-Span-427" style="margin-left: 0em; margin-right: 0.111em;"></span><span class="MJXp-mi" id="MJXp-Span-428">log</span><span class="MJXp-mo" id="MJXp-Span-429" style="margin-left: 0em; margin-right: 0em;"></span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-430">P</span><span class="MJXp-mo" id="MJXp-Span-431" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-mo" id="MJXp-Span-432" style="margin-left: 0.267em; margin-right: 0.267em;"></span><span class="MJXp-mi" id="MJXp-Span-433">log</span><span class="MJXp-mo" id="MJXp-Span-434" style="margin-left: 0em; margin-right: 0em;"></span><span class="MJXp-mo" id="MJXp-Span-435" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-mn" id="MJXp-Span-436">1</span><span class="MJXp-mo" id="MJXp-Span-437" style="margin-left: 0.267em; margin-right: 0.267em;"></span><span class="MJXp-msubsup" id="MJXp-Span-438"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-439" style="margin-right: 0.05em;">p</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-440" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-441">(</span><span class="MJXp-mn" id="MJXp-Span-442">1</span><span class="MJXp-mo" id="MJXp-Span-443">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-444" style="margin-left: 0em; margin-right: 0em;">)</span><span class="MJXp-mo" id="MJXp-Span-445" style="margin-left: 0.267em; margin-right: 0.267em;"></span><span class="MJXp-mi" id="MJXp-Span-446">log</span><span class="MJXp-mo" id="MJXp-Span-447" style="margin-left: 0em; margin-right: 0em;"></span><span class="MJXp-msubsup" id="MJXp-Span-448"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-449" style="margin-right: 0.05em;">p</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-450" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-451">(</span><span class="MJXp-mn" id="MJXp-Span-452">2</span><span class="MJXp-mo" id="MJXp-Span-453">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-454" style="margin-left: 0.267em; margin-right: 0.267em;"></span><span class="MJXp-mi" id="MJXp-Span-455">log</span><span class="MJXp-mo" id="MJXp-Span-456" style="margin-left: 0em; margin-right: 0em;"></span><span class="MJXp-mo" id="MJXp-Span-457" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-mn" id="MJXp-Span-458">1</span><span class="MJXp-mo" id="MJXp-Span-459" style="margin-left: 0.267em; margin-right: 0.267em;"></span><span class="MJXp-msubsup" id="MJXp-Span-460"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-461" style="margin-right: 0.05em;">p</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-462" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-463">(</span><span class="MJXp-mn" id="MJXp-Span-464">3</span><span class="MJXp-mo" id="MJXp-Span-465">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-466" style="margin-left: 0em; margin-right: 0em;">)</span></span></span><span id="MathJax-Element-43-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-505" class="mjx-math"><span id="MJXc-Node-506" class="mjx-mrow"><span id="MJXc-Node-507" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.297em; padding-bottom: 0.445em;"></span></span><span id="MJXc-Node-508" class="mjx-mi MJXc-space1"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.543em;">log</span></span><span id="MJXc-Node-509" class="mjx-mo"><span class="mjx-char"></span></span><span id="MJXc-Node-510" class="mjx-mi MJXc-space1"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em; padding-right: 0.109em;">P</span></span><span id="MJXc-Node-511" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.101em; padding-bottom: 0.297em;">=</span></span><span id="MJXc-Node-512" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.297em; padding-bottom: 0.445em;"></span></span><span id="MJXc-Node-513" class="mjx-mi MJXc-space1"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.543em;">log</span></span><span id="MJXc-Node-514" class="mjx-mo"><span class="mjx-char"></span></span><span id="MJXc-Node-515" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-516" class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">1</span></span><span id="MJXc-Node-517" class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.297em; padding-bottom: 0.445em;"></span></span><span id="MJXc-Node-518" class="mjx-msubsup MJXc-space2"><span class="mjx-base"><span id="MJXc-Node-519" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.494em;">p</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span id="MJXc-Node-520" class="mjx-texatom" style=""><span id="MJXc-Node-521" class="mjx-mrow"><span id="MJXc-Node-522" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-523" class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">1</span></span><span id="MJXc-Node-524" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><span id="MJXc-Node-525" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span><span id="MJXc-Node-526" class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.297em; padding-bottom: 0.445em;"></span></span><span id="MJXc-Node-527" class="mjx-mi MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.543em;">log</span></span><span id="MJXc-Node-528" class="mjx-mo"><span class="mjx-char"></span></span><span id="MJXc-Node-529" class="mjx-msubsup MJXc-space1"><span class="mjx-base"><span id="MJXc-Node-530" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.494em;">p</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span id="MJXc-Node-531" class="mjx-texatom" style=""><span id="MJXc-Node-532" class="mjx-mrow"><span id="MJXc-Node-533" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-534" class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">2</span></span><span id="MJXc-Node-535" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><span id="MJXc-Node-536" class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.297em; padding-bottom: 0.445em;"></span></span><span id="MJXc-Node-537" class="mjx-mi MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.543em;">log</span></span><span id="MJXc-Node-538" class="mjx-mo"><span class="mjx-char"></span></span><span id="MJXc-Node-539" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-540" class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">1</span></span><span id="MJXc-Node-541" class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.297em; padding-bottom: 0.445em;"></span></span><span id="MJXc-Node-542" class="mjx-msubsup MJXc-space2"><span class="mjx-base"><span id="MJXc-Node-543" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.494em;">p</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span id="MJXc-Node-544" class="mjx-texatom" style=""><span id="MJXc-Node-545" class="mjx-mrow"><span id="MJXc-Node-546" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-547" class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">3</span></span><span id="MJXc-Node-548" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><span id="MJXc-Node-549" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span><script type="math/tex" id="MathJax-Element-43">-\log P = -\log(1 - p^{(1)}) - \log p^{(2)} - \log(1 - p^{(3)})</script>. In general, the average negative log likelihood loss is</p>
<span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math MJXp-display" id="MJXp-Span-467"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-468"></span><span class="MJXp-mo" id="MJXp-Span-469" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-mo" id="MJXp-Span-470" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-mn" id="MJXp-Span-471">1</span><span class="MJXp-mrow" id="MJXp-Span-472"><span class="MJXp-mo" id="MJXp-Span-473" style="margin-left: 0.111em; margin-right: 0.111em;">/</span></span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-474">N</span><span class="MJXp-mo" id="MJXp-Span-475" style="margin-left: 0em; margin-right: 0em;">)</span><span class="MJXp-mo" id="MJXp-Span-476" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-msubsup" id="MJXp-Span-477"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-478" style="margin-right: 0.05em;"></span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-479" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-480">(</span><span class="MJXp-mn" id="MJXp-Span-481">1</span><span class="MJXp-mo" id="MJXp-Span-482">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-483" style="margin-left: 0.267em; margin-right: 0.267em;">+</span><span class="MJXp-mo" id="MJXp-Span-484" style="margin-left: 0em; margin-right: 0em;"></span><span class="MJXp-mo" id="MJXp-Span-485" style="margin-left: 0.267em; margin-right: 0.267em;">+</span><span class="MJXp-msubsup" id="MJXp-Span-486"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-487" style="margin-right: 0.05em;"></span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-488" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-489">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-490">N</span><span class="MJXp-mo" id="MJXp-Span-491">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-492" style="margin-left: 0em; margin-right: 0em;">)</span><span class="MJXp-mo" id="MJXp-Span-493" style="margin-left: 0em; margin-right: 0.222em;">,</span></span></span><span class="mjx-chtml MJXc-display MJXc-processed" style="text-align: center;"><span id="MathJax-Element-44-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 113%; text-align: center;"><span id="MJXc-Node-550" class="mjx-math"><span id="MJXc-Node-551" class="mjx-mrow"><span id="MJXc-Node-552" class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;"></span></span><span id="MJXc-Node-553" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.101em; padding-bottom: 0.297em;">=</span></span><span id="MJXc-Node-554" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-555" class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">1</span></span><span id="MJXc-Node-556" class="mjx-texatom"><span id="MJXc-Node-557" class="mjx-mrow"><span id="MJXc-Node-558" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">/</span></span></span></span><span id="MJXc-Node-559" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em; padding-right: 0.085em;">N</span></span><span id="MJXc-Node-560" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span><span id="MJXc-Node-561" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-562" class="mjx-msubsup"><span class="mjx-base"><span id="MJXc-Node-563" class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;"></span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.584em; padding-left: 0px; padding-right: 0.071em;"><span id="MJXc-Node-564" class="mjx-texatom" style=""><span id="MJXc-Node-565" class="mjx-mrow"><span id="MJXc-Node-566" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-567" class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">1</span></span><span id="MJXc-Node-568" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><span id="MJXc-Node-569" class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.297em; padding-bottom: 0.445em;">+</span></span><span id="MJXc-Node-570" class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.145em; padding-bottom: 0.347em;"></span></span><span id="MJXc-Node-571" class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.297em; padding-bottom: 0.445em;">+</span></span><span id="MJXc-Node-572" class="mjx-msubsup MJXc-space2"><span class="mjx-base"><span id="MJXc-Node-573" class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;"></span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.584em; padding-left: 0px; padding-right: 0.071em;"><span id="MJXc-Node-574" class="mjx-texatom" style=""><span id="MJXc-Node-575" class="mjx-mrow"><span id="MJXc-Node-576" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-577" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em; padding-right: 0.085em;">N</span></span><span id="MJXc-Node-578" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><span id="MJXc-Node-579" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span><span id="MJXc-Node-580" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.145em; padding-bottom: 0.543em;">,</span></span></span></span></span></span><script type="math/tex; mode=display" id="MathJax-Element-44">\ell = (1 / N) (\ell^{(1)} + \ldots + \ell^{(N)}),</script>
<p>where <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-494"><span class="MJXp-msubsup" id="MJXp-Span-495"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-496" style="margin-right: 0.05em;"></span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-497" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-498">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-499">i</span><span class="MJXp-mo" id="MJXp-Span-500">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-501" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-mo" id="MJXp-Span-502" style="margin-left: 0.267em; margin-right: 0.267em;"></span><span class="MJXp-msubsup" id="MJXp-Span-503"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-504" style="margin-right: 0.05em;">y</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-505" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-506">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-507">i</span><span class="MJXp-mo" id="MJXp-Span-508">)</span></span></span><span class="MJXp-mi" id="MJXp-Span-509">log</span><span class="MJXp-mo" id="MJXp-Span-510" style="margin-left: 0em; margin-right: 0em;"></span><span class="MJXp-msubsup" id="MJXp-Span-511"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-512" style="margin-right: 0.05em;">p</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-513" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-514">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-515">i</span><span class="MJXp-mo" id="MJXp-Span-516">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-517" style="margin-left: 0.267em; margin-right: 0.267em;"></span><span class="MJXp-mo" id="MJXp-Span-518" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-mn" id="MJXp-Span-519">1</span><span class="MJXp-mo" id="MJXp-Span-520" style="margin-left: 0.267em; margin-right: 0.267em;"></span><span class="MJXp-msubsup" id="MJXp-Span-521"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-522" style="margin-right: 0.05em;">y</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-523" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-524">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-525">i</span><span class="MJXp-mo" id="MJXp-Span-526">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-527" style="margin-left: 0em; margin-right: 0em;">)</span><span class="MJXp-mi" id="MJXp-Span-528">log</span><span class="MJXp-mo" id="MJXp-Span-529" style="margin-left: 0em; margin-right: 0em;"></span><span class="MJXp-mo" id="MJXp-Span-530" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-mn" id="MJXp-Span-531">1</span><span class="MJXp-mo" id="MJXp-Span-532" style="margin-left: 0.267em; margin-right: 0.267em;"></span><span class="MJXp-msubsup" id="MJXp-Span-533"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-534" style="margin-right: 0.05em;">p</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-535" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-536">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-537">i</span><span class="MJXp-mo" id="MJXp-Span-538">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-539" style="margin-left: 0em; margin-right: 0em;">)</span></span></span><span id="MathJax-Element-45-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-581" class="mjx-math"><span id="MJXc-Node-582" class="mjx-mrow"><span id="MJXc-Node-583" class="mjx-msubsup"><span class="mjx-base"><span id="MJXc-Node-584" class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;"></span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span id="MJXc-Node-585" class="mjx-texatom" style=""><span id="MJXc-Node-586" class="mjx-mrow"><span id="MJXc-Node-587" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-588" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">i</span></span><span id="MJXc-Node-589" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><span id="MJXc-Node-590" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.101em; padding-bottom: 0.297em;">=</span></span><span id="MJXc-Node-591" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.297em; padding-bottom: 0.445em;"></span></span><span id="MJXc-Node-592" class="mjx-msubsup"><span class="mjx-base" style="margin-right: -0.006em;"><span id="MJXc-Node-593" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.494em; padding-right: 0.006em;">y</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0.082em; padding-right: 0.071em;"><span id="MJXc-Node-594" class="mjx-texatom" style=""><span id="MJXc-Node-595" class="mjx-mrow"><span id="MJXc-Node-596" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-597" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">i</span></span><span id="MJXc-Node-598" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><span id="MJXc-Node-599" class="mjx-mi MJXc-space1"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.543em;">log</span></span><span id="MJXc-Node-600" class="mjx-mo"><span class="mjx-char"></span></span><span id="MJXc-Node-601" class="mjx-msubsup MJXc-space1"><span class="mjx-base"><span id="MJXc-Node-602" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.494em;">p</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span id="MJXc-Node-603" class="mjx-texatom" style=""><span id="MJXc-Node-604" class="mjx-mrow"><span id="MJXc-Node-605" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-606" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">i</span></span><span id="MJXc-Node-607" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><span id="MJXc-Node-608" class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.297em; padding-bottom: 0.445em;"></span></span><span id="MJXc-Node-609" class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-610" class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">1</span></span><span id="MJXc-Node-611" class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.297em; padding-bottom: 0.445em;"></span></span><span id="MJXc-Node-612" class="mjx-msubsup MJXc-space2"><span class="mjx-base" style="margin-right: -0.006em;"><span id="MJXc-Node-613" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.494em; padding-right: 0.006em;">y</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0.082em; padding-right: 0.071em;"><span id="MJXc-Node-614" class="mjx-texatom" style=""><span id="MJXc-Node-615" class="mjx-mrow"><span id="MJXc-Node-616" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-617" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">i</span></span><span id="MJXc-Node-618" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><span id="MJXc-Node-619" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span><span id="MJXc-Node-620" class="mjx-mi MJXc-space1"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.543em;">log</span></span><span id="MJXc-Node-621" class="mjx-mo"><span class="mjx-char"></span></span><span id="MJXc-Node-622" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-623" class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">1</span></span><span id="MJXc-Node-624" class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.297em; padding-bottom: 0.445em;"></span></span><span id="MJXc-Node-625" class="mjx-msubsup MJXc-space2"><span class="mjx-base"><span id="MJXc-Node-626" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.494em;">p</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span id="MJXc-Node-627" class="mjx-texatom" style=""><span id="MJXc-Node-628" class="mjx-mrow"><span id="MJXc-Node-629" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-630" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">i</span></span><span id="MJXc-Node-631" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><span id="MJXc-Node-632" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span><script type="math/tex" id="MathJax-Element-45">\ell^{(i)} = -y^{(i)} \log p^{(i)} - (1 - y^{(i)}) \log(1 - p^{(i)})</script>. This is also called cross-entropy loss and is the most popular loss function for classification tasks. As above, the negative log likelihood is a function of the model parameters <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-540"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-541">θ</span></span></span><span id="MathJax-Element-46-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-633" class="mjx-math"><span id="MJXc-Node-634" class="mjx-mrow"><span id="MJXc-Node-635" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.297em;">θ</span></span></span></span></span><script type="math/tex" id="MathJax-Element-46">\theta</script>.</p>
<h2 id="training-dnns">Training DNNs</h2>
<p>In deep learning, as with general prediction tasks, the model fitting/learning involves minimizing the regularized loss function</p>
<span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math MJXp-display" id="MJXp-Span-542"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-543">J</span><span class="MJXp-mo" id="MJXp-Span-544" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-545">θ</span><span class="MJXp-mo" id="MJXp-Span-546" style="margin-left: 0em; margin-right: 0em;">)</span><span class="MJXp-mo" id="MJXp-Span-547" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-548"></span><span class="MJXp-mo" id="MJXp-Span-549" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-550">θ</span><span class="MJXp-mo" id="MJXp-Span-551" style="margin-left: 0em; margin-right: 0em;">)</span><span class="MJXp-mo" id="MJXp-Span-552" style="margin-left: 0.267em; margin-right: 0.267em;">+</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-553">R</span><span class="MJXp-mo" id="MJXp-Span-554" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-555">θ</span><span class="MJXp-mo" id="MJXp-Span-556" style="margin-left: 0em; margin-right: 0em;">)</span><span class="MJXp-mo" id="MJXp-Span-557" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-mo" id="MJXp-Span-558" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-mn" id="MJXp-Span-559">1</span><span class="MJXp-mrow" id="MJXp-Span-560"><span class="MJXp-mo" id="MJXp-Span-561" style="margin-left: 0.111em; margin-right: 0.111em;">/</span></span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-562">N</span><span class="MJXp-mo" id="MJXp-Span-563" style="margin-left: 0em; margin-right: 0em;">)</span><span class="MJXp-munderover" id="MJXp-Span-564"><span><span class="MJXp-over"><span class=" MJXp-script"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-570" style="margin-right: 0px; margin-left: 0px;">N</span></span><span class=""><span class="MJXp-mo" id="MJXp-Span-565" style="margin-left: 0.111em; margin-right: 0.167em;"><span class="MJXp-largeop"></span></span></span></span></span><span class=" MJXp-script"><span class="MJXp-mrow" id="MJXp-Span-566" style="margin-left: 0px;"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-567">i</span><span class="MJXp-mo" id="MJXp-Span-568">=</span><span class="MJXp-mn" id="MJXp-Span-569">1</span></span></span></span><span class="MJXp-msubsup" id="MJXp-Span-571"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-572" style="margin-right: 0.05em;"></span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-573" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-574">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-575">i</span><span class="MJXp-mo" id="MJXp-Span-576">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-577" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-578">θ</span><span class="MJXp-mo" id="MJXp-Span-579" style="margin-left: 0em; margin-right: 0em;">)</span><span class="MJXp-mo" id="MJXp-Span-580" style="margin-left: 0.267em; margin-right: 0.267em;">+</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-581">R</span><span class="MJXp-mo" id="MJXp-Span-582" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-583">θ</span><span class="MJXp-mo" id="MJXp-Span-584" style="margin-left: 0em; margin-right: 0em;">)</span></span></span><span class="mjx-chtml MJXc-display MJXc-processed" style="text-align: center;"><span id="MathJax-Element-47-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 113%; text-align: center;"><span id="MJXc-Node-636" class="mjx-math"><span id="MJXc-Node-637" class="mjx-mrow"><span id="MJXc-Node-638" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em; padding-right: 0.078em;">J</span></span><span id="MJXc-Node-639" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-640" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.297em;">θ</span></span><span id="MJXc-Node-641" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span><span id="MJXc-Node-642" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.101em; padding-bottom: 0.297em;">=</span></span><span id="MJXc-Node-643" class="mjx-mi MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;"></span></span><span id="MJXc-Node-644" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-645" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.297em;">θ</span></span><span id="MJXc-Node-646" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span><span id="MJXc-Node-647" class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.297em; padding-bottom: 0.445em;">+</span></span><span id="MJXc-Node-648" class="mjx-mi MJXc-space2"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">R</span></span><span id="MJXc-Node-649" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-650" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.297em;">θ</span></span><span id="MJXc-Node-651" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span><span id="MJXc-Node-652" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.101em; padding-bottom: 0.297em;">=</span></span><span id="MJXc-Node-653" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-654" class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">1</span></span><span id="MJXc-Node-655" class="mjx-texatom"><span id="MJXc-Node-656" class="mjx-mrow"><span id="MJXc-Node-657" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">/</span></span></span></span><span id="MJXc-Node-658" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em; padding-right: 0.085em;">N</span></span><span id="MJXc-Node-659" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span><span id="MJXc-Node-660" class="mjx-munderover MJXc-space1"><span class="mjx-itable"><span class="mjx-row"><span class="mjx-cell"><span class="mjx-stack"><span class="mjx-over" style="font-size: 70.7%; padding-bottom: 0.258em; padding-top: 0.141em; padding-left: 0.577em;"><span id="MJXc-Node-667" class="mjx-mi" style=""><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em; padding-right: 0.085em;">N</span></span></span><span class="mjx-op"><span id="MJXc-Node-661" class="mjx-mo"><span class="mjx-char MJXc-TeX-size2-R" style="padding-top: 0.74em; padding-bottom: 0.74em;"></span></span></span></span></span></span><span class="mjx-row"><span class="mjx-under" style="font-size: 70.7%; padding-top: 0.236em; padding-bottom: 0.141em; padding-left: 0.21em;"><span id="MJXc-Node-662" class="mjx-texatom" style=""><span id="MJXc-Node-663" class="mjx-mrow"><span id="MJXc-Node-664" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">i</span></span><span id="MJXc-Node-665" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.101em; padding-bottom: 0.297em;">=</span></span><span id="MJXc-Node-666" class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">1</span></span></span></span></span></span></span></span><span id="MJXc-Node-668" class="mjx-msubsup MJXc-space1"><span class="mjx-base"><span id="MJXc-Node-669" class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;"></span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.584em; padding-left: 0px; padding-right: 0.071em;"><span id="MJXc-Node-670" class="mjx-texatom" style=""><span id="MJXc-Node-671" class="mjx-mrow"><span id="MJXc-Node-672" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-673" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">i</span></span><span id="MJXc-Node-674" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><span id="MJXc-Node-675" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-676" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.297em;">θ</span></span><span id="MJXc-Node-677" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span><span id="MJXc-Node-678" class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.297em; padding-bottom: 0.445em;">+</span></span><span id="MJXc-Node-679" class="mjx-mi MJXc-space2"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">R</span></span><span id="MJXc-Node-680" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-681" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.297em;">θ</span></span><span id="MJXc-Node-682" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><script type="math/tex; mode=display" id="MathJax-Element-47">J(\theta) = \ell(\theta) + R(\theta) = (1/N) \sum_{i=1}^N \ell^{(i)}(\theta) + R(\theta)</script>
<p>over the parameters <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-585"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-586">θ</span></span></span><span id="MathJax-Element-48-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-683" class="mjx-math"><span id="MJXc-Node-684" class="mjx-mrow"><span id="MJXc-Node-685" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.297em;">θ</span></span></span></span></span><script type="math/tex" id="MathJax-Element-48">\theta</script>. This is often done with an iterative procedure such as gradient descent. In gradient descent, we initialize the parameters at some value and continuously move in the direction of the negative gradient:</p>
<ol>
<li>Initialize <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-587"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-588">θ</span><span class="MJXp-mo" id="MJXp-Span-589" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-msubsup" id="MJXp-Span-590"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-591" style="margin-right: 0.05em;">θ</span><span class="MJXp-mn MJXp-script" id="MJXp-Span-592" style="vertical-align: -0.4em;">0</span></span></span></span><span id="MathJax-Element-49-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-686" class="mjx-math"><span id="MJXc-Node-687" class="mjx-mrow"><span id="MJXc-Node-688" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.297em;">θ</span></span><span id="MJXc-Node-689" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.101em; padding-bottom: 0.297em;">=</span></span><span id="MJXc-Node-690" class="mjx-msubsup MJXc-space3"><span class="mjx-base"><span id="MJXc-Node-691" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.297em;">θ</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.212em; padding-right: 0.071em;"><span id="MJXc-Node-692" class="mjx-mn" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">0</span></span></span></span></span></span></span><script type="math/tex" id="MathJax-Element-49">\theta = \theta_0</script></li>
<li>Repeatedly update <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-593"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-594">θ</span><span class="MJXp-mo" id="MJXp-Span-595" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-596">θ</span><span class="MJXp-mo" id="MJXp-Span-597" style="margin-left: 0.267em; margin-right: 0.267em;"></span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-598">r</span><span class="MJXp-mo" id="MJXp-Span-599" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-mi" id="MJXp-Span-600"></span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-601">J</span><span class="MJXp-mo" id="MJXp-Span-602" style="margin-left: 0em; margin-right: 0em;">)</span><span class="MJXp-mo" id="MJXp-Span-603" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-604">θ</span><span class="MJXp-mo" id="MJXp-Span-605" style="margin-left: 0em; margin-right: 0em;">)</span></span></span><span id="MathJax-Element-50-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-693" class="mjx-math"><span id="MJXc-Node-694" class="mjx-mrow"><span id="MJXc-Node-695" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.297em;">θ</span></span><span id="MJXc-Node-696" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.101em; padding-bottom: 0.297em;">=</span></span><span id="MJXc-Node-697" class="mjx-mi MJXc-space3"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.297em;">θ</span></span><span id="MJXc-Node-698" class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.297em; padding-bottom: 0.445em;"></span></span><span id="MJXc-Node-699" class="mjx-mi MJXc-space2"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.297em;">r</span></span><span id="MJXc-Node-700" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-701" class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.396em;"></span></span><span id="MJXc-Node-702" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em; padding-right: 0.078em;">J</span></span><span id="MJXc-Node-703" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span><span id="MJXc-Node-704" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-705" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.297em;">θ</span></span><span id="MJXc-Node-706" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span><script type="math/tex" id="MathJax-Element-50">\theta = \theta - r (\nabla J)(\theta)</script>, where <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-606"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-607">r</span></span></span><span id="MathJax-Element-51-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-707" class="mjx-math"><span id="MJXc-Node-708" class="mjx-mrow"><span id="MJXc-Node-709" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.297em;">r</span></span></span></span></span><script type="math/tex" id="MathJax-Element-51">r</script> is the step size or learning rate</li>
</ol>
<p>The derivative is a linear operator so the gradient of <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-608"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-609">J</span></span></span><span id="MathJax-Element-52-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-710" class="mjx-math"><span id="MJXc-Node-711" class="mjx-mrow"><span id="MJXc-Node-712" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em; padding-right: 0.078em;">J</span></span></span></span></span><script type="math/tex" id="MathJax-Element-52">J</script> breaks apart:</p>
<span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math MJXp-display" id="MJXp-Span-610"><span class="MJXp-mi" id="MJXp-Span-611"></span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-612">J</span><span class="MJXp-mo" id="MJXp-Span-613" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-mi" id="MJXp-Span-614"></span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-615"></span><span class="MJXp-mo" id="MJXp-Span-616" style="margin-left: 0.267em; margin-right: 0.267em;">+</span><span class="MJXp-mi" id="MJXp-Span-617"></span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-618">R</span><span class="MJXp-mo" id="MJXp-Span-619" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-mo" id="MJXp-Span-620" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-mn" id="MJXp-Span-621">1</span><span class="MJXp-mrow" id="MJXp-Span-622"><span class="MJXp-mo" id="MJXp-Span-623" style="margin-left: 0.111em; margin-right: 0.111em;">/</span></span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-624">N</span><span class="MJXp-mo" id="MJXp-Span-625" style="margin-left: 0em; margin-right: 0em;">)</span><span class="MJXp-munderover" id="MJXp-Span-626"><span><span class="MJXp-over"><span class=" MJXp-script"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-632" style="margin-right: 0px; margin-left: 0px;">N</span></span><span class=""><span class="MJXp-mo" id="MJXp-Span-627" style="margin-left: 0.111em; margin-right: 0.167em;"><span class="MJXp-largeop"></span></span></span></span></span><span class=" MJXp-script"><span class="MJXp-mrow" id="MJXp-Span-628" style="margin-left: 0px;"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-629">i</span><span class="MJXp-mo" id="MJXp-Span-630">=</span><span class="MJXp-mn" id="MJXp-Span-631">1</span></span></span></span><span class="MJXp-mi" id="MJXp-Span-633"></span><span class="MJXp-msubsup" id="MJXp-Span-634"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-635" style="margin-right: 0.05em;"></span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-636" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-637">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-638">i</span><span class="MJXp-mo" id="MJXp-Span-639">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-640" style="margin-left: 0.267em; margin-right: 0.267em;">+</span><span class="MJXp-mi" id="MJXp-Span-641"></span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-642">R</span><span class="MJXp-mo" id="MJXp-Span-643" style="margin-left: 0em; margin-right: 0.222em;">.</span></span></span><span class="mjx-chtml MJXc-display MJXc-processed" style="text-align: center;"><span id="MathJax-Element-53-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 113%; text-align: center;"><span id="MJXc-Node-713" class="mjx-math"><span id="MJXc-Node-714" class="mjx-mrow"><span id="MJXc-Node-715" class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.396em;"></span></span><span id="MJXc-Node-716" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em; padding-right: 0.078em;">J</span>