Skip to content

Instantly share code, notes, and snippets.

@kevinywlui
Last active September 19, 2019 06:21
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save kevinywlui/e414320fcc339c70c2d22735e4c5323a to your computer and use it in GitHub Desktop.
Save kevinywlui/e414320fcc339c70c2d22735e4c5323a to your computer and use it in GitHub Desktop.
<!DOCTYPE html>
<!-- saved from url=(0071)https://scottroy.github.io/implementing-a-neural-network-in-python.html -->
<html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>
Implementing a neural network in Python | statsandstuff
</title>
<meta name="viewport" content="width=device-width, initial-scale=1">
<link rel="stylesheet" href="./Implementing a neural network in Python _ statsandstuff_files/main.css">
<link rel="stylesheet" href="./Implementing a neural network in Python _ statsandstuff_files/syntax.css">
<!-- Use Atom -->
<link type="application/atom+xml" rel="alternate" href="https://scottroy.github.io/feed.xml" title="statsandstuff">
<!-- Use RSS-2.0 -->
<!--<link href="https://scottroy.github.io/rss-feed.xml" type="application/rss+xml" rel="alternate" title="statsandstuff | a blog on statistics and machine learning"/>
//-->
<link rel="stylesheet" href="./Implementing a neural network in Python _ statsandstuff_files/css">
<link rel="stylesheet" href="./Implementing a neural network in Python _ statsandstuff_files/css(1)">
<link rel="stylesheet" href="./Implementing a neural network in Python _ statsandstuff_files/css(2)">
<link rel="stylesheet" href="./Implementing a neural network in Python _ statsandstuff_files/font-awesome.min.css">
<script async="" src="./Implementing a neural network in Python _ statsandstuff_files/analytics.js"></script><script type="text/javascript" async="" src="./Implementing a neural network in Python _ statsandstuff_files/MathJax.js">
</script>
<!-- Google Analytics -->
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-135466463-1', 'auto');
ga('send', 'pageview');
</script>
<meta name="author" content="Scott Roy">
<meta property="og:locale" content="en_US">
<meta property="og:description" content="In this post, I walk through implementing a basic feed forward deep neural network in Python from scratch. See Introduction to neural networks for an overview of neural networks. The...">
<meta property="description" content="In this post, I walk through implementing a basic feed forward deep neural network in Python from scratch. See Introduction to neural networks for an overview of neural networks. The...">
<meta property="og:title" content="Implementing a neural network in Python">
<meta property="og:site_name" content="statsandstuff">
<meta property="og:type" content="article">
<meta property="og:url" content="https://scottroy.github.io/implementing-a-neural-network-in-python.html">
<meta property="og:image" content="https://scottroy.github.io/assets/img/backprop_prevoutput.png">
<meta property="og:image:secure_url" content="https://scottroy.github.io/assets/img/backprop_prevoutput.png">
<meta property="og:image:width" content="1200">
<meta property="og:image:height" content="630">
<style type="text/css">.MathJax_Hover_Frame {border-radius: .25em; -webkit-border-radius: .25em; -moz-border-radius: .25em; -khtml-border-radius: .25em; box-shadow: 0px 0px 15px #83A; -webkit-box-shadow: 0px 0px 15px #83A; -moz-box-shadow: 0px 0px 15px #83A; -khtml-box-shadow: 0px 0px 15px #83A; border: 1px solid #A6D ! important; display: inline-block; position: absolute}
.MathJax_Menu_Button .MathJax_Hover_Arrow {position: absolute; cursor: pointer; display: inline-block; border: 2px solid #AAA; border-radius: 4px; -webkit-border-radius: 4px; -moz-border-radius: 4px; -khtml-border-radius: 4px; font-family: 'Courier New',Courier; font-size: 9px; color: #F0F0F0}
.MathJax_Menu_Button .MathJax_Hover_Arrow span {display: block; background-color: #AAA; border: 1px solid; border-radius: 3px; line-height: 0; padding: 4px}
.MathJax_Hover_Arrow:hover {color: white!important; border: 2px solid #CCC!important}
.MathJax_Hover_Arrow:hover span {background-color: #CCC!important}
</style><style type="text/css">#MathJax_About {position: fixed; left: 50%; width: auto; text-align: center; border: 3px outset; padding: 1em 2em; background-color: #DDDDDD; color: black; cursor: default; font-family: message-box; font-size: 120%; font-style: normal; text-indent: 0; text-transform: none; line-height: normal; letter-spacing: normal; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; z-index: 201; border-radius: 15px; -webkit-border-radius: 15px; -moz-border-radius: 15px; -khtml-border-radius: 15px; box-shadow: 0px 10px 20px #808080; -webkit-box-shadow: 0px 10px 20px #808080; -moz-box-shadow: 0px 10px 20px #808080; -khtml-box-shadow: 0px 10px 20px #808080; filter: progid:DXImageTransform.Microsoft.dropshadow(OffX=2, OffY=2, Color='gray', Positive='true')}
#MathJax_About.MathJax_MousePost {outline: none}
.MathJax_Menu {position: absolute; background-color: white; color: black; width: auto; padding: 5px 0px; border: 1px solid #CCCCCC; margin: 0; cursor: default; font: menu; text-align: left; text-indent: 0; text-transform: none; line-height: normal; letter-spacing: normal; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; z-index: 201; border-radius: 5px; -webkit-border-radius: 5px; -moz-border-radius: 5px; -khtml-border-radius: 5px; box-shadow: 0px 10px 20px #808080; -webkit-box-shadow: 0px 10px 20px #808080; -moz-box-shadow: 0px 10px 20px #808080; -khtml-box-shadow: 0px 10px 20px #808080; filter: progid:DXImageTransform.Microsoft.dropshadow(OffX=2, OffY=2, Color='gray', Positive='true')}
.MathJax_MenuItem {padding: 1px 2em; background: transparent}
.MathJax_MenuArrow {position: absolute; right: .5em; padding-top: .25em; color: #666666; font-size: .75em}
.MathJax_MenuActive .MathJax_MenuArrow {color: white}
.MathJax_MenuArrow.RTL {left: .5em; right: auto}
.MathJax_MenuCheck {position: absolute; left: .7em}
.MathJax_MenuCheck.RTL {right: .7em; left: auto}
.MathJax_MenuRadioCheck {position: absolute; left: .7em}
.MathJax_MenuRadioCheck.RTL {right: .7em; left: auto}
.MathJax_MenuLabel {padding: 1px 2em 3px 1.33em; font-style: italic}
.MathJax_MenuRule {border-top: 1px solid #DDDDDD; margin: 4px 3px}
.MathJax_MenuDisabled {color: GrayText}
.MathJax_MenuActive {background-color: #606872; color: white}
.MathJax_MenuDisabled:focus, .MathJax_MenuLabel:focus {background-color: #E8E8E8}
.MathJax_ContextMenu:focus {outline: none}
.MathJax_ContextMenu .MathJax_MenuItem:focus {outline: none}
#MathJax_AboutClose {top: .2em; right: .2em}
.MathJax_Menu .MathJax_MenuClose {top: -10px; left: -10px}
.MathJax_MenuClose {position: absolute; cursor: pointer; display: inline-block; border: 2px solid #AAA; border-radius: 18px; -webkit-border-radius: 18px; -moz-border-radius: 18px; -khtml-border-radius: 18px; font-family: 'Courier New',Courier; font-size: 24px; color: #F0F0F0}
.MathJax_MenuClose span {display: block; background-color: #AAA; border: 1.5px solid; border-radius: 18px; -webkit-border-radius: 18px; -moz-border-radius: 18px; -khtml-border-radius: 18px; line-height: 0; padding: 8px 0 6px}
.MathJax_MenuClose:hover {color: white!important; border: 2px solid #CCC!important}
.MathJax_MenuClose:hover span {background-color: #CCC!important}
.MathJax_MenuClose:hover:focus {outline: none}
</style><style type="text/css">.MathJax_Preview .MJXf-math {color: inherit!important}
</style><style type="text/css">.MJX_Assistive_MathML {position: absolute!important; top: 0; left: 0; clip: rect(1px, 1px, 1px, 1px); padding: 1px 0 0 0!important; border: 0!important; height: 1px!important; width: 1px!important; overflow: hidden!important; display: block!important; -webkit-touch-callout: none; -webkit-user-select: none; -khtml-user-select: none; -moz-user-select: none; -ms-user-select: none; user-select: none}
.MJX_Assistive_MathML.MJX_Assistive_MathML_Block {width: 100%!important}
</style><style type="text/css">#MathJax_Zoom {position: absolute; background-color: #F0F0F0; overflow: auto; display: block; z-index: 301; padding: .5em; border: 1px solid black; margin: 0; font-weight: normal; font-style: normal; text-align: left; text-indent: 0; text-transform: none; line-height: normal; letter-spacing: normal; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; -webkit-box-sizing: content-box; -moz-box-sizing: content-box; box-sizing: content-box; box-shadow: 5px 5px 15px #AAAAAA; -webkit-box-shadow: 5px 5px 15px #AAAAAA; -moz-box-shadow: 5px 5px 15px #AAAAAA; -khtml-box-shadow: 5px 5px 15px #AAAAAA; filter: progid:DXImageTransform.Microsoft.dropshadow(OffX=2, OffY=2, Color='gray', Positive='true')}
#MathJax_ZoomOverlay {position: absolute; left: 0; top: 0; z-index: 300; display: inline-block; width: 100%; height: 100%; border: 0; padding: 0; margin: 0; background-color: white; opacity: 0; filter: alpha(opacity=0)}
#MathJax_ZoomFrame {position: relative; display: inline-block; height: 0; width: 0}
#MathJax_ZoomEventTrap {position: absolute; left: 0; top: 0; z-index: 302; display: inline-block; border: 0; padding: 0; margin: 0; background-color: white; opacity: 0; filter: alpha(opacity=0)}
</style><style type="text/css">.MathJax_Preview {color: #888}
#MathJax_Message {position: fixed; left: 1em; bottom: 1.5em; background-color: #E6E6E6; border: 1px solid #959595; margin: 0px; padding: 2px 8px; z-index: 102; color: black; font-size: 80%; width: auto; white-space: nowrap}
#MathJax_MSIE_Frame {position: absolute; top: 0; left: 0; width: 0px; z-index: 101; border: 0px; margin: 0px; padding: 0px}
.MathJax_Error {color: #CC0000; font-style: italic}
</style><style type="text/css">.MJXp-script {font-size: .8em}
.MJXp-right {-webkit-transform-origin: right; -moz-transform-origin: right; -ms-transform-origin: right; -o-transform-origin: right; transform-origin: right}
.MJXp-bold {font-weight: bold}
.MJXp-italic {font-style: italic}
.MJXp-scr {font-family: MathJax_Script,'Times New Roman',Times,STIXGeneral,serif}
.MJXp-frak {font-family: MathJax_Fraktur,'Times New Roman',Times,STIXGeneral,serif}
.MJXp-sf {font-family: MathJax_SansSerif,'Times New Roman',Times,STIXGeneral,serif}
.MJXp-cal {font-family: MathJax_Caligraphic,'Times New Roman',Times,STIXGeneral,serif}
.MJXp-mono {font-family: MathJax_Typewriter,'Times New Roman',Times,STIXGeneral,serif}
.MJXp-largeop {font-size: 150%}
.MJXp-largeop.MJXp-int {vertical-align: -.2em}
.MJXp-math {display: inline-block; line-height: 1.2; text-indent: 0; font-family: 'Times New Roman',Times,STIXGeneral,serif; white-space: nowrap; border-collapse: collapse}
.MJXp-display {display: block; text-align: center; margin: 1em 0}
.MJXp-math span {display: inline-block}
.MJXp-box {display: block!important; text-align: center}
.MJXp-box:after {content: " "}
.MJXp-rule {display: block!important; margin-top: .1em}
.MJXp-char {display: block!important}
.MJXp-mo {margin: 0 .15em}
.MJXp-mfrac {margin: 0 .125em; vertical-align: .25em}
.MJXp-denom {display: inline-table!important; width: 100%}
.MJXp-denom > * {display: table-row!important}
.MJXp-surd {vertical-align: top}
.MJXp-surd > * {display: block!important}
.MJXp-script-box > * {display: table!important; height: 50%}
.MJXp-script-box > * > * {display: table-cell!important; vertical-align: top}
.MJXp-script-box > *:last-child > * {vertical-align: bottom}
.MJXp-script-box > * > * > * {display: block!important}
.MJXp-mphantom {visibility: hidden}
.MJXp-munderover, .MJXp-munder {display: inline-table!important}
.MJXp-over {display: inline-block!important; text-align: center}
.MJXp-over > * {display: block!important}
.MJXp-munderover > *, .MJXp-munder > * {display: table-row!important}
.MJXp-mtable {vertical-align: .25em; margin: 0 .125em}
.MJXp-mtable > * {display: inline-table!important; vertical-align: middle}
.MJXp-mtr {display: table-row!important}
.MJXp-mtd {display: table-cell!important; text-align: center; padding: .5em 0 0 .5em}
.MJXp-mtr > .MJXp-mtd:first-child {padding-left: 0}
.MJXp-mtr:first-child > .MJXp-mtd {padding-top: 0}
.MJXp-mlabeledtr {display: table-row!important}
.MJXp-mlabeledtr > .MJXp-mtd:first-child {padding-left: 0}
.MJXp-mlabeledtr:first-child > .MJXp-mtd {padding-top: 0}
.MJXp-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 1px 3px; font-style: normal; font-size: 90%}
.MJXp-scale0 {-webkit-transform: scaleX(.0); -moz-transform: scaleX(.0); -ms-transform: scaleX(.0); -o-transform: scaleX(.0); transform: scaleX(.0)}
.MJXp-scale1 {-webkit-transform: scaleX(.1); -moz-transform: scaleX(.1); -ms-transform: scaleX(.1); -o-transform: scaleX(.1); transform: scaleX(.1)}
.MJXp-scale2 {-webkit-transform: scaleX(.2); -moz-transform: scaleX(.2); -ms-transform: scaleX(.2); -o-transform: scaleX(.2); transform: scaleX(.2)}
.MJXp-scale3 {-webkit-transform: scaleX(.3); -moz-transform: scaleX(.3); -ms-transform: scaleX(.3); -o-transform: scaleX(.3); transform: scaleX(.3)}
.MJXp-scale4 {-webkit-transform: scaleX(.4); -moz-transform: scaleX(.4); -ms-transform: scaleX(.4); -o-transform: scaleX(.4); transform: scaleX(.4)}
.MJXp-scale5 {-webkit-transform: scaleX(.5); -moz-transform: scaleX(.5); -ms-transform: scaleX(.5); -o-transform: scaleX(.5); transform: scaleX(.5)}
.MJXp-scale6 {-webkit-transform: scaleX(.6); -moz-transform: scaleX(.6); -ms-transform: scaleX(.6); -o-transform: scaleX(.6); transform: scaleX(.6)}
.MJXp-scale7 {-webkit-transform: scaleX(.7); -moz-transform: scaleX(.7); -ms-transform: scaleX(.7); -o-transform: scaleX(.7); transform: scaleX(.7)}
.MJXp-scale8 {-webkit-transform: scaleX(.8); -moz-transform: scaleX(.8); -ms-transform: scaleX(.8); -o-transform: scaleX(.8); transform: scaleX(.8)}
.MJXp-scale9 {-webkit-transform: scaleX(.9); -moz-transform: scaleX(.9); -ms-transform: scaleX(.9); -o-transform: scaleX(.9); transform: scaleX(.9)}
.MathJax_PHTML .noError {vertical-align: ; font-size: 90%; text-align: left; color: black; padding: 1px 3px; border: 1px solid}
</style><style type="text/css">.mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0}
.MJXc-display {display: block; text-align: center; margin: 1em 0; padding: 0}
.mjx-chtml[tabindex]:focus, body :focus .mjx-chtml[tabindex] {display: inline-table}
.mjx-full-width {text-align: center; display: table-cell!important; width: 10000em}
.mjx-math {display: inline-block; border-collapse: separate; border-spacing: 0}
.mjx-math * {display: inline-block; -webkit-box-sizing: content-box!important; -moz-box-sizing: content-box!important; box-sizing: content-box!important; text-align: left}
.mjx-numerator {display: block; text-align: center}
.mjx-denominator {display: block; text-align: center}
.MJXc-stacked {height: 0; position: relative}
.MJXc-stacked > * {position: absolute}
.MJXc-bevelled > * {display: inline-block}
.mjx-stack {display: inline-block}
.mjx-op {display: block}
.mjx-under {display: table-cell}
.mjx-over {display: block}
.mjx-over > * {padding-left: 0px!important; padding-right: 0px!important}
.mjx-under > * {padding-left: 0px!important; padding-right: 0px!important}
.mjx-stack > .mjx-sup {display: block}
.mjx-stack > .mjx-sub {display: block}
.mjx-prestack > .mjx-presup {display: block}
.mjx-prestack > .mjx-presub {display: block}
.mjx-delim-h > .mjx-char {display: inline-block}
.mjx-surd {vertical-align: top}
.mjx-mphantom * {visibility: hidden}
.mjx-merror {background-color: #FFFF88; color: #CC0000; border: 1px solid #CC0000; padding: 2px 3px; font-style: normal; font-size: 90%}
.mjx-annotation-xml {line-height: normal}
.mjx-menclose > svg {fill: none; stroke: currentColor}
.mjx-mtr {display: table-row}
.mjx-mlabeledtr {display: table-row}
.mjx-mtd {display: table-cell; text-align: center}
.mjx-label {display: table-row}
.mjx-box {display: inline-block}
.mjx-block {display: block}
.mjx-span {display: inline}
.mjx-char {display: block; white-space: pre}
.mjx-itable {display: inline-table; width: auto}
.mjx-row {display: table-row}
.mjx-cell {display: table-cell}
.mjx-table {display: table; width: 100%}
.mjx-line {display: block; height: 0}
.mjx-strut {width: 0; padding-top: 1em}
.mjx-vsize {width: 0}
.MJXc-space1 {margin-left: .167em}
.MJXc-space2 {margin-left: .222em}
.MJXc-space3 {margin-left: .278em}
.mjx-chartest {display: block; visibility: hidden; position: absolute; top: 0; line-height: normal; font-size: 500%}
.mjx-chartest .mjx-char {display: inline}
.mjx-chartest .mjx-box {padding-top: 1000px}
.MJXc-processing {visibility: hidden; position: fixed; width: 0; height: 0; overflow: hidden}
.MJXc-processed {display: none}
.mjx-test {font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; text-indent: 0; text-transform: none; letter-spacing: normal; word-spacing: normal; overflow: hidden; height: 1px}
.mjx-test.mjx-test-display {display: table!important}
.mjx-test.mjx-test-inline {display: inline!important; margin-right: -1px}
.mjx-test.mjx-test-default {display: block!important; clear: both}
.mjx-ex-box {display: inline-block!important; position: absolute; overflow: hidden; min-height: 0; max-height: none; padding: 0; border: 0; margin: 0; width: 1px; height: 60ex}
.mjx-test-inline .mjx-left-box {display: inline-block; width: 0; float: left}
.mjx-test-inline .mjx-right-box {display: inline-block; width: 0; float: right}
.mjx-test-display .mjx-right-box {display: table-cell!important; width: 10000em!important; min-width: 0; max-width: none; padding: 0; border: 0; margin: 0}
#MathJax_CHTML_Tooltip {background-color: InfoBackground; color: InfoText; border: 1px solid black; box-shadow: 2px 2px 5px #AAAAAA; -webkit-box-shadow: 2px 2px 5px #AAAAAA; -moz-box-shadow: 2px 2px 5px #AAAAAA; -khtml-box-shadow: 2px 2px 5px #AAAAAA; padding: 3px 4px; z-index: 401; position: absolute; left: 0; top: 0; width: auto; height: auto; display: none}
.mjx-chtml .mjx-noError {line-height: 1.2; vertical-align: ; font-size: 90%; text-align: left; color: black; padding: 1px 3px; border: 1px solid}
.MJXc-TeX-unknown-R {font-family: STIXGeneral,'Cambria Math','Arial Unicode MS',serif; font-style: normal; font-weight: normal}
.MJXc-TeX-unknown-I {font-family: STIXGeneral,'Cambria Math','Arial Unicode MS',serif; font-style: italic; font-weight: normal}
.MJXc-TeX-unknown-B {font-family: STIXGeneral,'Cambria Math','Arial Unicode MS',serif; font-style: normal; font-weight: bold}
.MJXc-TeX-unknown-BI {font-family: STIXGeneral,'Cambria Math','Arial Unicode MS',serif; font-style: italic; font-weight: bold}
.MJXc-TeX-ams-R {font-family: MJXc-TeX-ams-R,MJXc-TeX-ams-Rw}
.MJXc-TeX-cal-B {font-family: MJXc-TeX-cal-B,MJXc-TeX-cal-Bx,MJXc-TeX-cal-Bw}
.MJXc-TeX-frak-R {font-family: MJXc-TeX-frak-R,MJXc-TeX-frak-Rw}
.MJXc-TeX-frak-B {font-family: MJXc-TeX-frak-B,MJXc-TeX-frak-Bx,MJXc-TeX-frak-Bw}
.MJXc-TeX-math-BI {font-family: MJXc-TeX-math-BI,MJXc-TeX-math-BIx,MJXc-TeX-math-BIw}
.MJXc-TeX-sans-R {font-family: MJXc-TeX-sans-R,MJXc-TeX-sans-Rw}
.MJXc-TeX-sans-B {font-family: MJXc-TeX-sans-B,MJXc-TeX-sans-Bx,MJXc-TeX-sans-Bw}
.MJXc-TeX-sans-I {font-family: MJXc-TeX-sans-I,MJXc-TeX-sans-Ix,MJXc-TeX-sans-Iw}
.MJXc-TeX-script-R {font-family: MJXc-TeX-script-R,MJXc-TeX-script-Rw}
.MJXc-TeX-type-R {font-family: MJXc-TeX-type-R,MJXc-TeX-type-Rw}
.MJXc-TeX-cal-R {font-family: MJXc-TeX-cal-R,MJXc-TeX-cal-Rw}
.MJXc-TeX-main-B {font-family: MJXc-TeX-main-B,MJXc-TeX-main-Bx,MJXc-TeX-main-Bw}
.MJXc-TeX-main-I {font-family: MJXc-TeX-main-I,MJXc-TeX-main-Ix,MJXc-TeX-main-Iw}
.MJXc-TeX-main-R {font-family: MJXc-TeX-main-R,MJXc-TeX-main-Rw}
.MJXc-TeX-math-I {font-family: MJXc-TeX-math-I,MJXc-TeX-math-Ix,MJXc-TeX-math-Iw}
.MJXc-TeX-size1-R {font-family: MJXc-TeX-size1-R,MJXc-TeX-size1-Rw}
.MJXc-TeX-size2-R {font-family: MJXc-TeX-size2-R,MJXc-TeX-size2-Rw}
.MJXc-TeX-size3-R {font-family: MJXc-TeX-size3-R,MJXc-TeX-size3-Rw}
.MJXc-TeX-size4-R {font-family: MJXc-TeX-size4-R,MJXc-TeX-size4-Rw}
.MJXc-TeX-vec-R {font-family: MJXc-TeX-vec-R,MJXc-TeX-vec-Rw}
.MJXc-TeX-vec-B {font-family: MJXc-TeX-vec-B,MJXc-TeX-vec-Bx,MJXc-TeX-vec-Bw}
@font-face {font-family: MJXc-TeX-ams-R; src: local('MathJax_AMS'), local('MathJax_AMS-Regular')}
@font-face {font-family: MJXc-TeX-ams-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_AMS-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_AMS-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_AMS-Regular.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-cal-B; src: local('MathJax_Caligraphic Bold'), local('MathJax_Caligraphic-Bold')}
@font-face {font-family: MJXc-TeX-cal-Bx; src: local('MathJax_Caligraphic'); font-weight: bold}
@font-face {font-family: MJXc-TeX-cal-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Bold.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-frak-R; src: local('MathJax_Fraktur'), local('MathJax_Fraktur-Regular')}
@font-face {font-family: MJXc-TeX-frak-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Regular.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-frak-B; src: local('MathJax_Fraktur Bold'), local('MathJax_Fraktur-Bold')}
@font-face {font-family: MJXc-TeX-frak-Bx; src: local('MathJax_Fraktur'); font-weight: bold}
@font-face {font-family: MJXc-TeX-frak-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Fraktur-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Fraktur-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Fraktur-Bold.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-math-BI; src: local('MathJax_Math BoldItalic'), local('MathJax_Math-BoldItalic')}
@font-face {font-family: MJXc-TeX-math-BIx; src: local('MathJax_Math'); font-weight: bold; font-style: italic}
@font-face {font-family: MJXc-TeX-math-BIw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Math-BoldItalic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Math-BoldItalic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Math-BoldItalic.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-sans-R; src: local('MathJax_SansSerif'), local('MathJax_SansSerif-Regular')}
@font-face {font-family: MJXc-TeX-sans-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Regular.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-sans-B; src: local('MathJax_SansSerif Bold'), local('MathJax_SansSerif-Bold')}
@font-face {font-family: MJXc-TeX-sans-Bx; src: local('MathJax_SansSerif'); font-weight: bold}
@font-face {font-family: MJXc-TeX-sans-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Bold.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-sans-I; src: local('MathJax_SansSerif Italic'), local('MathJax_SansSerif-Italic')}
@font-face {font-family: MJXc-TeX-sans-Ix; src: local('MathJax_SansSerif'); font-style: italic}
@font-face {font-family: MJXc-TeX-sans-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_SansSerif-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_SansSerif-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_SansSerif-Italic.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-script-R; src: local('MathJax_Script'), local('MathJax_Script-Regular')}
@font-face {font-family: MJXc-TeX-script-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Script-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Script-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Script-Regular.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-type-R; src: local('MathJax_Typewriter'), local('MathJax_Typewriter-Regular')}
@font-face {font-family: MJXc-TeX-type-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Typewriter-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Typewriter-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Typewriter-Regular.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-cal-R; src: local('MathJax_Caligraphic'), local('MathJax_Caligraphic-Regular')}
@font-face {font-family: MJXc-TeX-cal-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Caligraphic-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Caligraphic-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Caligraphic-Regular.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-main-B; src: local('MathJax_Main Bold'), local('MathJax_Main-Bold')}
@font-face {font-family: MJXc-TeX-main-Bx; src: local('MathJax_Main'); font-weight: bold}
@font-face {font-family: MJXc-TeX-main-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Main-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Main-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Main-Bold.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-main-I; src: local('MathJax_Main Italic'), local('MathJax_Main-Italic')}
@font-face {font-family: MJXc-TeX-main-Ix; src: local('MathJax_Main'); font-style: italic}
@font-face {font-family: MJXc-TeX-main-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Main-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Main-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Main-Italic.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-main-R; src: local('MathJax_Main'), local('MathJax_Main-Regular')}
@font-face {font-family: MJXc-TeX-main-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Main-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Main-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Main-Regular.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-math-I; src: local('MathJax_Math Italic'), local('MathJax_Math-Italic')}
@font-face {font-family: MJXc-TeX-math-Ix; src: local('MathJax_Math'); font-style: italic}
@font-face {font-family: MJXc-TeX-math-Iw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Math-Italic.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Math-Italic.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Math-Italic.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-size1-R; src: local('MathJax_Size1'), local('MathJax_Size1-Regular')}
@font-face {font-family: MJXc-TeX-size1-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Size1-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Size1-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Size1-Regular.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-size2-R; src: local('MathJax_Size2'), local('MathJax_Size2-Regular')}
@font-face {font-family: MJXc-TeX-size2-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Size2-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Size2-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Size2-Regular.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-size3-R; src: local('MathJax_Size3'), local('MathJax_Size3-Regular')}
@font-face {font-family: MJXc-TeX-size3-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Size3-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Size3-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Size3-Regular.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-size4-R; src: local('MathJax_Size4'), local('MathJax_Size4-Regular')}
@font-face {font-family: MJXc-TeX-size4-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Size4-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Size4-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Size4-Regular.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-vec-R; src: local('MathJax_Vector'), local('MathJax_Vector-Regular')}
@font-face {font-family: MJXc-TeX-vec-Rw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Regular.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Regular.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Regular.otf') format('opentype')}
@font-face {font-family: MJXc-TeX-vec-B; src: local('MathJax_Vector Bold'), local('MathJax_Vector-Bold')}
@font-face {font-family: MJXc-TeX-vec-Bx; src: local('MathJax_Vector'); font-weight: bold}
@font-face {font-family: MJXc-TeX-vec-Bw; src /*1*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/eot/MathJax_Vector-Bold.eot'); src /*2*/: url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/woff/MathJax_Vector-Bold.woff') format('woff'), url('https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/fonts/HTML-CSS/TeX/otf/MathJax_Vector-Bold.otf') format('opentype')}
</style></head>
<body><div id="MathJax_Message" style="">File failed to load: https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/jax/element/mml/optable/BasicLatin.js</div>
<div class="container">
<header class="masthead">
<h3 class="masthead-title">
<a href="https://scottroy.github.io/">statsandstuff</a>
<small class="masthead-subtitle">a blog on statistics and machine learning</small>
<div class="menu">
<nav class="menu-content">
<a href="https://scottroy.github.io/menu/about.html">About</a>
<a href="https://scottroy.github.io/menu/writing.html">Writing</a>
<a href="https://scottroy.github.io/menu/contact.html">Contact</a>
</nav>
<nav class="social-icons">
<a href="https://www.github.com/scottroy" target="_blank"><i class="fa fa-github" aria-hidden="true"></i></a>
<a href="https://www.linkedin.com/in/scott-roy/" target="_blank"><i class="fa fa-linkedin" aria-hidden="true"></i></a>
<a href="mailto:scott.michael.roy@gmail.com" target="_blank"><i class="fa fa-envelope" aria-hidden="true"></i></a>
<a href="https://scottroy.github.io/feed.xml"><i class="fa fa-rss-square" aria-hidden="true"></i></a>
</nav>
</div>
</h3>
</header>
<div class="post-container">
<h1>
Implementing a neural network in Python
</h1>
<img src="./Implementing a neural network in Python _ statsandstuff_files/backprop_prevoutput.png">
<p>In this post, I walk through implementing a basic feed forward deep neural network in Python from scratch. See <a href="https://scottroy.github.io/introduction-to-neural-networks.html">Introduction to neural networks</a> for an overview of neural networks.</p>
<p>The post is organized as follows:</p>
<ul>
<li>Predictive modeling overview</li>
<li>Training DNNs
<ul>
<li>Stochastic gradient descent</li>
<li>Forward propagation</li>
<li>Back propagation</li>
</ul>
</li>
<li>Code</li>
</ul>
<p>The <a href="https://scottroy.github.io/implementing-a-neural-network-in-python.html#predictive-modeling-overview">Predictive modeling overview</a> section discusses predictive modeling in general and how predictive models are fit. Deep neural networks are a type of predictive model and are fit like other predictive models. The section <a href="https://scottroy.github.io/implementing-a-neural-network-in-python.html#training-dnns">Training DNNs</a> goes over computing derivatives of the loss function with respect to a DNN’s parameters. Finally the code is given in section <a href="https://scottroy.github.io/implementing-a-neural-network-in-python.html#code">Code</a>.</p>
<h2 id="predictive-modeling-overview">Predictive modeling overview</h2>
<p>A DNN is a type of <em>predictive model</em> and so before we discuss training DNNs in particular, let’s briefly go over what predictive models are and how they are fit. The basic task in predictive modelling is given data <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-1"><span class="MJXp-mo" id="MJXp-Span-2" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-msubsup" id="MJXp-Span-3"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-4" style="margin-right: 0.05em;">x</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-5" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-6">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-7">i</span><span class="MJXp-mo" id="MJXp-Span-8">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-9" style="margin-left: 0em; margin-right: 0.222em;">,</span><span class="MJXp-msubsup" id="MJXp-Span-10"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-11" style="margin-right: 0.05em;">y</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-12" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-13">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-14">i</span><span class="MJXp-mo" id="MJXp-Span-15">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-16" style="margin-left: 0em; margin-right: 0em;">)</span></span></span><span id="MathJax-Element-1-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-1" class="mjx-math"><span id="MJXc-Node-2" class="mjx-mrow"><span id="MJXc-Node-3" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-4" class="mjx-msubsup"><span class="mjx-base"><span id="MJXc-Node-5" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.297em;">x</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span id="MJXc-Node-6" class="mjx-texatom" style=""><span id="MJXc-Node-7" class="mjx-mrow"><span id="MJXc-Node-8" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-9" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">i</span></span><span id="MJXc-Node-10" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><span id="MJXc-Node-11" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.145em; padding-bottom: 0.543em;">,</span></span><span id="MJXc-Node-12" class="mjx-msubsup MJXc-space1"><span class="mjx-base" style="margin-right: -0.006em;"><span id="MJXc-Node-13" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.494em; padding-right: 0.006em;">y</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0.082em; padding-right: 0.071em;"><span id="MJXc-Node-14" class="mjx-texatom" style=""><span id="MJXc-Node-15" class="mjx-mrow"><span id="MJXc-Node-16" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-17" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">i</span></span><span id="MJXc-Node-18" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><span id="MJXc-Node-19" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span><script type="math/tex" id="MathJax-Element-1">(x^{(i)}, y^{(i)})</script> consisting of <em>features</em> <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-17"><span class="MJXp-msubsup" id="MJXp-Span-18"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-19" style="margin-right: 0.05em;">x</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-20" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-21">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-22">i</span><span class="MJXp-mo" id="MJXp-Span-23">)</span></span></span></span></span><span id="MathJax-Element-2-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-20" class="mjx-math"><span id="MJXc-Node-21" class="mjx-mrow"><span id="MJXc-Node-22" class="mjx-msubsup"><span class="mjx-base"><span id="MJXc-Node-23" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.297em;">x</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span id="MJXc-Node-24" class="mjx-texatom" style=""><span id="MJXc-Node-25" class="mjx-mrow"><span id="MJXc-Node-26" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-27" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">i</span></span><span id="MJXc-Node-28" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span></span></span></span><script type="math/tex" id="MathJax-Element-2">x^{(i)}</script> and <em>labels</em> <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-24"><span class="MJXp-msubsup" id="MJXp-Span-25"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-26" style="margin-right: 0.05em;">y</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-27" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-28">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-29">i</span><span class="MJXp-mo" id="MJXp-Span-30">)</span></span></span></span></span><span id="MathJax-Element-3-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-29" class="mjx-math"><span id="MJXc-Node-30" class="mjx-mrow"><span id="MJXc-Node-31" class="mjx-msubsup"><span class="mjx-base" style="margin-right: -0.006em;"><span id="MJXc-Node-32" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.494em; padding-right: 0.006em;">y</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0.082em; padding-right: 0.071em;"><span id="MJXc-Node-33" class="mjx-texatom" style=""><span id="MJXc-Node-34" class="mjx-mrow"><span id="MJXc-Node-35" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-36" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">i</span></span><span id="MJXc-Node-37" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span></span></span></span><script type="math/tex" id="MathJax-Element-3">y^{(i)}</script>, ‘‘learn’’ a model function <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-31"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-32">f</span></span></span><span id="MathJax-Element-4-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-38" class="mjx-math"><span id="MJXc-Node-39" class="mjx-mrow"><span id="MJXc-Node-40" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.494em; padding-right: 0.06em;">f</span></span></span></span></span><script type="math/tex" id="MathJax-Element-4">f</script> such that <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-33"><span class="MJXp-msubsup" id="MJXp-Span-34"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-35" style="margin-right: 0.05em;">y</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-36" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-37">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-38">i</span><span class="MJXp-mo" id="MJXp-Span-39">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-40" style="margin-left: 0.333em; margin-right: 0.333em;">≈</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-41">f</span><span class="MJXp-mo" id="MJXp-Span-42" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-msubsup" id="MJXp-Span-43"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-44" style="margin-right: 0.05em;">x</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-45" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-46">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-47">i</span><span class="MJXp-mo" id="MJXp-Span-48">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-49" style="margin-left: 0em; margin-right: 0em;">)</span></span></span><span id="MathJax-Element-5-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-41" class="mjx-math"><span id="MJXc-Node-42" class="mjx-mrow"><span id="MJXc-Node-43" class="mjx-msubsup"><span class="mjx-base" style="margin-right: -0.006em;"><span id="MJXc-Node-44" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.494em; padding-right: 0.006em;">y</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0.082em; padding-right: 0.071em;"><span id="MJXc-Node-45" class="mjx-texatom" style=""><span id="MJXc-Node-46" class="mjx-mrow"><span id="MJXc-Node-47" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-48" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">i</span></span><span id="MJXc-Node-49" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><span id="MJXc-Node-50" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.199em; padding-bottom: 0.297em;">≈</span></span><span id="MJXc-Node-51" class="mjx-mi MJXc-space3"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.494em; padding-right: 0.06em;">f</span></span><span id="MJXc-Node-52" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-53" class="mjx-msubsup"><span class="mjx-base"><span id="MJXc-Node-54" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.297em;">x</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span id="MJXc-Node-55" class="mjx-texatom" style=""><span id="MJXc-Node-56" class="mjx-mrow"><span id="MJXc-Node-57" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-58" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">i</span></span><span id="MJXc-Node-59" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><span id="MJXc-Node-60" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span><script type="math/tex" id="MathJax-Element-5">y^{(i)} \approx f(x^{(i)})</script>. More precisely, we want the model that “best” satisfies <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-50"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-51">f</span><span class="MJXp-mo" id="MJXp-Span-52" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-msubsup" id="MJXp-Span-53"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-54" style="margin-right: 0.05em;">x</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-55" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-56">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-57">i</span><span class="MJXp-mo" id="MJXp-Span-58">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-59" style="margin-left: 0em; margin-right: 0em;">)</span><span class="MJXp-mo" id="MJXp-Span-60" style="margin-left: 0.333em; margin-right: 0.333em;">≈</span><span class="MJXp-msubsup" id="MJXp-Span-61"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-62" style="margin-right: 0.05em;">y</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-63" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-64">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-65">i</span><span class="MJXp-mo" id="MJXp-Span-66">)</span></span></span></span></span><span id="MathJax-Element-6-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-61" class="mjx-math"><span id="MJXc-Node-62" class="mjx-mrow"><span id="MJXc-Node-63" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.494em; padding-right: 0.06em;">f</span></span><span id="MJXc-Node-64" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-65" class="mjx-msubsup"><span class="mjx-base"><span id="MJXc-Node-66" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.297em;">x</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span id="MJXc-Node-67" class="mjx-texatom" style=""><span id="MJXc-Node-68" class="mjx-mrow"><span id="MJXc-Node-69" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-70" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">i</span></span><span id="MJXc-Node-71" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><span id="MJXc-Node-72" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span><span id="MJXc-Node-73" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.199em; padding-bottom: 0.297em;">≈</span></span><span id="MJXc-Node-74" class="mjx-msubsup MJXc-space3"><span class="mjx-base" style="margin-right: -0.006em;"><span id="MJXc-Node-75" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.494em; padding-right: 0.006em;">y</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0.082em; padding-right: 0.071em;"><span id="MJXc-Node-76" class="mjx-texatom" style=""><span id="MJXc-Node-77" class="mjx-mrow"><span id="MJXc-Node-78" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-79" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">i</span></span><span id="MJXc-Node-80" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span></span></span></span><script type="math/tex" id="MathJax-Element-6">f(x^{(i)}) \approx y^{(i)}</script> for all training data <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-67"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-68">i</span><span class="MJXp-mo" id="MJXp-Span-69" style="margin-left: 0.333em; margin-right: 0.333em;">∈</span><span class="MJXp-mo" id="MJXp-Span-70" style="margin-left: 0em; margin-right: 0em;">{</span><span class="MJXp-mn" id="MJXp-Span-71">1</span><span class="MJXp-mo" id="MJXp-Span-72" style="margin-left: 0em; margin-right: 0.222em;">,</span><span class="MJXp-mo" id="MJXp-Span-73" style="margin-left: 0em; margin-right: 0em;">…</span><span class="MJXp-mo" id="MJXp-Span-74" style="margin-left: 0em; margin-right: 0.222em;">,</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-75">N</span><span class="MJXp-mo" id="MJXp-Span-76" style="margin-left: 0em; margin-right: 0em;">}</span></span></span><span id="MathJax-Element-7-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-81" class="mjx-math"><span id="MJXc-Node-82" class="mjx-mrow"><span id="MJXc-Node-83" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">i</span></span><span id="MJXc-Node-84" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.248em; padding-bottom: 0.396em;">∈</span></span><span id="MJXc-Node-85" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">{</span></span><span id="MJXc-Node-86" class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">1</span></span><span id="MJXc-Node-87" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.145em; padding-bottom: 0.543em;">,</span></span><span id="MJXc-Node-88" class="mjx-mo MJXc-space1"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.145em; padding-bottom: 0.347em;">…</span></span><span id="MJXc-Node-89" class="mjx-mo MJXc-space1"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.145em; padding-bottom: 0.543em;">,</span></span><span id="MJXc-Node-90" class="mjx-mi MJXc-space1"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em; padding-right: 0.085em;">N</span></span><span id="MJXc-Node-91" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">}</span></span></span></span></span><script type="math/tex" id="MathJax-Element-7">i \in \{1, \ldots, N\}</script>, where best is defined with respect to a <em>loss function</em>. For each mistake where <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-77"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-78">f</span><span class="MJXp-mo" id="MJXp-Span-79" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-msubsup" id="MJXp-Span-80"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-81" style="margin-right: 0.05em;">x</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-82" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-83">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-84">i</span><span class="MJXp-mo" id="MJXp-Span-85">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-86" style="margin-left: 0em; margin-right: 0em;">)</span></span></span><span id="MathJax-Element-8-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-92" class="mjx-math"><span id="MJXc-Node-93" class="mjx-mrow"><span id="MJXc-Node-94" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.494em; padding-right: 0.06em;">f</span></span><span id="MJXc-Node-95" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-96" class="mjx-msubsup"><span class="mjx-base"><span id="MJXc-Node-97" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.297em;">x</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span id="MJXc-Node-98" class="mjx-texatom" style=""><span id="MJXc-Node-99" class="mjx-mrow"><span id="MJXc-Node-100" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-101" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">i</span></span><span id="MJXc-Node-102" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><span id="MJXc-Node-103" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span><script type="math/tex" id="MathJax-Element-8">f(x^{(i)})</script> is not <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-87"><span class="MJXp-msubsup" id="MJXp-Span-88"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-89" style="margin-right: 0.05em;">y</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-90" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-91">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-92">i</span><span class="MJXp-mo" id="MJXp-Span-93">)</span></span></span></span></span><span id="MathJax-Element-9-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-104" class="mjx-math"><span id="MJXc-Node-105" class="mjx-mrow"><span id="MJXc-Node-106" class="mjx-msubsup"><span class="mjx-base" style="margin-right: -0.006em;"><span id="MJXc-Node-107" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.494em; padding-right: 0.006em;">y</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0.082em; padding-right: 0.071em;"><span id="MJXc-Node-108" class="mjx-texatom" style=""><span id="MJXc-Node-109" class="mjx-mrow"><span id="MJXc-Node-110" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-111" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">i</span></span><span id="MJXc-Node-112" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span></span></span></span><script type="math/tex" id="MathJax-Element-9">y^{(i)}</script>, some loss <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-94"><span class="MJXp-msubsup" id="MJXp-Span-95"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-96" style="margin-right: 0.05em;">ℓ</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-97" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-98">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-99">i</span><span class="MJXp-mo" id="MJXp-Span-100">)</span></span></span></span></span><span id="MathJax-Element-10-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-113" class="mjx-math"><span id="MJXc-Node-114" class="mjx-mrow"><span id="MJXc-Node-115" class="mjx-msubsup"><span class="mjx-base"><span id="MJXc-Node-116" class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">ℓ</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span id="MJXc-Node-117" class="mjx-texatom" style=""><span id="MJXc-Node-118" class="mjx-mrow"><span id="MJXc-Node-119" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-120" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">i</span></span><span id="MJXc-Node-121" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span></span></span></span><script type="math/tex" id="MathJax-Element-10">\ell^{(i)}</script> is incurred, e.g., <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-101"><span class="MJXp-msubsup" id="MJXp-Span-102"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-103" style="margin-right: 0.05em;">ℓ</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-104" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-105">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-106">i</span><span class="MJXp-mo" id="MJXp-Span-107">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-108" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-mo" id="MJXp-Span-109" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-110">f</span><span class="MJXp-mo" id="MJXp-Span-111" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-msubsup" id="MJXp-Span-112"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-113" style="margin-right: 0.05em;">x</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-114" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-115">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-116">i</span><span class="MJXp-mo" id="MJXp-Span-117">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-118" style="margin-left: 0em; margin-right: 0em;">)</span><span class="MJXp-mo" id="MJXp-Span-119" style="margin-left: 0.267em; margin-right: 0.267em;">−</span><span class="MJXp-msubsup" id="MJXp-Span-120"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-121" style="margin-right: 0.05em;">y</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-122" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-123">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-124">i</span><span class="MJXp-mo" id="MJXp-Span-125">)</span></span></span><span class="MJXp-msubsup" id="MJXp-Span-126"><span class="MJXp-mo" id="MJXp-Span-127" style="margin-left: 0em; margin-right: 0.05em;">)</span><span class="MJXp-mn MJXp-script" id="MJXp-Span-128" style="vertical-align: 0.5em;">2</span></span></span></span><span id="MathJax-Element-11-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-122" class="mjx-math"><span id="MJXc-Node-123" class="mjx-mrow"><span id="MJXc-Node-124" class="mjx-msubsup"><span class="mjx-base"><span id="MJXc-Node-125" class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">ℓ</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span id="MJXc-Node-126" class="mjx-texatom" style=""><span id="MJXc-Node-127" class="mjx-mrow"><span id="MJXc-Node-128" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-129" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">i</span></span><span id="MJXc-Node-130" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><span id="MJXc-Node-131" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.101em; padding-bottom: 0.297em;">=</span></span><span id="MJXc-Node-132" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-133" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.494em; padding-right: 0.06em;">f</span></span><span id="MJXc-Node-134" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-135" class="mjx-msubsup"><span class="mjx-base"><span id="MJXc-Node-136" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.297em;">x</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span id="MJXc-Node-137" class="mjx-texatom" style=""><span id="MJXc-Node-138" class="mjx-mrow"><span id="MJXc-Node-139" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-140" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">i</span></span><span id="MJXc-Node-141" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><span id="MJXc-Node-142" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span><span id="MJXc-Node-143" class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.297em; padding-bottom: 0.445em;">−</span></span><span id="MJXc-Node-144" class="mjx-msubsup MJXc-space2"><span class="mjx-base" style="margin-right: -0.006em;"><span id="MJXc-Node-145" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.494em; padding-right: 0.006em;">y</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0.082em; padding-right: 0.071em;"><span id="MJXc-Node-146" class="mjx-texatom" style=""><span id="MJXc-Node-147" class="mjx-mrow"><span id="MJXc-Node-148" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-149" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">i</span></span><span id="MJXc-Node-150" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><span id="MJXc-Node-151" class="mjx-msubsup"><span class="mjx-base"><span id="MJXc-Node-152" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span id="MJXc-Node-153" class="mjx-mn" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">2</span></span></span></span></span></span></span><script type="math/tex" id="MathJax-Element-11">\ell^{(i)} = ( f(x^{(i)}) - y^{(i)} )^2</script> might be the square error. The average loss on the dataset is</p>
<span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math MJXp-display" id="MJXp-Span-129"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-130">ℓ</span><span class="MJXp-mo" id="MJXp-Span-131" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-mo" id="MJXp-Span-132" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-mn" id="MJXp-Span-133">1</span><span class="MJXp-mrow" id="MJXp-Span-134"><span class="MJXp-mo" id="MJXp-Span-135" style="margin-left: 0.111em; margin-right: 0.111em;">/</span></span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-136">N</span><span class="MJXp-mo" id="MJXp-Span-137" style="margin-left: 0em; margin-right: 0em;">)</span><span class="MJXp-mo" id="MJXp-Span-138" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-msubsup" id="MJXp-Span-139"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-140" style="margin-right: 0.05em;">ℓ</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-141" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-142">(</span><span class="MJXp-mn" id="MJXp-Span-143">1</span><span class="MJXp-mo" id="MJXp-Span-144">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-145" style="margin-left: 0.267em; margin-right: 0.267em;">+</span><span class="MJXp-msubsup" id="MJXp-Span-146"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-147" style="margin-right: 0.05em;">ℓ</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-148" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-149">(</span><span class="MJXp-mn" id="MJXp-Span-150">2</span><span class="MJXp-mo" id="MJXp-Span-151">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-152" style="margin-left: 0.267em; margin-right: 0.267em;">+</span><span class="MJXp-mo" id="MJXp-Span-153" style="margin-left: 0em; margin-right: 0em;">…</span><span class="MJXp-mo" id="MJXp-Span-154" style="margin-left: 0.267em; margin-right: 0.267em;">+</span><span class="MJXp-msubsup" id="MJXp-Span-155"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-156" style="margin-right: 0.05em;">ℓ</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-157" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-158">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-159">N</span><span class="MJXp-mo" id="MJXp-Span-160">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-161" style="margin-left: 0em; margin-right: 0em;">)</span><span class="MJXp-mo" id="MJXp-Span-162" style="margin-left: 0em; margin-right: 0.222em;">.</span></span></span><span class="mjx-chtml MJXc-display MJXc-processed" style="text-align: center;"><span id="MathJax-Element-12-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 113%; text-align: center;"><span id="MJXc-Node-154" class="mjx-math"><span id="MJXc-Node-155" class="mjx-mrow"><span id="MJXc-Node-156" class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">ℓ</span></span><span id="MJXc-Node-157" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.101em; padding-bottom: 0.297em;">=</span></span><span id="MJXc-Node-158" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-159" class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">1</span></span><span id="MJXc-Node-160" class="mjx-texatom"><span id="MJXc-Node-161" class="mjx-mrow"><span id="MJXc-Node-162" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">/</span></span></span></span><span id="MJXc-Node-163" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em; padding-right: 0.085em;">N</span></span><span id="MJXc-Node-164" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span><span id="MJXc-Node-165" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-166" class="mjx-msubsup"><span class="mjx-base"><span id="MJXc-Node-167" class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">ℓ</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.584em; padding-left: 0px; padding-right: 0.071em;"><span id="MJXc-Node-168" class="mjx-texatom" style=""><span id="MJXc-Node-169" class="mjx-mrow"><span id="MJXc-Node-170" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-171" class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">1</span></span><span id="MJXc-Node-172" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><span id="MJXc-Node-173" class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.297em; padding-bottom: 0.445em;">+</span></span><span id="MJXc-Node-174" class="mjx-msubsup MJXc-space2"><span class="mjx-base"><span id="MJXc-Node-175" class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">ℓ</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.584em; padding-left: 0px; padding-right: 0.071em;"><span id="MJXc-Node-176" class="mjx-texatom" style=""><span id="MJXc-Node-177" class="mjx-mrow"><span id="MJXc-Node-178" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-179" class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">2</span></span><span id="MJXc-Node-180" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><span id="MJXc-Node-181" class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.297em; padding-bottom: 0.445em;">+</span></span><span id="MJXc-Node-182" class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.145em; padding-bottom: 0.347em;">…</span></span><span id="MJXc-Node-183" class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.297em; padding-bottom: 0.445em;">+</span></span><span id="MJXc-Node-184" class="mjx-msubsup MJXc-space2"><span class="mjx-base"><span id="MJXc-Node-185" class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">ℓ</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.584em; padding-left: 0px; padding-right: 0.071em;"><span id="MJXc-Node-186" class="mjx-texatom" style=""><span id="MJXc-Node-187" class="mjx-mrow"><span id="MJXc-Node-188" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-189" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em; padding-right: 0.085em;">N</span></span><span id="MJXc-Node-190" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><span id="MJXc-Node-191" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span><span id="MJXc-Node-192" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.145em; padding-bottom: 0.347em;">.</span></span></span></span></span></span><script type="math/tex; mode=display" id="MathJax-Element-12">\ell = (1 / N) (\ell^{(1)} + \ell^{(2)} + \ldots + \ell^{(N)}).</script>
<p>Minimizing average loss on a <em>particular</em> dataset is usually not the goal (in fact, we can achieve zero loss by just “memorizing” the dataset). What we really care about solving is</p>
<span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math MJXp-display" id="MJXp-Span-166"><span class="MJXp-munderover" id="MJXp-Span-167"><span class=""><span class="MJXp-mo" id="MJXp-Span-168" style="margin-left: 0.333em; margin-right: 0.333em;">min</span></span><span class=" MJXp-script"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-169" style="margin-left: 0px;">f</span></span></span><span class="MJXp-mtext" id="MJXp-Span-170">&nbsp;</span><span class="MJXp-msubsup" id="MJXp-Span-171"><span class="MJXp-mrow" id="MJXp-Span-172" style="margin-right: 0.05em;"><span class="MJXp-mtext MJXp-bold" id="MJXp-Span-173">E</span></span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-174" style="vertical-align: -0.4em;"><span class="MJXp-mo" id="MJXp-Span-175">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-176">x</span><span class="MJXp-mo" id="MJXp-Span-177">,</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-178">y</span><span class="MJXp-mo" id="MJXp-Span-179">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-180" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-181">ℓ</span><span class="MJXp-mo" id="MJXp-Span-182" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-183">f</span><span class="MJXp-mo" id="MJXp-Span-184" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-185">x</span><span class="MJXp-mo" id="MJXp-Span-186" style="margin-left: 0em; margin-right: 0em;">)</span><span class="MJXp-mo" id="MJXp-Span-187" style="margin-left: 0em; margin-right: 0.222em;">,</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-188">y</span><span class="MJXp-mo" id="MJXp-Span-189" style="margin-left: 0em; margin-right: 0em;">)</span><span class="MJXp-mo" id="MJXp-Span-190" style="margin-left: 0em; margin-right: 0em;">)</span><span class="MJXp-mo" id="MJXp-Span-191" style="margin-left: 0em; margin-right: 0.222em;">,</span></span></span><span class="mjx-chtml MJXc-display MJXc-processed" style="text-align: center;"><span id="MathJax-Element-13-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 113%; text-align: center;"><span id="MJXc-Node-193" class="mjx-math"><span id="MJXc-Node-194" class="mjx-mrow"><span id="MJXc-Node-195" class="mjx-munderover"><span class="mjx-itable"><span class="mjx-row"><span class="mjx-cell"><span class="mjx-op"><span id="MJXc-Node-196" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">min</span></span></span></span></span><span class="mjx-row"><span class="mjx-under" style="font-size: 70.7%; padding-top: 0.236em; padding-bottom: 0.141em; padding-left: 0.904em;"><span id="MJXc-Node-197" class="mjx-mi" style=""><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.494em; padding-right: 0.06em;">f</span></span></span></span></span></span><span id="MJXc-Node-198" class="mjx-mtext MJXc-space1"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.293em; padding-bottom: 0.347em;">&nbsp;</span></span><span id="MJXc-Node-199" class="mjx-msubsup"><span class="mjx-base"><span id="MJXc-Node-200" class="mjx-texatom"><span id="MJXc-Node-201" class="mjx-mrow"><span id="MJXc-Node-202" class="mjx-mtext"><span class="mjx-char MJXc-TeX-main-B" style="padding-top: 0.347em; padding-bottom: 0.347em;">E</span></span></span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.275em; padding-right: 0.071em;"><span id="MJXc-Node-203" class="mjx-texatom" style=""><span id="MJXc-Node-204" class="mjx-mrow"><span id="MJXc-Node-205" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-206" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.297em;">x</span></span><span id="MJXc-Node-207" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.145em; padding-bottom: 0.543em;">,</span></span><span id="MJXc-Node-208" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.494em; padding-right: 0.006em;">y</span></span><span id="MJXc-Node-209" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><span id="MJXc-Node-210" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-211" class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">ℓ</span></span><span id="MJXc-Node-212" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-213" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.494em; padding-right: 0.06em;">f</span></span><span id="MJXc-Node-214" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-215" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.297em;">x</span></span><span id="MJXc-Node-216" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span><span id="MJXc-Node-217" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.145em; padding-bottom: 0.543em;">,</span></span><span id="MJXc-Node-218" class="mjx-mi MJXc-space1"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.494em; padding-right: 0.006em;">y</span></span><span id="MJXc-Node-219" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span><span id="MJXc-Node-220" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span><span id="MJXc-Node-221" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.145em; padding-bottom: 0.543em;">,</span></span></span></span></span></span><script type="math/tex; mode=display" id="MathJax-Element-13">\min_f \ \textbf{E}_{(x,y)}(\ell(f(x), y)),</script>
<p>where the expectation is taken over the data distribution <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-192"><span class="MJXp-mo" id="MJXp-Span-193" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-194">x</span><span class="MJXp-mo" id="MJXp-Span-195" style="margin-left: 0em; margin-right: 0.222em;">,</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-196">y</span><span class="MJXp-mo" id="MJXp-Span-197" style="margin-left: 0em; margin-right: 0em;">)</span></span></span><span id="MathJax-Element-14-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-222" class="mjx-math"><span id="MJXc-Node-223" class="mjx-mrow"><span id="MJXc-Node-224" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-225" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.297em;">x</span></span><span id="MJXc-Node-226" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.145em; padding-bottom: 0.543em;">,</span></span><span id="MJXc-Node-227" class="mjx-mi MJXc-space1"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.494em; padding-right: 0.006em;">y</span></span><span id="MJXc-Node-228" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span><script type="math/tex" id="MathJax-Element-14">(x, y)</script>. The optimal model is called the <em>Bayes model</em> and the corresponding loss is called the <em>Bayes error</em>. The Bayes error is a hard limit on how well we can predict a response <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-198"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-199">y</span></span></span><span id="MathJax-Element-15-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-229" class="mjx-math"><span id="MJXc-Node-230" class="mjx-mrow"><span id="MJXc-Node-231" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.494em; padding-right: 0.006em;">y</span></span></span></span></span><script type="math/tex" id="MathJax-Element-15">y</script> from features <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-200"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-201">x</span></span></span><span id="MathJax-Element-16-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-232" class="mjx-math"><span id="MJXc-Node-233" class="mjx-mrow"><span id="MJXc-Node-234" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.297em;">x</span></span></span></span></span><script type="math/tex" id="MathJax-Element-16">x</script> with respect to a loss <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-202"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-203">ℓ</span></span></span><span id="MathJax-Element-17-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-235" class="mjx-math"><span id="MJXc-Node-236" class="mjx-mrow"><span id="MJXc-Node-237" class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">ℓ</span></span></span></span></span><script type="math/tex" id="MathJax-Element-17">\ell</script> and is usually unknown. For some tasks like object detection or speech recognition, the Bayes error is near zero because humans can do these tasks with near zero error. On the other hand, predicting if a borrower will default on a loan given a few characteristics like the loan amount, income, and credit score has a higher Bayes error. We can improve the Bayes error by using more informative features.
(As an aside, for a regression problem with square loss, the Bayes regressor is the conditional expectation <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-204"><span class="MJXp-mrow" id="MJXp-Span-205"><span class="MJXp-mtext MJXp-bold" id="MJXp-Span-206">E</span></span><span class="MJXp-mo" id="MJXp-Span-207" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-208">y</span><span class="MJXp-mo" id="MJXp-Span-209" style="margin-left: 0.167em; margin-right: 0.167em;">|</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-210">x</span><span class="MJXp-mo" id="MJXp-Span-211" style="margin-left: 0em; margin-right: 0em;">)</span></span></span><span id="MathJax-Element-18-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-238" class="mjx-math"><span id="MJXc-Node-239" class="mjx-mrow"><span id="MJXc-Node-240" class="mjx-texatom"><span id="MJXc-Node-241" class="mjx-mrow"><span id="MJXc-Node-242" class="mjx-mtext"><span class="mjx-char MJXc-TeX-main-B" style="padding-top: 0.347em; padding-bottom: 0.347em;">E</span></span></span></span><span id="MJXc-Node-243" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-244" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.494em; padding-right: 0.006em;">y</span></span><span id="MJXc-Node-245" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">|</span></span><span id="MJXc-Node-246" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.297em;">x</span></span><span id="MJXc-Node-247" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span><script type="math/tex" id="MathJax-Element-18">\textbf{E}(y \vert x)</script> and the Bayes error is the conditional variance <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-212"><span class="MJXp-mrow" id="MJXp-Span-213"><span class="MJXp-mtext MJXp-bold" id="MJXp-Span-214">Var</span></span><span class="MJXp-mo" id="MJXp-Span-215" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-216">y</span><span class="MJXp-mo" id="MJXp-Span-217" style="margin-left: 0.167em; margin-right: 0.167em;">|</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-218">x</span><span class="MJXp-mo" id="MJXp-Span-219" style="margin-left: 0em; margin-right: 0em;">)</span></span></span><span id="MathJax-Element-19-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-248" class="mjx-math"><span id="MJXc-Node-249" class="mjx-mrow"><span id="MJXc-Node-250" class="mjx-texatom"><span id="MJXc-Node-251" class="mjx-mrow"><span id="MJXc-Node-252" class="mjx-mtext"><span class="mjx-char MJXc-TeX-main-B" style="padding-top: 0.396em; padding-bottom: 0.396em;">Var</span></span></span></span><span id="MJXc-Node-253" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-254" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.494em; padding-right: 0.006em;">y</span></span><span id="MJXc-Node-255" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">|</span></span><span id="MJXc-Node-256" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.297em;">x</span></span><span id="MJXc-Node-257" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span><script type="math/tex" id="MathJax-Element-19">\textbf{Var}(y \vert x)</script>. Regression modeling therefore reduces to efficiently estimating/learning the conditional expectation.)</p>
<p>For tractability, most machine learning and statistics (including deep learning) is parametric. This means we restrict our model to lie in a parametrized class <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-220"><span class="MJXp-mrow" id="MJXp-Span-221"><span class="MJXp-mi MJXp-cal" id="MJXp-Span-222">F</span></span><span class="MJXp-mo" id="MJXp-Span-223" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-mo" id="MJXp-Span-224" style="margin-left: 0em; margin-right: 0em;">{</span><span class="MJXp-msubsup" id="MJXp-Span-225"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-226" style="margin-right: 0.05em;">f</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-227" style="vertical-align: -0.4em;"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-228">θ</span></span></span><span class="MJXp-mo" id="MJXp-Span-229" style="margin-left: 0.111em; margin-right: 0.167em;">:</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-230">θ</span><span class="MJXp-mo" id="MJXp-Span-231" style="margin-left: 0.333em; margin-right: 0.333em;">∈</span><span class="MJXp-mi" id="MJXp-Span-232">Θ</span><span class="MJXp-mo" id="MJXp-Span-233" style="margin-left: 0em; margin-right: 0em;">}</span></span></span><span id="MathJax-Element-20-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-258" class="mjx-math"><span id="MJXc-Node-259" class="mjx-mrow"><span id="MJXc-Node-260" class="mjx-texatom"><span id="MJXc-Node-261" class="mjx-mrow"><span id="MJXc-Node-262" class="mjx-mi"><span class="mjx-char MJXc-TeX-cal-R" style="padding-top: 0.445em; padding-bottom: 0.347em; padding-right: 0.11em;">F</span></span></span></span><span id="MJXc-Node-263" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.101em; padding-bottom: 0.297em;">=</span></span><span id="MJXc-Node-264" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">{</span></span><span id="MJXc-Node-265" class="mjx-msubsup"><span class="mjx-base" style="margin-right: -0.06em;"><span id="MJXc-Node-266" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.494em; padding-right: 0.06em;">f</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.23em; padding-right: 0.071em;"><span id="MJXc-Node-267" class="mjx-texatom" style=""><span id="MJXc-Node-268" class="mjx-mrow"><span id="MJXc-Node-269" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.297em;">θ</span></span></span></span></span></span><span id="MJXc-Node-270" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.15em; padding-bottom: 0.347em;">:</span></span><span id="MJXc-Node-271" class="mjx-mi MJXc-space3"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.297em;">θ</span></span><span id="MJXc-Node-272" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.248em; padding-bottom: 0.396em;">∈</span></span><span id="MJXc-Node-273" class="mjx-mi MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">Θ</span></span><span id="MJXc-Node-274" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">}</span></span></span></span></span><script type="math/tex" id="MathJax-Element-20">\mathcal{F} = \{ f_{\theta} : \theta \in \Theta\}</script> (e.g., all linear functions or all neural networks of a given architecture). We also minimize loss over a sample of data. These simplifications lead to <em>model class error</em> and <em>sample error</em>:</p>
<ul>
<li>
<p>Finding the best model in <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-234"><span class="MJXp-mrow" id="MJXp-Span-235"><span class="MJXp-mi MJXp-cal" id="MJXp-Span-236">F</span></span></span></span><span id="MathJax-Element-21-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-275" class="mjx-math"><span id="MJXc-Node-276" class="mjx-mrow"><span id="MJXc-Node-277" class="mjx-texatom"><span id="MJXc-Node-278" class="mjx-mrow"><span id="MJXc-Node-279" class="mjx-mi"><span class="mjx-char MJXc-TeX-cal-R" style="padding-top: 0.445em; padding-bottom: 0.347em; padding-right: 0.11em;">F</span></span></span></span></span></span></span><script type="math/tex" id="MathJax-Element-21">\mathcal{F}</script> instead of the best model overall leads to model class error. Model class error can be improved by using a more complicated model class. Note that if a simple model already achieves loss close to the Bayes error, using a more complicated model won’t help much.</p>
</li>
<li>
<p>Training on a sample of data instead of an infinite population leads to sample error and jeopardizes generalizability. Sample error is usually addressed with training on more data or using regularization.</p>
</li>
</ul>
<p>After these simplifications the learning problem is</p>
<span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math MJXp-display" id="MJXp-Span-237"><span class="MJXp-munderover" id="MJXp-Span-238"><span class=""><span class="MJXp-mo" id="MJXp-Span-239" style="margin-left: 0.333em; margin-right: 0.333em;">min</span></span><span class=" MJXp-script"><span class="MJXp-mrow" id="MJXp-Span-240" style="margin-left: 0px;"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-241">θ</span><span class="MJXp-mo" id="MJXp-Span-242">∈</span><span class="MJXp-mi" id="MJXp-Span-243">Θ</span></span></span></span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-244">J</span><span class="MJXp-mo" id="MJXp-Span-245" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-246">θ</span><span class="MJXp-mo" id="MJXp-Span-247" style="margin-left: 0em; margin-right: 0em;">)</span><span class="MJXp-mo" id="MJXp-Span-248" style="margin-left: 0.111em; margin-right: 0.167em;">:=</span><span class="MJXp-mfrac" id="MJXp-Span-249" style="vertical-align: 0.25em;"><span class="MJXp-box"><span class="MJXp-mn" id="MJXp-Span-250">1</span></span><span class="MJXp-box" style="margin-top: -0.9em;"><span class="MJXp-denom"><span><span class="MJXp-rule" style="height: 1em; border-top: none; border-bottom: 1px solid; margin: 0.1em 0px;"></span></span><span><span class="MJXp-box"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-251">N</span></span></span></span></span></span><span class="MJXp-munderover" id="MJXp-Span-252"><span><span class="MJXp-over"><span class=" MJXp-script"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-258" style="margin-right: 0px; margin-left: 0px;">N</span></span><span class=""><span class="MJXp-mo" id="MJXp-Span-253" style="margin-left: 0.111em; margin-right: 0.167em;"><span class="MJXp-largeop">∑</span></span></span></span></span><span class=" MJXp-script"><span class="MJXp-mrow" id="MJXp-Span-254" style="margin-left: 0px;"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-255">i</span><span class="MJXp-mo" id="MJXp-Span-256">=</span><span class="MJXp-mn" id="MJXp-Span-257">1</span></span></span></span><span class="MJXp-msubsup" id="MJXp-Span-259"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-260" style="margin-right: 0.05em;">ℓ</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-261" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-262">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-263">i</span><span class="MJXp-mo" id="MJXp-Span-264">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-265" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-266">θ</span><span class="MJXp-mo" id="MJXp-Span-267" style="margin-left: 0em; margin-right: 0em;">)</span><span class="MJXp-mo" id="MJXp-Span-268" style="margin-left: 0.267em; margin-right: 0.267em;">+</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-269">R</span><span class="MJXp-mo" id="MJXp-Span-270" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-271">θ</span><span class="MJXp-mo" id="MJXp-Span-272" style="margin-left: 0em; margin-right: 0em;">)</span><span class="MJXp-mo" id="MJXp-Span-273" style="margin-left: 0em; margin-right: 0.222em;">.</span></span></span><span class="mjx-chtml MJXc-display MJXc-processed" style="text-align: center;"><span id="MathJax-Element-22-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 113%; text-align: center;"><span id="MJXc-Node-280" class="mjx-math"><span id="MJXc-Node-281" class="mjx-mrow"><span id="MJXc-Node-282" class="mjx-munderover"><span class="mjx-itable"><span class="mjx-row"><span class="mjx-cell"><span class="mjx-op"><span id="MJXc-Node-283" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">min</span></span></span></span></span><span class="mjx-row"><span class="mjx-under" style="font-size: 70.7%; padding-top: 0.236em; padding-bottom: 0.141em; padding-left: 0.222em;"><span id="MJXc-Node-284" class="mjx-texatom" style=""><span id="MJXc-Node-285" class="mjx-mrow"><span id="MJXc-Node-286" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.297em;">θ</span></span><span id="MJXc-Node-287" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.248em; padding-bottom: 0.396em;">∈</span></span><span id="MJXc-Node-288" class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">Θ</span></span></span></span></span></span></span></span><span id="MJXc-Node-289" class="mjx-mi MJXc-space1"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em; padding-right: 0.078em;">J</span></span><span id="MJXc-Node-290" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-291" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.297em;">θ</span></span><span id="MJXc-Node-292" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span><span id="MJXc-Node-293" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.101em; padding-bottom: 0.347em;">:<span class="mjx-charbox MJXc-TeX-main-R" style="padding-bottom: 0.314em;">=</span></span></span><span id="MJXc-Node-294" class="mjx-mfrac MJXc-space3"><span class="mjx-box MJXc-stacked" style="width: 1.088em; padding: 0px 0.12em;"><span class="mjx-numerator" style="width: 1.088em; top: -1.368em;"><span id="MJXc-Node-295" class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">1</span></span></span><span class="mjx-denominator" style="width: 1.088em; bottom: -0.711em;"><span id="MJXc-Node-296" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em; padding-right: 0.085em;">N</span></span></span><span class="mjx-line" style="border-bottom: 1.3px solid; top: -0.281em; width: 1.088em;"></span></span><span class="mjx-vsize" style="height: 2.078em; vertical-align: -0.711em;"></span></span><span id="MJXc-Node-297" class="mjx-munderover MJXc-space1"><span class="mjx-itable"><span class="mjx-row"><span class="mjx-cell"><span class="mjx-stack"><span class="mjx-over" style="font-size: 70.7%; padding-bottom: 0.258em; padding-top: 0.141em; padding-left: 0.577em;"><span id="MJXc-Node-304" class="mjx-mi" style=""><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em; padding-right: 0.085em;">N</span></span></span><span class="mjx-op"><span id="MJXc-Node-298" class="mjx-mo"><span class="mjx-char MJXc-TeX-size2-R" style="padding-top: 0.74em; padding-bottom: 0.74em;">∑</span></span></span></span></span></span><span class="mjx-row"><span class="mjx-under" style="font-size: 70.7%; padding-top: 0.236em; padding-bottom: 0.141em; padding-left: 0.21em;"><span id="MJXc-Node-299" class="mjx-texatom" style=""><span id="MJXc-Node-300" class="mjx-mrow"><span id="MJXc-Node-301" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">i</span></span><span id="MJXc-Node-302" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.101em; padding-bottom: 0.297em;">=</span></span><span id="MJXc-Node-303" class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">1</span></span></span></span></span></span></span></span><span id="MJXc-Node-305" class="mjx-msubsup MJXc-space1"><span class="mjx-base"><span id="MJXc-Node-306" class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">ℓ</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.584em; padding-left: 0px; padding-right: 0.071em;"><span id="MJXc-Node-307" class="mjx-texatom" style=""><span id="MJXc-Node-308" class="mjx-mrow"><span id="MJXc-Node-309" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-310" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">i</span></span><span id="MJXc-Node-311" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><span id="MJXc-Node-312" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-313" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.297em;">θ</span></span><span id="MJXc-Node-314" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span><span id="MJXc-Node-315" class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.297em; padding-bottom: 0.445em;">+</span></span><span id="MJXc-Node-316" class="mjx-mi MJXc-space2"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">R</span></span><span id="MJXc-Node-317" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-318" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.297em;">θ</span></span><span id="MJXc-Node-319" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span><span id="MJXc-Node-320" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.145em; padding-bottom: 0.347em;">.</span></span></span></span></span></span><script type="math/tex; mode=display" id="MathJax-Element-22">\min_{\theta \in \Theta} J(\theta) := \frac{1}{N} \sum_{i=1}^N \ell^{(i)}(\theta) + R(\theta).</script>
<p>Notice that the loss <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-274"><span class="MJXp-msubsup" id="MJXp-Span-275"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-276" style="margin-right: 0.05em;">ℓ</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-277" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-278">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-279">i</span><span class="MJXp-mo" id="MJXp-Span-280">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-281" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-282">θ</span><span class="MJXp-mo" id="MJXp-Span-283" style="margin-left: 0em; margin-right: 0em;">)</span></span></span><span id="MathJax-Element-23-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-321" class="mjx-math"><span id="MJXc-Node-322" class="mjx-mrow"><span id="MJXc-Node-323" class="mjx-msubsup"><span class="mjx-base"><span id="MJXc-Node-324" class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">ℓ</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span id="MJXc-Node-325" class="mjx-texatom" style=""><span id="MJXc-Node-326" class="mjx-mrow"><span id="MJXc-Node-327" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-328" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">i</span></span><span id="MJXc-Node-329" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><span id="MJXc-Node-330" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-331" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.297em;">θ</span></span><span id="MJXc-Node-332" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span><script type="math/tex" id="MathJax-Element-23">\ell^{(i)}(\theta)</script> on the <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-284"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-285">i</span></span></span><span id="MathJax-Element-24-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-333" class="mjx-math"><span id="MJXc-Node-334" class="mjx-mrow"><span id="MJXc-Node-335" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">i</span></span></span></span></span><script type="math/tex" id="MathJax-Element-24">i</script>th observation is a function of the model parameters (before the loss was a function of the model <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-286"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-287">f</span></span></span><span id="MathJax-Element-25-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-336" class="mjx-math"><span id="MJXc-Node-337" class="mjx-mrow"><span id="MJXc-Node-338" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.494em; padding-right: 0.06em;">f</span></span></span></span></span><script type="math/tex" id="MathJax-Element-25">f</script>, but now <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-288"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-289">f</span></span></span><span id="MathJax-Element-26-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-339" class="mjx-math"><span id="MJXc-Node-340" class="mjx-mrow"><span id="MJXc-Node-341" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.494em; padding-right: 0.06em;">f</span></span></span></span></span><script type="math/tex" id="MathJax-Element-26">f</script> is identified with its parameters <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-290"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-291">θ</span></span></span><span id="MathJax-Element-27-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-342" class="mjx-math"><span id="MJXc-Node-343" class="mjx-mrow"><span id="MJXc-Node-344" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.297em;">θ</span></span></span></span></span><script type="math/tex" id="MathJax-Element-27">\theta</script>). Also notice that we’ve included a regularization term <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-292"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-293">R</span><span class="MJXp-mo" id="MJXp-Span-294" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-295">θ</span><span class="MJXp-mo" id="MJXp-Span-296" style="margin-left: 0em; margin-right: 0em;">)</span></span></span><span id="MathJax-Element-28-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-345" class="mjx-math"><span id="MJXc-Node-346" class="mjx-mrow"><span id="MJXc-Node-347" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">R</span></span><span id="MJXc-Node-348" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-349" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.297em;">θ</span></span><span id="MJXc-Node-350" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span><script type="math/tex" id="MathJax-Element-28">R(\theta)</script> to deal with sample error. The most common form of regularization is L2 regularization in which <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-297"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-298">R</span><span class="MJXp-mo" id="MJXp-Span-299" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-300">θ</span><span class="MJXp-mo" id="MJXp-Span-301" style="margin-left: 0em; margin-right: 0em;">)</span><span class="MJXp-mo" id="MJXp-Span-302" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-303">α</span><span class="MJXp-mo" id="MJXp-Span-304" style="margin-left: 0.167em; margin-right: 0.167em;">|</span><span class="MJXp-mo" id="MJXp-Span-305" style="margin-left: 0.167em; margin-right: 0.167em;">|</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-306">θ</span><span class="MJXp-mo" id="MJXp-Span-307" style="margin-left: 0.167em; margin-right: 0.167em;">|</span><span class="MJXp-msubsup" id="MJXp-Span-308"><span class="MJXp-mo" id="MJXp-Span-309" style="margin-left: 0em; margin-right: 0.05em;">|</span><span class="MJXp-script-box" style="height: 1.86em; vertical-align: -0.64em;"><span class=" MJXp-script"><span><span style="margin-bottom: -0.25em;"><span class="MJXp-mn" id="MJXp-Span-311">2</span></span></span></span><span class=" MJXp-script"><span><span style="margin-top: -0.85em;"><span class="MJXp-mn" id="MJXp-Span-310">2</span></span></span></span></span></span></span></span><span id="MathJax-Element-29-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-351" class="mjx-math"><span id="MJXc-Node-352" class="mjx-mrow"><span id="MJXc-Node-353" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">R</span></span><span id="MJXc-Node-354" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-355" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.297em;">θ</span></span><span id="MJXc-Node-356" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span><span id="MJXc-Node-357" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.101em; padding-bottom: 0.297em;">=</span></span><span id="MJXc-Node-358" class="mjx-mi MJXc-space3"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.297em;">α</span></span><span id="MJXc-Node-359" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">|</span></span><span id="MJXc-Node-360" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">|</span></span><span id="MJXc-Node-361" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.297em;">θ</span></span><span id="MJXc-Node-362" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">|</span></span><span id="MJXc-Node-363" class="mjx-msubsup"><span class="mjx-base"><span id="MJXc-Node-364" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">|</span></span></span><span class="mjx-stack" style="vertical-align: -0.315em;"><span class="mjx-sup" style="font-size: 70.7%; padding-bottom: 0.255em; padding-left: 0px; padding-right: 0.071em;"><span id="MJXc-Node-366" class="mjx-mn" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">2</span></span></span><span class="mjx-sub" style="font-size: 70.7%; padding-right: 0.071em;"><span id="MJXc-Node-365" class="mjx-mn" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">2</span></span></span></span></span></span></span></span><script type="math/tex" id="MathJax-Element-29">R(\theta) = \alpha \vert \vert \theta \vert \vert_2^2</script>.</p>
<p>Minimizing the regularized loss <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-312"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-313">J</span></span></span><span id="MathJax-Element-30-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-367" class="mjx-math"><span id="MJXc-Node-368" class="mjx-mrow"><span id="MJXc-Node-369" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em; padding-right: 0.078em;">J</span></span></span></span></span><script type="math/tex" id="MathJax-Element-30">J</script> over <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-314"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-315">θ</span></span></span><span id="MathJax-Element-31-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-370" class="mjx-math"><span id="MJXc-Node-371" class="mjx-mrow"><span id="MJXc-Node-372" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.297em;">θ</span></span></span></span></span><script type="math/tex" id="MathJax-Element-31">\theta</script> may still be difficult. <em>Optimization error</em> occurs when we only find an approximate minimizer; this can be addressed by optimizing for more iterations (i.e., training for longer) or using a better optimization algorithm. The table below summarizes the different kinds of error in a predictive problem and how to improve each kind.</p>
<table>
<thead>
<tr>
<th style="text-align: center">Error</th>
<th style="text-align: center">How to improve</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center">Bayes error</td>
<td style="text-align: center">Use better features</td>
</tr>
<tr>
<td style="text-align: center">Model class error</td>
<td style="text-align: center">Use a more complicated model</td>
</tr>
<tr>
<td style="text-align: center">Sample error</td>
<td style="text-align: center">Use regularization; get more data</td>
</tr>
<tr>
<td style="text-align: center">Optimization error</td>
<td style="text-align: center">Train longer; use a better optimization algorithm; reformulate loss/regularization to have properties more conducive to optimization like differentiability, Lipschitz continuous gradients, or strong convexity</td>
</tr>
</tbody>
</table>
<p>Before we discuss training DNNs, let’s quickly go over binary classification because it is formulated slightly differently than described above. In classification, the labels <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-316"><span class="MJXp-msubsup" id="MJXp-Span-317"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-318" style="margin-right: 0.05em;">y</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-319" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-320">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-321">i</span><span class="MJXp-mo" id="MJXp-Span-322">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-323" style="margin-left: 0.333em; margin-right: 0.333em;">∈</span><span class="MJXp-mo" id="MJXp-Span-324" style="margin-left: 0em; margin-right: 0em;">{</span><span class="MJXp-mn" id="MJXp-Span-325">0</span><span class="MJXp-mo" id="MJXp-Span-326" style="margin-left: 0em; margin-right: 0.222em;">,</span><span class="MJXp-mn" id="MJXp-Span-327">1</span><span class="MJXp-mo" id="MJXp-Span-328" style="margin-left: 0em; margin-right: 0em;">}</span></span></span><span id="MathJax-Element-32-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-373" class="mjx-math"><span id="MJXc-Node-374" class="mjx-mrow"><span id="MJXc-Node-375" class="mjx-msubsup"><span class="mjx-base" style="margin-right: -0.006em;"><span id="MJXc-Node-376" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.494em; padding-right: 0.006em;">y</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0.082em; padding-right: 0.071em;"><span id="MJXc-Node-377" class="mjx-texatom" style=""><span id="MJXc-Node-378" class="mjx-mrow"><span id="MJXc-Node-379" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-380" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">i</span></span><span id="MJXc-Node-381" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><span id="MJXc-Node-382" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.248em; padding-bottom: 0.396em;">∈</span></span><span id="MJXc-Node-383" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">{</span></span><span id="MJXc-Node-384" class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">0</span></span><span id="MJXc-Node-385" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.145em; padding-bottom: 0.543em;">,</span></span><span id="MJXc-Node-386" class="mjx-mn MJXc-space1"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">1</span></span><span id="MJXc-Node-387" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">}</span></span></span></span></span><script type="math/tex" id="MathJax-Element-32">y^{(i)} \in \{0, 1\}</script> indicate whether an event occurred or not (e.g., did a person default on their loan or did a user buy a product). Rather than model the labels <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-329"><span class="MJXp-msubsup" id="MJXp-Span-330"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-331" style="margin-right: 0.05em;">y</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-332" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-333">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-334">i</span><span class="MJXp-mo" id="MJXp-Span-335">)</span></span></span></span></span><span id="MathJax-Element-33-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-388" class="mjx-math"><span id="MJXc-Node-389" class="mjx-mrow"><span id="MJXc-Node-390" class="mjx-msubsup"><span class="mjx-base" style="margin-right: -0.006em;"><span id="MJXc-Node-391" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.494em; padding-right: 0.006em;">y</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0.082em; padding-right: 0.071em;"><span id="MJXc-Node-392" class="mjx-texatom" style=""><span id="MJXc-Node-393" class="mjx-mrow"><span id="MJXc-Node-394" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-395" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">i</span></span><span id="MJXc-Node-396" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span></span></span></span><script type="math/tex" id="MathJax-Element-33">y^{(i)}</script> directly, the model returns <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-336"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-337">p</span><span class="MJXp-mo" id="MJXp-Span-338" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-339">f</span><span class="MJXp-mo" id="MJXp-Span-340" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-341">x</span><span class="MJXp-mo" id="MJXp-Span-342" style="margin-left: 0em; margin-right: 0em;">)</span></span></span><span id="MathJax-Element-34-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-397" class="mjx-math"><span id="MJXc-Node-398" class="mjx-mrow"><span id="MJXc-Node-399" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.494em;">p</span></span><span id="MJXc-Node-400" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.101em; padding-bottom: 0.297em;">=</span></span><span id="MJXc-Node-401" class="mjx-mi MJXc-space3"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.494em; padding-right: 0.06em;">f</span></span><span id="MJXc-Node-402" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-403" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.297em;">x</span></span><span id="MJXc-Node-404" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span><script type="math/tex" id="MathJax-Element-34">p = f(x)</script>, the probability that <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-343"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-344">y</span><span class="MJXp-mo" id="MJXp-Span-345" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-mn" id="MJXp-Span-346">1</span></span></span><span id="MathJax-Element-35-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-405" class="mjx-math"><span id="MJXc-Node-406" class="mjx-mrow"><span id="MJXc-Node-407" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.494em; padding-right: 0.006em;">y</span></span><span id="MJXc-Node-408" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.101em; padding-bottom: 0.297em;">=</span></span><span id="MJXc-Node-409" class="mjx-mn MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">1</span></span></span></span></span><script type="math/tex" id="MathJax-Element-35">y = 1</script> (see <a href="https://scottroy.github.io/ROC-space-and-AUC.html">ROC space and AUC</a> for a discussion of the difference between a classifier and a scorer). In the classification setting, the loss is usually based on the likelihood of observing the training data under the model, assuming each observation is independent. For example, given outcomes <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-347"><span class="MJXp-msubsup" id="MJXp-Span-348"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-349" style="margin-right: 0.05em;">y</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-350" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-351">(</span><span class="MJXp-mn" id="MJXp-Span-352">1</span><span class="MJXp-mo" id="MJXp-Span-353">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-354" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-mn" id="MJXp-Span-355">0</span></span></span><span id="MathJax-Element-36-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-410" class="mjx-math"><span id="MJXc-Node-411" class="mjx-mrow"><span id="MJXc-Node-412" class="mjx-msubsup"><span class="mjx-base" style="margin-right: -0.006em;"><span id="MJXc-Node-413" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.494em; padding-right: 0.006em;">y</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0.082em; padding-right: 0.071em;"><span id="MJXc-Node-414" class="mjx-texatom" style=""><span id="MJXc-Node-415" class="mjx-mrow"><span id="MJXc-Node-416" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-417" class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">1</span></span><span id="MJXc-Node-418" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><span id="MJXc-Node-419" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.101em; padding-bottom: 0.297em;">=</span></span><span id="MJXc-Node-420" class="mjx-mn MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">0</span></span></span></span></span><script type="math/tex" id="MathJax-Element-36">y^{(1)} = 0</script>, <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-356"><span class="MJXp-msubsup" id="MJXp-Span-357"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-358" style="margin-right: 0.05em;">y</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-359" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-360">(</span><span class="MJXp-mn" id="MJXp-Span-361">2</span><span class="MJXp-mo" id="MJXp-Span-362">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-363" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-mn" id="MJXp-Span-364">1</span></span></span><span id="MathJax-Element-37-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-421" class="mjx-math"><span id="MJXc-Node-422" class="mjx-mrow"><span id="MJXc-Node-423" class="mjx-msubsup"><span class="mjx-base" style="margin-right: -0.006em;"><span id="MJXc-Node-424" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.494em; padding-right: 0.006em;">y</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0.082em; padding-right: 0.071em;"><span id="MJXc-Node-425" class="mjx-texatom" style=""><span id="MJXc-Node-426" class="mjx-mrow"><span id="MJXc-Node-427" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-428" class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">2</span></span><span id="MJXc-Node-429" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><span id="MJXc-Node-430" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.101em; padding-bottom: 0.297em;">=</span></span><span id="MJXc-Node-431" class="mjx-mn MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">1</span></span></span></span></span><script type="math/tex" id="MathJax-Element-37">y^{(2)} = 1</script>, and <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-365"><span class="MJXp-msubsup" id="MJXp-Span-366"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-367" style="margin-right: 0.05em;">y</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-368" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-369">(</span><span class="MJXp-mn" id="MJXp-Span-370">3</span><span class="MJXp-mo" id="MJXp-Span-371">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-372" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-mn" id="MJXp-Span-373">0</span></span></span><span id="MathJax-Element-38-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-432" class="mjx-math"><span id="MJXc-Node-433" class="mjx-mrow"><span id="MJXc-Node-434" class="mjx-msubsup"><span class="mjx-base" style="margin-right: -0.006em;"><span id="MJXc-Node-435" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.494em; padding-right: 0.006em;">y</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0.082em; padding-right: 0.071em;"><span id="MJXc-Node-436" class="mjx-texatom" style=""><span id="MJXc-Node-437" class="mjx-mrow"><span id="MJXc-Node-438" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-439" class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">3</span></span><span id="MJXc-Node-440" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><span id="MJXc-Node-441" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.101em; padding-bottom: 0.297em;">=</span></span><span id="MJXc-Node-442" class="mjx-mn MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">0</span></span></span></span></span><script type="math/tex" id="MathJax-Element-38">y^{(3)} = 0</script> and model probabilities <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-374"><span class="MJXp-msubsup" id="MJXp-Span-375"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-376" style="margin-right: 0.05em;">p</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-377" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-378">(</span><span class="MJXp-mn" id="MJXp-Span-379">1</span><span class="MJXp-mo" id="MJXp-Span-380">)</span></span></span></span></span><span id="MathJax-Element-39-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-443" class="mjx-math"><span id="MJXc-Node-444" class="mjx-mrow"><span id="MJXc-Node-445" class="mjx-msubsup"><span class="mjx-base"><span id="MJXc-Node-446" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.494em;">p</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span id="MJXc-Node-447" class="mjx-texatom" style=""><span id="MJXc-Node-448" class="mjx-mrow"><span id="MJXc-Node-449" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-450" class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">1</span></span><span id="MJXc-Node-451" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span></span></span></span><script type="math/tex" id="MathJax-Element-39">p^{(1)}</script>, <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-381"><span class="MJXp-msubsup" id="MJXp-Span-382"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-383" style="margin-right: 0.05em;">p</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-384" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-385">(</span><span class="MJXp-mn" id="MJXp-Span-386">2</span><span class="MJXp-mo" id="MJXp-Span-387">)</span></span></span></span></span><span id="MathJax-Element-40-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-452" class="mjx-math"><span id="MJXc-Node-453" class="mjx-mrow"><span id="MJXc-Node-454" class="mjx-msubsup"><span class="mjx-base"><span id="MJXc-Node-455" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.494em;">p</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span id="MJXc-Node-456" class="mjx-texatom" style=""><span id="MJXc-Node-457" class="mjx-mrow"><span id="MJXc-Node-458" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-459" class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">2</span></span><span id="MJXc-Node-460" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span></span></span></span><script type="math/tex" id="MathJax-Element-40">p^{(2)}</script>, and <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-388"><span class="MJXp-msubsup" id="MJXp-Span-389"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-390" style="margin-right: 0.05em;">p</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-391" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-392">(</span><span class="MJXp-mn" id="MJXp-Span-393">3</span><span class="MJXp-mo" id="MJXp-Span-394">)</span></span></span></span></span><span id="MathJax-Element-41-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-461" class="mjx-math"><span id="MJXc-Node-462" class="mjx-mrow"><span id="MJXc-Node-463" class="mjx-msubsup"><span class="mjx-base"><span id="MJXc-Node-464" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.494em;">p</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span id="MJXc-Node-465" class="mjx-texatom" style=""><span id="MJXc-Node-466" class="mjx-mrow"><span id="MJXc-Node-467" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-468" class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">3</span></span><span id="MJXc-Node-469" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span></span></span></span><script type="math/tex" id="MathJax-Element-41">p^{(3)}</script>, the likelihood of observing the data under the model is <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-395"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-396">P</span><span class="MJXp-mo" id="MJXp-Span-397" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-mo" id="MJXp-Span-398" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-mn" id="MJXp-Span-399">1</span><span class="MJXp-mo" id="MJXp-Span-400" style="margin-left: 0.267em; margin-right: 0.267em;">−</span><span class="MJXp-msubsup" id="MJXp-Span-401"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-402" style="margin-right: 0.05em;">p</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-403" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-404">(</span><span class="MJXp-mn" id="MJXp-Span-405">1</span><span class="MJXp-mo" id="MJXp-Span-406">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-407" style="margin-left: 0em; margin-right: 0em;">)</span><span class="MJXp-mo" id="MJXp-Span-408" style="margin-left: 0.267em; margin-right: 0.267em;">⋅</span><span class="MJXp-msubsup" id="MJXp-Span-409"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-410" style="margin-right: 0.05em;">p</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-411" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-412">(</span><span class="MJXp-mn" id="MJXp-Span-413">2</span><span class="MJXp-mo" id="MJXp-Span-414">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-415" style="margin-left: 0.267em; margin-right: 0.267em;">⋅</span><span class="MJXp-mo" id="MJXp-Span-416" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-mn" id="MJXp-Span-417">1</span><span class="MJXp-mo" id="MJXp-Span-418" style="margin-left: 0.267em; margin-right: 0.267em;">−</span><span class="MJXp-msubsup" id="MJXp-Span-419"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-420" style="margin-right: 0.05em;">p</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-421" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-422">(</span><span class="MJXp-mn" id="MJXp-Span-423">3</span><span class="MJXp-mo" id="MJXp-Span-424">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-425" style="margin-left: 0em; margin-right: 0em;">)</span></span></span><span id="MathJax-Element-42-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-470" class="mjx-math"><span id="MJXc-Node-471" class="mjx-mrow"><span id="MJXc-Node-472" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em; padding-right: 0.109em;">P</span></span><span id="MJXc-Node-473" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.101em; padding-bottom: 0.297em;">=</span></span><span id="MJXc-Node-474" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-475" class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">1</span></span><span id="MJXc-Node-476" class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.297em; padding-bottom: 0.445em;">−</span></span><span id="MJXc-Node-477" class="mjx-msubsup MJXc-space2"><span class="mjx-base"><span id="MJXc-Node-478" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.494em;">p</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span id="MJXc-Node-479" class="mjx-texatom" style=""><span id="MJXc-Node-480" class="mjx-mrow"><span id="MJXc-Node-481" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-482" class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">1</span></span><span id="MJXc-Node-483" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><span id="MJXc-Node-484" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span><span id="MJXc-Node-485" class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.002em; padding-bottom: 0.297em;">⋅</span></span><span id="MJXc-Node-486" class="mjx-msubsup MJXc-space2"><span class="mjx-base"><span id="MJXc-Node-487" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.494em;">p</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span id="MJXc-Node-488" class="mjx-texatom" style=""><span id="MJXc-Node-489" class="mjx-mrow"><span id="MJXc-Node-490" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-491" class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">2</span></span><span id="MJXc-Node-492" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><span id="MJXc-Node-493" class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.002em; padding-bottom: 0.297em;">⋅</span></span><span id="MJXc-Node-494" class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-495" class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">1</span></span><span id="MJXc-Node-496" class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.297em; padding-bottom: 0.445em;">−</span></span><span id="MJXc-Node-497" class="mjx-msubsup MJXc-space2"><span class="mjx-base"><span id="MJXc-Node-498" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.494em;">p</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span id="MJXc-Node-499" class="mjx-texatom" style=""><span id="MJXc-Node-500" class="mjx-mrow"><span id="MJXc-Node-501" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-502" class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">3</span></span><span id="MJXc-Node-503" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><span id="MJXc-Node-504" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span><script type="math/tex" id="MathJax-Element-42">P = (1 - p^{(1)}) \cdot p^{(2)} \cdot (1 - p^{(3)})</script>. We define the loss as the negative log likelihood <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-426"><span class="MJXp-mo" id="MJXp-Span-427" style="margin-left: 0em; margin-right: 0.111em;">−</span><span class="MJXp-mi" id="MJXp-Span-428">log</span><span class="MJXp-mo" id="MJXp-Span-429" style="margin-left: 0em; margin-right: 0em;"></span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-430">P</span><span class="MJXp-mo" id="MJXp-Span-431" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-mo" id="MJXp-Span-432" style="margin-left: 0.267em; margin-right: 0.267em;">−</span><span class="MJXp-mi" id="MJXp-Span-433">log</span><span class="MJXp-mo" id="MJXp-Span-434" style="margin-left: 0em; margin-right: 0em;"></span><span class="MJXp-mo" id="MJXp-Span-435" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-mn" id="MJXp-Span-436">1</span><span class="MJXp-mo" id="MJXp-Span-437" style="margin-left: 0.267em; margin-right: 0.267em;">−</span><span class="MJXp-msubsup" id="MJXp-Span-438"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-439" style="margin-right: 0.05em;">p</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-440" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-441">(</span><span class="MJXp-mn" id="MJXp-Span-442">1</span><span class="MJXp-mo" id="MJXp-Span-443">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-444" style="margin-left: 0em; margin-right: 0em;">)</span><span class="MJXp-mo" id="MJXp-Span-445" style="margin-left: 0.267em; margin-right: 0.267em;">−</span><span class="MJXp-mi" id="MJXp-Span-446">log</span><span class="MJXp-mo" id="MJXp-Span-447" style="margin-left: 0em; margin-right: 0em;"></span><span class="MJXp-msubsup" id="MJXp-Span-448"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-449" style="margin-right: 0.05em;">p</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-450" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-451">(</span><span class="MJXp-mn" id="MJXp-Span-452">2</span><span class="MJXp-mo" id="MJXp-Span-453">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-454" style="margin-left: 0.267em; margin-right: 0.267em;">−</span><span class="MJXp-mi" id="MJXp-Span-455">log</span><span class="MJXp-mo" id="MJXp-Span-456" style="margin-left: 0em; margin-right: 0em;"></span><span class="MJXp-mo" id="MJXp-Span-457" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-mn" id="MJXp-Span-458">1</span><span class="MJXp-mo" id="MJXp-Span-459" style="margin-left: 0.267em; margin-right: 0.267em;">−</span><span class="MJXp-msubsup" id="MJXp-Span-460"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-461" style="margin-right: 0.05em;">p</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-462" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-463">(</span><span class="MJXp-mn" id="MJXp-Span-464">3</span><span class="MJXp-mo" id="MJXp-Span-465">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-466" style="margin-left: 0em; margin-right: 0em;">)</span></span></span><span id="MathJax-Element-43-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-505" class="mjx-math"><span id="MJXc-Node-506" class="mjx-mrow"><span id="MJXc-Node-507" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.297em; padding-bottom: 0.445em;">−</span></span><span id="MJXc-Node-508" class="mjx-mi MJXc-space1"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.543em;">log</span></span><span id="MJXc-Node-509" class="mjx-mo"><span class="mjx-char"></span></span><span id="MJXc-Node-510" class="mjx-mi MJXc-space1"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em; padding-right: 0.109em;">P</span></span><span id="MJXc-Node-511" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.101em; padding-bottom: 0.297em;">=</span></span><span id="MJXc-Node-512" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.297em; padding-bottom: 0.445em;">−</span></span><span id="MJXc-Node-513" class="mjx-mi MJXc-space1"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.543em;">log</span></span><span id="MJXc-Node-514" class="mjx-mo"><span class="mjx-char"></span></span><span id="MJXc-Node-515" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-516" class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">1</span></span><span id="MJXc-Node-517" class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.297em; padding-bottom: 0.445em;">−</span></span><span id="MJXc-Node-518" class="mjx-msubsup MJXc-space2"><span class="mjx-base"><span id="MJXc-Node-519" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.494em;">p</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span id="MJXc-Node-520" class="mjx-texatom" style=""><span id="MJXc-Node-521" class="mjx-mrow"><span id="MJXc-Node-522" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-523" class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">1</span></span><span id="MJXc-Node-524" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><span id="MJXc-Node-525" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span><span id="MJXc-Node-526" class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.297em; padding-bottom: 0.445em;">−</span></span><span id="MJXc-Node-527" class="mjx-mi MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.543em;">log</span></span><span id="MJXc-Node-528" class="mjx-mo"><span class="mjx-char"></span></span><span id="MJXc-Node-529" class="mjx-msubsup MJXc-space1"><span class="mjx-base"><span id="MJXc-Node-530" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.494em;">p</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span id="MJXc-Node-531" class="mjx-texatom" style=""><span id="MJXc-Node-532" class="mjx-mrow"><span id="MJXc-Node-533" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-534" class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">2</span></span><span id="MJXc-Node-535" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><span id="MJXc-Node-536" class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.297em; padding-bottom: 0.445em;">−</span></span><span id="MJXc-Node-537" class="mjx-mi MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.543em;">log</span></span><span id="MJXc-Node-538" class="mjx-mo"><span class="mjx-char"></span></span><span id="MJXc-Node-539" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-540" class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">1</span></span><span id="MJXc-Node-541" class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.297em; padding-bottom: 0.445em;">−</span></span><span id="MJXc-Node-542" class="mjx-msubsup MJXc-space2"><span class="mjx-base"><span id="MJXc-Node-543" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.494em;">p</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span id="MJXc-Node-544" class="mjx-texatom" style=""><span id="MJXc-Node-545" class="mjx-mrow"><span id="MJXc-Node-546" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-547" class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">3</span></span><span id="MJXc-Node-548" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><span id="MJXc-Node-549" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span><script type="math/tex" id="MathJax-Element-43">-\log P = -\log(1 - p^{(1)}) - \log p^{(2)} - \log(1 - p^{(3)})</script>. In general, the average negative log likelihood loss is</p>
<span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math MJXp-display" id="MJXp-Span-467"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-468">ℓ</span><span class="MJXp-mo" id="MJXp-Span-469" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-mo" id="MJXp-Span-470" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-mn" id="MJXp-Span-471">1</span><span class="MJXp-mrow" id="MJXp-Span-472"><span class="MJXp-mo" id="MJXp-Span-473" style="margin-left: 0.111em; margin-right: 0.111em;">/</span></span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-474">N</span><span class="MJXp-mo" id="MJXp-Span-475" style="margin-left: 0em; margin-right: 0em;">)</span><span class="MJXp-mo" id="MJXp-Span-476" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-msubsup" id="MJXp-Span-477"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-478" style="margin-right: 0.05em;">ℓ</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-479" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-480">(</span><span class="MJXp-mn" id="MJXp-Span-481">1</span><span class="MJXp-mo" id="MJXp-Span-482">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-483" style="margin-left: 0.267em; margin-right: 0.267em;">+</span><span class="MJXp-mo" id="MJXp-Span-484" style="margin-left: 0em; margin-right: 0em;">…</span><span class="MJXp-mo" id="MJXp-Span-485" style="margin-left: 0.267em; margin-right: 0.267em;">+</span><span class="MJXp-msubsup" id="MJXp-Span-486"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-487" style="margin-right: 0.05em;">ℓ</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-488" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-489">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-490">N</span><span class="MJXp-mo" id="MJXp-Span-491">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-492" style="margin-left: 0em; margin-right: 0em;">)</span><span class="MJXp-mo" id="MJXp-Span-493" style="margin-left: 0em; margin-right: 0.222em;">,</span></span></span><span class="mjx-chtml MJXc-display MJXc-processed" style="text-align: center;"><span id="MathJax-Element-44-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 113%; text-align: center;"><span id="MJXc-Node-550" class="mjx-math"><span id="MJXc-Node-551" class="mjx-mrow"><span id="MJXc-Node-552" class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">ℓ</span></span><span id="MJXc-Node-553" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.101em; padding-bottom: 0.297em;">=</span></span><span id="MJXc-Node-554" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-555" class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">1</span></span><span id="MJXc-Node-556" class="mjx-texatom"><span id="MJXc-Node-557" class="mjx-mrow"><span id="MJXc-Node-558" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">/</span></span></span></span><span id="MJXc-Node-559" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em; padding-right: 0.085em;">N</span></span><span id="MJXc-Node-560" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span><span id="MJXc-Node-561" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-562" class="mjx-msubsup"><span class="mjx-base"><span id="MJXc-Node-563" class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">ℓ</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.584em; padding-left: 0px; padding-right: 0.071em;"><span id="MJXc-Node-564" class="mjx-texatom" style=""><span id="MJXc-Node-565" class="mjx-mrow"><span id="MJXc-Node-566" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-567" class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">1</span></span><span id="MJXc-Node-568" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><span id="MJXc-Node-569" class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.297em; padding-bottom: 0.445em;">+</span></span><span id="MJXc-Node-570" class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.145em; padding-bottom: 0.347em;">…</span></span><span id="MJXc-Node-571" class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.297em; padding-bottom: 0.445em;">+</span></span><span id="MJXc-Node-572" class="mjx-msubsup MJXc-space2"><span class="mjx-base"><span id="MJXc-Node-573" class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">ℓ</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.584em; padding-left: 0px; padding-right: 0.071em;"><span id="MJXc-Node-574" class="mjx-texatom" style=""><span id="MJXc-Node-575" class="mjx-mrow"><span id="MJXc-Node-576" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-577" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em; padding-right: 0.085em;">N</span></span><span id="MJXc-Node-578" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><span id="MJXc-Node-579" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span><span id="MJXc-Node-580" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.145em; padding-bottom: 0.543em;">,</span></span></span></span></span></span><script type="math/tex; mode=display" id="MathJax-Element-44">\ell = (1 / N) (\ell^{(1)} + \ldots + \ell^{(N)}),</script>
<p>where <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-494"><span class="MJXp-msubsup" id="MJXp-Span-495"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-496" style="margin-right: 0.05em;">ℓ</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-497" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-498">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-499">i</span><span class="MJXp-mo" id="MJXp-Span-500">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-501" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-mo" id="MJXp-Span-502" style="margin-left: 0.267em; margin-right: 0.267em;">−</span><span class="MJXp-msubsup" id="MJXp-Span-503"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-504" style="margin-right: 0.05em;">y</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-505" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-506">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-507">i</span><span class="MJXp-mo" id="MJXp-Span-508">)</span></span></span><span class="MJXp-mi" id="MJXp-Span-509">log</span><span class="MJXp-mo" id="MJXp-Span-510" style="margin-left: 0em; margin-right: 0em;"></span><span class="MJXp-msubsup" id="MJXp-Span-511"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-512" style="margin-right: 0.05em;">p</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-513" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-514">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-515">i</span><span class="MJXp-mo" id="MJXp-Span-516">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-517" style="margin-left: 0.267em; margin-right: 0.267em;">−</span><span class="MJXp-mo" id="MJXp-Span-518" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-mn" id="MJXp-Span-519">1</span><span class="MJXp-mo" id="MJXp-Span-520" style="margin-left: 0.267em; margin-right: 0.267em;">−</span><span class="MJXp-msubsup" id="MJXp-Span-521"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-522" style="margin-right: 0.05em;">y</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-523" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-524">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-525">i</span><span class="MJXp-mo" id="MJXp-Span-526">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-527" style="margin-left: 0em; margin-right: 0em;">)</span><span class="MJXp-mi" id="MJXp-Span-528">log</span><span class="MJXp-mo" id="MJXp-Span-529" style="margin-left: 0em; margin-right: 0em;"></span><span class="MJXp-mo" id="MJXp-Span-530" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-mn" id="MJXp-Span-531">1</span><span class="MJXp-mo" id="MJXp-Span-532" style="margin-left: 0.267em; margin-right: 0.267em;">−</span><span class="MJXp-msubsup" id="MJXp-Span-533"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-534" style="margin-right: 0.05em;">p</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-535" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-536">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-537">i</span><span class="MJXp-mo" id="MJXp-Span-538">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-539" style="margin-left: 0em; margin-right: 0em;">)</span></span></span><span id="MathJax-Element-45-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-581" class="mjx-math"><span id="MJXc-Node-582" class="mjx-mrow"><span id="MJXc-Node-583" class="mjx-msubsup"><span class="mjx-base"><span id="MJXc-Node-584" class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">ℓ</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span id="MJXc-Node-585" class="mjx-texatom" style=""><span id="MJXc-Node-586" class="mjx-mrow"><span id="MJXc-Node-587" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-588" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">i</span></span><span id="MJXc-Node-589" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><span id="MJXc-Node-590" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.101em; padding-bottom: 0.297em;">=</span></span><span id="MJXc-Node-591" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.297em; padding-bottom: 0.445em;">−</span></span><span id="MJXc-Node-592" class="mjx-msubsup"><span class="mjx-base" style="margin-right: -0.006em;"><span id="MJXc-Node-593" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.494em; padding-right: 0.006em;">y</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0.082em; padding-right: 0.071em;"><span id="MJXc-Node-594" class="mjx-texatom" style=""><span id="MJXc-Node-595" class="mjx-mrow"><span id="MJXc-Node-596" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-597" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">i</span></span><span id="MJXc-Node-598" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><span id="MJXc-Node-599" class="mjx-mi MJXc-space1"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.543em;">log</span></span><span id="MJXc-Node-600" class="mjx-mo"><span class="mjx-char"></span></span><span id="MJXc-Node-601" class="mjx-msubsup MJXc-space1"><span class="mjx-base"><span id="MJXc-Node-602" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.494em;">p</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span id="MJXc-Node-603" class="mjx-texatom" style=""><span id="MJXc-Node-604" class="mjx-mrow"><span id="MJXc-Node-605" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-606" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">i</span></span><span id="MJXc-Node-607" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><span id="MJXc-Node-608" class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.297em; padding-bottom: 0.445em;">−</span></span><span id="MJXc-Node-609" class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-610" class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">1</span></span><span id="MJXc-Node-611" class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.297em; padding-bottom: 0.445em;">−</span></span><span id="MJXc-Node-612" class="mjx-msubsup MJXc-space2"><span class="mjx-base" style="margin-right: -0.006em;"><span id="MJXc-Node-613" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.494em; padding-right: 0.006em;">y</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0.082em; padding-right: 0.071em;"><span id="MJXc-Node-614" class="mjx-texatom" style=""><span id="MJXc-Node-615" class="mjx-mrow"><span id="MJXc-Node-616" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-617" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">i</span></span><span id="MJXc-Node-618" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><span id="MJXc-Node-619" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span><span id="MJXc-Node-620" class="mjx-mi MJXc-space1"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.543em;">log</span></span><span id="MJXc-Node-621" class="mjx-mo"><span class="mjx-char"></span></span><span id="MJXc-Node-622" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-623" class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">1</span></span><span id="MJXc-Node-624" class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.297em; padding-bottom: 0.445em;">−</span></span><span id="MJXc-Node-625" class="mjx-msubsup MJXc-space2"><span class="mjx-base"><span id="MJXc-Node-626" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.494em;">p</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span id="MJXc-Node-627" class="mjx-texatom" style=""><span id="MJXc-Node-628" class="mjx-mrow"><span id="MJXc-Node-629" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-630" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">i</span></span><span id="MJXc-Node-631" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><span id="MJXc-Node-632" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span><script type="math/tex" id="MathJax-Element-45">\ell^{(i)} = -y^{(i)} \log p^{(i)} - (1 - y^{(i)}) \log(1 - p^{(i)})</script>. This is also called cross-entropy loss and is the most popular loss function for classification tasks. As above, the negative log likelihood is a function of the model parameters <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-540"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-541">θ</span></span></span><span id="MathJax-Element-46-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-633" class="mjx-math"><span id="MJXc-Node-634" class="mjx-mrow"><span id="MJXc-Node-635" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.297em;">θ</span></span></span></span></span><script type="math/tex" id="MathJax-Element-46">\theta</script>.</p>
<h2 id="training-dnns">Training DNNs</h2>
<p>In deep learning, as with general prediction tasks, the model fitting/learning involves minimizing the regularized loss function</p>
<span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math MJXp-display" id="MJXp-Span-542"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-543">J</span><span class="MJXp-mo" id="MJXp-Span-544" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-545">θ</span><span class="MJXp-mo" id="MJXp-Span-546" style="margin-left: 0em; margin-right: 0em;">)</span><span class="MJXp-mo" id="MJXp-Span-547" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-548">ℓ</span><span class="MJXp-mo" id="MJXp-Span-549" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-550">θ</span><span class="MJXp-mo" id="MJXp-Span-551" style="margin-left: 0em; margin-right: 0em;">)</span><span class="MJXp-mo" id="MJXp-Span-552" style="margin-left: 0.267em; margin-right: 0.267em;">+</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-553">R</span><span class="MJXp-mo" id="MJXp-Span-554" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-555">θ</span><span class="MJXp-mo" id="MJXp-Span-556" style="margin-left: 0em; margin-right: 0em;">)</span><span class="MJXp-mo" id="MJXp-Span-557" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-mo" id="MJXp-Span-558" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-mn" id="MJXp-Span-559">1</span><span class="MJXp-mrow" id="MJXp-Span-560"><span class="MJXp-mo" id="MJXp-Span-561" style="margin-left: 0.111em; margin-right: 0.111em;">/</span></span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-562">N</span><span class="MJXp-mo" id="MJXp-Span-563" style="margin-left: 0em; margin-right: 0em;">)</span><span class="MJXp-munderover" id="MJXp-Span-564"><span><span class="MJXp-over"><span class=" MJXp-script"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-570" style="margin-right: 0px; margin-left: 0px;">N</span></span><span class=""><span class="MJXp-mo" id="MJXp-Span-565" style="margin-left: 0.111em; margin-right: 0.167em;"><span class="MJXp-largeop">∑</span></span></span></span></span><span class=" MJXp-script"><span class="MJXp-mrow" id="MJXp-Span-566" style="margin-left: 0px;"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-567">i</span><span class="MJXp-mo" id="MJXp-Span-568">=</span><span class="MJXp-mn" id="MJXp-Span-569">1</span></span></span></span><span class="MJXp-msubsup" id="MJXp-Span-571"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-572" style="margin-right: 0.05em;">ℓ</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-573" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-574">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-575">i</span><span class="MJXp-mo" id="MJXp-Span-576">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-577" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-578">θ</span><span class="MJXp-mo" id="MJXp-Span-579" style="margin-left: 0em; margin-right: 0em;">)</span><span class="MJXp-mo" id="MJXp-Span-580" style="margin-left: 0.267em; margin-right: 0.267em;">+</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-581">R</span><span class="MJXp-mo" id="MJXp-Span-582" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-583">θ</span><span class="MJXp-mo" id="MJXp-Span-584" style="margin-left: 0em; margin-right: 0em;">)</span></span></span><span class="mjx-chtml MJXc-display MJXc-processed" style="text-align: center;"><span id="MathJax-Element-47-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 113%; text-align: center;"><span id="MJXc-Node-636" class="mjx-math"><span id="MJXc-Node-637" class="mjx-mrow"><span id="MJXc-Node-638" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em; padding-right: 0.078em;">J</span></span><span id="MJXc-Node-639" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-640" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.297em;">θ</span></span><span id="MJXc-Node-641" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span><span id="MJXc-Node-642" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.101em; padding-bottom: 0.297em;">=</span></span><span id="MJXc-Node-643" class="mjx-mi MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">ℓ</span></span><span id="MJXc-Node-644" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-645" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.297em;">θ</span></span><span id="MJXc-Node-646" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span><span id="MJXc-Node-647" class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.297em; padding-bottom: 0.445em;">+</span></span><span id="MJXc-Node-648" class="mjx-mi MJXc-space2"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">R</span></span><span id="MJXc-Node-649" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-650" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.297em;">θ</span></span><span id="MJXc-Node-651" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span><span id="MJXc-Node-652" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.101em; padding-bottom: 0.297em;">=</span></span><span id="MJXc-Node-653" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-654" class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">1</span></span><span id="MJXc-Node-655" class="mjx-texatom"><span id="MJXc-Node-656" class="mjx-mrow"><span id="MJXc-Node-657" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">/</span></span></span></span><span id="MJXc-Node-658" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em; padding-right: 0.085em;">N</span></span><span id="MJXc-Node-659" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span><span id="MJXc-Node-660" class="mjx-munderover MJXc-space1"><span class="mjx-itable"><span class="mjx-row"><span class="mjx-cell"><span class="mjx-stack"><span class="mjx-over" style="font-size: 70.7%; padding-bottom: 0.258em; padding-top: 0.141em; padding-left: 0.577em;"><span id="MJXc-Node-667" class="mjx-mi" style=""><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em; padding-right: 0.085em;">N</span></span></span><span class="mjx-op"><span id="MJXc-Node-661" class="mjx-mo"><span class="mjx-char MJXc-TeX-size2-R" style="padding-top: 0.74em; padding-bottom: 0.74em;">∑</span></span></span></span></span></span><span class="mjx-row"><span class="mjx-under" style="font-size: 70.7%; padding-top: 0.236em; padding-bottom: 0.141em; padding-left: 0.21em;"><span id="MJXc-Node-662" class="mjx-texatom" style=""><span id="MJXc-Node-663" class="mjx-mrow"><span id="MJXc-Node-664" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">i</span></span><span id="MJXc-Node-665" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.101em; padding-bottom: 0.297em;">=</span></span><span id="MJXc-Node-666" class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">1</span></span></span></span></span></span></span></span><span id="MJXc-Node-668" class="mjx-msubsup MJXc-space1"><span class="mjx-base"><span id="MJXc-Node-669" class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">ℓ</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.584em; padding-left: 0px; padding-right: 0.071em;"><span id="MJXc-Node-670" class="mjx-texatom" style=""><span id="MJXc-Node-671" class="mjx-mrow"><span id="MJXc-Node-672" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-673" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">i</span></span><span id="MJXc-Node-674" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><span id="MJXc-Node-675" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-676" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.297em;">θ</span></span><span id="MJXc-Node-677" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span><span id="MJXc-Node-678" class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.297em; padding-bottom: 0.445em;">+</span></span><span id="MJXc-Node-679" class="mjx-mi MJXc-space2"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">R</span></span><span id="MJXc-Node-680" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-681" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.297em;">θ</span></span><span id="MJXc-Node-682" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><script type="math/tex; mode=display" id="MathJax-Element-47">J(\theta) = \ell(\theta) + R(\theta) = (1/N) \sum_{i=1}^N \ell^{(i)}(\theta) + R(\theta)</script>
<p>over the parameters <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-585"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-586">θ</span></span></span><span id="MathJax-Element-48-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-683" class="mjx-math"><span id="MJXc-Node-684" class="mjx-mrow"><span id="MJXc-Node-685" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.297em;">θ</span></span></span></span></span><script type="math/tex" id="MathJax-Element-48">\theta</script>. This is often done with an iterative procedure such as gradient descent. In gradient descent, we initialize the parameters at some value and continuously move in the direction of the negative gradient:</p>
<ol>
<li>Initialize <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-587"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-588">θ</span><span class="MJXp-mo" id="MJXp-Span-589" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-msubsup" id="MJXp-Span-590"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-591" style="margin-right: 0.05em;">θ</span><span class="MJXp-mn MJXp-script" id="MJXp-Span-592" style="vertical-align: -0.4em;">0</span></span></span></span><span id="MathJax-Element-49-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-686" class="mjx-math"><span id="MJXc-Node-687" class="mjx-mrow"><span id="MJXc-Node-688" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.297em;">θ</span></span><span id="MJXc-Node-689" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.101em; padding-bottom: 0.297em;">=</span></span><span id="MJXc-Node-690" class="mjx-msubsup MJXc-space3"><span class="mjx-base"><span id="MJXc-Node-691" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.297em;">θ</span></span></span><span class="mjx-sub" style="font-size: 70.7%; vertical-align: -0.212em; padding-right: 0.071em;"><span id="MJXc-Node-692" class="mjx-mn" style=""><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">0</span></span></span></span></span></span></span><script type="math/tex" id="MathJax-Element-49">\theta = \theta_0</script></li>
<li>Repeatedly update <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-593"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-594">θ</span><span class="MJXp-mo" id="MJXp-Span-595" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-596">θ</span><span class="MJXp-mo" id="MJXp-Span-597" style="margin-left: 0.267em; margin-right: 0.267em;">−</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-598">r</span><span class="MJXp-mo" id="MJXp-Span-599" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-mi" id="MJXp-Span-600">∇</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-601">J</span><span class="MJXp-mo" id="MJXp-Span-602" style="margin-left: 0em; margin-right: 0em;">)</span><span class="MJXp-mo" id="MJXp-Span-603" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-604">θ</span><span class="MJXp-mo" id="MJXp-Span-605" style="margin-left: 0em; margin-right: 0em;">)</span></span></span><span id="MathJax-Element-50-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-693" class="mjx-math"><span id="MJXc-Node-694" class="mjx-mrow"><span id="MJXc-Node-695" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.297em;">θ</span></span><span id="MJXc-Node-696" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.101em; padding-bottom: 0.297em;">=</span></span><span id="MJXc-Node-697" class="mjx-mi MJXc-space3"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.297em;">θ</span></span><span id="MJXc-Node-698" class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.297em; padding-bottom: 0.445em;">−</span></span><span id="MJXc-Node-699" class="mjx-mi MJXc-space2"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.297em;">r</span></span><span id="MJXc-Node-700" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-701" class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.396em;">∇</span></span><span id="MJXc-Node-702" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em; padding-right: 0.078em;">J</span></span><span id="MJXc-Node-703" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span><span id="MJXc-Node-704" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-705" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.494em; padding-bottom: 0.297em;">θ</span></span><span id="MJXc-Node-706" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span><script type="math/tex" id="MathJax-Element-50">\theta = \theta - r (\nabla J)(\theta)</script>, where <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-606"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-607">r</span></span></span><span id="MathJax-Element-51-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-707" class="mjx-math"><span id="MJXc-Node-708" class="mjx-mrow"><span id="MJXc-Node-709" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.199em; padding-bottom: 0.297em;">r</span></span></span></span></span><script type="math/tex" id="MathJax-Element-51">r</script> is the step size or learning rate</li>
</ol>
<p>The derivative is a linear operator so the gradient of <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-608"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-609">J</span></span></span><span id="MathJax-Element-52-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-710" class="mjx-math"><span id="MJXc-Node-711" class="mjx-mrow"><span id="MJXc-Node-712" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em; padding-right: 0.078em;">J</span></span></span></span></span><script type="math/tex" id="MathJax-Element-52">J</script> breaks apart:</p>
<span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math MJXp-display" id="MJXp-Span-610"><span class="MJXp-mi" id="MJXp-Span-611">∇</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-612">J</span><span class="MJXp-mo" id="MJXp-Span-613" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-mi" id="MJXp-Span-614">∇</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-615">ℓ</span><span class="MJXp-mo" id="MJXp-Span-616" style="margin-left: 0.267em; margin-right: 0.267em;">+</span><span class="MJXp-mi" id="MJXp-Span-617">∇</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-618">R</span><span class="MJXp-mo" id="MJXp-Span-619" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-mo" id="MJXp-Span-620" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-mn" id="MJXp-Span-621">1</span><span class="MJXp-mrow" id="MJXp-Span-622"><span class="MJXp-mo" id="MJXp-Span-623" style="margin-left: 0.111em; margin-right: 0.111em;">/</span></span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-624">N</span><span class="MJXp-mo" id="MJXp-Span-625" style="margin-left: 0em; margin-right: 0em;">)</span><span class="MJXp-munderover" id="MJXp-Span-626"><span><span class="MJXp-over"><span class=" MJXp-script"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-632" style="margin-right: 0px; margin-left: 0px;">N</span></span><span class=""><span class="MJXp-mo" id="MJXp-Span-627" style="margin-left: 0.111em; margin-right: 0.167em;"><span class="MJXp-largeop">∑</span></span></span></span></span><span class=" MJXp-script"><span class="MJXp-mrow" id="MJXp-Span-628" style="margin-left: 0px;"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-629">i</span><span class="MJXp-mo" id="MJXp-Span-630">=</span><span class="MJXp-mn" id="MJXp-Span-631">1</span></span></span></span><span class="MJXp-mi" id="MJXp-Span-633">∇</span><span class="MJXp-msubsup" id="MJXp-Span-634"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-635" style="margin-right: 0.05em;">ℓ</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-636" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-637">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-638">i</span><span class="MJXp-mo" id="MJXp-Span-639">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-640" style="margin-left: 0.267em; margin-right: 0.267em;">+</span><span class="MJXp-mi" id="MJXp-Span-641">∇</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-642">R</span><span class="MJXp-mo" id="MJXp-Span-643" style="margin-left: 0em; margin-right: 0.222em;">.</span></span></span><span class="mjx-chtml MJXc-display MJXc-processed" style="text-align: center;"><span id="MathJax-Element-53-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0" style="font-size: 113%; text-align: center;"><span id="MJXc-Node-713" class="mjx-math"><span id="MJXc-Node-714" class="mjx-mrow"><span id="MJXc-Node-715" class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.396em;">∇</span></span><span id="MJXc-Node-716" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em; padding-right: 0.078em;">J</span></span><span id="MJXc-Node-717" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.101em; padding-bottom: 0.297em;">=</span></span><span id="MJXc-Node-718" class="mjx-mi MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.396em;">∇</span></span><span id="MJXc-Node-719" class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">ℓ</span></span><span id="MJXc-Node-720" class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.297em; padding-bottom: 0.445em;">+</span></span><span id="MJXc-Node-721" class="mjx-mi MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.396em;">∇</span></span><span id="MJXc-Node-722" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">R</span></span><span id="MJXc-Node-723" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.101em; padding-bottom: 0.297em;">=</span></span><span id="MJXc-Node-724" class="mjx-mo MJXc-space3"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-725" class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">1</span></span><span id="MJXc-Node-726" class="mjx-texatom"><span id="MJXc-Node-727" class="mjx-mrow"><span id="MJXc-Node-728" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">/</span></span></span></span><span id="MJXc-Node-729" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em; padding-right: 0.085em;">N</span></span><span id="MJXc-Node-730" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span><span id="MJXc-Node-731" class="mjx-munderover MJXc-space1"><span class="mjx-itable"><span class="mjx-row"><span class="mjx-cell"><span class="mjx-stack"><span class="mjx-over" style="font-size: 70.7%; padding-bottom: 0.258em; padding-top: 0.141em; padding-left: 0.577em;"><span id="MJXc-Node-738" class="mjx-mi" style=""><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em; padding-right: 0.085em;">N</span></span></span><span class="mjx-op"><span id="MJXc-Node-732" class="mjx-mo"><span class="mjx-char MJXc-TeX-size2-R" style="padding-top: 0.74em; padding-bottom: 0.74em;">∑</span></span></span></span></span></span><span class="mjx-row"><span class="mjx-under" style="font-size: 70.7%; padding-top: 0.236em; padding-bottom: 0.141em; padding-left: 0.21em;"><span id="MJXc-Node-733" class="mjx-texatom" style=""><span id="MJXc-Node-734" class="mjx-mrow"><span id="MJXc-Node-735" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">i</span></span><span id="MJXc-Node-736" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.101em; padding-bottom: 0.297em;">=</span></span><span id="MJXc-Node-737" class="mjx-mn"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">1</span></span></span></span></span></span></span></span><span id="MJXc-Node-739" class="mjx-mi MJXc-space1"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.396em;">∇</span></span><span id="MJXc-Node-740" class="mjx-msubsup"><span class="mjx-base"><span id="MJXc-Node-741" class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">ℓ</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.584em; padding-left: 0px; padding-right: 0.071em;"><span id="MJXc-Node-742" class="mjx-texatom" style=""><span id="MJXc-Node-743" class="mjx-mrow"><span id="MJXc-Node-744" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-745" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">i</span></span><span id="MJXc-Node-746" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span><span id="MJXc-Node-747" class="mjx-mo MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.297em; padding-bottom: 0.445em;">+</span></span><span id="MJXc-Node-748" class="mjx-mi MJXc-space2"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.396em;">∇</span></span><span id="MJXc-Node-749" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">R</span></span><span id="MJXc-Node-750" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="margin-top: -0.145em; padding-bottom: 0.347em;">.</span></span></span></span></span></span><script type="math/tex; mode=display" id="MathJax-Element-53">\nabla J = \nabla \ell + \nabla R = (1/N) \sum_{i=1}^N \nabla \ell^{(i)} + \nabla R.</script>
<p>The above expression shows why gradient descent can be prohibitively expensive in big data applications: each gradient computation requires computing <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-644"><span class="MJXp-mi" id="MJXp-Span-645">∇</span><span class="MJXp-msubsup" id="MJXp-Span-646"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-647" style="margin-right: 0.05em;">ℓ</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-648" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-649">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-650">i</span><span class="MJXp-mo" id="MJXp-Span-651">)</span></span></span></span></span><span id="MathJax-Element-54-Frame" class="mjx-chtml MathJax_CHTML MJXc-processed" tabindex="0" style="font-size: 113%;"><span id="MJXc-Node-751" class="mjx-math"><span id="MJXc-Node-752" class="mjx-mrow"><span id="MJXc-Node-753" class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.396em;">∇</span></span><span id="MJXc-Node-754" class="mjx-msubsup"><span class="mjx-base"><span id="MJXc-Node-755" class="mjx-mi"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.396em; padding-bottom: 0.347em;">ℓ</span></span></span><span class="mjx-sup" style="font-size: 70.7%; vertical-align: 0.513em; padding-left: 0px; padding-right: 0.071em;"><span id="MJXc-Node-756" class="mjx-texatom" style=""><span id="MJXc-Node-757" class="mjx-mrow"><span id="MJXc-Node-758" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">(</span></span><span id="MJXc-Node-759" class="mjx-mi"><span class="mjx-char MJXc-TeX-math-I" style="padding-top: 0.445em; padding-bottom: 0.297em;">i</span></span><span id="MJXc-Node-760" class="mjx-mo"><span class="mjx-char MJXc-TeX-main-R" style="padding-top: 0.445em; padding-bottom: 0.592em;">)</span></span></span></span></span></span></span></span></span><script type="math/tex" id="MathJax-Element-54">\nabla \ell^{(i)}</script> for <em>every</em> observation in the training data. The usual solution is to replace the gradient with a noisy, but cheap, stochastic approximation:</p>
<span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math MJXp-display" id="MJXp-Span-652"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-653">g</span><span class="MJXp-mo" id="MJXp-Span-654" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-mo" id="MJXp-Span-655" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-mn" id="MJXp-Span-656">1</span><span class="MJXp-mrow" id="MJXp-Span-657"><span class="MJXp-mo" id="MJXp-Span-658" style="margin-left: 0.111em; margin-right: 0.111em;">/</span></span><span class="MJXp-mo" id="MJXp-Span-659" style="margin-left: 0.167em; margin-right: 0.167em;">|</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-660">B</span><span class="MJXp-mo" id="MJXp-Span-661" style="margin-left: 0.167em; margin-right: 0.167em;">|</span><span class="MJXp-mo" id="MJXp-Span-662" style="margin-left: 0em; margin-right: 0em;">)</span><span class="MJXp-munderover" id="MJXp-Span-663"><span class=""><span class="MJXp-mo" id="MJXp-Span-664" style="margin-left: 0.111em; margin-right: 0.167em;"><span class="MJXp-largeop">∑</span></span></span><span class=" MJXp-script"><span class="MJXp-mrow" id="MJXp-Span-665" style="margin-left: 0px;"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-666">i</span><span class="MJXp-mo" id="MJXp-Span-667">∈</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-668">B</span></span></span></span><span class="MJXp-mi" id="MJXp-Span-669">∇</span><span class="MJXp-msubsup" id="MJXp-Span-670"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-671" style="margin-right: 0.05em;">ℓ</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-672" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-673">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-674">i</span><span class="MJXp-mo" id="MJXp-Span-675">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-676" style="margin-left: 0.267em; margin-right: 0.267em;">+</span><span class="MJXp-mi" id="MJXp-Span-677">∇</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-678">R</span><span class="MJXp-mo" id="MJXp-Span-679" style="margin-left: 0em; margin-right: 0.222em;">,</span></span></span><span class="mjx-chtml MJXc-display MJXc-processing"><span id="MathJax-Element-55-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0"></span></span><script type="math/tex; mode=display" id="MathJax-Element-55">g = (1 / \vert B \vert) \sum_{i \in B} \nabla \ell^{(i)} + \nabla R,</script>
<p>where <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-680"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-681">B</span></span></span><span id="MathJax-Element-56-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-56">B</script> is a (random) batch of training data. This yields stochastic (or mini-batch) gradient descent.</p>
<p>(It is important that <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-682"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-683">B</span></span></span><span id="MathJax-Element-57-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-57">B</script> is a random batch of training data so that <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-684"><span class="MJXp-mrow" id="MJXp-Span-685"><span class="MJXp-mtext MJXp-bold" id="MJXp-Span-686">E</span></span><span class="MJXp-mo" id="MJXp-Span-687" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-688">g</span><span class="MJXp-mo" id="MJXp-Span-689" style="margin-left: 0em; margin-right: 0em;">)</span><span class="MJXp-mo" id="MJXp-Span-690" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-mi" id="MJXp-Span-691">∇</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-692">J</span></span></span><span id="MathJax-Element-58-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-58">\textbf{E}(g) = \nabla J</script>. This requires shuffling the training data before breaking it into batches.)</p>
<p>We have given all the details for training an arbitrary predictive model in a big data setting. In order to flesh out the details for deep learning, we just need to discuss how to compute <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-693"><span class="MJXp-mi" id="MJXp-Span-694">∇</span><span class="MJXp-msubsup" id="MJXp-Span-695"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-696" style="margin-right: 0.05em;">ℓ</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-697" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-698">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-699">i</span><span class="MJXp-mo" id="MJXp-Span-700">)</span></span></span></span></span><span id="MathJax-Element-59-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-59">\nabla \ell^{(i)}</script>, the derivative of the loss on a single training sample. Before discussing back propagation (the way we compute <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-701"><span class="MJXp-mi" id="MJXp-Span-702">∇</span><span class="MJXp-msubsup" id="MJXp-Span-703"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-704" style="margin-right: 0.05em;">ℓ</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-705" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-706">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-707">i</span><span class="MJXp-mo" id="MJXp-Span-708">)</span></span></span></span></span><span id="MathJax-Element-60-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-60">\nabla \ell^{(i)}</script>), we discuss forward propagation as a way to introduce notation.</p>
<h3 id="forward-propagation">Forward propagation</h3>
<p>Forward propagation is how we compute <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-709"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-710">f</span><span class="MJXp-mo" id="MJXp-Span-711" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-712">x</span><span class="MJXp-mo" id="MJXp-Span-713" style="margin-left: 0em; margin-right: 0em;">)</span></span></span><span id="MathJax-Element-61-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-61">f(x)</script>, the network’s prediction for an observation with features <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-714"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-715">x</span></span></span><span id="MathJax-Element-62-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-62">x</script>. (In classification tasks, it’s how we compute <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-716"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-717">p</span><span class="MJXp-mo" id="MJXp-Span-718" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-719">x</span><span class="MJXp-mo" id="MJXp-Span-720" style="margin-left: 0em; margin-right: 0em;">)</span></span></span><span id="MathJax-Element-63-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-63">p(x)</script>, the probability that <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-721"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-722">y</span><span class="MJXp-mo" id="MJXp-Span-723" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-mn" id="MJXp-Span-724">1</span></span></span><span id="MathJax-Element-64-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-64">y = 1</script> given features <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-725"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-726">x</span></span></span><span id="MathJax-Element-65-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-65">x</script>.)</p>
<p>We let <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-727"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-728">L</span></span></span><span id="MathJax-Element-66-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-66">L</script> denote the number of layers in the network and <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-729"><span class="MJXp-msubsup" id="MJXp-Span-730"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-731" style="margin-right: 0.05em;">n</span><span class="MJXp-mi MJXp-italic MJXp-script" id="MJXp-Span-732" style="vertical-align: -0.4em;">l</span></span></span></span><span id="MathJax-Element-67-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-67">n_l</script> denote the number of units in layer <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-733"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-734">l</span></span></span><span id="MathJax-Element-68-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-68">l</script> for <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-735"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-736">l</span><span class="MJXp-mo" id="MJXp-Span-737" style="margin-left: 0.333em; margin-right: 0.333em;">∈</span><span class="MJXp-mo" id="MJXp-Span-738" style="margin-left: 0em; margin-right: 0em;">{</span><span class="MJXp-mn" id="MJXp-Span-739">0</span><span class="MJXp-mo" id="MJXp-Span-740" style="margin-left: 0em; margin-right: 0.222em;">,</span><span class="MJXp-mn" id="MJXp-Span-741">1</span><span class="MJXp-mo" id="MJXp-Span-742" style="margin-left: 0em; margin-right: 0.222em;">,</span><span class="MJXp-mo" id="MJXp-Span-743" style="margin-left: 0em; margin-right: 0em;">…</span><span class="MJXp-mo" id="MJXp-Span-744" style="margin-left: 0em; margin-right: 0.222em;">,</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-745">L</span><span class="MJXp-mo" id="MJXp-Span-746" style="margin-left: 0em; margin-right: 0em;">}</span></span></span><span id="MathJax-Element-69-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-69">l \in \{0, 1, \ldots, L\}</script>.
We let <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-747"><span class="MJXp-msubsup" id="MJXp-Span-748"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-749" style="margin-right: 0.05em;">a</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-750" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-751">[</span><span class="MJXp-mn" id="MJXp-Span-752">0</span><span class="MJXp-mo" id="MJXp-Span-753">]</span><span class="MJXp-mo" id="MJXp-Span-754">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-755">i</span><span class="MJXp-mo" id="MJXp-Span-756">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-757" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-msubsup" id="MJXp-Span-758"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-759" style="margin-right: 0.05em;">x</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-760" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-761">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-762">i</span><span class="MJXp-mo" id="MJXp-Span-763">)</span></span></span></span></span><span id="MathJax-Element-70-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-70">a^{[0](i)} = x^{(i)}</script> be the input (for the <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-764"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-765">i</span></span></span><span id="MathJax-Element-71-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-71">i</script>th data point), <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-766"><span class="MJXp-msubsup" id="MJXp-Span-767"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-768" style="margin-right: 0.05em;">a</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-769" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-770">[</span><span class="MJXp-mn" id="MJXp-Span-771">1</span><span class="MJXp-mo" id="MJXp-Span-772">]</span><span class="MJXp-mo" id="MJXp-Span-773">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-774">i</span><span class="MJXp-mo" id="MJXp-Span-775">)</span></span></span></span></span><span id="MathJax-Element-72-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-72">a^{[1](i)}</script> be the activations from the first layer, <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-776"><span class="MJXp-msubsup" id="MJXp-Span-777"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-778" style="margin-right: 0.05em;">a</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-779" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-780">[</span><span class="MJXp-mn" id="MJXp-Span-781">2</span><span class="MJXp-mo" id="MJXp-Span-782">]</span><span class="MJXp-mo" id="MJXp-Span-783">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-784">i</span><span class="MJXp-mo" id="MJXp-Span-785">)</span></span></span></span></span><span id="MathJax-Element-73-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-73">a^{[2](i)}</script> be the activations from the second layer, and so on. Notice that <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-786"><span class="MJXp-msubsup" id="MJXp-Span-787"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-788" style="margin-right: 0.05em;">a</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-789" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-790">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-791">l</span><span class="MJXp-mo" id="MJXp-Span-792">]</span><span class="MJXp-mo" id="MJXp-Span-793">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-794">i</span><span class="MJXp-mo" id="MJXp-Span-795">)</span></span></span></span></span><span id="MathJax-Element-74-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-74">a^{[l](i)}</script> is a vector of length <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-796"><span class="MJXp-msubsup" id="MJXp-Span-797"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-798" style="margin-right: 0.05em;">n</span><span class="MJXp-mi MJXp-italic MJXp-script" id="MJXp-Span-799" style="vertical-align: -0.4em;">l</span></span></span></span><span id="MathJax-Element-75-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-75">n_l</script>. The output is <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-800"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-801">f</span><span class="MJXp-mo" id="MJXp-Span-802" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-msubsup" id="MJXp-Span-803"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-804" style="margin-right: 0.05em;">x</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-805" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-806">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-807">i</span><span class="MJXp-mo" id="MJXp-Span-808">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-809" style="margin-left: 0em; margin-right: 0em;">)</span><span class="MJXp-mo" id="MJXp-Span-810" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-msubsup" id="MJXp-Span-811"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-812" style="margin-right: 0.05em;">a</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-813" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-814">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-815">L</span><span class="MJXp-mo" id="MJXp-Span-816">]</span><span class="MJXp-mo" id="MJXp-Span-817">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-818">i</span><span class="MJXp-mo" id="MJXp-Span-819">)</span></span></span></span></span><span id="MathJax-Element-76-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-76">f(x^{(i)}) = a^{[L](i)}</script>, the activations from the last layer. In a feed-forward network, the activations are defined recursively:</p>
<span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math MJXp-display" id="MJXp-Span-820"><span class="MJXp-mtable" id="MJXp-Span-821"><span><span class="MJXp-mtr" id="MJXp-Span-822" style="vertical-align: baseline;"><span class="MJXp-mtd" id="MJXp-Span-823" style="text-align: right;"><span class="MJXp-msubsup" id="MJXp-Span-824"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-825" style="margin-right: 0.05em;">z</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-826" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-827">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-828">l</span><span class="MJXp-mo" id="MJXp-Span-829">]</span><span class="MJXp-mo" id="MJXp-Span-830">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-831">i</span><span class="MJXp-mo" id="MJXp-Span-832">)</span></span></span></span><span class="MJXp-mtd" id="MJXp-Span-833" style="padding-left: 0em; text-align: left;"><span class="MJXp-mi" id="MJXp-Span-834"></span><span class="MJXp-mo" id="MJXp-Span-835" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-msubsup" id="MJXp-Span-836"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-837" style="margin-right: 0.05em;">W</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-838" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-839">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-840">l</span><span class="MJXp-mo" id="MJXp-Span-841">]</span></span></span><span class="MJXp-msubsup" id="MJXp-Span-842"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-843" style="margin-right: 0.05em;">a</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-844" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-845">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-846">l</span><span class="MJXp-mo" id="MJXp-Span-847">−</span><span class="MJXp-mn" id="MJXp-Span-848">1</span><span class="MJXp-mo" id="MJXp-Span-849">]</span><span class="MJXp-mo" id="MJXp-Span-850">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-851">i</span><span class="MJXp-mo" id="MJXp-Span-852">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-853" style="margin-left: 0.267em; margin-right: 0.267em;">+</span><span class="MJXp-msubsup" id="MJXp-Span-854"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-855" style="margin-right: 0.05em;">b</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-856" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-857">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-858">l</span><span class="MJXp-mo" id="MJXp-Span-859">]</span></span></span></span></span><span class="MJXp-mtr" id="MJXp-Span-860" style="vertical-align: baseline;"><span class="MJXp-mtd" id="MJXp-Span-861" style="padding-top: 0.3em; text-align: right;"><span class="MJXp-msubsup" id="MJXp-Span-862"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-863" style="margin-right: 0.05em;">a</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-864" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-865">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-866">l</span><span class="MJXp-mo" id="MJXp-Span-867">]</span><span class="MJXp-mo" id="MJXp-Span-868">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-869">i</span><span class="MJXp-mo" id="MJXp-Span-870">)</span></span></span></span><span class="MJXp-mtd" id="MJXp-Span-871" style="padding-left: 0em; padding-top: 0.3em; text-align: left;"><span class="MJXp-mi" id="MJXp-Span-872"></span><span class="MJXp-mo" id="MJXp-Span-873" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-msubsup" id="MJXp-Span-874"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-875" style="margin-right: 0.05em;">g</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-876" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-877">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-878">l</span><span class="MJXp-mo" id="MJXp-Span-879">]</span></span></span><span class="MJXp-mo" id="MJXp-Span-880" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-msubsup" id="MJXp-Span-881"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-882" style="margin-right: 0.05em;">z</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-883" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-884">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-885">l</span><span class="MJXp-mo" id="MJXp-Span-886">]</span><span class="MJXp-mo" id="MJXp-Span-887">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-888">i</span><span class="MJXp-mo" id="MJXp-Span-889">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-890" style="margin-left: 0em; margin-right: 0em;">)</span></span></span></span></span></span></span><span class="mjx-chtml MJXc-display MJXc-processing"><span id="MathJax-Element-77-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0"></span></span><script type="math/tex; mode=display" id="MathJax-Element-77">% <![CDATA[
\begin{aligned}
z^{[l](i)} &= W^{[l]} a^{[l-1](i)} + b^{[l]} \\
a^{[l](i)} &= g^{[l]}(z^{[l](i)})
\end{aligned} %]]></script>
<p>Here <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-891"><span class="MJXp-msubsup" id="MJXp-Span-892"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-893" style="margin-right: 0.05em;">W</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-894" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-895">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-896">l</span><span class="MJXp-mo" id="MJXp-Span-897">]</span></span></span></span></span><span id="MathJax-Element-78-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-78">W^{[l]}</script> is an <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-898"><span class="MJXp-msubsup" id="MJXp-Span-899"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-900" style="margin-right: 0.05em;">n</span><span class="MJXp-mi MJXp-italic MJXp-script" id="MJXp-Span-901" style="vertical-align: -0.4em;">l</span></span><span class="MJXp-mo" id="MJXp-Span-902" style="margin-left: 0.267em; margin-right: 0.267em;">×</span><span class="MJXp-msubsup" id="MJXp-Span-903"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-904" style="margin-right: 0.05em;">n</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-905" style="vertical-align: -0.4em;"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-906">l</span><span class="MJXp-mo" id="MJXp-Span-907">−</span><span class="MJXp-mn" id="MJXp-Span-908">1</span></span></span></span></span><span id="MathJax-Element-79-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-79">n_l \times n_{l-1}</script> matrix and <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-909"><span class="MJXp-msubsup" id="MJXp-Span-910"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-911" style="margin-right: 0.05em;">b</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-912" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-913">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-914">l</span><span class="MJXp-mo" id="MJXp-Span-915">]</span></span></span></span></span><span id="MathJax-Element-80-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-80">b^{[l]}</script> is an <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-916"><span class="MJXp-msubsup" id="MJXp-Span-917"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-918" style="margin-right: 0.05em;">n</span><span class="MJXp-mi MJXp-italic MJXp-script" id="MJXp-Span-919" style="vertical-align: -0.4em;">l</span></span><span class="MJXp-mo" id="MJXp-Span-920" style="margin-left: 0.267em; margin-right: 0.267em;">×</span><span class="MJXp-mn" id="MJXp-Span-921">1</span></span></span><span id="MathJax-Element-81-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-81">n_l \times 1</script> vector that linearly transform the outputs <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-922"><span class="MJXp-msubsup" id="MJXp-Span-923"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-924" style="margin-right: 0.05em;">a</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-925" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-926">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-927">l</span><span class="MJXp-mo" id="MJXp-Span-928">−</span><span class="MJXp-mn" id="MJXp-Span-929">1</span><span class="MJXp-mo" id="MJXp-Span-930">]</span><span class="MJXp-mo" id="MJXp-Span-931">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-932">i</span><span class="MJXp-mo" id="MJXp-Span-933">)</span></span></span></span></span><span id="MathJax-Element-82-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-82">a^{[l-1](i)}</script> from the previous layer. The function <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-934"><span class="MJXp-msubsup" id="MJXp-Span-935"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-936" style="margin-right: 0.05em;">g</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-937" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-938">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-939">l</span><span class="MJXp-mo" id="MJXp-Span-940">]</span></span></span></span></span><span id="MathJax-Element-83-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-83">g^{[l]}</script> is a nonlinear activation function and is applied elementwise.</p>
<p>In code, we’ll process a batch of observations at a time. For simplicity, suppose our batch is the first <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-941"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-942">m</span></span></span><span id="MathJax-Element-84-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-84">m</script> observations <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-943"><span class="MJXp-mo" id="MJXp-Span-944" style="margin-left: 0em; margin-right: 0em;">{</span><span class="MJXp-mn" id="MJXp-Span-945">1</span><span class="MJXp-mo" id="MJXp-Span-946" style="margin-left: 0em; margin-right: 0.222em;">,</span><span class="MJXp-mn" id="MJXp-Span-947">2</span><span class="MJXp-mo" id="MJXp-Span-948" style="margin-left: 0em; margin-right: 0.222em;">,</span><span class="MJXp-mo" id="MJXp-Span-949" style="margin-left: 0em; margin-right: 0em;">…</span><span class="MJXp-mo" id="MJXp-Span-950" style="margin-left: 0em; margin-right: 0.222em;">,</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-951">m</span><span class="MJXp-mo" id="MJXp-Span-952" style="margin-left: 0em; margin-right: 0em;">}</span></span></span><span id="MathJax-Element-85-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-85">\{1, 2, \ldots, m\}</script>. For each observation <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-953"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-954">i</span></span></span><span id="MathJax-Element-86-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-86">i</script> in the batch, we store <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-955"><span class="MJXp-msubsup" id="MJXp-Span-956"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-957" style="margin-right: 0.05em;">z</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-958" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-959">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-960">l</span><span class="MJXp-mo" id="MJXp-Span-961">]</span><span class="MJXp-mo" id="MJXp-Span-962">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-963">i</span><span class="MJXp-mo" id="MJXp-Span-964">)</span></span></span></span></span><span id="MathJax-Element-87-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-87">z^{[l](i)}</script> and <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-965"><span class="MJXp-msubsup" id="MJXp-Span-966"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-967" style="margin-right: 0.05em;">a</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-968" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-969">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-970">l</span><span class="MJXp-mo" id="MJXp-Span-971">]</span><span class="MJXp-mo" id="MJXp-Span-972">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-973">i</span><span class="MJXp-mo" id="MJXp-Span-974">)</span></span></span></span></span><span id="MathJax-Element-88-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-88">a^{[l](i)}</script> as columns in a matrix:</p>
<span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math MJXp-display" id="MJXp-Span-975"><span class="MJXp-mtable" id="MJXp-Span-976"><span><span class="MJXp-mtr" id="MJXp-Span-977" style="vertical-align: baseline;"><span class="MJXp-mtd" id="MJXp-Span-978" style="text-align: right;"><span class="MJXp-msubsup" id="MJXp-Span-979"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-980" style="margin-right: 0.05em;">Z</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-981" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-982">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-983">l</span><span class="MJXp-mo" id="MJXp-Span-984">]</span></span></span></span><span class="MJXp-mtd" id="MJXp-Span-985" style="padding-left: 0em; text-align: left;"><span class="MJXp-mi" id="MJXp-Span-986"></span><span class="MJXp-mo" id="MJXp-Span-987" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-mrow" id="MJXp-Span-988"><span class="MJXp-mo" id="MJXp-Span-989" style="margin-left: 0em; margin-right: 0em; vertical-align: -0.172em;"><span class="MJXp-right MJXp-scale8" style="font-size: 1.689em; margin-left: -0.01em;">[</span></span><span class="MJXp-mtable" id="MJXp-Span-990"><span><span class="MJXp-mtr" id="MJXp-Span-991" style="vertical-align: baseline;"><span class="MJXp-mtd" id="MJXp-Span-992" style="text-align: center;"><span class="MJXp-msubsup" id="MJXp-Span-993"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-994" style="margin-right: 0.05em;">z</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-995" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-996">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-997">l</span><span class="MJXp-mo" id="MJXp-Span-998">]</span><span class="MJXp-mo" id="MJXp-Span-999">(</span><span class="MJXp-mn" id="MJXp-Span-1000">1</span><span class="MJXp-mo" id="MJXp-Span-1001">)</span></span></span></span><span class="MJXp-mtd" id="MJXp-Span-1002" style="padding-left: 1em; text-align: center;"><span class="MJXp-msubsup" id="MJXp-Span-1003"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1004" style="margin-right: 0.05em;">z</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1005" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-1006">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1007">l</span><span class="MJXp-mo" id="MJXp-Span-1008">]</span><span class="MJXp-mo" id="MJXp-Span-1009">(</span><span class="MJXp-mn" id="MJXp-Span-1010">2</span><span class="MJXp-mo" id="MJXp-Span-1011">)</span></span></span></span><span class="MJXp-mtd" id="MJXp-Span-1012" style="padding-left: 1em; text-align: center;"><span class="MJXp-mo" id="MJXp-Span-1013" style="margin-left: 0em; margin-right: 0em;">…</span></span><span class="MJXp-mtd" id="MJXp-Span-1014" style="padding-left: 1em; text-align: center;"><span class="MJXp-msubsup" id="MJXp-Span-1015"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1016" style="margin-right: 0.05em;">z</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1017" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-1018">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1019">l</span><span class="MJXp-mo" id="MJXp-Span-1020">]</span><span class="MJXp-mo" id="MJXp-Span-1021">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1022">m</span><span class="MJXp-mo" id="MJXp-Span-1023">)</span></span></span></span></span></span></span><span class="MJXp-mo" id="MJXp-Span-1024" style="margin-left: 0em; margin-right: 0em; vertical-align: -0.172em;"><span class="MJXp-right MJXp-scale8" style="font-size: 1.689em; margin-left: -0.01em;">]</span></span></span></span></span><span class="MJXp-mtr" id="MJXp-Span-1025" style="vertical-align: baseline;"><span class="MJXp-mtd" id="MJXp-Span-1026" style="padding-top: 0.3em; text-align: right;"><span class="MJXp-msubsup" id="MJXp-Span-1027"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1028" style="margin-right: 0.05em;">A</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1029" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-1030">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1031">l</span><span class="MJXp-mo" id="MJXp-Span-1032">]</span></span></span></span><span class="MJXp-mtd" id="MJXp-Span-1033" style="padding-left: 0em; padding-top: 0.3em; text-align: left;"><span class="MJXp-mi" id="MJXp-Span-1034"></span><span class="MJXp-mo" id="MJXp-Span-1035" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-mrow" id="MJXp-Span-1036"><span class="MJXp-mo" id="MJXp-Span-1037" style="margin-left: 0em; margin-right: 0em; vertical-align: -0.172em;"><span class="MJXp-right MJXp-scale8" style="font-size: 1.689em; margin-left: -0.01em;">[</span></span><span class="MJXp-mtable" id="MJXp-Span-1038"><span><span class="MJXp-mtr" id="MJXp-Span-1039" style="vertical-align: baseline;"><span class="MJXp-mtd" id="MJXp-Span-1040" style="text-align: center;"><span class="MJXp-msubsup" id="MJXp-Span-1041"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1042" style="margin-right: 0.05em;">a</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1043" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-1044">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1045">l</span><span class="MJXp-mo" id="MJXp-Span-1046">]</span><span class="MJXp-mo" id="MJXp-Span-1047">(</span><span class="MJXp-mn" id="MJXp-Span-1048">1</span><span class="MJXp-mo" id="MJXp-Span-1049">)</span></span></span></span><span class="MJXp-mtd" id="MJXp-Span-1050" style="padding-left: 1em; text-align: center;"><span class="MJXp-msubsup" id="MJXp-Span-1051"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1052" style="margin-right: 0.05em;">a</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1053" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-1054">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1055">l</span><span class="MJXp-mo" id="MJXp-Span-1056">]</span><span class="MJXp-mo" id="MJXp-Span-1057">(</span><span class="MJXp-mn" id="MJXp-Span-1058">2</span><span class="MJXp-mo" id="MJXp-Span-1059">)</span></span></span></span><span class="MJXp-mtd" id="MJXp-Span-1060" style="padding-left: 1em; text-align: center;"><span class="MJXp-mo" id="MJXp-Span-1061" style="margin-left: 0em; margin-right: 0em;">…</span></span><span class="MJXp-mtd" id="MJXp-Span-1062" style="padding-left: 1em; text-align: center;"><span class="MJXp-msubsup" id="MJXp-Span-1063"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1064" style="margin-right: 0.05em;">a</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1065" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-1066">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1067">l</span><span class="MJXp-mo" id="MJXp-Span-1068">]</span><span class="MJXp-mo" id="MJXp-Span-1069">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1070">m</span><span class="MJXp-mo" id="MJXp-Span-1071">)</span></span></span></span></span></span></span><span class="MJXp-mo" id="MJXp-Span-1072" style="margin-left: 0em; margin-right: 0em; vertical-align: -0.172em;"><span class="MJXp-right MJXp-scale8" style="font-size: 1.689em; margin-left: -0.01em;">]</span></span></span></span></span></span></span></span></span><span class="mjx-chtml MJXc-display MJXc-processing"><span id="MathJax-Element-89-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0"></span></span><script type="math/tex; mode=display" id="MathJax-Element-89">% <![CDATA[
\begin{aligned}
Z^{[l]} &= \begin{bmatrix} z^{[l](1)} & z^{[l](2)} & \ldots & z^{[l](m)} \end{bmatrix} \\
A^{[l]} &= \begin{bmatrix} a^{[l](1)} & a^{[l](2)} & \ldots & a^{[l](m)} \end{bmatrix}
\end{aligned} %]]></script>
<p>With this notation, forward-propagating a batch requires recursively computing</p>
<span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math MJXp-display" id="MJXp-Span-1073"><span class="MJXp-mtable" id="MJXp-Span-1074"><span><span class="MJXp-mtr" id="MJXp-Span-1075" style="vertical-align: baseline;"><span class="MJXp-mtd" id="MJXp-Span-1076" style="text-align: right;"><span class="MJXp-msubsup" id="MJXp-Span-1077"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1078" style="margin-right: 0.05em;">Z</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1079" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-1080">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1081">l</span><span class="MJXp-mo" id="MJXp-Span-1082">]</span></span></span></span><span class="MJXp-mtd" id="MJXp-Span-1083" style="padding-left: 0em; text-align: left;"><span class="MJXp-mi" id="MJXp-Span-1084"></span><span class="MJXp-mo" id="MJXp-Span-1085" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-msubsup" id="MJXp-Span-1086"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1087" style="margin-right: 0.05em;">W</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1088" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-1089">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1090">l</span><span class="MJXp-mo" id="MJXp-Span-1091">]</span></span></span><span class="MJXp-msubsup" id="MJXp-Span-1092"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1093" style="margin-right: 0.05em;">A</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1094" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-1095">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1096">l</span><span class="MJXp-mo" id="MJXp-Span-1097">−</span><span class="MJXp-mn" id="MJXp-Span-1098">1</span><span class="MJXp-mo" id="MJXp-Span-1099">]</span></span></span><span class="MJXp-mo" id="MJXp-Span-1100" style="margin-left: 0.267em; margin-right: 0.267em;">+</span><span class="MJXp-msubsup" id="MJXp-Span-1101"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1102" style="margin-right: 0.05em;">b</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1103" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-1104">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1105">l</span><span class="MJXp-mo" id="MJXp-Span-1106">]</span></span></span></span></span><span class="MJXp-mtr" id="MJXp-Span-1107" style="vertical-align: baseline;"><span class="MJXp-mtd" id="MJXp-Span-1108" style="padding-top: 0.3em; text-align: right;"><span class="MJXp-msubsup" id="MJXp-Span-1109"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1110" style="margin-right: 0.05em;">A</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1111" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-1112">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1113">l</span><span class="MJXp-mo" id="MJXp-Span-1114">]</span></span></span></span><span class="MJXp-mtd" id="MJXp-Span-1115" style="padding-left: 0em; padding-top: 0.3em; text-align: left;"><span class="MJXp-mi" id="MJXp-Span-1116"></span><span class="MJXp-mo" id="MJXp-Span-1117" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-msubsup" id="MJXp-Span-1118"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1119" style="margin-right: 0.05em;">g</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1120" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-1121">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1122">l</span><span class="MJXp-mo" id="MJXp-Span-1123">]</span></span></span><span class="MJXp-mo" id="MJXp-Span-1124" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-msubsup" id="MJXp-Span-1125"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1126" style="margin-right: 0.05em;">Z</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1127" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-1128">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1129">l</span><span class="MJXp-mo" id="MJXp-Span-1130">]</span></span></span><span class="MJXp-mo" id="MJXp-Span-1131" style="margin-left: 0em; margin-right: 0em;">)</span><span class="MJXp-mo" id="MJXp-Span-1132" style="margin-left: 0em; margin-right: 0.222em;">,</span></span></span></span></span></span></span><span class="mjx-chtml MJXc-display MJXc-processing"><span id="MathJax-Element-90-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0"></span></span><script type="math/tex; mode=display" id="MathJax-Element-90">% <![CDATA[
\begin{aligned}
Z^{[l]} &= W^{[l]} A^{[l-1]} + b^{[l]} \\
A^{[l]} &= g^{[l]}(Z^{[l]}),
\end{aligned} %]]></script>
<p>where <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-1133"><span class="MJXp-msubsup" id="MJXp-Span-1134"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1135" style="margin-right: 0.05em;">A</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1136" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-1137">[</span><span class="MJXp-mn" id="MJXp-Span-1138">0</span><span class="MJXp-mo" id="MJXp-Span-1139">]</span></span></span><span class="MJXp-mo" id="MJXp-Span-1140" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-mrow" id="MJXp-Span-1141"><span class="MJXp-mo" id="MJXp-Span-1142" style="margin-left: 0em; margin-right: 0em; vertical-align: -0.172em;"><span class="MJXp-right MJXp-scale8" style="font-size: 1.689em; margin-left: -0.01em;">[</span></span><span class="MJXp-mtable" id="MJXp-Span-1143"><span><span class="MJXp-mtr" id="MJXp-Span-1144" style="vertical-align: baseline;"><span class="MJXp-mtd" id="MJXp-Span-1145" style="text-align: center;"><span class="MJXp-msubsup" id="MJXp-Span-1146"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1147" style="margin-right: 0.05em;">x</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1148" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-1149">(</span><span class="MJXp-mn" id="MJXp-Span-1150">1</span><span class="MJXp-mo" id="MJXp-Span-1151">)</span></span></span></span><span class="MJXp-mtd" id="MJXp-Span-1152" style="padding-left: 1em; text-align: center;"><span class="MJXp-msubsup" id="MJXp-Span-1153"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1154" style="margin-right: 0.05em;">x</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1155" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-1156">(</span><span class="MJXp-mn" id="MJXp-Span-1157">2</span><span class="MJXp-mo" id="MJXp-Span-1158">)</span></span></span></span><span class="MJXp-mtd" id="MJXp-Span-1159" style="padding-left: 1em; text-align: center;"><span class="MJXp-mo" id="MJXp-Span-1160" style="margin-left: 0em; margin-right: 0em;">…</span></span><span class="MJXp-mtd" id="MJXp-Span-1161" style="padding-left: 1em; text-align: center;"><span class="MJXp-msubsup" id="MJXp-Span-1162"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1163" style="margin-right: 0.05em;">x</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1164" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-1165">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1166">m</span><span class="MJXp-mo" id="MJXp-Span-1167">)</span></span></span></span></span></span></span><span class="MJXp-mo" id="MJXp-Span-1168" style="margin-left: 0em; margin-right: 0em; vertical-align: -0.172em;"><span class="MJXp-right MJXp-scale8" style="font-size: 1.689em; margin-left: -0.01em;">]</span></span></span></span></span><span id="MathJax-Element-91-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-91">% <![CDATA[
A^{[0]} = \begin{bmatrix} x^{(1)} & x^{(2)} & \ldots & x^{(m)} \end{bmatrix} %]]></script> is the matrix of input observations. (In computing <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-1169"><span class="MJXp-msubsup" id="MJXp-Span-1170"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1171" style="margin-right: 0.05em;">Z</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1172" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-1173">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1174">l</span><span class="MJXp-mo" id="MJXp-Span-1175">]</span></span></span></span></span><span id="MathJax-Element-92-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-92">Z^{[l]}</script> above, <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-1176"><span class="MJXp-msubsup" id="MJXp-Span-1177"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1178" style="margin-right: 0.05em;">W</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1179" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-1180">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1181">l</span><span class="MJXp-mo" id="MJXp-Span-1182">]</span></span></span><span class="MJXp-msubsup" id="MJXp-Span-1183"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1184" style="margin-right: 0.05em;">A</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1185" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-1186">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1187">l</span><span class="MJXp-mo" id="MJXp-Span-1188">−</span><span class="MJXp-mn" id="MJXp-Span-1189">1</span><span class="MJXp-mo" id="MJXp-Span-1190">]</span></span></span></span></span><span id="MathJax-Element-93-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-93">W^{[l]} A^{[l-1]}</script> is an <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-1191"><span class="MJXp-msubsup" id="MJXp-Span-1192"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1193" style="margin-right: 0.05em;">n</span><span class="MJXp-mi MJXp-italic MJXp-script" id="MJXp-Span-1194" style="vertical-align: -0.4em;">l</span></span><span class="MJXp-mo" id="MJXp-Span-1195" style="margin-left: 0.267em; margin-right: 0.267em;">×</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1196">m</span></span></span><span id="MathJax-Element-94-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-94">n_l \times m</script> matrix and <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-1197"><span class="MJXp-msubsup" id="MJXp-Span-1198"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1199" style="margin-right: 0.05em;">b</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1200" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-1201">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1202">l</span><span class="MJXp-mo" id="MJXp-Span-1203">]</span></span></span></span></span><span id="MathJax-Element-95-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-95">b^{[l]}</script> is an <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-1204"><span class="MJXp-msubsup" id="MJXp-Span-1205"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1206" style="margin-right: 0.05em;">n</span><span class="MJXp-mi MJXp-italic MJXp-script" id="MJXp-Span-1207" style="vertical-align: -0.4em;">l</span></span><span class="MJXp-mo" id="MJXp-Span-1208" style="margin-left: 0.267em; margin-right: 0.267em;">×</span><span class="MJXp-mn" id="MJXp-Span-1209">1</span></span></span><span id="MathJax-Element-96-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-96">n_l \times 1</script> vector. The addition is done with broadcasting (NumPy behavior), which adds <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-1210"><span class="MJXp-msubsup" id="MJXp-Span-1211"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1212" style="margin-right: 0.05em;">b</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1213" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-1214">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1215">l</span><span class="MJXp-mo" id="MJXp-Span-1216">]</span></span></span></span></span><span id="MathJax-Element-97-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-97">b^{[l]}</script> to each column of <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-1217"><span class="MJXp-msubsup" id="MJXp-Span-1218"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1219" style="margin-right: 0.05em;">W</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1220" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-1221">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1222">l</span><span class="MJXp-mo" id="MJXp-Span-1223">]</span></span></span><span class="MJXp-msubsup" id="MJXp-Span-1224"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1225" style="margin-right: 0.05em;">A</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1226" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-1227">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1228">l</span><span class="MJXp-mo" id="MJXp-Span-1229">−</span><span class="MJXp-mn" id="MJXp-Span-1230">1</span><span class="MJXp-mo" id="MJXp-Span-1231">]</span></span></span></span></span><span id="MathJax-Element-98-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-98">W^{[l]} A^{[l-1]}</script>.)</p>
<h3 id="back-propagation">Back propagation</h3>
<p>To train the network, we need to compute the derivative of the loss with respect to the network parameters <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-1232"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1233">θ</span><span class="MJXp-mo" id="MJXp-Span-1234" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-mo" id="MJXp-Span-1235" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-msubsup" id="MJXp-Span-1236"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1237" style="margin-right: 0.05em;">b</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1238" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-1239">[</span><span class="MJXp-mn" id="MJXp-Span-1240">1</span><span class="MJXp-mo" id="MJXp-Span-1241">]</span></span></span><span class="MJXp-mo" id="MJXp-Span-1242" style="margin-left: 0em; margin-right: 0.222em;">,</span><span class="MJXp-msubsup" id="MJXp-Span-1243"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1244" style="margin-right: 0.05em;">W</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1245" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-1246">[</span><span class="MJXp-mn" id="MJXp-Span-1247">1</span><span class="MJXp-mo" id="MJXp-Span-1248">]</span></span></span><span class="MJXp-mo" id="MJXp-Span-1249" style="margin-left: 0em; margin-right: 0.222em;">,</span><span class="MJXp-msubsup" id="MJXp-Span-1250"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1251" style="margin-right: 0.05em;">b</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1252" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-1253">[</span><span class="MJXp-mn" id="MJXp-Span-1254">2</span><span class="MJXp-mo" id="MJXp-Span-1255">]</span></span></span><span class="MJXp-mo" id="MJXp-Span-1256" style="margin-left: 0em; margin-right: 0.222em;">,</span><span class="MJXp-msubsup" id="MJXp-Span-1257"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1258" style="margin-right: 0.05em;">W</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1259" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-1260">[</span><span class="MJXp-mn" id="MJXp-Span-1261">2</span><span class="MJXp-mo" id="MJXp-Span-1262">]</span></span></span><span class="MJXp-mo" id="MJXp-Span-1263" style="margin-left: 0em; margin-right: 0.222em;">,</span><span class="MJXp-mo" id="MJXp-Span-1264" style="margin-left: 0em; margin-right: 0em;">…</span><span class="MJXp-mo" id="MJXp-Span-1265" style="margin-left: 0em; margin-right: 0.222em;">,</span><span class="MJXp-msubsup" id="MJXp-Span-1266"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1267" style="margin-right: 0.05em;">b</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1268" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-1269">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1270">L</span><span class="MJXp-mo" id="MJXp-Span-1271">]</span></span></span><span class="MJXp-mo" id="MJXp-Span-1272" style="margin-left: 0em; margin-right: 0.222em;">,</span><span class="MJXp-msubsup" id="MJXp-Span-1273"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1274" style="margin-right: 0.05em;">W</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1275" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-1276">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1277">L</span><span class="MJXp-mo" id="MJXp-Span-1278">]</span></span></span><span class="MJXp-mo" id="MJXp-Span-1279" style="margin-left: 0em; margin-right: 0em;">)</span></span></span><span id="MathJax-Element-99-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-99">\theta = (b^{[1]}, W^{[1]}, b^{[2]}, W^{[2]}, \ldots, b^{[L]}, W^{[L]})</script>. This is called back propagation, but is really just the chain rule.</p>
<p>As with forward propagation, we will start with the single observation case. Thinking recursively, suppose we already know</p>
<span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math MJXp-display" id="MJXp-Span-1280"><span class="MJXp-mfrac" id="MJXp-Span-1281" style="vertical-align: 0.25em;"><span class="MJXp-box"><span class="MJXp-mi" id="MJXp-Span-1282">∂</span><span class="MJXp-msubsup" id="MJXp-Span-1283"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1284" style="margin-right: 0.05em;">ℓ</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1285" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-1286">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1287">i</span><span class="MJXp-mo" id="MJXp-Span-1288">)</span></span></span></span><span class="MJXp-box" style="margin-top: -0.9em;"><span class="MJXp-denom"><span><span class="MJXp-rule" style="height: 1em; border-top: none; border-bottom: 1px solid; margin: 0.1em 0px;"></span></span><span><span class="MJXp-box"><span class="MJXp-mi" id="MJXp-Span-1289">∂</span><span class="MJXp-msubsup" id="MJXp-Span-1290"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1291" style="margin-right: 0.05em;">a</span><span class="MJXp-script-box" style="height: 1.86em; vertical-align: -0.64em;"><span class=" MJXp-script"><span><span style="margin-bottom: -0.25em;"><span class="MJXp-mrow" id="MJXp-Span-1293"><span class="MJXp-mo" id="MJXp-Span-1294">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1295">l</span><span class="MJXp-mo" id="MJXp-Span-1296">]</span><span class="MJXp-mo" id="MJXp-Span-1297">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1298">i</span><span class="MJXp-mo" id="MJXp-Span-1299">)</span></span></span></span></span><span class=" MJXp-script"><span><span style="margin-top: -0.85em;"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1292">j</span></span></span></span></span></span></span></span></span></span></span><span class="MJXp-mo" id="MJXp-Span-1300" style="margin-left: 0em; margin-right: 0.222em;">,</span></span></span><span class="mjx-chtml MJXc-display MJXc-processing"><span id="MathJax-Element-100-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0"></span></span><script type="math/tex; mode=display" id="MathJax-Element-100">\frac{\partial \ell^{(i)}}{\partial a^{[l](i)}_j},</script>
<p>the derivative of the loss on the <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-1301"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1302">i</span></span></span><span id="MathJax-Element-101-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-101">i</script>th observation for each unit <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-1303"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1304">j</span><span class="MJXp-mo" id="MJXp-Span-1305" style="margin-left: 0.333em; margin-right: 0.333em;">∈</span><span class="MJXp-mo" id="MJXp-Span-1306" style="margin-left: 0em; margin-right: 0em;">{</span><span class="MJXp-mn" id="MJXp-Span-1307">1</span><span class="MJXp-mo" id="MJXp-Span-1308" style="margin-left: 0em; margin-right: 0.222em;">,</span><span class="MJXp-mo" id="MJXp-Span-1309" style="margin-left: 0em; margin-right: 0em;">…</span><span class="MJXp-mo" id="MJXp-Span-1310" style="margin-left: 0em; margin-right: 0.222em;">,</span><span class="MJXp-msubsup" id="MJXp-Span-1311"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1312" style="margin-right: 0.05em;">n</span><span class="MJXp-mi MJXp-italic MJXp-script" id="MJXp-Span-1313" style="vertical-align: -0.4em;">l</span></span><span class="MJXp-mo" id="MJXp-Span-1314" style="margin-left: 0em; margin-right: 0em;">}</span></span></span><span id="MathJax-Element-102-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-102">j \in \{1, \ldots, n_l\}</script> in the <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-1315"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1316">l</span></span></span><span id="MathJax-Element-103-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-103">l</script>th layer. Since we are limiting our discussion to a single observation, we’ll drop indexing by <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-1317"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1318">i</span></span></span><span id="MathJax-Element-104-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-104">i</script> from the notation and write:</p>
<span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math MJXp-display" id="MJXp-Span-1319"><span class="MJXp-mfrac" id="MJXp-Span-1320" style="vertical-align: 0.25em;"><span class="MJXp-box"><span class="MJXp-mi" id="MJXp-Span-1321">∂</span><span class="MJXp-msubsup" id="MJXp-Span-1322"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1323" style="margin-right: 0.05em;">ℓ</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1324" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-1325">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1326">i</span><span class="MJXp-mo" id="MJXp-Span-1327">)</span></span></span></span><span class="MJXp-box" style="margin-top: -0.9em;"><span class="MJXp-denom"><span><span class="MJXp-rule" style="height: 1em; border-top: none; border-bottom: 1px solid; margin: 0.1em 0px;"></span></span><span><span class="MJXp-box"><span class="MJXp-mi" id="MJXp-Span-1328">∂</span><span class="MJXp-msubsup" id="MJXp-Span-1329"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1330" style="margin-right: 0.05em;">a</span><span class="MJXp-script-box" style="height: 1.86em; vertical-align: -0.64em;"><span class=" MJXp-script"><span><span style="margin-bottom: -0.25em;"><span class="MJXp-mrow" id="MJXp-Span-1332"><span class="MJXp-mo" id="MJXp-Span-1333">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1334">l</span><span class="MJXp-mo" id="MJXp-Span-1335">]</span></span></span></span></span><span class=" MJXp-script"><span><span style="margin-top: -0.85em;"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1331">j</span></span></span></span></span></span></span></span></span></span></span><span class="MJXp-mo" id="MJXp-Span-1336" style="margin-left: 0em; margin-right: 0.222em;">.</span></span></span><span class="mjx-chtml MJXc-display MJXc-processing"><span id="MathJax-Element-105-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0"></span></span><script type="math/tex; mode=display" id="MathJax-Element-105">\frac{\partial \ell^{(i)}}{\partial a^{[l]}_j}.</script>
<p>We now discuss how to compute</p>
<ol>
<li>The derivative of the loss with respect to the parameters <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-1337"><span class="MJXp-msubsup" id="MJXp-Span-1338"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1339" style="margin-right: 0.05em;">b</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1340" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-1341">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1342">l</span><span class="MJXp-mo" id="MJXp-Span-1343">]</span></span></span></span></span><span id="MathJax-Element-106-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-106">b^{[l]}</script> and <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-1344"><span class="MJXp-msubsup" id="MJXp-Span-1345"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1346" style="margin-right: 0.05em;">W</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1347" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-1348">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1349">l</span><span class="MJXp-mo" id="MJXp-Span-1350">]</span></span></span></span></span><span id="MathJax-Element-107-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-107">W^{[l]}</script> in the <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-1351"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1352">l</span></span></span><span id="MathJax-Element-108-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-108">l</script>th layer.</li>
<li>The derivative of the loss with respect to the previous layer’s units.</li>
</ol>
<h4 id="parameter-derivatives">Parameter derivatives</h4>
<p>The figure below illustrates how the parameters in layer <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-1353"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1354">l</span></span></span><span id="MathJax-Element-109-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-109">l</script> affect the loss.</p>
<p><img src="./Implementing a neural network in Python _ statsandstuff_files/backprop_params.png" alt=""></p>
<p>Since <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-1355"><span class="MJXp-msubsup" id="MJXp-Span-1356"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1357" style="margin-right: 0.05em;">a</span><span class="MJXp-script-box" style="height: 1.86em; vertical-align: -0.64em;"><span class=" MJXp-script"><span><span style="margin-bottom: -0.25em;"><span class="MJXp-mrow" id="MJXp-Span-1359"><span class="MJXp-mo" id="MJXp-Span-1360">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1361">l</span><span class="MJXp-mo" id="MJXp-Span-1362">]</span></span></span></span></span><span class=" MJXp-script"><span><span style="margin-top: -0.85em;"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1358">j</span></span></span></span></span></span><span class="MJXp-mo" id="MJXp-Span-1363" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-msubsup" id="MJXp-Span-1364"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1365" style="margin-right: 0.05em;">g</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1366" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-1367">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1368">l</span><span class="MJXp-mo" id="MJXp-Span-1369">]</span></span></span><span class="MJXp-mo" id="MJXp-Span-1370" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-msubsup" id="MJXp-Span-1371"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1372" style="margin-right: 0.05em;">z</span><span class="MJXp-script-box" style="height: 1.86em; vertical-align: -0.64em;"><span class=" MJXp-script"><span><span style="margin-bottom: -0.25em;"><span class="MJXp-mrow" id="MJXp-Span-1374"><span class="MJXp-mo" id="MJXp-Span-1375">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1376">l</span><span class="MJXp-mo" id="MJXp-Span-1377">]</span></span></span></span></span><span class=" MJXp-script"><span><span style="margin-top: -0.85em;"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1373">j</span></span></span></span></span></span><span class="MJXp-mo" id="MJXp-Span-1378" style="margin-left: 0em; margin-right: 0em;">)</span></span></span><span id="MathJax-Element-110-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-110">a_j^{[l]} = g^{[l]} ( z_j^{[l]} )</script>, the chain rule gives:</p>
<span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math MJXp-display" id="MJXp-Span-1379"><span class="MJXp-mtable" id="MJXp-Span-1380"><span><span class="MJXp-mtr" id="MJXp-Span-1381" style="vertical-align: baseline;"><span class="MJXp-mtd" id="MJXp-Span-1382" style="text-align: right;"><span class="MJXp-mfrac" id="MJXp-Span-1383" style="vertical-align: 0.25em;"><span class="MJXp-box"><span class="MJXp-mi" id="MJXp-Span-1384">∂</span><span class="MJXp-msubsup" id="MJXp-Span-1385"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1386" style="margin-right: 0.05em;">ℓ</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1387" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-1388">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1389">i</span><span class="MJXp-mo" id="MJXp-Span-1390">)</span></span></span></span><span class="MJXp-box" style="margin-top: -0.9em;"><span class="MJXp-denom"><span><span class="MJXp-rule" style="height: 1em; border-top: none; border-bottom: 1px solid; margin: 0.1em 0px;"></span></span><span><span class="MJXp-box"><span class="MJXp-mi" id="MJXp-Span-1391">∂</span><span class="MJXp-msubsup" id="MJXp-Span-1392"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1393" style="margin-right: 0.05em;">z</span><span class="MJXp-script-box" style="height: 1.86em; vertical-align: -0.64em;"><span class=" MJXp-script"><span><span style="margin-bottom: -0.25em;"><span class="MJXp-mrow" id="MJXp-Span-1395"><span class="MJXp-mo" id="MJXp-Span-1396">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1397">l</span><span class="MJXp-mo" id="MJXp-Span-1398">]</span></span></span></span></span><span class=" MJXp-script"><span><span style="margin-top: -0.85em;"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1394">j</span></span></span></span></span></span></span></span></span></span></span></span><span class="MJXp-mtd" id="MJXp-Span-1399" style="padding-left: 0em; text-align: left;"><span class="MJXp-mi" id="MJXp-Span-1400"></span><span class="MJXp-mo" id="MJXp-Span-1401" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-mfrac" id="MJXp-Span-1402" style="vertical-align: 0.25em;"><span class="MJXp-box"><span class="MJXp-mi" id="MJXp-Span-1403">∂</span><span class="MJXp-msubsup" id="MJXp-Span-1404"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1405" style="margin-right: 0.05em;">ℓ</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1406" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-1407">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1408">i</span><span class="MJXp-mo" id="MJXp-Span-1409">)</span></span></span></span><span class="MJXp-box" style="margin-top: -0.9em;"><span class="MJXp-denom"><span><span class="MJXp-rule" style="height: 1em; border-top: none; border-bottom: 1px solid; margin: 0.1em 0px;"></span></span><span><span class="MJXp-box"><span class="MJXp-mi" id="MJXp-Span-1410">∂</span><span class="MJXp-msubsup" id="MJXp-Span-1411"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1412" style="margin-right: 0.05em;">a</span><span class="MJXp-script-box" style="height: 1.86em; vertical-align: -0.64em;"><span class=" MJXp-script"><span><span style="margin-bottom: -0.25em;"><span class="MJXp-mrow" id="MJXp-Span-1414"><span class="MJXp-mo" id="MJXp-Span-1415">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1416">l</span><span class="MJXp-mo" id="MJXp-Span-1417">]</span></span></span></span></span><span class=" MJXp-script"><span><span style="margin-top: -0.85em;"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1413">j</span></span></span></span></span></span></span></span></span></span></span><span class="MJXp-mfrac" id="MJXp-Span-1418" style="vertical-align: 0.25em;"><span class="MJXp-box"><span class="MJXp-mi" id="MJXp-Span-1419">∂</span><span class="MJXp-msubsup" id="MJXp-Span-1420"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1421" style="margin-right: 0.05em;">a</span><span class="MJXp-script-box" style="height: 1.86em; vertical-align: -0.64em;"><span class=" MJXp-script"><span><span style="margin-bottom: -0.25em;"><span class="MJXp-mrow" id="MJXp-Span-1423"><span class="MJXp-mo" id="MJXp-Span-1424">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1425">l</span><span class="MJXp-mo" id="MJXp-Span-1426">]</span></span></span></span></span><span class=" MJXp-script"><span><span style="margin-top: -0.85em;"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1422">j</span></span></span></span></span></span></span><span class="MJXp-box" style="margin-top: -0.9em;"><span class="MJXp-denom"><span><span class="MJXp-rule" style="height: 1em; border-top: none; border-bottom: 1px solid; margin: 0.1em 0px;"></span></span><span><span class="MJXp-box"><span class="MJXp-mi" id="MJXp-Span-1427">∂</span><span class="MJXp-msubsup" id="MJXp-Span-1428"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1429" style="margin-right: 0.05em;">z</span><span class="MJXp-script-box" style="height: 1.86em; vertical-align: -0.64em;"><span class=" MJXp-script"><span><span style="margin-bottom: -0.25em;"><span class="MJXp-mrow" id="MJXp-Span-1431"><span class="MJXp-mo" id="MJXp-Span-1432">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1433">l</span><span class="MJXp-mo" id="MJXp-Span-1434">]</span></span></span></span></span><span class=" MJXp-script"><span><span style="margin-top: -0.85em;"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1430">j</span></span></span></span></span></span></span></span></span></span></span></span></span><span class="MJXp-mtr" id="MJXp-Span-1435" style="vertical-align: baseline;"><span class="MJXp-mtd" id="MJXp-Span-1436" style="padding-top: 0.3em; text-align: right;"></span><span class="MJXp-mtd" id="MJXp-Span-1437" style="padding-left: 0em; padding-top: 0.3em; text-align: left;"><span class="MJXp-mi" id="MJXp-Span-1438"></span><span class="MJXp-mo" id="MJXp-Span-1439" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-mfrac" id="MJXp-Span-1440" style="vertical-align: 0.25em;"><span class="MJXp-box"><span class="MJXp-mi" id="MJXp-Span-1441">∂</span><span class="MJXp-msubsup" id="MJXp-Span-1442"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1443" style="margin-right: 0.05em;">ℓ</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1444" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-1445">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1446">i</span><span class="MJXp-mo" id="MJXp-Span-1447">)</span></span></span></span><span class="MJXp-box" style="margin-top: -0.9em;"><span class="MJXp-denom"><span><span class="MJXp-rule" style="height: 1em; border-top: none; border-bottom: 1px solid; margin: 0.1em 0px;"></span></span><span><span class="MJXp-box"><span class="MJXp-mi" id="MJXp-Span-1448">∂</span><span class="MJXp-msubsup" id="MJXp-Span-1449"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1450" style="margin-right: 0.05em;">a</span><span class="MJXp-script-box" style="height: 1.86em; vertical-align: -0.64em;"><span class=" MJXp-script"><span><span style="margin-bottom: -0.25em;"><span class="MJXp-mrow" id="MJXp-Span-1452"><span class="MJXp-mo" id="MJXp-Span-1453">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1454">l</span><span class="MJXp-mo" id="MJXp-Span-1455">]</span></span></span></span></span><span class=" MJXp-script"><span><span style="margin-top: -0.85em;"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1451">j</span></span></span></span></span></span></span></span></span></span></span><span class="MJXp-mo" id="MJXp-Span-1456" style="margin-left: 0.267em; margin-right: 0.267em;">⋅</span><span class="MJXp-mo" id="MJXp-Span-1457" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-msubsup" id="MJXp-Span-1458"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1459" style="margin-right: 0.05em;">g</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1460" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-1461">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1462">l</span><span class="MJXp-mo" id="MJXp-Span-1463">]</span></span></span><span class="MJXp-msup" id="MJXp-Span-1464"><span class="MJXp-mo" id="MJXp-Span-1465" style="margin-left: 0em; margin-right: 0.05em;">)</span><span class="MJXp-mo MJXp-script" id="MJXp-Span-1466" style="vertical-align: 0.5em;">′</span></span><span class="MJXp-mo" id="MJXp-Span-1467" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-msubsup" id="MJXp-Span-1468"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1469" style="margin-right: 0.05em;">z</span><span class="MJXp-script-box" style="height: 1.86em; vertical-align: -0.64em;"><span class=" MJXp-script"><span><span style="margin-bottom: -0.25em;"><span class="MJXp-mrow" id="MJXp-Span-1471"><span class="MJXp-mo" id="MJXp-Span-1472">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1473">l</span><span class="MJXp-mo" id="MJXp-Span-1474">]</span></span></span></span></span><span class=" MJXp-script"><span><span style="margin-top: -0.85em;"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1470">j</span></span></span></span></span></span><span class="MJXp-mo" id="MJXp-Span-1475" style="margin-left: 0em; margin-right: 0em;">)</span><span class="MJXp-mo" id="MJXp-Span-1476" style="margin-left: 0em; margin-right: 0.222em;">.</span></span></span></span></span></span></span><span class="mjx-chtml MJXc-display MJXc-processing"><span id="MathJax-Element-111-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0"></span></span><script type="math/tex; mode=display" id="MathJax-Element-111">% <![CDATA[
\begin{aligned}
\frac{\partial \ell^{(i)}}{\partial z_j^{[l]}} &= \frac{\partial \ell^{(i)}}{\partial a_j^{[l]}}\frac{\partial a_j^{[l]}}{\partial z_j^{[l]}} \\
&= \frac{\partial \ell^{(i)}}{\partial a_j^{[l]}} \cdot (g^{[l]})'(z_j^{[l]}).
\end{aligned} %]]></script>
<p>Putting these derivatives into a gradient vector, we have</p>
<span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math MJXp-display" id="MJXp-Span-1477"><span class="MJXp-msubsup" id="MJXp-Span-1478"><span class="MJXp-mi" id="MJXp-Span-1479" style="margin-right: 0.05em;">∇</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1480" style="vertical-align: -0.4em;"><span class="MJXp-msubsup" id="MJXp-Span-1481"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1482" style="margin-right: 0.05em;">z</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1483" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-1484">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1485">l</span><span class="MJXp-mo" id="MJXp-Span-1486">]</span></span></span></span></span><span class="MJXp-msubsup" id="MJXp-Span-1487"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1488" style="margin-right: 0.05em;">ℓ</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1489" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-1490">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1491">i</span><span class="MJXp-mo" id="MJXp-Span-1492">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-1493" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-msubsup" id="MJXp-Span-1494"><span class="MJXp-mi" id="MJXp-Span-1495" style="margin-right: 0.05em;">∇</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1496" style="vertical-align: -0.4em;"><span class="MJXp-msubsup" id="MJXp-Span-1497"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1498" style="margin-right: 0.05em;">a</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1499" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-1500">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1501">l</span><span class="MJXp-mo" id="MJXp-Span-1502">]</span></span></span></span></span><span class="MJXp-msubsup" id="MJXp-Span-1503"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1504" style="margin-right: 0.05em;">ℓ</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1505" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-1506">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1507">i</span><span class="MJXp-mo" id="MJXp-Span-1508">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-1509" style="margin-left: 0.267em; margin-right: 0.267em;">∗</span><span class="MJXp-mo" id="MJXp-Span-1510" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-msubsup" id="MJXp-Span-1511"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1512" style="margin-right: 0.05em;">g</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1513" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-1514">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1515">l</span><span class="MJXp-mo" id="MJXp-Span-1516">]</span></span></span><span class="MJXp-msup" id="MJXp-Span-1517"><span class="MJXp-mo" id="MJXp-Span-1518" style="margin-left: 0em; margin-right: 0.05em;">)</span><span class="MJXp-mo MJXp-script" id="MJXp-Span-1519" style="vertical-align: 0.5em;">′</span></span><span class="MJXp-mo" id="MJXp-Span-1520" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-msubsup" id="MJXp-Span-1521"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1522" style="margin-right: 0.05em;">z</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1523" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-1524">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1525">l</span><span class="MJXp-mo" id="MJXp-Span-1526">]</span></span></span><span class="MJXp-mo" id="MJXp-Span-1527" style="margin-left: 0em; margin-right: 0em;">)</span><span class="MJXp-mo" id="MJXp-Span-1528" style="margin-left: 0em; margin-right: 0.222em;">,</span></span></span><span class="mjx-chtml MJXc-display MJXc-processing"><span id="MathJax-Element-112-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0"></span></span><script type="math/tex; mode=display" id="MathJax-Element-112">\nabla_{z^{[l]}} \ell^{(i)} = \nabla_{a^{[l]}} \ell^{(i)} * (g^{[l]})'(z^{[l]}),</script>
<p>where the <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-1529"><span class="MJXp-mo" id="MJXp-Span-1530" style="margin-left: 0.267em; margin-right: 0.267em;">∗</span></span></span><span id="MathJax-Element-113-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-113">*</script> denotes elementwise multiplication and the function <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-1531"><span class="MJXp-mo" id="MJXp-Span-1532" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-msubsup" id="MJXp-Span-1533"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1534" style="margin-right: 0.05em;">g</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1535" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-1536">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1537">l</span><span class="MJXp-mo" id="MJXp-Span-1538">]</span></span></span><span class="MJXp-msup" id="MJXp-Span-1539"><span class="MJXp-mo" id="MJXp-Span-1540" style="margin-left: 0em; margin-right: 0.05em;">)</span><span class="MJXp-mo MJXp-script" id="MJXp-Span-1541" style="vertical-align: 0.5em;">′</span></span></span></span><span id="MathJax-Element-114-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-114">(g^{[l]})'</script> is applied elementwise.</p>
<p>The parameters <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-1542"><span class="MJXp-msubsup" id="MJXp-Span-1543"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1544" style="margin-right: 0.05em;">b</span><span class="MJXp-script-box" style="height: 1.86em; vertical-align: -0.64em;"><span class=" MJXp-script"><span><span style="margin-bottom: -0.25em;"><span class="MJXp-mrow" id="MJXp-Span-1546"><span class="MJXp-mo" id="MJXp-Span-1547">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1548">l</span><span class="MJXp-mo" id="MJXp-Span-1549">]</span></span></span></span></span><span class=" MJXp-script"><span><span style="margin-top: -0.85em;"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1545">j</span></span></span></span></span></span></span></span><span id="MathJax-Element-115-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-115">b_j^{[l]}</script> and <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-1550"><span class="MJXp-msubsup" id="MJXp-Span-1551"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1552" style="margin-right: 0.05em;">W</span><span class="MJXp-script-box" style="height: 1.86em; vertical-align: -0.64em;"><span class=" MJXp-script"><span><span style="margin-bottom: -0.25em;"><span class="MJXp-mrow" id="MJXp-Span-1556"><span class="MJXp-mo" id="MJXp-Span-1557">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1558">l</span><span class="MJXp-mo" id="MJXp-Span-1559">]</span></span></span></span></span><span class=" MJXp-script"><span><span style="margin-top: -0.85em;"><span class="MJXp-mrow" id="MJXp-Span-1553"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1554">j</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1555">k</span></span></span></span></span></span></span></span></span><span id="MathJax-Element-116-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-116">W_{jk}^{[l]}</script> affect the loss through <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-1560"><span class="MJXp-msubsup" id="MJXp-Span-1561"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1562" style="margin-right: 0.05em;">z</span><span class="MJXp-script-box" style="height: 1.86em; vertical-align: -0.64em;"><span class=" MJXp-script"><span><span style="margin-bottom: -0.25em;"><span class="MJXp-mrow" id="MJXp-Span-1564"><span class="MJXp-mo" id="MJXp-Span-1565">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1566">l</span><span class="MJXp-mo" id="MJXp-Span-1567">]</span></span></span></span></span><span class=" MJXp-script"><span><span style="margin-top: -0.85em;"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1563">j</span></span></span></span></span></span></span></span><span id="MathJax-Element-117-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-117">z_j^{[l]}</script> (see figure above). For <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-1568"><span class="MJXp-msubsup" id="MJXp-Span-1569"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1570" style="margin-right: 0.05em;">b</span><span class="MJXp-script-box" style="height: 1.86em; vertical-align: -0.64em;"><span class=" MJXp-script"><span><span style="margin-bottom: -0.25em;"><span class="MJXp-mrow" id="MJXp-Span-1572"><span class="MJXp-mo" id="MJXp-Span-1573">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1574">l</span><span class="MJXp-mo" id="MJXp-Span-1575">]</span></span></span></span></span><span class=" MJXp-script"><span><span style="margin-top: -0.85em;"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1571">j</span></span></span></span></span></span></span></span><span id="MathJax-Element-118-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-118">b_j^{[l]}</script>, the chain rule gives:</p>
<span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math MJXp-display" id="MJXp-Span-1576"><span class="MJXp-mtable" id="MJXp-Span-1577"><span><span class="MJXp-mtr" id="MJXp-Span-1578" style="vertical-align: baseline;"><span class="MJXp-mtd" id="MJXp-Span-1579" style="text-align: right;"><span class="MJXp-mfrac" id="MJXp-Span-1580" style="vertical-align: 0.25em;"><span class="MJXp-box"><span class="MJXp-mi" id="MJXp-Span-1581">∂</span><span class="MJXp-msubsup" id="MJXp-Span-1582"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1583" style="margin-right: 0.05em;">ℓ</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1584" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-1585">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1586">i</span><span class="MJXp-mo" id="MJXp-Span-1587">)</span></span></span></span><span class="MJXp-box" style="margin-top: -0.9em;"><span class="MJXp-denom"><span><span class="MJXp-rule" style="height: 1em; border-top: none; border-bottom: 1px solid; margin: 0.1em 0px;"></span></span><span><span class="MJXp-box"><span class="MJXp-mi" id="MJXp-Span-1588">∂</span><span class="MJXp-msubsup" id="MJXp-Span-1589"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1590" style="margin-right: 0.05em;">b</span><span class="MJXp-script-box" style="height: 1.86em; vertical-align: -0.64em;"><span class=" MJXp-script"><span><span style="margin-bottom: -0.25em;"><span class="MJXp-mrow" id="MJXp-Span-1592"><span class="MJXp-mo" id="MJXp-Span-1593">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1594">l</span><span class="MJXp-mo" id="MJXp-Span-1595">]</span></span></span></span></span><span class=" MJXp-script"><span><span style="margin-top: -0.85em;"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1591">j</span></span></span></span></span></span></span></span></span></span></span></span><span class="MJXp-mtd" id="MJXp-Span-1596" style="padding-left: 0em; text-align: left;"><span class="MJXp-mi" id="MJXp-Span-1597"></span><span class="MJXp-mo" id="MJXp-Span-1598" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-mfrac" id="MJXp-Span-1599" style="vertical-align: 0.25em;"><span class="MJXp-box"><span class="MJXp-mi" id="MJXp-Span-1600">∂</span><span class="MJXp-msubsup" id="MJXp-Span-1601"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1602" style="margin-right: 0.05em;">ℓ</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1603" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-1604">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1605">i</span><span class="MJXp-mo" id="MJXp-Span-1606">)</span></span></span></span><span class="MJXp-box" style="margin-top: -0.9em;"><span class="MJXp-denom"><span><span class="MJXp-rule" style="height: 1em; border-top: none; border-bottom: 1px solid; margin: 0.1em 0px;"></span></span><span><span class="MJXp-box"><span class="MJXp-mi" id="MJXp-Span-1607">∂</span><span class="MJXp-msubsup" id="MJXp-Span-1608"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1609" style="margin-right: 0.05em;">z</span><span class="MJXp-script-box" style="height: 1.86em; vertical-align: -0.64em;"><span class=" MJXp-script"><span><span style="margin-bottom: -0.25em;"><span class="MJXp-mrow" id="MJXp-Span-1611"><span class="MJXp-mo" id="MJXp-Span-1612">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1613">l</span><span class="MJXp-mo" id="MJXp-Span-1614">]</span></span></span></span></span><span class=" MJXp-script"><span><span style="margin-top: -0.85em;"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1610">j</span></span></span></span></span></span></span></span></span></span></span><span class="MJXp-mfrac" id="MJXp-Span-1615" style="vertical-align: 0.25em;"><span class="MJXp-box"><span class="MJXp-mi" id="MJXp-Span-1616">∂</span><span class="MJXp-msubsup" id="MJXp-Span-1617"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1618" style="margin-right: 0.05em;">z</span><span class="MJXp-script-box" style="height: 1.86em; vertical-align: -0.64em;"><span class=" MJXp-script"><span><span style="margin-bottom: -0.25em;"><span class="MJXp-mrow" id="MJXp-Span-1620"><span class="MJXp-mo" id="MJXp-Span-1621">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1622">l</span><span class="MJXp-mo" id="MJXp-Span-1623">]</span></span></span></span></span><span class=" MJXp-script"><span><span style="margin-top: -0.85em;"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1619">j</span></span></span></span></span></span></span><span class="MJXp-box" style="margin-top: -0.9em;"><span class="MJXp-denom"><span><span class="MJXp-rule" style="height: 1em; border-top: none; border-bottom: 1px solid; margin: 0.1em 0px;"></span></span><span><span class="MJXp-box"><span class="MJXp-mi" id="MJXp-Span-1624">∂</span><span class="MJXp-msubsup" id="MJXp-Span-1625"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1626" style="margin-right: 0.05em;">b</span><span class="MJXp-script-box" style="height: 1.86em; vertical-align: -0.64em;"><span class=" MJXp-script"><span><span style="margin-bottom: -0.25em;"><span class="MJXp-mrow" id="MJXp-Span-1628"><span class="MJXp-mo" id="MJXp-Span-1629">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1630">l</span><span class="MJXp-mo" id="MJXp-Span-1631">]</span></span></span></span></span><span class=" MJXp-script"><span><span style="margin-top: -0.85em;"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1627">j</span></span></span></span></span></span></span></span></span></span></span></span></span><span class="MJXp-mtr" id="MJXp-Span-1632" style="vertical-align: baseline;"><span class="MJXp-mtd" id="MJXp-Span-1633" style="padding-top: 0.3em; text-align: right;"></span><span class="MJXp-mtd" id="MJXp-Span-1634" style="padding-left: 0em; padding-top: 0.3em; text-align: left;"><span class="MJXp-mi" id="MJXp-Span-1635"></span><span class="MJXp-mo" id="MJXp-Span-1636" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-mfrac" id="MJXp-Span-1637" style="vertical-align: 0.25em;"><span class="MJXp-box"><span class="MJXp-mi" id="MJXp-Span-1638">∂</span><span class="MJXp-msubsup" id="MJXp-Span-1639"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1640" style="margin-right: 0.05em;">ℓ</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1641" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-1642">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1643">i</span><span class="MJXp-mo" id="MJXp-Span-1644">)</span></span></span></span><span class="MJXp-box" style="margin-top: -0.9em;"><span class="MJXp-denom"><span><span class="MJXp-rule" style="height: 1em; border-top: none; border-bottom: 1px solid; margin: 0.1em 0px;"></span></span><span><span class="MJXp-box"><span class="MJXp-mi" id="MJXp-Span-1645">∂</span><span class="MJXp-msubsup" id="MJXp-Span-1646"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1647" style="margin-right: 0.05em;">z</span><span class="MJXp-script-box" style="height: 1.86em; vertical-align: -0.64em;"><span class=" MJXp-script"><span><span style="margin-bottom: -0.25em;"><span class="MJXp-mrow" id="MJXp-Span-1649"><span class="MJXp-mo" id="MJXp-Span-1650">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1651">l</span><span class="MJXp-mo" id="MJXp-Span-1652">]</span></span></span></span></span><span class=" MJXp-script"><span><span style="margin-top: -0.85em;"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1648">j</span></span></span></span></span></span></span></span></span></span></span><span class="MJXp-mo" id="MJXp-Span-1653" style="margin-left: 0.267em; margin-right: 0.267em;">⋅</span><span class="MJXp-mn" id="MJXp-Span-1654">1</span></span></span><span class="MJXp-mtr" id="MJXp-Span-1655" style="vertical-align: baseline;"><span class="MJXp-mtd" id="MJXp-Span-1656" style="padding-top: 0.3em; text-align: right;"></span><span class="MJXp-mtd" id="MJXp-Span-1657" style="padding-left: 0em; padding-top: 0.3em; text-align: left;"><span class="MJXp-mi" id="MJXp-Span-1658"></span><span class="MJXp-mo" id="MJXp-Span-1659" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-mfrac" id="MJXp-Span-1660" style="vertical-align: 0.25em;"><span class="MJXp-box"><span class="MJXp-mi" id="MJXp-Span-1661">∂</span><span class="MJXp-msubsup" id="MJXp-Span-1662"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1663" style="margin-right: 0.05em;">ℓ</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1664" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-1665">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1666">i</span><span class="MJXp-mo" id="MJXp-Span-1667">)</span></span></span></span><span class="MJXp-box" style="margin-top: -0.9em;"><span class="MJXp-denom"><span><span class="MJXp-rule" style="height: 1em; border-top: none; border-bottom: 1px solid; margin: 0.1em 0px;"></span></span><span><span class="MJXp-box"><span class="MJXp-mi" id="MJXp-Span-1668">∂</span><span class="MJXp-msubsup" id="MJXp-Span-1669"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1670" style="margin-right: 0.05em;">z</span><span class="MJXp-script-box" style="height: 1.86em; vertical-align: -0.64em;"><span class=" MJXp-script"><span><span style="margin-bottom: -0.25em;"><span class="MJXp-mrow" id="MJXp-Span-1672"><span class="MJXp-mo" id="MJXp-Span-1673">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1674">l</span><span class="MJXp-mo" id="MJXp-Span-1675">]</span></span></span></span></span><span class=" MJXp-script"><span><span style="margin-top: -0.85em;"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1671">j</span></span></span></span></span></span></span></span></span></span></span><span class="MJXp-mo" id="MJXp-Span-1676" style="margin-left: 0em; margin-right: 0.222em;">.</span></span></span></span></span></span></span><span class="mjx-chtml MJXc-display MJXc-processing"><span id="MathJax-Element-119-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0"></span></span><script type="math/tex; mode=display" id="MathJax-Element-119">% <![CDATA[
\begin{aligned}
\frac{\partial \ell^{(i)}}{\partial b_j^{[l]}} &= \frac{\partial \ell^{(i)}}{\partial z_j^{[l]}}\frac{\partial z_j^{[l]}}{\partial b_j^{[l]}} \\
&= \frac{\partial \ell^{(i)}}{\partial z_j^{[l]}} \cdot 1 \\
&= \frac{\partial \ell^{(i)}}{\partial z_j^{[l]}}.
\end{aligned} %]]></script>
<p>This means <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-1677"><span class="MJXp-msubsup" id="MJXp-Span-1678"><span class="MJXp-mi" id="MJXp-Span-1679" style="margin-right: 0.05em;">∇</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1680" style="vertical-align: -0.4em;"><span class="MJXp-msubsup" id="MJXp-Span-1681"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1682" style="margin-right: 0.05em;">b</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1683" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-1684">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1685">l</span><span class="MJXp-mo" id="MJXp-Span-1686">]</span></span></span></span></span><span class="MJXp-msubsup" id="MJXp-Span-1687"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1688" style="margin-right: 0.05em;">ℓ</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1689" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-1690">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1691">i</span><span class="MJXp-mo" id="MJXp-Span-1692">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-1693" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-msubsup" id="MJXp-Span-1694"><span class="MJXp-mi" id="MJXp-Span-1695" style="margin-right: 0.05em;">∇</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1696" style="vertical-align: -0.4em;"><span class="MJXp-msubsup" id="MJXp-Span-1697"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1698" style="margin-right: 0.05em;">z</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1699" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-1700">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1701">l</span><span class="MJXp-mo" id="MJXp-Span-1702">]</span></span></span></span></span><span class="MJXp-msubsup" id="MJXp-Span-1703"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1704" style="margin-right: 0.05em;">ℓ</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1705" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-1706">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1707">i</span><span class="MJXp-mo" id="MJXp-Span-1708">)</span></span></span></span></span><span id="MathJax-Element-120-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-120">\nabla_{b^{[l]}} \ell^{(i)} = \nabla_{z^{[l]}} \ell^{(i)}</script>.</p>
<p>Similarly for <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-1709"><span class="MJXp-msubsup" id="MJXp-Span-1710"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1711" style="margin-right: 0.05em;">W</span><span class="MJXp-script-box" style="height: 1.86em; vertical-align: -0.64em;"><span class=" MJXp-script"><span><span style="margin-bottom: -0.25em;"><span class="MJXp-mrow" id="MJXp-Span-1715"><span class="MJXp-mo" id="MJXp-Span-1716">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1717">l</span><span class="MJXp-mo" id="MJXp-Span-1718">]</span></span></span></span></span><span class=" MJXp-script"><span><span style="margin-top: -0.85em;"><span class="MJXp-mrow" id="MJXp-Span-1712"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1713">j</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1714">k</span></span></span></span></span></span></span></span></span><span id="MathJax-Element-121-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-121">W_{jk}^{[l]}</script>:</p>
<span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math MJXp-display" id="MJXp-Span-1719"><span class="MJXp-mtable" id="MJXp-Span-1720"><span><span class="MJXp-mtr" id="MJXp-Span-1721" style="vertical-align: baseline;"><span class="MJXp-mtd" id="MJXp-Span-1722" style="text-align: right;"><span class="MJXp-mfrac" id="MJXp-Span-1723" style="vertical-align: 0.25em;"><span class="MJXp-box"><span class="MJXp-mi" id="MJXp-Span-1724">∂</span><span class="MJXp-msubsup" id="MJXp-Span-1725"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1726" style="margin-right: 0.05em;">ℓ</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1727" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-1728">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1729">i</span><span class="MJXp-mo" id="MJXp-Span-1730">)</span></span></span></span><span class="MJXp-box" style="margin-top: -0.9em;"><span class="MJXp-denom"><span><span class="MJXp-rule" style="height: 1em; border-top: none; border-bottom: 1px solid; margin: 0.1em 0px;"></span></span><span><span class="MJXp-box"><span class="MJXp-mi" id="MJXp-Span-1731">∂</span><span class="MJXp-msubsup" id="MJXp-Span-1732"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1733" style="margin-right: 0.05em;">W</span><span class="MJXp-script-box" style="height: 1.86em; vertical-align: -0.64em;"><span class=" MJXp-script"><span><span style="margin-bottom: -0.25em;"><span class="MJXp-mrow" id="MJXp-Span-1737"><span class="MJXp-mo" id="MJXp-Span-1738">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1739">l</span><span class="MJXp-mo" id="MJXp-Span-1740">]</span></span></span></span></span><span class=" MJXp-script"><span><span style="margin-top: -0.85em;"><span class="MJXp-mrow" id="MJXp-Span-1734"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1735">j</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1736">k</span></span></span></span></span></span></span></span></span></span></span></span></span><span class="MJXp-mtd" id="MJXp-Span-1741" style="padding-left: 0em; text-align: left;"><span class="MJXp-mi" id="MJXp-Span-1742"></span><span class="MJXp-mo" id="MJXp-Span-1743" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-mfrac" id="MJXp-Span-1744" style="vertical-align: 0.25em;"><span class="MJXp-box"><span class="MJXp-mi" id="MJXp-Span-1745">∂</span><span class="MJXp-msubsup" id="MJXp-Span-1746"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1747" style="margin-right: 0.05em;">ℓ</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1748" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-1749">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1750">i</span><span class="MJXp-mo" id="MJXp-Span-1751">)</span></span></span></span><span class="MJXp-box" style="margin-top: -0.9em;"><span class="MJXp-denom"><span><span class="MJXp-rule" style="height: 1em; border-top: none; border-bottom: 1px solid; margin: 0.1em 0px;"></span></span><span><span class="MJXp-box"><span class="MJXp-mi" id="MJXp-Span-1752">∂</span><span class="MJXp-msubsup" id="MJXp-Span-1753"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1754" style="margin-right: 0.05em;">z</span><span class="MJXp-script-box" style="height: 1.86em; vertical-align: -0.64em;"><span class=" MJXp-script"><span><span style="margin-bottom: -0.25em;"><span class="MJXp-mrow" id="MJXp-Span-1756"><span class="MJXp-mo" id="MJXp-Span-1757">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1758">l</span><span class="MJXp-mo" id="MJXp-Span-1759">]</span></span></span></span></span><span class=" MJXp-script"><span><span style="margin-top: -0.85em;"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1755">j</span></span></span></span></span></span></span></span></span></span></span><span class="MJXp-mfrac" id="MJXp-Span-1760" style="vertical-align: 0.25em;"><span class="MJXp-box"><span class="MJXp-mi" id="MJXp-Span-1761">∂</span><span class="MJXp-msubsup" id="MJXp-Span-1762"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1763" style="margin-right: 0.05em;">z</span><span class="MJXp-script-box" style="height: 1.86em; vertical-align: -0.64em;"><span class=" MJXp-script"><span><span style="margin-bottom: -0.25em;"><span class="MJXp-mrow" id="MJXp-Span-1765"><span class="MJXp-mo" id="MJXp-Span-1766">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1767">l</span><span class="MJXp-mo" id="MJXp-Span-1768">]</span></span></span></span></span><span class=" MJXp-script"><span><span style="margin-top: -0.85em;"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1764">j</span></span></span></span></span></span></span><span class="MJXp-box" style="margin-top: -0.9em;"><span class="MJXp-denom"><span><span class="MJXp-rule" style="height: 1em; border-top: none; border-bottom: 1px solid; margin: 0.1em 0px;"></span></span><span><span class="MJXp-box"><span class="MJXp-mi" id="MJXp-Span-1769">∂</span><span class="MJXp-msubsup" id="MJXp-Span-1770"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1771" style="margin-right: 0.05em;">W</span><span class="MJXp-script-box" style="height: 1.86em; vertical-align: -0.64em;"><span class=" MJXp-script"><span><span style="margin-bottom: -0.25em;"><span class="MJXp-mrow" id="MJXp-Span-1775"><span class="MJXp-mo" id="MJXp-Span-1776">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1777">l</span><span class="MJXp-mo" id="MJXp-Span-1778">]</span></span></span></span></span><span class=" MJXp-script"><span><span style="margin-top: -0.85em;"><span class="MJXp-mrow" id="MJXp-Span-1772"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1773">j</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1774">k</span></span></span></span></span></span></span></span></span></span></span></span></span></span><span class="MJXp-mtr" id="MJXp-Span-1779" style="vertical-align: baseline;"><span class="MJXp-mtd" id="MJXp-Span-1780" style="padding-top: 0.3em; text-align: right;"></span><span class="MJXp-mtd" id="MJXp-Span-1781" style="padding-left: 0em; padding-top: 0.3em; text-align: left;"><span class="MJXp-mi" id="MJXp-Span-1782"></span><span class="MJXp-mo" id="MJXp-Span-1783" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-mfrac" id="MJXp-Span-1784" style="vertical-align: 0.25em;"><span class="MJXp-box"><span class="MJXp-mi" id="MJXp-Span-1785">∂</span><span class="MJXp-msubsup" id="MJXp-Span-1786"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1787" style="margin-right: 0.05em;">ℓ</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1788" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-1789">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1790">i</span><span class="MJXp-mo" id="MJXp-Span-1791">)</span></span></span></span><span class="MJXp-box" style="margin-top: -0.9em;"><span class="MJXp-denom"><span><span class="MJXp-rule" style="height: 1em; border-top: none; border-bottom: 1px solid; margin: 0.1em 0px;"></span></span><span><span class="MJXp-box"><span class="MJXp-mi" id="MJXp-Span-1792">∂</span><span class="MJXp-msubsup" id="MJXp-Span-1793"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1794" style="margin-right: 0.05em;">z</span><span class="MJXp-script-box" style="height: 1.86em; vertical-align: -0.64em;"><span class=" MJXp-script"><span><span style="margin-bottom: -0.25em;"><span class="MJXp-mrow" id="MJXp-Span-1796"><span class="MJXp-mo" id="MJXp-Span-1797">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1798">l</span><span class="MJXp-mo" id="MJXp-Span-1799">]</span></span></span></span></span><span class=" MJXp-script"><span><span style="margin-top: -0.85em;"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1795">j</span></span></span></span></span></span></span></span></span></span></span><span class="MJXp-msubsup" id="MJXp-Span-1800"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1801" style="margin-right: 0.05em;">a</span><span class="MJXp-script-box" style="height: 1.86em; vertical-align: -0.64em;"><span class=" MJXp-script"><span><span style="margin-bottom: -0.25em;"><span class="MJXp-mrow" id="MJXp-Span-1803"><span class="MJXp-mo" id="MJXp-Span-1804">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1805">l</span><span class="MJXp-mo" id="MJXp-Span-1806">−</span><span class="MJXp-mn" id="MJXp-Span-1807">1</span><span class="MJXp-mo" id="MJXp-Span-1808">]</span></span></span></span></span><span class=" MJXp-script"><span><span style="margin-top: -0.85em;"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1802">k</span></span></span></span></span></span><span class="MJXp-mo" id="MJXp-Span-1809" style="margin-left: 0em; margin-right: 0.222em;">.</span></span></span></span></span></span></span><span class="mjx-chtml MJXc-display MJXc-processing"><span id="MathJax-Element-122-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0"></span></span><script type="math/tex; mode=display" id="MathJax-Element-122">% <![CDATA[
\begin{aligned}
\frac{\partial \ell^{(i)}}{\partial W_{jk}^{[l]}} &= \frac{\partial \ell^{(i)}}{\partial z_j^{[l]}} \frac{\partial z_j^{[l]}}{\partial W_{jk}^{[l]}} \\
&= \frac{\partial \ell^{(i)}}{\partial z_j^{[l]}} a_k^{[l-1]}.
\end{aligned} %]]></script>
<p>Putting these derivatives into a gradient matrix, we have</p>
<span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math MJXp-display" id="MJXp-Span-1810"><span class="MJXp-msubsup" id="MJXp-Span-1811"><span class="MJXp-mi" id="MJXp-Span-1812" style="margin-right: 0.05em;">∇</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1813" style="vertical-align: -0.4em;"><span class="MJXp-msubsup" id="MJXp-Span-1814"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1815" style="margin-right: 0.05em;">W</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1816" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-1817">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1818">l</span><span class="MJXp-mo" id="MJXp-Span-1819">]</span></span></span></span></span><span class="MJXp-msubsup" id="MJXp-Span-1820"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1821" style="margin-right: 0.05em;">ℓ</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1822" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-1823">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1824">i</span><span class="MJXp-mo" id="MJXp-Span-1825">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-1826" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-msubsup" id="MJXp-Span-1827"><span class="MJXp-mi" id="MJXp-Span-1828" style="margin-right: 0.05em;">∇</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1829" style="vertical-align: -0.4em;"><span class="MJXp-msubsup" id="MJXp-Span-1830"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1831" style="margin-right: 0.05em;">z</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1832" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-1833">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1834">l</span><span class="MJXp-mo" id="MJXp-Span-1835">]</span></span></span></span></span><span class="MJXp-msubsup" id="MJXp-Span-1836"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1837" style="margin-right: 0.05em;">ℓ</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1838" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-1839">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1840">i</span><span class="MJXp-mo" id="MJXp-Span-1841">)</span></span></span><span class="MJXp-msubsup" id="MJXp-Span-1842"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1843" style="margin-right: 0.05em;">a</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1844" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-1845">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1846">l</span><span class="MJXp-mo" id="MJXp-Span-1847">−</span><span class="MJXp-mn" id="MJXp-Span-1848">1</span><span class="MJXp-mo" id="MJXp-Span-1849">]</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1850">T</span></span></span></span></span><span class="mjx-chtml MJXc-display MJXc-processing"><span id="MathJax-Element-123-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0"></span></span><script type="math/tex; mode=display" id="MathJax-Element-123">\nabla_{W^{[l]}} \ell^{(i)} = \nabla_{z^{[l]}} \ell^{(i)} a^{[l-1]T}</script>
<p>(Recall that a column times a row is a matrix.)</p>
<h4 id="previous-layer-derivatives">Previous layer derivatives</h4>
<p>The figure below shows how the previous layer <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-1851"><span class="MJXp-mo" id="MJXp-Span-1852" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1853">l</span><span class="MJXp-mo" id="MJXp-Span-1854" style="margin-left: 0.267em; margin-right: 0.267em;">−</span><span class="MJXp-mn" id="MJXp-Span-1855">1</span><span class="MJXp-mo" id="MJXp-Span-1856" style="margin-left: 0em; margin-right: 0em;">)</span></span></span><span id="MathJax-Element-124-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-124">(l-1)</script> affects the loss.</p>
<p><img src="./Implementing a neural network in Python _ statsandstuff_files/backprop_prevoutput.png" alt=""></p>
<p>A particular unit <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-1857"><span class="MJXp-msubsup" id="MJXp-Span-1858"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1859" style="margin-right: 0.05em;">a</span><span class="MJXp-script-box" style="height: 1.86em; vertical-align: -0.64em;"><span class=" MJXp-script"><span><span style="margin-bottom: -0.25em;"><span class="MJXp-mrow" id="MJXp-Span-1861"><span class="MJXp-mo" id="MJXp-Span-1862">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1863">l</span><span class="MJXp-mo" id="MJXp-Span-1864">−</span><span class="MJXp-mn" id="MJXp-Span-1865">1</span><span class="MJXp-mo" id="MJXp-Span-1866">]</span></span></span></span></span><span class=" MJXp-script"><span><span style="margin-top: -0.85em;"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1860">j</span></span></span></span></span></span></span></span><span id="MathJax-Element-125-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-125">a_j^{[l-1]}</script> in the previous layer affects the loss through every unit in the current layer <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-1867"><span class="MJXp-msubsup" id="MJXp-Span-1868"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1869" style="margin-right: 0.05em;">z</span><span class="MJXp-script-box" style="height: 1.86em; vertical-align: -0.64em;"><span class=" MJXp-script"><span><span style="margin-bottom: -0.25em;"><span class="MJXp-mrow" id="MJXp-Span-1871"><span class="MJXp-mo" id="MJXp-Span-1872">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1873">l</span><span class="MJXp-mo" id="MJXp-Span-1874">]</span></span></span></span></span><span class=" MJXp-script"><span><span style="margin-top: -0.85em;"><span class="MJXp-mn" id="MJXp-Span-1870">1</span></span></span></span></span></span><span class="MJXp-mo" id="MJXp-Span-1875" style="margin-left: 0em; margin-right: 0.222em;">,</span><span class="MJXp-msubsup" id="MJXp-Span-1876"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1877" style="margin-right: 0.05em;">z</span><span class="MJXp-script-box" style="height: 1.86em; vertical-align: -0.64em;"><span class=" MJXp-script"><span><span style="margin-bottom: -0.25em;"><span class="MJXp-mrow" id="MJXp-Span-1879"><span class="MJXp-mo" id="MJXp-Span-1880">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1881">l</span><span class="MJXp-mo" id="MJXp-Span-1882">]</span></span></span></span></span><span class=" MJXp-script"><span><span style="margin-top: -0.85em;"><span class="MJXp-mn" id="MJXp-Span-1878">2</span></span></span></span></span></span><span class="MJXp-mo" id="MJXp-Span-1883" style="margin-left: 0em; margin-right: 0.222em;">,</span><span class="MJXp-mo" id="MJXp-Span-1884" style="margin-left: 0em; margin-right: 0em;">…</span><span class="MJXp-mo" id="MJXp-Span-1885" style="margin-left: 0em; margin-right: 0.222em;">,</span><span class="MJXp-msubsup" id="MJXp-Span-1886"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1887" style="margin-right: 0.05em;">z</span><span class="MJXp-script-box" style="height: 2.132em; vertical-align: -0.912em;"><span class=" MJXp-script"><span><span style="margin-bottom: -0.25em;"><span class="MJXp-mrow" id="MJXp-Span-1892"><span class="MJXp-mo" id="MJXp-Span-1893">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1894">l</span><span class="MJXp-mo" id="MJXp-Span-1895">]</span></span></span></span></span><span class=" MJXp-script"><span><span style="margin-top: -0.85em;"><span class="MJXp-mrow" id="MJXp-Span-1888"><span class="MJXp-msubsup" id="MJXp-Span-1889"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1890" style="margin-right: 0.05em;">n</span><span class="MJXp-mi MJXp-italic MJXp-script" id="MJXp-Span-1891" style="vertical-align: -0.4em;">l</span></span></span></span></span></span></span></span></span></span><span id="MathJax-Element-126-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-126">z_1^{[l]}, z_2^{[l]}, \ldots, z_{n_l}^{[l]}</script>.</p>
<p>The multivariate chain rule therefore gives</p>
<span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math MJXp-display" id="MJXp-Span-1896"><span class="MJXp-mtable" id="MJXp-Span-1897"><span><span class="MJXp-mtr" id="MJXp-Span-1898" style="vertical-align: baseline;"><span class="MJXp-mtd" id="MJXp-Span-1899" style="text-align: right;"><span class="MJXp-mfrac" id="MJXp-Span-1900" style="vertical-align: 0.25em;"><span class="MJXp-box"><span class="MJXp-mi" id="MJXp-Span-1901">∂</span><span class="MJXp-msubsup" id="MJXp-Span-1902"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1903" style="margin-right: 0.05em;">ℓ</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1904" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-1905">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1906">i</span><span class="MJXp-mo" id="MJXp-Span-1907">)</span></span></span></span><span class="MJXp-box" style="margin-top: -0.9em;"><span class="MJXp-denom"><span><span class="MJXp-rule" style="height: 1em; border-top: none; border-bottom: 1px solid; margin: 0.1em 0px;"></span></span><span><span class="MJXp-box"><span class="MJXp-mi" id="MJXp-Span-1908">∂</span><span class="MJXp-msubsup" id="MJXp-Span-1909"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1910" style="margin-right: 0.05em;">a</span><span class="MJXp-script-box" style="height: 1.86em; vertical-align: -0.64em;"><span class=" MJXp-script"><span><span style="margin-bottom: -0.25em;"><span class="MJXp-mrow" id="MJXp-Span-1912"><span class="MJXp-mo" id="MJXp-Span-1913">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1914">l</span><span class="MJXp-mo" id="MJXp-Span-1915">−</span><span class="MJXp-mn" id="MJXp-Span-1916">1</span><span class="MJXp-mo" id="MJXp-Span-1917">]</span></span></span></span></span><span class=" MJXp-script"><span><span style="margin-top: -0.85em;"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1911">j</span></span></span></span></span></span></span></span></span></span></span></span><span class="MJXp-mtd" id="MJXp-Span-1918" style="padding-left: 0em; text-align: left;"><span class="MJXp-mi" id="MJXp-Span-1919"></span><span class="MJXp-mo" id="MJXp-Span-1920" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-mfrac" id="MJXp-Span-1921" style="vertical-align: 0.25em;"><span class="MJXp-box"><span class="MJXp-mi" id="MJXp-Span-1922">∂</span><span class="MJXp-msubsup" id="MJXp-Span-1923"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1924" style="margin-right: 0.05em;">ℓ</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1925" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-1926">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1927">i</span><span class="MJXp-mo" id="MJXp-Span-1928">)</span></span></span></span><span class="MJXp-box" style="margin-top: -0.9em;"><span class="MJXp-denom"><span><span class="MJXp-rule" style="height: 1em; border-top: none; border-bottom: 1px solid; margin: 0.1em 0px;"></span></span><span><span class="MJXp-box"><span class="MJXp-mi" id="MJXp-Span-1929">∂</span><span class="MJXp-msubsup" id="MJXp-Span-1930"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1931" style="margin-right: 0.05em;">z</span><span class="MJXp-script-box" style="height: 1.86em; vertical-align: -0.64em;"><span class=" MJXp-script"><span><span style="margin-bottom: -0.25em;"><span class="MJXp-mrow" id="MJXp-Span-1933"><span class="MJXp-mo" id="MJXp-Span-1934">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1935">l</span><span class="MJXp-mo" id="MJXp-Span-1936">]</span></span></span></span></span><span class=" MJXp-script"><span><span style="margin-top: -0.85em;"><span class="MJXp-mn" id="MJXp-Span-1932">1</span></span></span></span></span></span></span></span></span></span></span><span class="MJXp-mfrac" id="MJXp-Span-1937" style="vertical-align: 0.25em;"><span class="MJXp-box"><span class="MJXp-mi" id="MJXp-Span-1938">∂</span><span class="MJXp-msubsup" id="MJXp-Span-1939"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1940" style="margin-right: 0.05em;">z</span><span class="MJXp-script-box" style="height: 1.86em; vertical-align: -0.64em;"><span class=" MJXp-script"><span><span style="margin-bottom: -0.25em;"><span class="MJXp-mrow" id="MJXp-Span-1942"><span class="MJXp-mo" id="MJXp-Span-1943">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1944">l</span><span class="MJXp-mo" id="MJXp-Span-1945">]</span></span></span></span></span><span class=" MJXp-script"><span><span style="margin-top: -0.85em;"><span class="MJXp-mn" id="MJXp-Span-1941">1</span></span></span></span></span></span></span><span class="MJXp-box" style="margin-top: -0.9em;"><span class="MJXp-denom"><span><span class="MJXp-rule" style="height: 1em; border-top: none; border-bottom: 1px solid; margin: 0.1em 0px;"></span></span><span><span class="MJXp-box"><span class="MJXp-mi" id="MJXp-Span-1946">∂</span><span class="MJXp-msubsup" id="MJXp-Span-1947"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1948" style="margin-right: 0.05em;">a</span><span class="MJXp-script-box" style="height: 1.86em; vertical-align: -0.64em;"><span class=" MJXp-script"><span><span style="margin-bottom: -0.25em;"><span class="MJXp-mrow" id="MJXp-Span-1950"><span class="MJXp-mo" id="MJXp-Span-1951">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1952">l</span><span class="MJXp-mo" id="MJXp-Span-1953">−</span><span class="MJXp-mn" id="MJXp-Span-1954">1</span><span class="MJXp-mo" id="MJXp-Span-1955">]</span></span></span></span></span><span class=" MJXp-script"><span><span style="margin-top: -0.85em;"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1949">j</span></span></span></span></span></span></span></span></span></span></span><span class="MJXp-mo" id="MJXp-Span-1956" style="margin-left: 0.267em; margin-right: 0.267em;">+</span><span class="MJXp-mfrac" id="MJXp-Span-1957" style="vertical-align: 0.25em;"><span class="MJXp-box"><span class="MJXp-mi" id="MJXp-Span-1958">∂</span><span class="MJXp-msubsup" id="MJXp-Span-1959"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1960" style="margin-right: 0.05em;">ℓ</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1961" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-1962">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1963">i</span><span class="MJXp-mo" id="MJXp-Span-1964">)</span></span></span></span><span class="MJXp-box" style="margin-top: -0.9em;"><span class="MJXp-denom"><span><span class="MJXp-rule" style="height: 1em; border-top: none; border-bottom: 1px solid; margin: 0.1em 0px;"></span></span><span><span class="MJXp-box"><span class="MJXp-mi" id="MJXp-Span-1965">∂</span><span class="MJXp-msubsup" id="MJXp-Span-1966"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1967" style="margin-right: 0.05em;">z</span><span class="MJXp-script-box" style="height: 1.86em; vertical-align: -0.64em;"><span class=" MJXp-script"><span><span style="margin-bottom: -0.25em;"><span class="MJXp-mrow" id="MJXp-Span-1969"><span class="MJXp-mo" id="MJXp-Span-1970">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1971">l</span><span class="MJXp-mo" id="MJXp-Span-1972">]</span></span></span></span></span><span class=" MJXp-script"><span><span style="margin-top: -0.85em;"><span class="MJXp-mn" id="MJXp-Span-1968">2</span></span></span></span></span></span></span></span></span></span></span><span class="MJXp-mfrac" id="MJXp-Span-1973" style="vertical-align: 0.25em;"><span class="MJXp-box"><span class="MJXp-mi" id="MJXp-Span-1974">∂</span><span class="MJXp-msubsup" id="MJXp-Span-1975"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1976" style="margin-right: 0.05em;">z</span><span class="MJXp-script-box" style="height: 1.86em; vertical-align: -0.64em;"><span class=" MJXp-script"><span><span style="margin-bottom: -0.25em;"><span class="MJXp-mrow" id="MJXp-Span-1978"><span class="MJXp-mo" id="MJXp-Span-1979">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1980">l</span><span class="MJXp-mo" id="MJXp-Span-1981">]</span></span></span></span></span><span class=" MJXp-script"><span><span style="margin-top: -0.85em;"><span class="MJXp-mn" id="MJXp-Span-1977">2</span></span></span></span></span></span></span><span class="MJXp-box" style="margin-top: -0.9em;"><span class="MJXp-denom"><span><span class="MJXp-rule" style="height: 1em; border-top: none; border-bottom: 1px solid; margin: 0.1em 0px;"></span></span><span><span class="MJXp-box"><span class="MJXp-mi" id="MJXp-Span-1982">∂</span><span class="MJXp-msubsup" id="MJXp-Span-1983"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1984" style="margin-right: 0.05em;">a</span><span class="MJXp-script-box" style="height: 1.86em; vertical-align: -0.64em;"><span class=" MJXp-script"><span><span style="margin-bottom: -0.25em;"><span class="MJXp-mrow" id="MJXp-Span-1986"><span class="MJXp-mo" id="MJXp-Span-1987">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1988">l</span><span class="MJXp-mo" id="MJXp-Span-1989">−</span><span class="MJXp-mn" id="MJXp-Span-1990">1</span><span class="MJXp-mo" id="MJXp-Span-1991">]</span></span></span></span></span><span class=" MJXp-script"><span><span style="margin-top: -0.85em;"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1985">j</span></span></span></span></span></span></span></span></span></span></span><span class="MJXp-mo" id="MJXp-Span-1992" style="margin-left: 0.267em; margin-right: 0.267em;">+</span><span class="MJXp-mo" id="MJXp-Span-1993" style="margin-left: 0em; margin-right: 0em;">…</span><span class="MJXp-mo" id="MJXp-Span-1994" style="margin-left: 0.267em; margin-right: 0.267em;">+</span><span class="MJXp-mfrac" id="MJXp-Span-1995" style="vertical-align: 0.25em;"><span class="MJXp-box"><span class="MJXp-mi" id="MJXp-Span-1996">∂</span><span class="MJXp-msubsup" id="MJXp-Span-1997"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-1998" style="margin-right: 0.05em;">ℓ</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-1999" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2000">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2001">i</span><span class="MJXp-mo" id="MJXp-Span-2002">)</span></span></span></span><span class="MJXp-box" style="margin-top: -0.9em;"><span class="MJXp-denom"><span><span class="MJXp-rule" style="height: 1em; border-top: none; border-bottom: 1px solid; margin: 0.1em 0px;"></span></span><span><span class="MJXp-box"><span class="MJXp-mi" id="MJXp-Span-2003">∂</span><span class="MJXp-msubsup" id="MJXp-Span-2004"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2005" style="margin-right: 0.05em;">z</span><span class="MJXp-script-box" style="height: 2.132em; vertical-align: -0.912em;"><span class=" MJXp-script"><span><span style="margin-bottom: -0.25em;"><span class="MJXp-mrow" id="MJXp-Span-2010"><span class="MJXp-mo" id="MJXp-Span-2011">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2012">l</span><span class="MJXp-mo" id="MJXp-Span-2013">]</span></span></span></span></span><span class=" MJXp-script"><span><span style="margin-top: -0.85em;"><span class="MJXp-mrow" id="MJXp-Span-2006"><span class="MJXp-msubsup" id="MJXp-Span-2007"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2008" style="margin-right: 0.05em;">n</span><span class="MJXp-mi MJXp-italic MJXp-script" id="MJXp-Span-2009" style="vertical-align: -0.4em;">l</span></span></span></span></span></span></span></span></span></span></span></span></span><span class="MJXp-mfrac" id="MJXp-Span-2014" style="vertical-align: 0.25em;"><span class="MJXp-box"><span class="MJXp-mi" id="MJXp-Span-2015">∂</span><span class="MJXp-msubsup" id="MJXp-Span-2016"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2017" style="margin-right: 0.05em;">z</span><span class="MJXp-script-box" style="height: 2.132em; vertical-align: -0.912em;"><span class=" MJXp-script"><span><span style="margin-bottom: -0.25em;"><span class="MJXp-mrow" id="MJXp-Span-2022"><span class="MJXp-mo" id="MJXp-Span-2023">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2024">l</span><span class="MJXp-mo" id="MJXp-Span-2025">]</span></span></span></span></span><span class=" MJXp-script"><span><span style="margin-top: -0.85em;"><span class="MJXp-mrow" id="MJXp-Span-2018"><span class="MJXp-msubsup" id="MJXp-Span-2019"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2020" style="margin-right: 0.05em;">n</span><span class="MJXp-mi MJXp-italic MJXp-script" id="MJXp-Span-2021" style="vertical-align: -0.4em;">l</span></span></span></span></span></span></span></span></span><span class="MJXp-box" style="margin-top: -0.9em;"><span class="MJXp-denom"><span><span class="MJXp-rule" style="height: 1em; border-top: none; border-bottom: 1px solid; margin: 0.1em 0px;"></span></span><span><span class="MJXp-box"><span class="MJXp-mi" id="MJXp-Span-2026">∂</span><span class="MJXp-msubsup" id="MJXp-Span-2027"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2028" style="margin-right: 0.05em;">a</span><span class="MJXp-script-box" style="height: 1.86em; vertical-align: -0.64em;"><span class=" MJXp-script"><span><span style="margin-bottom: -0.25em;"><span class="MJXp-mrow" id="MJXp-Span-2030"><span class="MJXp-mo" id="MJXp-Span-2031">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2032">l</span><span class="MJXp-mo" id="MJXp-Span-2033">−</span><span class="MJXp-mn" id="MJXp-Span-2034">1</span><span class="MJXp-mo" id="MJXp-Span-2035">]</span></span></span></span></span><span class=" MJXp-script"><span><span style="margin-top: -0.85em;"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2029">j</span></span></span></span></span></span></span></span></span></span></span></span></span><span class="MJXp-mtr" id="MJXp-Span-2036" style="vertical-align: baseline;"><span class="MJXp-mtd" id="MJXp-Span-2037" style="padding-top: 0.3em; text-align: right;"></span><span class="MJXp-mtd" id="MJXp-Span-2038" style="padding-left: 0em; padding-top: 0.3em; text-align: left;"><span class="MJXp-mi" id="MJXp-Span-2039"></span><span class="MJXp-mo" id="MJXp-Span-2040" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-mfrac" id="MJXp-Span-2041" style="vertical-align: 0.25em;"><span class="MJXp-box"><span class="MJXp-mi" id="MJXp-Span-2042">∂</span><span class="MJXp-msubsup" id="MJXp-Span-2043"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2044" style="margin-right: 0.05em;">ℓ</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2045" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2046">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2047">i</span><span class="MJXp-mo" id="MJXp-Span-2048">)</span></span></span></span><span class="MJXp-box" style="margin-top: -0.9em;"><span class="MJXp-denom"><span><span class="MJXp-rule" style="height: 1em; border-top: none; border-bottom: 1px solid; margin: 0.1em 0px;"></span></span><span><span class="MJXp-box"><span class="MJXp-mi" id="MJXp-Span-2049">∂</span><span class="MJXp-msubsup" id="MJXp-Span-2050"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2051" style="margin-right: 0.05em;">z</span><span class="MJXp-script-box" style="height: 1.86em; vertical-align: -0.64em;"><span class=" MJXp-script"><span><span style="margin-bottom: -0.25em;"><span class="MJXp-mrow" id="MJXp-Span-2053"><span class="MJXp-mo" id="MJXp-Span-2054">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2055">l</span><span class="MJXp-mo" id="MJXp-Span-2056">]</span></span></span></span></span><span class=" MJXp-script"><span><span style="margin-top: -0.85em;"><span class="MJXp-mn" id="MJXp-Span-2052">1</span></span></span></span></span></span></span></span></span></span></span><span class="MJXp-msubsup" id="MJXp-Span-2057"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2058" style="margin-right: 0.05em;">W</span><span class="MJXp-script-box" style="height: 1.86em; vertical-align: -0.64em;"><span class=" MJXp-script"><span><span style="margin-bottom: -0.25em;"><span class="MJXp-mrow" id="MJXp-Span-2062"><span class="MJXp-mo" id="MJXp-Span-2063">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2064">l</span><span class="MJXp-mo" id="MJXp-Span-2065">]</span></span></span></span></span><span class=" MJXp-script"><span><span style="margin-top: -0.85em;"><span class="MJXp-mrow" id="MJXp-Span-2059"><span class="MJXp-mn" id="MJXp-Span-2060">1</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2061">j</span></span></span></span></span></span></span><span class="MJXp-mo" id="MJXp-Span-2066" style="margin-left: 0.267em; margin-right: 0.267em;">+</span><span class="MJXp-mfrac" id="MJXp-Span-2067" style="vertical-align: 0.25em;"><span class="MJXp-box"><span class="MJXp-mi" id="MJXp-Span-2068">∂</span><span class="MJXp-msubsup" id="MJXp-Span-2069"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2070" style="margin-right: 0.05em;">ℓ</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2071" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2072">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2073">i</span><span class="MJXp-mo" id="MJXp-Span-2074">)</span></span></span></span><span class="MJXp-box" style="margin-top: -0.9em;"><span class="MJXp-denom"><span><span class="MJXp-rule" style="height: 1em; border-top: none; border-bottom: 1px solid; margin: 0.1em 0px;"></span></span><span><span class="MJXp-box"><span class="MJXp-mi" id="MJXp-Span-2075">∂</span><span class="MJXp-msubsup" id="MJXp-Span-2076"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2077" style="margin-right: 0.05em;">z</span><span class="MJXp-script-box" style="height: 1.86em; vertical-align: -0.64em;"><span class=" MJXp-script"><span><span style="margin-bottom: -0.25em;"><span class="MJXp-mrow" id="MJXp-Span-2079"><span class="MJXp-mo" id="MJXp-Span-2080">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2081">l</span><span class="MJXp-mo" id="MJXp-Span-2082">]</span></span></span></span></span><span class=" MJXp-script"><span><span style="margin-top: -0.85em;"><span class="MJXp-mn" id="MJXp-Span-2078">2</span></span></span></span></span></span></span></span></span></span></span><span class="MJXp-msubsup" id="MJXp-Span-2083"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2084" style="margin-right: 0.05em;">W</span><span class="MJXp-script-box" style="height: 1.86em; vertical-align: -0.64em;"><span class=" MJXp-script"><span><span style="margin-bottom: -0.25em;"><span class="MJXp-mrow" id="MJXp-Span-2088"><span class="MJXp-mo" id="MJXp-Span-2089">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2090">l</span><span class="MJXp-mo" id="MJXp-Span-2091">]</span></span></span></span></span><span class=" MJXp-script"><span><span style="margin-top: -0.85em;"><span class="MJXp-mrow" id="MJXp-Span-2085"><span class="MJXp-mn" id="MJXp-Span-2086">2</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2087">j</span></span></span></span></span></span></span><span class="MJXp-mo" id="MJXp-Span-2092" style="margin-left: 0.267em; margin-right: 0.267em;">+</span><span class="MJXp-mo" id="MJXp-Span-2093" style="margin-left: 0em; margin-right: 0em;">…</span><span class="MJXp-mo" id="MJXp-Span-2094" style="margin-left: 0.267em; margin-right: 0.267em;">+</span><span class="MJXp-mfrac" id="MJXp-Span-2095" style="vertical-align: 0.25em;"><span class="MJXp-box"><span class="MJXp-mi" id="MJXp-Span-2096">∂</span><span class="MJXp-msubsup" id="MJXp-Span-2097"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2098" style="margin-right: 0.05em;">ℓ</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2099" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2100">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2101">i</span><span class="MJXp-mo" id="MJXp-Span-2102">)</span></span></span></span><span class="MJXp-box" style="margin-top: -0.9em;"><span class="MJXp-denom"><span><span class="MJXp-rule" style="height: 1em; border-top: none; border-bottom: 1px solid; margin: 0.1em 0px;"></span></span><span><span class="MJXp-box"><span class="MJXp-mi" id="MJXp-Span-2103">∂</span><span class="MJXp-msubsup" id="MJXp-Span-2104"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2105" style="margin-right: 0.05em;">z</span><span class="MJXp-script-box" style="height: 2.132em; vertical-align: -0.912em;"><span class=" MJXp-script"><span><span style="margin-bottom: -0.25em;"><span class="MJXp-mrow" id="MJXp-Span-2110"><span class="MJXp-mo" id="MJXp-Span-2111">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2112">l</span><span class="MJXp-mo" id="MJXp-Span-2113">]</span></span></span></span></span><span class=" MJXp-script"><span><span style="margin-top: -0.85em;"><span class="MJXp-mrow" id="MJXp-Span-2106"><span class="MJXp-msubsup" id="MJXp-Span-2107"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2108" style="margin-right: 0.05em;">n</span><span class="MJXp-mi MJXp-italic MJXp-script" id="MJXp-Span-2109" style="vertical-align: -0.4em;">l</span></span></span></span></span></span></span></span></span></span></span></span></span><span class="MJXp-msubsup" id="MJXp-Span-2114"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2115" style="margin-right: 0.05em;">W</span><span class="MJXp-script-box" style="height: 2.132em; vertical-align: -0.912em;"><span class=" MJXp-script"><span><span style="margin-bottom: -0.25em;"><span class="MJXp-mrow" id="MJXp-Span-2121"><span class="MJXp-mo" id="MJXp-Span-2122">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2123">l</span><span class="MJXp-mo" id="MJXp-Span-2124">]</span></span></span></span></span><span class=" MJXp-script"><span><span style="margin-top: -0.85em;"><span class="MJXp-mrow" id="MJXp-Span-2116"><span class="MJXp-msubsup" id="MJXp-Span-2117"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2118" style="margin-right: 0.05em;">n</span><span class="MJXp-mi MJXp-italic MJXp-script" id="MJXp-Span-2119" style="vertical-align: -0.4em;">l</span></span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2120">j</span></span></span></span></span></span></span></span></span></span></span></span></span><span class="mjx-chtml MJXc-display MJXc-processing"><span id="MathJax-Element-127-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0"></span></span><script type="math/tex; mode=display" id="MathJax-Element-127">% <![CDATA[
\begin{aligned}
\frac{\partial \ell^{(i)}}{\partial a_j^{[l-1]}} &= \frac{\partial \ell^{(i)}}{\partial z_1^{[l]}} \frac{\partial z_1^{[l]}}{\partial a_j^{[l-1]}} + \frac{\partial \ell^{(i)}}{\partial z_2^{[l]}} \frac{\partial z_2^{[l]}}{\partial a_j^{[l-1]}} + \ldots + \frac{\partial \ell^{(i)}}{\partial z_{n_l}^{[l]}} \frac{\partial z_{n_l}^{[l]}}{\partial a_j^{[l-1]}} \\
&= \frac{\partial \ell^{(i)}}{\partial z_1^{[l]}} W_{1j}^{[l]} + \frac{\partial \ell^{(i)}}{\partial z_2^{[l]}} W_{2j}^{[l]} + \ldots + \frac{\partial \ell^{(i)}}{\partial z_{n_l}^{[l]}} W_{n_lj}^{[l]}
\end{aligned} %]]></script>
<p>Putting these in a vector, we have</p>
<span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math MJXp-display" id="MJXp-Span-2125"><span class="MJXp-msubsup" id="MJXp-Span-2126"><span class="MJXp-mi" id="MJXp-Span-2127" style="margin-right: 0.05em;">∇</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2128" style="vertical-align: -0.4em;"><span class="MJXp-msubsup" id="MJXp-Span-2129"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2130" style="margin-right: 0.05em;">a</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2131" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2132">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2133">l</span><span class="MJXp-mo" id="MJXp-Span-2134">−</span><span class="MJXp-mn" id="MJXp-Span-2135">1</span><span class="MJXp-mo" id="MJXp-Span-2136">]</span></span></span></span></span><span class="MJXp-msubsup" id="MJXp-Span-2137"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2138" style="margin-right: 0.05em;">ℓ</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2139" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2140">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2141">i</span><span class="MJXp-mo" id="MJXp-Span-2142">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-2143" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-msubsup" id="MJXp-Span-2144"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2145" style="margin-right: 0.05em;">W</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2146" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2147">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2148">l</span><span class="MJXp-mo" id="MJXp-Span-2149">]</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2150">T</span></span></span><span class="MJXp-msubsup" id="MJXp-Span-2151"><span class="MJXp-mi" id="MJXp-Span-2152" style="margin-right: 0.05em;">∇</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2153" style="vertical-align: -0.4em;"><span class="MJXp-msubsup" id="MJXp-Span-2154"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2155" style="margin-right: 0.05em;">z</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2156" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2157">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2158">l</span><span class="MJXp-mo" id="MJXp-Span-2159">]</span></span></span></span></span><span class="MJXp-msubsup" id="MJXp-Span-2160"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2161" style="margin-right: 0.05em;">ℓ</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2162" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2163">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2164">i</span><span class="MJXp-mo" id="MJXp-Span-2165">)</span></span></span></span></span><span class="mjx-chtml MJXc-display MJXc-processing"><span id="MathJax-Element-128-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0"></span></span><script type="math/tex; mode=display" id="MathJax-Element-128">\nabla_{a^{[l-1]}} \ell^{(i)} = W^{[l]T} \nabla_{z^{[l]}} \ell^{(i)}</script>
<h4 id="back-propagation-on-a-batch">Back propagation on a batch</h4>
<p>Summarizing the derivatives from the previous sections, for a single observation we have:</p>
<span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math MJXp-display" id="MJXp-Span-2166"><span class="MJXp-mtable" id="MJXp-Span-2167"><span><span class="MJXp-mtr" id="MJXp-Span-2168" style="vertical-align: baseline;"><span class="MJXp-mtd" id="MJXp-Span-2169" style="text-align: right;"><span class="MJXp-msubsup" id="MJXp-Span-2170"><span class="MJXp-mi" id="MJXp-Span-2171" style="margin-right: 0.05em;">∇</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2172" style="vertical-align: -0.4em;"><span class="MJXp-msubsup" id="MJXp-Span-2173"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2174" style="margin-right: 0.05em;">z</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2175" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2176">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2177">l</span><span class="MJXp-mo" id="MJXp-Span-2178">]</span></span></span></span></span><span class="MJXp-msubsup" id="MJXp-Span-2179"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2180" style="margin-right: 0.05em;">ℓ</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2181" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2182">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2183">i</span><span class="MJXp-mo" id="MJXp-Span-2184">)</span></span></span></span><span class="MJXp-mtd" id="MJXp-Span-2185" style="padding-left: 0em; text-align: left;"><span class="MJXp-mi" id="MJXp-Span-2186"></span><span class="MJXp-mo" id="MJXp-Span-2187" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-msubsup" id="MJXp-Span-2188"><span class="MJXp-mi" id="MJXp-Span-2189" style="margin-right: 0.05em;">∇</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2190" style="vertical-align: -0.4em;"><span class="MJXp-msubsup" id="MJXp-Span-2191"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2192" style="margin-right: 0.05em;">a</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2193" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2194">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2195">l</span><span class="MJXp-mo" id="MJXp-Span-2196">]</span></span></span></span></span><span class="MJXp-msubsup" id="MJXp-Span-2197"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2198" style="margin-right: 0.05em;">ℓ</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2199" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2200">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2201">i</span><span class="MJXp-mo" id="MJXp-Span-2202">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-2203" style="margin-left: 0.267em; margin-right: 0.267em;">∗</span><span class="MJXp-mo" id="MJXp-Span-2204" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-msubsup" id="MJXp-Span-2205"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2206" style="margin-right: 0.05em;">g</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2207" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2208">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2209">l</span><span class="MJXp-mo" id="MJXp-Span-2210">]</span></span></span><span class="MJXp-msup" id="MJXp-Span-2211"><span class="MJXp-mo" id="MJXp-Span-2212" style="margin-left: 0em; margin-right: 0.05em;">)</span><span class="MJXp-mo MJXp-script" id="MJXp-Span-2213" style="vertical-align: 0.5em;">′</span></span><span class="MJXp-mo" id="MJXp-Span-2214" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-msubsup" id="MJXp-Span-2215"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2216" style="margin-right: 0.05em;">z</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2217" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2218">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2219">l</span><span class="MJXp-mo" id="MJXp-Span-2220">]</span></span></span><span class="MJXp-mo" id="MJXp-Span-2221" style="margin-left: 0em; margin-right: 0em;">)</span></span></span><span class="MJXp-mtr" id="MJXp-Span-2222" style="vertical-align: baseline;"><span class="MJXp-mtd" id="MJXp-Span-2223" style="padding-top: 0.3em; text-align: right;"><span class="MJXp-msubsup" id="MJXp-Span-2224"><span class="MJXp-mi" id="MJXp-Span-2225" style="margin-right: 0.05em;">∇</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2226" style="vertical-align: -0.4em;"><span class="MJXp-msubsup" id="MJXp-Span-2227"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2228" style="margin-right: 0.05em;">b</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2229" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2230">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2231">l</span><span class="MJXp-mo" id="MJXp-Span-2232">]</span></span></span></span></span><span class="MJXp-msubsup" id="MJXp-Span-2233"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2234" style="margin-right: 0.05em;">ℓ</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2235" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2236">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2237">i</span><span class="MJXp-mo" id="MJXp-Span-2238">)</span></span></span></span><span class="MJXp-mtd" id="MJXp-Span-2239" style="padding-left: 0em; padding-top: 0.3em; text-align: left;"><span class="MJXp-mi" id="MJXp-Span-2240"></span><span class="MJXp-mo" id="MJXp-Span-2241" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-msubsup" id="MJXp-Span-2242"><span class="MJXp-mi" id="MJXp-Span-2243" style="margin-right: 0.05em;">∇</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2244" style="vertical-align: -0.4em;"><span class="MJXp-msubsup" id="MJXp-Span-2245"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2246" style="margin-right: 0.05em;">z</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2247" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2248">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2249">l</span><span class="MJXp-mo" id="MJXp-Span-2250">]</span></span></span></span></span><span class="MJXp-msubsup" id="MJXp-Span-2251"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2252" style="margin-right: 0.05em;">ℓ</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2253" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2254">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2255">i</span><span class="MJXp-mo" id="MJXp-Span-2256">)</span></span></span></span></span><span class="MJXp-mtr" id="MJXp-Span-2257" style="vertical-align: baseline;"><span class="MJXp-mtd" id="MJXp-Span-2258" style="padding-top: 0.3em; text-align: right;"><span class="MJXp-msubsup" id="MJXp-Span-2259"><span class="MJXp-mi" id="MJXp-Span-2260" style="margin-right: 0.05em;">∇</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2261" style="vertical-align: -0.4em;"><span class="MJXp-msubsup" id="MJXp-Span-2262"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2263" style="margin-right: 0.05em;">W</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2264" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2265">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2266">l</span><span class="MJXp-mo" id="MJXp-Span-2267">]</span></span></span></span></span><span class="MJXp-msubsup" id="MJXp-Span-2268"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2269" style="margin-right: 0.05em;">ℓ</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2270" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2271">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2272">i</span><span class="MJXp-mo" id="MJXp-Span-2273">)</span></span></span></span><span class="MJXp-mtd" id="MJXp-Span-2274" style="padding-left: 0em; padding-top: 0.3em; text-align: left;"><span class="MJXp-mi" id="MJXp-Span-2275"></span><span class="MJXp-mo" id="MJXp-Span-2276" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-msubsup" id="MJXp-Span-2277"><span class="MJXp-mi" id="MJXp-Span-2278" style="margin-right: 0.05em;">∇</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2279" style="vertical-align: -0.4em;"><span class="MJXp-msubsup" id="MJXp-Span-2280"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2281" style="margin-right: 0.05em;">z</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2282" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2283">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2284">l</span><span class="MJXp-mo" id="MJXp-Span-2285">]</span></span></span></span></span><span class="MJXp-msubsup" id="MJXp-Span-2286"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2287" style="margin-right: 0.05em;">ℓ</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2288" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2289">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2290">i</span><span class="MJXp-mo" id="MJXp-Span-2291">)</span></span></span><span class="MJXp-msubsup" id="MJXp-Span-2292"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2293" style="margin-right: 0.05em;">a</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2294" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2295">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2296">l</span><span class="MJXp-mo" id="MJXp-Span-2297">−</span><span class="MJXp-mn" id="MJXp-Span-2298">1</span><span class="MJXp-mo" id="MJXp-Span-2299">]</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2300">T</span></span></span></span></span><span class="MJXp-mtr" id="MJXp-Span-2301" style="vertical-align: baseline;"><span class="MJXp-mtd" id="MJXp-Span-2302" style="padding-top: 0.3em; text-align: right;"><span class="MJXp-msubsup" id="MJXp-Span-2303"><span class="MJXp-mi" id="MJXp-Span-2304" style="margin-right: 0.05em;">∇</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2305" style="vertical-align: -0.4em;"><span class="MJXp-msubsup" id="MJXp-Span-2306"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2307" style="margin-right: 0.05em;">a</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2308" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2309">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2310">l</span><span class="MJXp-mo" id="MJXp-Span-2311">−</span><span class="MJXp-mn" id="MJXp-Span-2312">1</span><span class="MJXp-mo" id="MJXp-Span-2313">]</span></span></span></span></span><span class="MJXp-msubsup" id="MJXp-Span-2314"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2315" style="margin-right: 0.05em;">ℓ</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2316" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2317">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2318">i</span><span class="MJXp-mo" id="MJXp-Span-2319">)</span></span></span></span><span class="MJXp-mtd" id="MJXp-Span-2320" style="padding-left: 0em; padding-top: 0.3em; text-align: left;"><span class="MJXp-mi" id="MJXp-Span-2321"></span><span class="MJXp-mo" id="MJXp-Span-2322" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-msubsup" id="MJXp-Span-2323"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2324" style="margin-right: 0.05em;">W</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2325" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2326">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2327">l</span><span class="MJXp-mo" id="MJXp-Span-2328">]</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2329">T</span></span></span><span class="MJXp-msubsup" id="MJXp-Span-2330"><span class="MJXp-mi" id="MJXp-Span-2331" style="margin-right: 0.05em;">∇</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2332" style="vertical-align: -0.4em;"><span class="MJXp-msubsup" id="MJXp-Span-2333"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2334" style="margin-right: 0.05em;">z</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2335" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2336">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2337">l</span><span class="MJXp-mo" id="MJXp-Span-2338">]</span></span></span></span></span><span class="MJXp-msubsup" id="MJXp-Span-2339"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2340" style="margin-right: 0.05em;">ℓ</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2341" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2342">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2343">i</span><span class="MJXp-mo" id="MJXp-Span-2344">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-2345" style="margin-left: 0em; margin-right: 0.222em;">.</span></span></span></span></span></span></span><span class="mjx-chtml MJXc-display MJXc-processing"><span id="MathJax-Element-129-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0"></span></span><script type="math/tex; mode=display" id="MathJax-Element-129">% <![CDATA[
\begin{aligned}
\nabla_{z^{[l]}} \ell^{(i)} &= \nabla_{a^{[l]}} \ell^{(i)} * (g^{[l]})'(z^{[l]}) \\
\nabla_{b^{[l]}} \ell^{(i)} &= \nabla_{z^{[l]}} \ell^{(i)} \\
\nabla_{W^{[l]}} \ell^{(i)} &= \nabla_{z^{[l]}} \ell^{(i)} a^{[l-1]T} \\
\nabla_{a^{[l-1]}} \ell^{(i)} &= W^{[l]T} \nabla_{z^{[l]}} \ell^{(i)}.
\end{aligned} %]]></script>
<p>As before, we consider a batch consisting of the first <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-2346"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2347">m</span></span></span><span id="MathJax-Element-130-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-130">m</script> observations and let <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-2348"><span class="MJXp-msubsup" id="MJXp-Span-2349"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2350" style="margin-right: 0.05em;">Z</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2351" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2352">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2353">l</span><span class="MJXp-mo" id="MJXp-Span-2354">]</span></span></span></span></span><span id="MathJax-Element-131-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-131">Z^{[l]}</script> and <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-2355"><span class="MJXp-msubsup" id="MJXp-Span-2356"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2357" style="margin-right: 0.05em;">A</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2358" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2359">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2360">l</span><span class="MJXp-mo" id="MJXp-Span-2361">]</span></span></span></span></span><span id="MathJax-Element-132-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-132">A^{[l]}</script> be matrices whose <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-2362"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2363">i</span></span></span><span id="MathJax-Element-133-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-133">i</script>th columns are the network values in layer <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-2364"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2365">l</span></span></span><span id="MathJax-Element-134-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-134">l</script> evaluated on the <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-2366"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2367">i</span></span></span><span id="MathJax-Element-135-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-135">i</script>th observation.</p>
<p>We also let</p>
<span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math MJXp-display" id="MJXp-Span-2368"><span class="MJXp-mtable" id="MJXp-Span-2369"><span><span class="MJXp-mtr" id="MJXp-Span-2370" style="vertical-align: baseline;"><span class="MJXp-mtd" id="MJXp-Span-2371" style="text-align: right;"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2372">d</span><span class="MJXp-msubsup" id="MJXp-Span-2373"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2374" style="margin-right: 0.05em;">Z</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2375" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2376">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2377">l</span><span class="MJXp-mo" id="MJXp-Span-2378">]</span></span></span><span class="MJXp-mo" id="MJXp-Span-2379" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-mrow" id="MJXp-Span-2380"><span class="MJXp-mo" id="MJXp-Span-2381" style="margin-left: 0em; margin-right: 0em; vertical-align: -0.267em;"><span class="MJXp-right MJXp-scale7" style="font-size: 2.067em; margin-left: -0.05em;">[</span></span><span class="MJXp-mtable" id="MJXp-Span-2382"><span><span class="MJXp-mtr" id="MJXp-Span-2383" style="vertical-align: baseline;"><span class="MJXp-mtd" id="MJXp-Span-2384" style="text-align: center;"><span class="MJXp-msubsup" id="MJXp-Span-2385"><span class="MJXp-mi" id="MJXp-Span-2386" style="margin-right: 0.05em;">∇</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2387" style="vertical-align: -0.4em;"><span class="MJXp-msubsup" id="MJXp-Span-2388"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2389" style="margin-right: 0.05em;">z</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2390" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2391">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2392">l</span><span class="MJXp-mo" id="MJXp-Span-2393">]</span><span class="MJXp-mo" id="MJXp-Span-2394">(</span><span class="MJXp-mn" id="MJXp-Span-2395">1</span><span class="MJXp-mo" id="MJXp-Span-2396">)</span></span></span></span></span><span class="MJXp-msubsup" id="MJXp-Span-2397"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2398" style="margin-right: 0.05em;">ℓ</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2399" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2400">(</span><span class="MJXp-mn" id="MJXp-Span-2401">1</span><span class="MJXp-mo" id="MJXp-Span-2402">)</span></span></span></span><span class="MJXp-mtd" id="MJXp-Span-2403" style="padding-left: 1em; text-align: center;"><span class="MJXp-msubsup" id="MJXp-Span-2404"><span class="MJXp-mi" id="MJXp-Span-2405" style="margin-right: 0.05em;">∇</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2406" style="vertical-align: -0.4em;"><span class="MJXp-msubsup" id="MJXp-Span-2407"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2408" style="margin-right: 0.05em;">z</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2409" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2410">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2411">l</span><span class="MJXp-mo" id="MJXp-Span-2412">]</span><span class="MJXp-mo" id="MJXp-Span-2413">(</span><span class="MJXp-mn" id="MJXp-Span-2414">2</span><span class="MJXp-mo" id="MJXp-Span-2415">)</span></span></span></span></span><span class="MJXp-msubsup" id="MJXp-Span-2416"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2417" style="margin-right: 0.05em;">ℓ</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2418" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2419">(</span><span class="MJXp-mn" id="MJXp-Span-2420">2</span><span class="MJXp-mo" id="MJXp-Span-2421">)</span></span></span></span><span class="MJXp-mtd" id="MJXp-Span-2422" style="padding-left: 1em; text-align: center;"><span class="MJXp-mo" id="MJXp-Span-2423" style="margin-left: 0em; margin-right: 0em;">…</span></span><span class="MJXp-mtd" id="MJXp-Span-2424" style="padding-left: 1em; text-align: center;"><span class="MJXp-msubsup" id="MJXp-Span-2425"><span class="MJXp-mi" id="MJXp-Span-2426" style="margin-right: 0.05em;">∇</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2427" style="vertical-align: -0.4em;"><span class="MJXp-msubsup" id="MJXp-Span-2428"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2429" style="margin-right: 0.05em;">z</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2430" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2431">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2432">l</span><span class="MJXp-mo" id="MJXp-Span-2433">]</span><span class="MJXp-mo" id="MJXp-Span-2434">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2435">m</span><span class="MJXp-mo" id="MJXp-Span-2436">)</span></span></span></span></span><span class="MJXp-msubsup" id="MJXp-Span-2437"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2438" style="margin-right: 0.05em;">ℓ</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2439" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2440">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2441">m</span><span class="MJXp-mo" id="MJXp-Span-2442">)</span></span></span></span></span></span></span><span class="MJXp-mo" id="MJXp-Span-2443" style="margin-left: 0em; margin-right: 0em; vertical-align: -0.267em;"><span class="MJXp-right MJXp-scale7" style="font-size: 2.067em; margin-left: -0.05em;">]</span></span></span></span></span><span class="MJXp-mtr" id="MJXp-Span-2444" style="vertical-align: baseline;"><span class="MJXp-mtd" id="MJXp-Span-2445" style="padding-top: 0.3em; text-align: right;"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2446">d</span><span class="MJXp-msubsup" id="MJXp-Span-2447"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2448" style="margin-right: 0.05em;">A</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2449" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2450">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2451">l</span><span class="MJXp-mo" id="MJXp-Span-2452">]</span></span></span><span class="MJXp-mo" id="MJXp-Span-2453" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-mrow" id="MJXp-Span-2454"><span class="MJXp-mo" id="MJXp-Span-2455" style="margin-left: 0em; margin-right: 0em; vertical-align: -0.267em;"><span class="MJXp-right MJXp-scale7" style="font-size: 2.067em; margin-left: -0.05em;">[</span></span><span class="MJXp-mtable" id="MJXp-Span-2456"><span><span class="MJXp-mtr" id="MJXp-Span-2457" style="vertical-align: baseline;"><span class="MJXp-mtd" id="MJXp-Span-2458" style="text-align: center;"><span class="MJXp-msubsup" id="MJXp-Span-2459"><span class="MJXp-mi" id="MJXp-Span-2460" style="margin-right: 0.05em;">∇</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2461" style="vertical-align: -0.4em;"><span class="MJXp-msubsup" id="MJXp-Span-2462"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2463" style="margin-right: 0.05em;">a</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2464" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2465">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2466">l</span><span class="MJXp-mo" id="MJXp-Span-2467">]</span><span class="MJXp-mo" id="MJXp-Span-2468">(</span><span class="MJXp-mn" id="MJXp-Span-2469">1</span><span class="MJXp-mo" id="MJXp-Span-2470">)</span></span></span></span></span><span class="MJXp-msubsup" id="MJXp-Span-2471"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2472" style="margin-right: 0.05em;">ℓ</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2473" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2474">(</span><span class="MJXp-mn" id="MJXp-Span-2475">1</span><span class="MJXp-mo" id="MJXp-Span-2476">)</span></span></span></span><span class="MJXp-mtd" id="MJXp-Span-2477" style="padding-left: 1em; text-align: center;"><span class="MJXp-msubsup" id="MJXp-Span-2478"><span class="MJXp-mi" id="MJXp-Span-2479" style="margin-right: 0.05em;">∇</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2480" style="vertical-align: -0.4em;"><span class="MJXp-msubsup" id="MJXp-Span-2481"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2482" style="margin-right: 0.05em;">a</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2483" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2484">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2485">l</span><span class="MJXp-mo" id="MJXp-Span-2486">]</span><span class="MJXp-mo" id="MJXp-Span-2487">(</span><span class="MJXp-mn" id="MJXp-Span-2488">2</span><span class="MJXp-mo" id="MJXp-Span-2489">)</span></span></span></span></span><span class="MJXp-msubsup" id="MJXp-Span-2490"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2491" style="margin-right: 0.05em;">ℓ</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2492" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2493">(</span><span class="MJXp-mn" id="MJXp-Span-2494">2</span><span class="MJXp-mo" id="MJXp-Span-2495">)</span></span></span></span><span class="MJXp-mtd" id="MJXp-Span-2496" style="padding-left: 1em; text-align: center;"><span class="MJXp-mo" id="MJXp-Span-2497" style="margin-left: 0em; margin-right: 0em;">…</span></span><span class="MJXp-mtd" id="MJXp-Span-2498" style="padding-left: 1em; text-align: center;"><span class="MJXp-msubsup" id="MJXp-Span-2499"><span class="MJXp-mi" id="MJXp-Span-2500" style="margin-right: 0.05em;">∇</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2501" style="vertical-align: -0.4em;"><span class="MJXp-msubsup" id="MJXp-Span-2502"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2503" style="margin-right: 0.05em;">a</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2504" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2505">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2506">l</span><span class="MJXp-mo" id="MJXp-Span-2507">]</span><span class="MJXp-mo" id="MJXp-Span-2508">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2509">m</span><span class="MJXp-mo" id="MJXp-Span-2510">)</span></span></span></span></span><span class="MJXp-msubsup" id="MJXp-Span-2511"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2512" style="margin-right: 0.05em;">ℓ</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2513" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2514">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2515">m</span><span class="MJXp-mo" id="MJXp-Span-2516">)</span></span></span></span></span></span></span><span class="MJXp-mo" id="MJXp-Span-2517" style="margin-left: 0em; margin-right: 0em; vertical-align: -0.267em;"><span class="MJXp-right MJXp-scale7" style="font-size: 2.067em; margin-left: -0.05em;">]</span></span></span></span></span></span></span></span></span><span class="mjx-chtml MJXc-display MJXc-processing"><span id="MathJax-Element-136-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0"></span></span><script type="math/tex; mode=display" id="MathJax-Element-136">% <![CDATA[
\begin{aligned}
dZ^{[l]} = \begin{bmatrix} \nabla_{z^{[l](1)}} \ell^{(1)} & \nabla_{z^{[l](2)}} \ell^{(2)} & \ldots & \nabla_{z^{[l](m)}} \ell^{(m)} \end{bmatrix} \\
dA^{[l]} = \begin{bmatrix} \nabla_{a^{[l](1)}} \ell^{(1)} & \nabla_{a^{[l](2)}} \ell^{(2)} & \ldots & \nabla_{a^{[l](m)}} \ell^{(m)} \end{bmatrix}
\end{aligned} %]]></script>
<p>be a matrices of gradients. Notice that each column is a gradient of a different function <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-2518"><span class="MJXp-msubsup" id="MJXp-Span-2519"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2520" style="margin-right: 0.05em;">ℓ</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2521" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2522">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2523">i</span><span class="MJXp-mo" id="MJXp-Span-2524">)</span></span></span></span></span><span id="MathJax-Element-137-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-137">\ell^{(i)}</script>.</p>
<p>From the single observation case and the definitions of <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-2525"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2526">d</span><span class="MJXp-msubsup" id="MJXp-Span-2527"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2528" style="margin-right: 0.05em;">Z</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2529" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2530">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2531">l</span><span class="MJXp-mo" id="MJXp-Span-2532">]</span></span></span></span></span><span id="MathJax-Element-138-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-138">dZ^{[l]}</script> and <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-2533"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2534">d</span><span class="MJXp-msubsup" id="MJXp-Span-2535"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2536" style="margin-right: 0.05em;">A</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2537" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2538">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2539">l</span><span class="MJXp-mo" id="MJXp-Span-2540">]</span></span></span></span></span><span id="MathJax-Element-139-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-139">dA^{[l]}</script>, it is immediately clear that</p>
<span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math MJXp-display" id="MJXp-Span-2541"><span class="MJXp-mtable" id="MJXp-Span-2542"><span><span class="MJXp-mtr" id="MJXp-Span-2543" style="vertical-align: baseline;"><span class="MJXp-mtd" id="MJXp-Span-2544" style="text-align: right;"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2545">d</span><span class="MJXp-msubsup" id="MJXp-Span-2546"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2547" style="margin-right: 0.05em;">Z</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2548" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2549">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2550">l</span><span class="MJXp-mo" id="MJXp-Span-2551">]</span></span></span></span><span class="MJXp-mtd" id="MJXp-Span-2552" style="padding-left: 0em; text-align: left;"><span class="MJXp-mi" id="MJXp-Span-2553"></span><span class="MJXp-mo" id="MJXp-Span-2554" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2555">d</span><span class="MJXp-msubsup" id="MJXp-Span-2556"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2557" style="margin-right: 0.05em;">A</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2558" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2559">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2560">l</span><span class="MJXp-mo" id="MJXp-Span-2561">]</span></span></span><span class="MJXp-mo" id="MJXp-Span-2562" style="margin-left: 0.267em; margin-right: 0.267em;">∗</span><span class="MJXp-mo" id="MJXp-Span-2563" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-msubsup" id="MJXp-Span-2564"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2565" style="margin-right: 0.05em;">g</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2566" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2567">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2568">l</span><span class="MJXp-mo" id="MJXp-Span-2569">]</span></span></span><span class="MJXp-msup" id="MJXp-Span-2570"><span class="MJXp-mo" id="MJXp-Span-2571" style="margin-left: 0em; margin-right: 0.05em;">)</span><span class="MJXp-mo MJXp-script" id="MJXp-Span-2572" style="vertical-align: 0.5em;">′</span></span><span class="MJXp-mo" id="MJXp-Span-2573" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-msubsup" id="MJXp-Span-2574"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2575" style="margin-right: 0.05em;">Z</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2576" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2577">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2578">l</span><span class="MJXp-mo" id="MJXp-Span-2579">]</span></span></span><span class="MJXp-mo" id="MJXp-Span-2580" style="margin-left: 0em; margin-right: 0em;">)</span></span></span><span class="MJXp-mtr" id="MJXp-Span-2581" style="vertical-align: baseline;"><span class="MJXp-mtd" id="MJXp-Span-2582" style="padding-top: 0.3em; text-align: right;"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2583">d</span><span class="MJXp-msubsup" id="MJXp-Span-2584"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2585" style="margin-right: 0.05em;">A</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2586" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2587">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2588">l</span><span class="MJXp-mo" id="MJXp-Span-2589">−</span><span class="MJXp-mn" id="MJXp-Span-2590">1</span><span class="MJXp-mo" id="MJXp-Span-2591">]</span></span></span></span><span class="MJXp-mtd" id="MJXp-Span-2592" style="padding-left: 0em; padding-top: 0.3em; text-align: left;"><span class="MJXp-mi" id="MJXp-Span-2593"></span><span class="MJXp-mo" id="MJXp-Span-2594" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-msubsup" id="MJXp-Span-2595"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2596" style="margin-right: 0.05em;">W</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2597" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2598">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2599">l</span><span class="MJXp-mo" id="MJXp-Span-2600">]</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2601">T</span></span></span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2602">d</span><span class="MJXp-msubsup" id="MJXp-Span-2603"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2604" style="margin-right: 0.05em;">Z</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2605" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2606">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2607">l</span><span class="MJXp-mo" id="MJXp-Span-2608">]</span></span></span><span class="MJXp-mo" id="MJXp-Span-2609" style="margin-left: 0em; margin-right: 0.222em;">.</span></span></span></span></span></span></span><span class="mjx-chtml MJXc-display MJXc-processing"><span id="MathJax-Element-140-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0"></span></span><script type="math/tex; mode=display" id="MathJax-Element-140">% <![CDATA[
\begin{aligned}
dZ^{[l]} &= dA^{[l]} * (g^{[l]})'(Z^{[l]}) \\
dA^{[l-1]} &= W^{[l]T} dZ^{[l]}.
\end{aligned} %]]></script>
<p>For the parameters <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-2610"><span class="MJXp-msubsup" id="MJXp-Span-2611"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2612" style="margin-right: 0.05em;">b</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2613" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2614">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2615">l</span><span class="MJXp-mo" id="MJXp-Span-2616">]</span></span></span></span></span><span id="MathJax-Element-141-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-141">b^{[l]}</script> and <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-2617"><span class="MJXp-msubsup" id="MJXp-Span-2618"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2619" style="margin-right: 0.05em;">W</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2620" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2621">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2622">l</span><span class="MJXp-mo" id="MJXp-Span-2623">]</span></span></span></span></span><span id="MathJax-Element-142-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-142">W^{[l]}</script>, we are interested in the derivatives of the average batch loss <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-2624"><span class="MJXp-msubsup" id="MJXp-Span-2625"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2626" style="margin-right: 0.05em;">ℓ</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2627" style="vertical-align: 0.5em;"><span class="MJXp-mtext" id="MJXp-Span-2628">batch</span></span></span><span class="MJXp-mo" id="MJXp-Span-2629" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-mfrac" id="MJXp-Span-2630" style="vertical-align: 0.25em;"><span class="MJXp-box MJXp-script"><span class="MJXp-mn" id="MJXp-Span-2631">1</span></span><span class="MJXp-box" style="margin-top: -0.9em;"><span class="MJXp-denom"><span><span class="MJXp-rule" style="height: 1em; border-top: none; border-bottom: 1px solid; margin: 0.1em 0px;"></span></span><span><span class="MJXp-box MJXp-script"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2632">m</span></span></span></span></span></span><span class="MJXp-mo" id="MJXp-Span-2633" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-msubsup" id="MJXp-Span-2634"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2635" style="margin-right: 0.05em;">ℓ</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2636" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2637">(</span><span class="MJXp-mn" id="MJXp-Span-2638">1</span><span class="MJXp-mo" id="MJXp-Span-2639">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-2640" style="margin-left: 0.267em; margin-right: 0.267em;">+</span><span class="MJXp-mo" id="MJXp-Span-2641" style="margin-left: 0em; margin-right: 0em;">…</span><span class="MJXp-mo" id="MJXp-Span-2642" style="margin-left: 0.267em; margin-right: 0.267em;">+</span><span class="MJXp-msubsup" id="MJXp-Span-2643"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2644" style="margin-right: 0.05em;">ℓ</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2645" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2646">(</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2647">m</span><span class="MJXp-mo" id="MJXp-Span-2648">)</span></span></span><span class="MJXp-mo" id="MJXp-Span-2649" style="margin-left: 0em; margin-right: 0em;">)</span></span></span><span id="MathJax-Element-143-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-143">\ell^{\text{batch}} = \frac{1}{m} (\ell^{(1)} + \ldots + \ell^{(m)})</script>:</p>
<span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math MJXp-display" id="MJXp-Span-2650"><span class="MJXp-mtable" id="MJXp-Span-2651"><span><span class="MJXp-mtr" id="MJXp-Span-2652" style="vertical-align: baseline;"><span class="MJXp-mtd" id="MJXp-Span-2653" style="text-align: right;"><span class="MJXp-msubsup" id="MJXp-Span-2654"><span class="MJXp-mi" id="MJXp-Span-2655" style="margin-right: 0.05em;">∇</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2656" style="vertical-align: -0.4em;"><span class="MJXp-msubsup" id="MJXp-Span-2657"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2658" style="margin-right: 0.05em;">b</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2659" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2660">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2661">l</span><span class="MJXp-mo" id="MJXp-Span-2662">]</span></span></span></span></span><span class="MJXp-msubsup" id="MJXp-Span-2663"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2664" style="margin-right: 0.05em;">ℓ</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2665" style="vertical-align: 0.5em;"><span class="MJXp-mtext" id="MJXp-Span-2666">batch</span></span></span></span><span class="MJXp-mtd" id="MJXp-Span-2667" style="padding-left: 0em; text-align: left;"><span class="MJXp-mi" id="MJXp-Span-2668"></span><span class="MJXp-mo" id="MJXp-Span-2669" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-mfrac" id="MJXp-Span-2670" style="vertical-align: 0.25em;"><span class="MJXp-box"><span class="MJXp-mn" id="MJXp-Span-2671">1</span></span><span class="MJXp-box" style="margin-top: -0.9em;"><span class="MJXp-denom"><span><span class="MJXp-rule" style="height: 1em; border-top: none; border-bottom: 1px solid; margin: 0.1em 0px;"></span></span><span><span class="MJXp-box"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2672">m</span></span></span></span></span></span><span class="MJXp-mtext" id="MJXp-Span-2673">rowSum</span><span class="MJXp-mrow" id="MJXp-Span-2674"><span class="MJXp-mo" id="MJXp-Span-2675" style="margin-left: 0em; margin-right: 0em; vertical-align: -0.289em;"><span class="MJXp-right MJXp-scale6" style="font-size: 2.156em; margin-left: -0.09em;">(</span></span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2676">d</span><span class="MJXp-msubsup" id="MJXp-Span-2677"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2678" style="margin-right: 0.05em;">Z</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2679" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2680">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2681">l</span><span class="MJXp-mo" id="MJXp-Span-2682">]</span></span></span><span class="MJXp-mo" id="MJXp-Span-2683" style="margin-left: 0em; margin-right: 0em; vertical-align: -0.289em;"><span class="MJXp-right MJXp-scale6" style="font-size: 2.156em; margin-left: -0.09em;">)</span></span></span></span></span><span class="MJXp-mtr" id="MJXp-Span-2684" style="vertical-align: baseline;"><span class="MJXp-mtd" id="MJXp-Span-2685" style="padding-top: 0.3em; text-align: right;"><span class="MJXp-msubsup" id="MJXp-Span-2686"><span class="MJXp-mi" id="MJXp-Span-2687" style="margin-right: 0.05em;">∇</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2688" style="vertical-align: -0.4em;"><span class="MJXp-msubsup" id="MJXp-Span-2689"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2690" style="margin-right: 0.05em;">W</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2691" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2692">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2693">l</span><span class="MJXp-mo" id="MJXp-Span-2694">]</span></span></span></span></span><span class="MJXp-msubsup" id="MJXp-Span-2695"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2696" style="margin-right: 0.05em;">ℓ</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2697" style="vertical-align: 0.5em;"><span class="MJXp-mtext" id="MJXp-Span-2698">batch</span></span></span></span><span class="MJXp-mtd" id="MJXp-Span-2699" style="padding-left: 0em; padding-top: 0.3em; text-align: left;"><span class="MJXp-mi" id="MJXp-Span-2700"></span><span class="MJXp-mo" id="MJXp-Span-2701" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-mfrac" id="MJXp-Span-2702" style="vertical-align: 0.25em;"><span class="MJXp-box"><span class="MJXp-mn" id="MJXp-Span-2703">1</span></span><span class="MJXp-box" style="margin-top: -0.9em;"><span class="MJXp-denom"><span><span class="MJXp-rule" style="height: 1em; border-top: none; border-bottom: 1px solid; margin: 0.1em 0px;"></span></span><span><span class="MJXp-box"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2704">m</span></span></span></span></span></span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2705">d</span><span class="MJXp-msubsup" id="MJXp-Span-2706"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2707" style="margin-right: 0.05em;">Z</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2708" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2709">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2710">l</span><span class="MJXp-mo" id="MJXp-Span-2711">]</span></span></span><span class="MJXp-msubsup" id="MJXp-Span-2712"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2713" style="margin-right: 0.05em;">A</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2714" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2715">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2716">l</span><span class="MJXp-mo" id="MJXp-Span-2717">−</span><span class="MJXp-mn" id="MJXp-Span-2718">1</span><span class="MJXp-mo" id="MJXp-Span-2719">]</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2720">T</span></span></span><span class="MJXp-mo" id="MJXp-Span-2721" style="margin-left: 0em; margin-right: 0.222em;">.</span></span></span></span></span></span></span><span class="mjx-chtml MJXc-display MJXc-processing"><span id="MathJax-Element-144-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0"></span></span><script type="math/tex; mode=display" id="MathJax-Element-144">% <![CDATA[
\begin{aligned}
\nabla_{b^{[l]}} \ell^{\text{batch}} &= \frac{1}{m} \text{rowSum}\left( dZ^{[l]} \right) \\
\nabla_{W^{[l]}} \ell^{\text{batch}} &= \frac{1}{m} dZ^{[l]} A^{[l-1]T}.
\end{aligned} %]]></script>
<p>(To see the second equation, consider writing the matrix multiplication as an outer product expansion.) Summarizing the batch back propagation equations, we have:</p>
<span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math MJXp-display" id="MJXp-Span-2722"><span class="MJXp-mtable" id="MJXp-Span-2723"><span><span class="MJXp-mtr" id="MJXp-Span-2724" style="vertical-align: baseline;"><span class="MJXp-mtd" id="MJXp-Span-2725" style="text-align: right;"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2726">d</span><span class="MJXp-msubsup" id="MJXp-Span-2727"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2728" style="margin-right: 0.05em;">Z</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2729" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2730">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2731">l</span><span class="MJXp-mo" id="MJXp-Span-2732">]</span></span></span></span><span class="MJXp-mtd" id="MJXp-Span-2733" style="padding-left: 0em; text-align: left;"><span class="MJXp-mi" id="MJXp-Span-2734"></span><span class="MJXp-mo" id="MJXp-Span-2735" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2736">d</span><span class="MJXp-msubsup" id="MJXp-Span-2737"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2738" style="margin-right: 0.05em;">A</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2739" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2740">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2741">l</span><span class="MJXp-mo" id="MJXp-Span-2742">]</span></span></span><span class="MJXp-mo" id="MJXp-Span-2743" style="margin-left: 0.267em; margin-right: 0.267em;">∗</span><span class="MJXp-mo" id="MJXp-Span-2744" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-msubsup" id="MJXp-Span-2745"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2746" style="margin-right: 0.05em;">g</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2747" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2748">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2749">l</span><span class="MJXp-mo" id="MJXp-Span-2750">]</span></span></span><span class="MJXp-msup" id="MJXp-Span-2751"><span class="MJXp-mo" id="MJXp-Span-2752" style="margin-left: 0em; margin-right: 0.05em;">)</span><span class="MJXp-mo MJXp-script" id="MJXp-Span-2753" style="vertical-align: 0.5em;">′</span></span><span class="MJXp-mo" id="MJXp-Span-2754" style="margin-left: 0em; margin-right: 0em;">(</span><span class="MJXp-msubsup" id="MJXp-Span-2755"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2756" style="margin-right: 0.05em;">Z</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2757" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2758">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2759">l</span><span class="MJXp-mo" id="MJXp-Span-2760">]</span></span></span><span class="MJXp-mo" id="MJXp-Span-2761" style="margin-left: 0em; margin-right: 0em;">)</span></span></span><span class="MJXp-mtr" id="MJXp-Span-2762" style="vertical-align: baseline;"><span class="MJXp-mtd" id="MJXp-Span-2763" style="padding-top: 0.3em; text-align: right;"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2764">d</span><span class="MJXp-msubsup" id="MJXp-Span-2765"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2766" style="margin-right: 0.05em;">A</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2767" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2768">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2769">l</span><span class="MJXp-mo" id="MJXp-Span-2770">−</span><span class="MJXp-mn" id="MJXp-Span-2771">1</span><span class="MJXp-mo" id="MJXp-Span-2772">]</span></span></span></span><span class="MJXp-mtd" id="MJXp-Span-2773" style="padding-left: 0em; padding-top: 0.3em; text-align: left;"><span class="MJXp-mi" id="MJXp-Span-2774"></span><span class="MJXp-mo" id="MJXp-Span-2775" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-msubsup" id="MJXp-Span-2776"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2777" style="margin-right: 0.05em;">W</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2778" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2779">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2780">l</span><span class="MJXp-mo" id="MJXp-Span-2781">]</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2782">T</span></span></span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2783">d</span><span class="MJXp-msubsup" id="MJXp-Span-2784"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2785" style="margin-right: 0.05em;">Z</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2786" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2787">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2788">l</span><span class="MJXp-mo" id="MJXp-Span-2789">]</span></span></span></span></span><span class="MJXp-mtr" id="MJXp-Span-2790" style="vertical-align: baseline;"><span class="MJXp-mtd" id="MJXp-Span-2791" style="padding-top: 0.3em; text-align: right;"><span class="MJXp-msubsup" id="MJXp-Span-2792"><span class="MJXp-mi" id="MJXp-Span-2793" style="margin-right: 0.05em;">∇</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2794" style="vertical-align: -0.4em;"><span class="MJXp-msubsup" id="MJXp-Span-2795"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2796" style="margin-right: 0.05em;">b</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2797" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2798">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2799">l</span><span class="MJXp-mo" id="MJXp-Span-2800">]</span></span></span></span></span><span class="MJXp-msubsup" id="MJXp-Span-2801"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2802" style="margin-right: 0.05em;">ℓ</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2803" style="vertical-align: 0.5em;"><span class="MJXp-mtext" id="MJXp-Span-2804">batch</span></span></span></span><span class="MJXp-mtd" id="MJXp-Span-2805" style="padding-left: 0em; padding-top: 0.3em; text-align: left;"><span class="MJXp-mi" id="MJXp-Span-2806"></span><span class="MJXp-mo" id="MJXp-Span-2807" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-mfrac" id="MJXp-Span-2808" style="vertical-align: 0.25em;"><span class="MJXp-box"><span class="MJXp-mn" id="MJXp-Span-2809">1</span></span><span class="MJXp-box" style="margin-top: -0.9em;"><span class="MJXp-denom"><span><span class="MJXp-rule" style="height: 1em; border-top: none; border-bottom: 1px solid; margin: 0.1em 0px;"></span></span><span><span class="MJXp-box"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2810">m</span></span></span></span></span></span><span class="MJXp-mtext" id="MJXp-Span-2811">rowSum</span><span class="MJXp-mrow" id="MJXp-Span-2812"><span class="MJXp-mo" id="MJXp-Span-2813" style="margin-left: 0em; margin-right: 0em; vertical-align: -0.289em;"><span class="MJXp-right MJXp-scale6" style="font-size: 2.156em; margin-left: -0.09em;">(</span></span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2814">d</span><span class="MJXp-msubsup" id="MJXp-Span-2815"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2816" style="margin-right: 0.05em;">Z</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2817" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2818">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2819">l</span><span class="MJXp-mo" id="MJXp-Span-2820">]</span></span></span><span class="MJXp-mo" id="MJXp-Span-2821" style="margin-left: 0em; margin-right: 0em; vertical-align: -0.289em;"><span class="MJXp-right MJXp-scale6" style="font-size: 2.156em; margin-left: -0.09em;">)</span></span></span></span></span><span class="MJXp-mtr" id="MJXp-Span-2822" style="vertical-align: baseline;"><span class="MJXp-mtd" id="MJXp-Span-2823" style="padding-top: 0.3em; text-align: right;"><span class="MJXp-msubsup" id="MJXp-Span-2824"><span class="MJXp-mi" id="MJXp-Span-2825" style="margin-right: 0.05em;">∇</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2826" style="vertical-align: -0.4em;"><span class="MJXp-msubsup" id="MJXp-Span-2827"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2828" style="margin-right: 0.05em;">W</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2829" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2830">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2831">l</span><span class="MJXp-mo" id="MJXp-Span-2832">]</span></span></span></span></span><span class="MJXp-msubsup" id="MJXp-Span-2833"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2834" style="margin-right: 0.05em;">ℓ</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2835" style="vertical-align: 0.5em;"><span class="MJXp-mtext" id="MJXp-Span-2836">batch</span></span></span></span><span class="MJXp-mtd" id="MJXp-Span-2837" style="padding-left: 0em; padding-top: 0.3em; text-align: left;"><span class="MJXp-mi" id="MJXp-Span-2838"></span><span class="MJXp-mo" id="MJXp-Span-2839" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-mfrac" id="MJXp-Span-2840" style="vertical-align: 0.25em;"><span class="MJXp-box"><span class="MJXp-mn" id="MJXp-Span-2841">1</span></span><span class="MJXp-box" style="margin-top: -0.9em;"><span class="MJXp-denom"><span><span class="MJXp-rule" style="height: 1em; border-top: none; border-bottom: 1px solid; margin: 0.1em 0px;"></span></span><span><span class="MJXp-box"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2842">m</span></span></span></span></span></span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2843">d</span><span class="MJXp-msubsup" id="MJXp-Span-2844"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2845" style="margin-right: 0.05em;">Z</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2846" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2847">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2848">l</span><span class="MJXp-mo" id="MJXp-Span-2849">]</span></span></span><span class="MJXp-msubsup" id="MJXp-Span-2850"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2851" style="margin-right: 0.05em;">A</span><span class="MJXp-mrow MJXp-script" id="MJXp-Span-2852" style="vertical-align: 0.5em;"><span class="MJXp-mo" id="MJXp-Span-2853">[</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2854">l</span><span class="MJXp-mo" id="MJXp-Span-2855">−</span><span class="MJXp-mn" id="MJXp-Span-2856">1</span><span class="MJXp-mo" id="MJXp-Span-2857">]</span><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2858">T</span></span></span><span class="MJXp-mo" id="MJXp-Span-2859" style="margin-left: 0em; margin-right: 0.222em;">.</span></span></span></span></span></span></span><span class="mjx-chtml MJXc-display MJXc-processing"><span id="MathJax-Element-145-Frame" class="mjx-chtml MathJax_CHTML" tabindex="0"></span></span><script type="math/tex; mode=display" id="MathJax-Element-145">% <![CDATA[
\begin{aligned}
dZ^{[l]} &= dA^{[l]} * (g^{[l]})'(Z^{[l]}) \\
dA^{[l-1]} &= W^{[l]T} dZ^{[l]} \\
\nabla_{b^{[l]}} \ell^{\text{batch}} &= \frac{1}{m} \text{rowSum}\left( dZ^{[l]} \right) \\
\nabla_{W^{[l]}} \ell^{\text{batch}} &= \frac{1}{m} dZ^{[l]} A^{[l-1]T}.
\end{aligned} %]]></script>
<h2 id="code">Code</h2>
<p>This section goes over implementing a neural network in Python/NumPy. The code is fairly comprehensive, but is intended for pedagogical (not production) purposes. As such, things like error checking and unit tests (e.g., gradient checking with finite differences) are not implemented. The code supports</p>
<ul>
<li>Arbitrary feed forward architectures</li>
<li>Input normalization</li>
<li>Arbitrary loss and activation functions</li>
<li>Batch gradient descent</li>
<li>L2 regularization</li>
</ul>
<p>The code does not support:</p>
<ul>
<li>Batch normalization</li>
<li>Momentum</li>
<li>Adam</li>
<li>Dropout</li>
<li>Normalizing input in a streaming fashion (as opposed to computing <code class="highlighter-rouge">mu</code> and <code class="highlighter-rouge">sig</code> over the entire dataset)</li>
<li>Automatric learning rate decay and early stopping using a validation set</li>
</ul>
<h3 id="activation-functions">Activation functions</h3>
<p>An activation function has a signature like</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">activationFunction</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">return_derivative</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
</code></pre></div></div>
<p>and returns a dict with keys <code class="highlighter-rouge">value</code> and <code class="highlighter-rouge">derivative</code> (if returned). The input <code class="highlighter-rouge">x</code> is a numpy array. The outputs <code class="highlighter-rouge">value</code> and <code class="highlighter-rouge">derivative</code> are also numpy arrays with the same shape as <code class="highlighter-rouge">x</code>. Below are implementations of some common activation functions.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">tanh</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">return_derivative</span><span class="o">=</span><span class="bp">True</span><span class="p">):</span>
<span class="n">r</span> <span class="o">=</span> <span class="nb">dict</span><span class="p">()</span>
<span class="n">a</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">tanh</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
<span class="n">r</span><span class="p">[</span><span class="s">"value"</span><span class="p">]</span> <span class="o">=</span> <span class="n">a</span>
<span class="k">if</span> <span class="n">return_derivative</span><span class="p">:</span>
<span class="n">g</span> <span class="o">=</span> <span class="mi">1</span> <span class="o">-</span> <span class="n">np</span><span class="o">.</span><span class="n">power</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="mi">2</span><span class="p">)</span>
<span class="n">r</span><span class="p">[</span><span class="s">"derivative"</span><span class="p">]</span> <span class="o">=</span> <span class="n">g</span>
<span class="k">return</span> <span class="n">r</span>
<span class="k">def</span> <span class="nf">sigmoid</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">return_derivative</span><span class="o">=</span><span class="bp">True</span><span class="p">):</span>
<span class="n">r</span> <span class="o">=</span> <span class="nb">dict</span><span class="p">()</span>
<span class="n">a</span> <span class="o">=</span> <span class="mi">1</span> <span class="o">/</span> <span class="p">(</span><span class="mi">1</span> <span class="o">+</span> <span class="n">np</span><span class="o">.</span><span class="n">exp</span><span class="p">(</span><span class="o">-</span><span class="n">x</span><span class="p">))</span>
<span class="n">r</span><span class="p">[</span><span class="s">"value"</span><span class="p">]</span> <span class="o">=</span> <span class="n">a</span>
<span class="k">if</span> <span class="n">return_derivative</span><span class="p">:</span>
<span class="n">g</span> <span class="o">=</span> <span class="n">a</span> <span class="o">*</span> <span class="p">(</span><span class="mi">1</span> <span class="o">-</span> <span class="n">a</span><span class="p">)</span>
<span class="n">r</span><span class="p">[</span><span class="s">"derivative"</span><span class="p">]</span> <span class="o">=</span> <span class="n">g</span>
<span class="k">return</span> <span class="n">r</span>
<span class="k">def</span> <span class="nf">relu</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">return_derivative</span><span class="o">=</span><span class="bp">True</span><span class="p">):</span>
<span class="n">r</span> <span class="o">=</span> <span class="nb">dict</span><span class="p">()</span>
<span class="n">a</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">maximum</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">)</span>
<span class="n">r</span><span class="p">[</span><span class="s">"value"</span><span class="p">]</span> <span class="o">=</span> <span class="n">a</span>
<span class="k">if</span> <span class="n">return_derivative</span><span class="p">:</span>
<span class="n">g</span> <span class="o">=</span> <span class="p">(</span><span class="n">x</span> <span class="o">&gt;=</span> <span class="mi">0</span><span class="p">)</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="nb">float</span><span class="p">)</span>
<span class="n">r</span><span class="p">[</span><span class="s">"derivative"</span><span class="p">]</span> <span class="o">=</span> <span class="n">g</span>
<span class="k">return</span> <span class="n">r</span>
<span class="k">def</span> <span class="nf">linear</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">return_derivative</span><span class="o">=</span><span class="bp">True</span><span class="p">):</span>
<span class="n">r</span> <span class="o">=</span> <span class="nb">dict</span><span class="p">()</span>
<span class="n">a</span> <span class="o">=</span> <span class="n">x</span>
<span class="n">r</span><span class="p">[</span><span class="s">"value"</span><span class="p">]</span> <span class="o">=</span> <span class="n">a</span>
<span class="k">if</span> <span class="n">return_derivative</span><span class="p">:</span>
<span class="n">g</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">ones</span><span class="p">(</span><span class="n">a</span><span class="o">.</span><span class="n">shape</span><span class="p">)</span>
<span class="n">r</span><span class="p">[</span><span class="s">"derivative"</span><span class="p">]</span> <span class="o">=</span> <span class="n">g</span>
<span class="k">return</span> <span class="n">r</span>
</code></pre></div></div>
<h3 id="loss-functions">Loss functions</h3>
<p>A loss function has signature like</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">lossFunction</span><span class="p">(</span><span class="n">A</span><span class="p">,</span> <span class="n">Y</span><span class="p">,</span> <span class="n">return_derivative</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span><span class="o">.</span>
</code></pre></div></div>
<p>The inputs <code class="highlighter-rouge">A</code> (predictions) and <code class="highlighter-rouge">Y</code> (labels) are numpy arrays of the same shape. The output is a dictionary with keys <code class="highlighter-rouge">value</code> and <code class="highlighter-rouge">derivative</code> (if returned), both of which are numpy arrays of the same shape as the two inputs <code class="highlighter-rouge">A</code> and <code class="highlighter-rouge">Y</code>. The array <code class="highlighter-rouge">derivative</code> must be the derivative of the loss function with respect to the predictions <code class="highlighter-rouge">A</code>.</p>
<p>The cross-entropy and square loss functions are implemented below. The cross-entropy loss below does not properly handle edge cases when <span class="MathJax_Preview" style="color: inherit;"><span class="MJXp-math" id="MJXp-Span-2860"><span class="MJXp-mi MJXp-italic" id="MJXp-Span-2861">A</span><span class="MJXp-mo" id="MJXp-Span-2862" style="margin-left: 0.333em; margin-right: 0.333em;">=</span><span class="MJXp-mo" id="MJXp-Span-2863" style="margin-left: 0.267em; margin-right: 0.267em;">±</span><span class="MJXp-mn" id="MJXp-Span-2864">1</span></span></span><span id="MathJax-Element-146-Frame" class="mjx-chtml MathJax_CHTML MJXc-processing" tabindex="0"></span><script type="math/tex" id="MathJax-Element-146">A = \pm 1</script>, which should be corrected before productionizing the code.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">xent</span><span class="p">(</span><span class="n">A</span><span class="p">,</span> <span class="n">Y</span><span class="p">,</span> <span class="n">return_derivative</span><span class="o">=</span><span class="bp">True</span><span class="p">):</span>
<span class="n">r</span> <span class="o">=</span> <span class="nb">dict</span><span class="p">()</span>
<span class="n">l</span> <span class="o">=</span> <span class="o">-</span><span class="p">(</span><span class="n">Y</span> <span class="o">*</span> <span class="n">np</span><span class="o">.</span><span class="n">log</span><span class="p">(</span><span class="n">A</span><span class="p">)</span> <span class="o">+</span> <span class="p">(</span><span class="mi">1</span><span class="o">-</span><span class="n">Y</span><span class="p">)</span> <span class="o">*</span> <span class="n">np</span><span class="o">.</span><span class="n">log</span><span class="p">(</span><span class="mi">1</span> <span class="o">-</span> <span class="n">A</span><span class="p">))</span>
<span class="n">r</span><span class="p">[</span><span class="s">"value"</span><span class="p">]</span> <span class="o">=</span> <span class="n">l</span>
<span class="k">if</span> <span class="n">return_derivative</span><span class="p">:</span>
<span class="n">dA</span> <span class="o">=</span> <span class="o">-</span><span class="p">(</span><span class="n">Y</span> <span class="o">/</span> <span class="n">A</span> <span class="o">-</span> <span class="p">(</span><span class="mi">1</span> <span class="o">-</span> <span class="n">Y</span><span class="p">)</span> <span class="o">/</span> <span class="p">(</span><span class="mi">1</span> <span class="o">-</span> <span class="n">A</span><span class="p">))</span>
<span class="n">r</span><span class="p">[</span><span class="s">"derivative"</span><span class="p">]</span> <span class="o">=</span> <span class="n">dA</span>
<span class="k">return</span> <span class="n">r</span>
<span class="k">def</span> <span class="nf">square</span><span class="p">(</span><span class="n">A</span><span class="p">,</span> <span class="n">Y</span><span class="p">,</span> <span class="n">return_derivative</span><span class="o">=</span><span class="bp">True</span><span class="p">):</span>
<span class="n">r</span> <span class="o">=</span> <span class="nb">dict</span><span class="p">()</span>
<span class="n">l</span> <span class="o">=</span> <span class="p">(</span><span class="n">Y</span> <span class="o">-</span> <span class="n">A</span><span class="p">)</span><span class="o">**</span><span class="mi">2</span>
<span class="n">r</span><span class="p">[</span><span class="s">"value"</span><span class="p">]</span> <span class="o">=</span> <span class="n">l</span>
<span class="k">if</span> <span class="n">return_derivative</span><span class="p">:</span>
<span class="n">dA</span> <span class="o">=</span> <span class="o">-</span><span class="mi">2</span> <span class="o">*</span> <span class="p">(</span><span class="n">Y</span> <span class="o">-</span> <span class="n">A</span><span class="p">)</span>
<span class="n">r</span><span class="p">[</span><span class="s">"derivative"</span><span class="p">]</span> <span class="o">=</span> <span class="n">dA</span>
<span class="k">return</span> <span class="n">r</span>
</code></pre></div></div>
<h3 id="dnn-class">DNN class</h3>
<p>Below is an implementation of a DNN class. The code is self-explanatory. Backpropagation is the most complicated method and uses the equations we derived in previous sections.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">DNN</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">input_size</span><span class="p">,</span> <span class="n">layer_sizes</span><span class="p">,</span> <span class="n">activation_functions</span><span class="p">,</span> <span class="n">loss</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">nlayers</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">layer_sizes</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">input_size</span> <span class="o">=</span> <span class="n">input_size</span>
<span class="bp">self</span><span class="o">.</span><span class="n">layer_sizes</span> <span class="o">=</span> <span class="n">layer_sizes</span>
<span class="c"># In many cases, it's useful to view the input as the "zeroth" layer.</span>
<span class="c"># We define a private variable _layer_sizes for this.</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_layer_sizes</span> <span class="o">=</span> <span class="p">[</span><span class="n">input_size</span><span class="p">]</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_layer_sizes</span><span class="o">.</span><span class="n">extend</span><span class="p">(</span><span class="n">layer_sizes</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">activation_functions</span> <span class="o">=</span> <span class="n">activation_functions</span>
<span class="bp">self</span><span class="o">.</span><span class="n">loss</span> <span class="o">=</span> <span class="n">loss</span>
<span class="k">def</span> <span class="nf">_initialize_parameters</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="c"># Similar to Xavier initialization, but is adapted for RELU</span>
<span class="bp">self</span><span class="o">.</span><span class="n">params</span> <span class="o">=</span> <span class="nb">dict</span><span class="p">()</span>
<span class="k">for</span> <span class="n">l</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_layer_sizes</span><span class="p">)):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">[</span><span class="s">"W"</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">l</span><span class="p">)]</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">sqrt</span><span class="p">(</span><span class="mi">2</span> <span class="o">/</span> <span class="bp">self</span><span class="o">.</span><span class="n">_layer_sizes</span><span class="p">[</span><span class="n">l</span><span class="o">-</span><span class="mi">1</span><span class="p">])</span> <span class="o">*</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">randn</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_layer_sizes</span><span class="p">[</span><span class="n">l</span><span class="p">],</span> <span class="bp">self</span><span class="o">.</span><span class="n">_layer_sizes</span><span class="p">[</span><span class="n">l</span><span class="o">-</span><span class="mi">1</span><span class="p">])</span>
<span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">[</span><span class="s">"b"</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">l</span><span class="p">)]</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">((</span><span class="bp">self</span><span class="o">.</span><span class="n">_layer_sizes</span><span class="p">[</span><span class="n">l</span><span class="p">],</span> <span class="mi">1</span><span class="p">))</span>
<span class="k">def</span> <span class="nf">_loss_lookup</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">):</span>
<span class="k">if</span> <span class="n">name</span> <span class="o">==</span> <span class="s">"xent"</span><span class="p">:</span>
<span class="k">return</span> <span class="n">xent</span>
<span class="k">elif</span> <span class="n">name</span> <span class="o">==</span> <span class="s">"square"</span><span class="p">:</span>
<span class="k">return</span> <span class="n">square</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">raise</span> <span class="nb">Exception</span><span class="p">(</span><span class="n">name</span> <span class="o">+</span> <span class="s">" is not an implemented loss function."</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">_activation_lookup</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">):</span>
<span class="k">if</span> <span class="n">name</span> <span class="o">==</span> <span class="s">"tanh"</span><span class="p">:</span>
<span class="k">return</span> <span class="n">tanh</span>
<span class="k">elif</span> <span class="n">name</span> <span class="o">==</span> <span class="s">"sigmoid"</span><span class="p">:</span>
<span class="k">return</span> <span class="n">sigmoid</span>
<span class="k">elif</span> <span class="n">name</span> <span class="o">==</span> <span class="s">"relu"</span><span class="p">:</span>
<span class="k">return</span> <span class="n">relu</span>
<span class="k">elif</span> <span class="n">name</span> <span class="o">==</span> <span class="s">"linear"</span><span class="p">:</span>
<span class="k">return</span> <span class="n">linear</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">raise</span> <span class="nb">Exception</span><span class="p">(</span><span class="n">name</span> <span class="o">+</span> <span class="s">" is not an implemented activation function."</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">_forward_propagate</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">X</span><span class="p">,</span> <span class="n">params</span><span class="p">,</span> <span class="n">return_cache</span><span class="o">=</span><span class="bp">False</span><span class="p">):</span>
<span class="n">cache</span> <span class="o">=</span> <span class="nb">dict</span><span class="p">()</span>
<span class="n">A</span> <span class="o">=</span> <span class="n">X</span>
<span class="k">if</span> <span class="n">return_cache</span><span class="p">:</span>
<span class="n">cache</span><span class="p">[</span><span class="s">"A0"</span><span class="p">]</span> <span class="o">=</span> <span class="n">A</span>
<span class="k">for</span> <span class="n">l</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">nlayers</span> <span class="o">+</span> <span class="mi">1</span><span class="p">):</span>
<span class="n">Z</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">dot</span><span class="p">(</span><span class="n">params</span><span class="p">[</span><span class="s">"W"</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">l</span><span class="p">)],</span> <span class="n">A</span><span class="p">)</span> <span class="o">+</span> <span class="n">params</span><span class="p">[</span><span class="s">"b"</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">l</span><span class="p">)]</span>
<span class="n">g</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_activation_lookup</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">activation_functions</span><span class="p">[</span><span class="n">l</span><span class="o">-</span><span class="mi">1</span><span class="p">])(</span><span class="n">Z</span><span class="p">,</span> <span class="n">return_derivative</span><span class="o">=</span><span class="n">return_cache</span><span class="p">)</span>
<span class="n">A</span> <span class="o">=</span> <span class="n">g</span><span class="p">[</span><span class="s">"value"</span><span class="p">]</span>
<span class="k">if</span> <span class="n">return_cache</span><span class="p">:</span>
<span class="n">cache</span><span class="p">[</span><span class="s">"A"</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">l</span><span class="p">)]</span> <span class="o">=</span> <span class="n">A</span>
<span class="n">cache</span><span class="p">[</span><span class="s">"gPrime"</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">l</span><span class="p">)]</span> <span class="o">=</span> <span class="n">g</span><span class="p">[</span><span class="s">"derivative"</span><span class="p">]</span>
<span class="k">if</span> <span class="n">return_cache</span><span class="p">:</span>
<span class="k">return</span> <span class="n">A</span><span class="p">,</span> <span class="n">cache</span>
<span class="k">return</span> <span class="n">A</span>
<span class="k">def</span> <span class="nf">_back_propagate</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">X</span><span class="p">,</span> <span class="n">Y</span><span class="p">,</span> <span class="n">params</span><span class="p">,</span> <span class="n">cache</span><span class="p">):</span>
<span class="n">grads</span> <span class="o">=</span> <span class="nb">dict</span><span class="p">()</span>
<span class="n">m</span> <span class="o">=</span> <span class="n">X</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
<span class="n">A</span> <span class="o">=</span> <span class="n">cache</span><span class="p">[</span><span class="s">"A"</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">nlayers</span><span class="p">)]</span>
<span class="n">r</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_loss_lookup</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">loss</span><span class="p">)(</span><span class="n">A</span><span class="p">,</span> <span class="n">Y</span><span class="p">,</span> <span class="n">return_derivative</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="n">grads</span><span class="p">[</span><span class="s">"dA"</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">nlayers</span><span class="p">)]</span> <span class="o">=</span> <span class="n">r</span><span class="p">[</span><span class="s">"derivative"</span><span class="p">]</span>
<span class="n">loss</span> <span class="o">=</span> <span class="p">(</span><span class="mi">1</span> <span class="o">/</span> <span class="n">m</span><span class="p">)</span> <span class="o">*</span> <span class="n">np</span><span class="o">.</span><span class="nb">sum</span><span class="p">(</span><span class="n">r</span><span class="p">[</span><span class="s">"value"</span><span class="p">])</span>
<span class="k">for</span> <span class="n">l</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">nlayers</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">):</span>
<span class="n">grads</span><span class="p">[</span><span class="s">"dZ"</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">l</span><span class="p">)]</span> <span class="o">=</span> <span class="n">grads</span><span class="p">[</span><span class="s">"dA"</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">l</span><span class="p">)]</span> <span class="o">*</span> <span class="n">cache</span><span class="p">[</span><span class="s">"gPrime"</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">l</span><span class="p">)]</span>
<span class="n">grads</span><span class="p">[</span><span class="s">"dA"</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">l</span><span class="o">-</span><span class="mi">1</span><span class="p">)]</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">dot</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">[</span><span class="s">"W"</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">l</span><span class="p">)]</span><span class="o">.</span><span class="n">T</span><span class="p">,</span> <span class="n">grads</span><span class="p">[</span><span class="s">"dZ"</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">l</span><span class="p">)])</span>
<span class="n">grads</span><span class="p">[</span><span class="s">"dByW"</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">l</span><span class="p">)]</span> <span class="o">=</span> <span class="p">(</span><span class="mi">1</span> <span class="o">/</span> <span class="n">m</span><span class="p">)</span> <span class="o">*</span> <span class="n">np</span><span class="o">.</span><span class="n">dot</span><span class="p">(</span><span class="n">grads</span><span class="p">[</span><span class="s">"dZ"</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">l</span><span class="p">)],</span> <span class="n">cache</span><span class="p">[</span><span class="s">"A"</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">l</span><span class="o">-</span><span class="mi">1</span><span class="p">)]</span><span class="o">.</span><span class="n">T</span><span class="p">)</span>
<span class="n">grads</span><span class="p">[</span><span class="s">"dByb"</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">l</span><span class="p">)]</span> <span class="o">=</span> <span class="p">(</span><span class="mi">1</span> <span class="o">/</span> <span class="n">m</span><span class="p">)</span> <span class="o">*</span> <span class="n">np</span><span class="o">.</span><span class="nb">sum</span><span class="p">(</span><span class="n">grads</span><span class="p">[</span><span class="s">"dZ"</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">l</span><span class="p">)],</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">keepdims</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="k">return</span> <span class="n">grads</span><span class="p">,</span> <span class="n">loss</span>
<span class="k">def</span> <span class="nf">_update_parameters</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">params</span><span class="p">,</span> <span class="n">grads</span><span class="p">,</span> <span class="n">learning_rate</span> <span class="o">=</span> <span class="mf">0.1</span><span class="p">,</span> <span class="n">l2_regularization</span> <span class="o">=</span> <span class="mf">0.0</span><span class="p">):</span>
<span class="k">for</span> <span class="n">l</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">nlayers</span> <span class="o">+</span> <span class="mi">1</span><span class="p">):</span>
<span class="n">params</span><span class="p">[</span><span class="s">"W"</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">l</span><span class="p">)]</span> <span class="o">-=</span> <span class="n">learning_rate</span> <span class="o">*</span> <span class="p">(</span><span class="n">grads</span><span class="p">[</span><span class="s">"dByW"</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">l</span><span class="p">)]</span> <span class="o">+</span> <span class="n">l2_regularization</span> <span class="o">*</span> <span class="n">params</span><span class="p">[</span><span class="s">"W"</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">l</span><span class="p">)])</span>
<span class="n">params</span><span class="p">[</span><span class="s">"b"</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">l</span><span class="p">)]</span> <span class="o">-=</span> <span class="n">learning_rate</span> <span class="o">*</span> <span class="p">(</span><span class="n">grads</span><span class="p">[</span><span class="s">"dByb"</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">l</span><span class="p">)]</span> <span class="o">+</span> <span class="n">l2_regularization</span> <span class="o">*</span> <span class="n">params</span><span class="p">[</span><span class="s">"b"</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">l</span><span class="p">)])</span>
<span class="k">return</span> <span class="n">params</span>
<span class="k">def</span> <span class="nf">train</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">X</span><span class="p">,</span> <span class="n">Y</span><span class="p">,</span> <span class="n">batch_size</span><span class="o">=</span><span class="mi">32</span><span class="p">,</span> <span class="n">n_epochs</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span> <span class="n">learning_rate</span><span class="o">=</span><span class="mf">0.1</span><span class="p">,</span> <span class="n">l2_regularization</span><span class="o">=</span><span class="mf">0.01</span><span class="p">,</span> <span class="n">verbose</span><span class="o">=</span><span class="bp">True</span><span class="p">):</span>
<span class="c"># Normalize input</span>
<span class="bp">self</span><span class="o">.</span><span class="n">mu</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">mean</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">keepdims</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">sig</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">std</span><span class="p">(</span><span class="n">X</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">keepdims</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="n">X_std</span> <span class="o">=</span> <span class="p">(</span><span class="n">X</span> <span class="o">-</span> <span class="bp">self</span><span class="o">.</span><span class="n">mu</span><span class="p">)</span> <span class="o">/</span> <span class="bp">self</span><span class="o">.</span><span class="n">sig</span>
<span class="c"># Initialize parameters</span>
<span class="bp">self</span><span class="o">.</span><span class="n">_initialize_parameters</span><span class="p">()</span>
<span class="n">losses</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">n_obs</span> <span class="o">=</span> <span class="n">X</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
<span class="n">n_full_batches</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span><span class="n">n_obs</span> <span class="o">/</span> <span class="n">batch_size</span><span class="p">)</span>
<span class="n">last_batch_size</span> <span class="o">=</span> <span class="n">n_obs</span> <span class="o">-</span> <span class="n">batch_size</span> <span class="o">*</span> <span class="n">n_full_batches</span>
<span class="k">for</span> <span class="n">epoch</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">n_epochs</span><span class="p">):</span>
<span class="c"># Shuffle data</span>
<span class="n">order</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">permutation</span><span class="p">(</span><span class="n">n_obs</span><span class="p">)</span>
<span class="k">for</span> <span class="n">batch</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">n_full_batches</span><span class="p">):</span>
<span class="c"># Define batch</span>
<span class="n">batch_ind</span> <span class="o">=</span> <span class="n">order</span><span class="p">[(</span><span class="n">batch</span> <span class="o">*</span> <span class="n">batch_size</span><span class="p">):((</span><span class="n">batch</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span> <span class="o">*</span> <span class="n">batch_size</span><span class="p">)]</span>
<span class="n">X_batch_std</span> <span class="o">=</span> <span class="n">X_std</span><span class="p">[:,</span> <span class="n">batch_ind</span><span class="p">]</span>
<span class="n">Y_batch</span> <span class="o">=</span> <span class="n">Y</span><span class="p">[:,</span> <span class="n">batch_ind</span><span class="p">]</span>
<span class="c"># Forward/back propagate to compute parameter gradients</span>
<span class="n">A</span><span class="p">,</span> <span class="n">cache</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_forward_propagate</span><span class="p">(</span><span class="n">X_batch_std</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">,</span> <span class="n">return_cache</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="n">grads</span><span class="p">,</span> <span class="n">loss</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_back_propagate</span><span class="p">(</span><span class="n">X_batch_std</span><span class="p">,</span> <span class="n">Y_batch</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">,</span> <span class="n">cache</span><span class="p">)</span>
<span class="c"># Update parameters</span>
<span class="bp">self</span><span class="o">.</span><span class="n">params</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_update_parameters</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">,</span> <span class="n">grads</span><span class="p">,</span> <span class="n">learning_rate</span><span class="p">,</span> <span class="n">l2_regularization</span><span class="p">)</span>
<span class="n">losses</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">loss</span><span class="p">)</span>
<span class="k">if</span> <span class="n">verbose</span> <span class="ow">and</span> <span class="p">(</span><span class="n">batch</span> <span class="o">%</span> <span class="mi">100</span> <span class="o">==</span> <span class="mi">0</span><span class="p">):</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Loss after epoch "</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">epoch</span><span class="p">)</span> <span class="o">+</span> <span class="s">" batch "</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">batch</span><span class="p">)</span> <span class="o">+</span> <span class="s">": "</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">loss</span><span class="p">))</span>
<span class="k">if</span> <span class="n">last_batch_size</span> <span class="o">&gt;</span> <span class="mi">0</span><span class="p">:</span>
<span class="c"># Define batch</span>
<span class="n">batch_ind</span> <span class="o">=</span> <span class="n">order</span><span class="p">[(</span><span class="n">n_full_batches</span> <span class="o">*</span> <span class="n">batch_size</span><span class="p">):]</span>
<span class="n">X_batch_std</span> <span class="o">=</span> <span class="n">X_std</span><span class="p">[:,</span> <span class="n">batch_ind</span><span class="p">]</span>
<span class="n">Y_batch</span> <span class="o">=</span> <span class="n">Y</span><span class="p">[:,</span> <span class="n">batch_ind</span><span class="p">]</span>
<span class="n">A</span><span class="p">,</span> <span class="n">cache</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_forward_propagate</span><span class="p">(</span><span class="n">X_batch_std</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">,</span> <span class="n">return_cache</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="n">grads</span><span class="p">,</span> <span class="n">loss</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_back_propagate</span><span class="p">(</span><span class="n">X_batch_std</span><span class="p">,</span> <span class="n">Y_batch</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">,</span> <span class="n">cache</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">params</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_update_parameters</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">,</span> <span class="n">grads</span><span class="p">,</span> <span class="n">learning_rate</span><span class="p">,</span> <span class="n">l2_regularization</span><span class="p">)</span>
<span class="n">losses</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">loss</span><span class="p">)</span>
<span class="k">if</span> <span class="n">verbose</span> <span class="ow">and</span> <span class="p">(</span><span class="n">batch</span> <span class="o">%</span> <span class="mi">100</span> <span class="o">==</span> <span class="mi">0</span><span class="p">):</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Loss after epoch "</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">epoch</span><span class="p">)</span> <span class="o">+</span> <span class="s">" batch "</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">batch</span><span class="p">)</span> <span class="o">+</span> <span class="s">": "</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">loss</span><span class="p">))</span>
<span class="k">return</span> <span class="n">losses</span>
<span class="k">def</span> <span class="nf">predict</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">X</span><span class="p">):</span>
<span class="n">X_std</span> <span class="o">=</span> <span class="p">(</span><span class="n">X</span> <span class="o">-</span> <span class="bp">self</span><span class="o">.</span><span class="n">mu</span><span class="p">)</span> <span class="o">/</span> <span class="bp">self</span><span class="o">.</span><span class="n">sig</span>
<span class="n">A</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_forward_propagate</span><span class="p">(</span><span class="n">X_std</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">params</span><span class="p">,</span> <span class="n">return_cache</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
<span class="k">return</span> <span class="n">A</span>
</code></pre></div></div>
<h3 id="toy-data">Toy data</h3>
<p>To test the DNN class, we define a toy dataset below.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">n</span> <span class="o">=</span> <span class="mi">2</span>
<span class="n">m</span> <span class="o">=</span> <span class="mi">100000</span>
<span class="n">v1</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">randn</span><span class="p">(</span><span class="n">n</span><span class="p">,</span><span class="mi">1</span><span class="p">)</span>
<span class="n">v2</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">randn</span><span class="p">(</span><span class="n">n</span><span class="p">,</span><span class="mi">1</span><span class="p">)</span>
<span class="n">X_train</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">randn</span><span class="p">(</span><span class="n">n</span><span class="p">,</span><span class="n">m</span><span class="p">)</span>
<span class="n">Y_train</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">minimum</span><span class="p">((</span><span class="n">np</span><span class="o">.</span><span class="n">dot</span><span class="p">(</span><span class="n">v1</span><span class="o">.</span><span class="n">T</span><span class="p">,</span> <span class="n">X_train</span><span class="p">)</span> <span class="o">&gt;</span> <span class="mi">0</span><span class="p">)</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="nb">float</span><span class="p">)</span> <span class="o">+</span> <span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">dot</span><span class="p">(</span><span class="n">v2</span><span class="o">.</span><span class="n">T</span><span class="p">,</span> <span class="n">X_train</span><span class="p">)</span> <span class="o">&gt;</span> <span class="mi">0</span><span class="p">)</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="nb">float</span><span class="p">),</span> <span class="mi">1</span><span class="p">)</span>
<span class="n">X_test</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">randn</span><span class="p">(</span><span class="n">n</span><span class="p">,</span><span class="n">m</span><span class="p">)</span>
<span class="n">Y_test</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">minimum</span><span class="p">((</span><span class="n">np</span><span class="o">.</span><span class="n">dot</span><span class="p">(</span><span class="n">v1</span><span class="o">.</span><span class="n">T</span><span class="p">,</span> <span class="n">X_test</span><span class="p">)</span> <span class="o">&gt;</span> <span class="mi">0</span><span class="p">)</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="nb">float</span><span class="p">)</span> <span class="o">+</span> <span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">dot</span><span class="p">(</span><span class="n">v2</span><span class="o">.</span><span class="n">T</span><span class="p">,</span> <span class="n">X_test</span><span class="p">)</span> <span class="o">&gt;</span> <span class="mi">0</span><span class="p">)</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="nb">float</span><span class="p">),</span> <span class="mi">1</span><span class="p">)</span>
</code></pre></div></div>
<p>The classes are not linearly separable so logistic regression will not work very well (see plot of test data below).</p>
<p><img src="./Implementing a neural network in Python _ statsandstuff_files/dnn-toy-data.png" alt=""></p>
<h3 id="train-a-dnn-on-toy-data">Train a DNN on toy data</h3>
<p>Below we use our DNN class to define a network architecture and train it on the toy data.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">input_size</span> <span class="o">=</span> <span class="n">n</span>
<span class="n">layer_sizes</span> <span class="o">=</span> <span class="p">[</span><span class="mi">8</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">1</span><span class="p">]</span>
<span class="n">activation_functions</span> <span class="o">=</span> <span class="p">[</span><span class="s">"relu"</span><span class="p">,</span> <span class="s">"relu"</span><span class="p">,</span> <span class="s">"sigmoid"</span><span class="p">]</span>
<span class="n">loss</span> <span class="o">=</span> <span class="s">"xent"</span>
<span class="n">dnn</span> <span class="o">=</span> <span class="n">DNN</span><span class="p">(</span><span class="n">input_size</span><span class="p">,</span> <span class="n">layer_sizes</span><span class="p">,</span> <span class="n">activation_functions</span><span class="p">,</span> <span class="n">loss</span><span class="p">)</span>
<span class="n">dnn</span><span class="o">.</span><span class="n">train</span><span class="p">(</span><span class="n">X_train</span><span class="p">,</span> <span class="n">Y_train</span><span class="p">,</span> <span class="n">batch_size</span><span class="o">=</span><span class="mi">64</span><span class="p">,</span> <span class="n">n_epochs</span><span class="o">=</span><span class="mi">20</span><span class="p">,</span> <span class="n">learning_rate</span><span class="o">=</span><span class="mf">0.01</span><span class="p">,</span> <span class="n">l2_regularization</span><span class="o">=</span><span class="mf">0.01</span><span class="p">)</span>
</code></pre></div></div>
<p>The classifier achieves 98% accuracy on the test data, which we compute with the following code:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>A = dnn.predict(X_test)
np.mean((A &gt; 0.5).astype(float) == Y_test)
</code></pre></div></div>
<span class="post-date">
Written on
September
17th,
2019
by
Scott Roy
</span>
<div class="post-date">Feel free to share!</div>
<div class="sharing-icons">
<a href="https://twitter.com/intent/tweet?text=Implementing%20a%20neural%20network%20in%20Python&amp;url=https://scottroy.github.io/implementing-a-neural-network-in-python.html" target="_blank"><i class="fa fa-twitter" aria-hidden="true"></i></a>
<a href="https://www.facebook.com/sharer/sharer.php?u=https://scottroy.github.io/implementing-a-neural-network-in-python.html&amp;title=Implementing%20a%20neural%20network%20in%20Python" target="_blank"><i class="fa fa-facebook" aria-hidden="true"></i></a>
<a href="https://plus.google.com/share?url=https://scottroy.github.io/implementing-a-neural-network-in-python.html" target="_blank"><i class="fa fa-google-plus" aria-hidden="true"></i></a>
</div>
</div>
<div class="related">
<h1>You may also enjoy:</h1>
<ul class="related-posts">
<li>
<h3>
<a href="https://scottroy.github.io/introduction-to-neural-networks.html">
Introduction to neural networks
<!--<img src="https://scottroy.github.io/images/">-->
<!--<small>August 11, 2019</small>-->
</a>
</h3>
</li>
</ul>
</div>
</div>
<footer class="footer">
<a href="https://www.github.com/scottroy" target="_blank"><i class="fa fa-github" aria-hidden="true"></i></a>
<a href="https://www.linkedin.com/in/scott-roy/" target="_blank"><i class="fa fa-linkedin" aria-hidden="true"></i></a>
<a href="mailto:scott.michael.roy@gmail.com" target="_blank"><i class="fa fa-envelope" aria-hidden="true"></i></a>
<a href="https://scottroy.github.io/feed.xml"><i class="fa fa-rss-square" aria-hidden="true"></i></a>
<div class="post-date"><a href="https://scottroy.github.io/menu/about.html">statsandstuff | a blog on statistics and machine learning by Scott Roy</a></div>
</footer>
</body></html>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment