- Advertisement -

- Advertisement -

All the Backpropagation derivatives

0 13

Get real time updates directly on you device, subscribe now.

- Advertisement -

[5] Derivative w.r.t weights (2)

[5] derivative of cost func w.r.t weights ‘w’

- Advertisement -

This derivative can be computed two different ways! We can use chain rule or compute directly. We will do both as it provides a great intuition behind backprop calculation.

To use chain rule to get derivative [5] we note that we have already computed the following

previously computed

Noting that the product of the first two equations gives us

if we then continue using the chain rule and multiply this result by

then we get

which is nothing more than

The final result for ‘dw’

or written out long hand

chain rule result for ‘dw’

So that’s the ‘chain rule way’. Now lets compute ‘dw’ directly:

To compute directly, we first take our cost function

Cross Entropy cost function

We can notice that the first log term ‘ln(a)’ can be expanded to

expanding ‘ln(a)’

Which simplifies to:

And if we take the second log function ‘ln(1-a)’ which can be shown as


taking the log of the numerator ( we will leave the denominator) we get

log of the numerator

This result comes from the rule of logs, which states: log(p/q) = log(p) — log(q).

Plugging these formula back into our original cost function we get

plugged back into cost function

Expanding the term in the square brackets we get

terms inside bracket expanded

The first and last terms ‘yln(1+e^-z)’ cancel out leaving:

Which we can rearrange by pulling the ‘yz’ term to the outside to give

Here’s where it gets interesting, by adding an exp term to the ‘z’ inside the square brackets and then immediately taking its log

we exponentiate ‘e^z’ then take its log

next we can take advantage of the rule of sum of logs: ln(a) + ln(b) = ln(a.b) combined with rule of exp products:e^a + e^b = e^(a+b) to get

summing the logs

followed by

summing the exps

Pulling the ‘yz’ term inside the brackets we get :

Finally we note that z = Wx+b therefore taking the derivative w.r.t W:

take derivative w.r.t W

The first term ‘yz ’becomes ‘yx ’and the second term becomes :

taking derivative of logs again

Note that the 2nd term is nothing but

Which gives a final result of

We can rearrange by pulling ‘x’ out to give

which gives

final result

- Advertisement -

Get real time updates directly on you device, subscribe now.

- Advertisement -

- Advertisement -

Leave A Reply

Your email address will not be published.


We use cookies to give you the best online experience. By agreeing you accept the use of cookies in accordance with our cookie policy.

I accept I decline