byelipk/9-ml-exercises.md

## 9-ml-exercises.md

      
    Raw
  

              9-ml-exercises.md
            
          
    Excercises

1. What are the main benefits of creating a computation graph rather than directly executing the computations? What are the main drawbacks?

Deep Learning frameworks that generate computation graphs, like TensorFlow, have several things going for it.
For starters, computation graphs will compute the gradients automatically. This saves you from having to do lots of tedious calculus by hand.
Another huge plus is that they are optimized to run on your computer's GPU. If this wasn't the case you'd need to learn either CUDA or OPENCL and write lots of C++ by hand. Not an easy thing to do.
These frameworks also have utilities built in to shared models between devices as well as reuse models from larger project, so-called "transfer learning."
Finally, using a framework like TensorFlow gives you access to TensorBoard, a visualization tool to help you debug your neural networks.
The thing about deep learning frameworks, is that it adds several layers of abstraction. That makes learning not just the framework, but deep learning more difficult. The extra layers of abstraction make debugging not as intuitive as writing a ML algoirthm from scratch would be.
(See p. 243)
2. Is the statement a_val = a.eval(session=sess) equivalent to a_val = sess.run(a)?

Yup. These two statements are the same. Let's look at some code:
import tensorflow as tf

x = tf.Variable(3, name="x")

sess = tf.Session()
sess.run(x.initializer)
sess.run(x) == x.eval(session=sess)
(See p. 245)
3. Is the statement a_val, b_val = a.eval(session=sess), b.eval(session=sess) equivalent to a_val, b_val = sess.run([a, b])?

No. This one is a little tricky.
The statement a_val, b_val = a.eval(session=sess), b.eval(session=sess) actually runs the computation graph once to compute a_val and a second time to compute b_val. So its running the computation graph twice!
On the other hand, the statement a_val, b_val = sess.run([a, b]) will run both operations at the same time.
For more details check out the tf.Session docs at: https://www.tensorflow.org/api_docs/python/tf/Session
(See p. 247)
4. Can you run two graphs in the same session?

You cannot run two graphs in the same session.
You would have to merge both graphs into a single graph to create that affect.
(See p. 246)
5. If you create a graph g containing a variable w, then start two threads and open a session in each thread, both using the same graph g, will each session have its own copy of the variable w or will it be shared?

On a local machine, each thread will have its own copy of each variable. It's a bit more complicated in a distributed environment where variable values are stored in a container managed by the cluster. So if both sessions connect to the same cluster and use the same container, they will share the same variable.
6. When is a variable initialized? When is it destroyed?

A variable is initialized when you call its initializer. It is destroyed when the session ends. In distributed TF, variables live in containers on the cluster, so closing a session will not destroy a variable. Instead, you'd need to clear the container.
7. What is the difference between a placeholder and a variable?

Variables are operations that hold values. These values can persist throughout successive runs of the graph. Variables are also mutable; you can modify their value using the assignment operation tf.assign(). A typical use case for variables is to hold onto model parameters or counting the training step.
Placeholder hold information about the type and shape of tensor they represent. However, they have no value. If you try to evaluate an operation that depends on a placeholder, you must feed TensorFlow the value of the placeholder using the feed_dict argument or else TF will complain. A typical use case for placeholders is to feed training or test data to TF durin ghte execution phase.
8. What happens when you run the graph to evaluate an operation that depends on a placeholder but you don’t feed its value? What happens if the operation does not depend on the placeholder?

If you run the graph to evaluate an operation that depends on a placeholder and you do not feed that placeholder its value you will get an exception. If the operation does not depend on the placeholder then no exception is raised.
9. When you run a graph, can you feed the output value of any operation, or just the value of placeholders?

When you run a graph you can feed the output value of any operation, not just the value of placeholders.
10. How can you set a variable to any value you want (during the execution phase)?

If you want to set a variable to any value you want:

Create the variable and initialize it to some value
Create a placeholder for the updated value
Create an assignmnet operation using tf.assign()

When the session is running, you can evaluate the assignment operation and feed it the updated value.
Here's some code:
import tensorflow as tf

x = tf.Variable(100)
x_new = tf.placeholder(shape=(), dtype=tf.int32)
x_assign = tf.assign(x, x_new)

with tf.Session():
    x.initializer.run()
    print(x.eval())
    x_assign.eval(feed_dict={x_new: 200})
    print(x.eval())
11. How many times does reverse-mode autodiff need to traverse the graph in order to compute the gradients of the cost function with regards to 10 variables? What about forward-mode autodiff? And symbolic differentiation?

Reverse-mode autodiff needs to traverse the graph only twice in order to compute the gradients of the cost function with respect to n variables.
Forward-mode will run once for each variable.
Symbolic differentiation will build a new graph instead.
12. Implement Logistic Regression with Mini-batch Gradient Descent using TensorFlow. Train it and evaluate it on the moons dataset. Try adding all the bells and whistles:


define the graph within a logistic_regression() function that can be reused easily,
save checkpoints using a Saver at regular intervals during training, save the final model at the end of training,
restore the last checkpoint upon startup if training was interrupted,
define the graph using nice scopes so the graph looks good in TensorBoard,
add summaries to visualize the learning curves in TensorBoard,
try tweaking some hyperparameters such as the learning rate or the mini-batch size and look at the shape of the learning curve.