dpshelio/check_bug.py

## check_bug.py
import numpy as np
import subprocess

# Creates input file
data_input = np.zeros((20, 20))
data_input[-1, :] = 1
# The expeted result is all zeros, except the last one, it should be 1
expected = np.zeros(20)
expected[-1] = 1

np.savetxt("brain_sample.csv", data_input, fmt='%d', delimiter=',')

# run python program
subprocess.run(["python", "sagital_brain.py"])

# Check result
result = np.loadtxt("brain_average.csv",  delimiter=',')
np.testing.assert_array_equal(result, expected)

## diagram_brains.png

      
    Raw
  

              diagram_brains.png
            
          
## git_bisect_problem.md

      
    Raw
  

              git_bisect_problem.md
            
          
    Charlene Bultoc has just started a post-doc at an important neuro-science institute. She is doing research on
a new methodology to anlayse signals on our brains detected through a combination of CT and MRI.
Using image processing techniques she can simplify the whole dataset into a grid of 20x20 arrays.

Her theory is that the average of such signals through the sagital plane is
constant over time, so she has written some software to calculate this. She
decided to write that software in Python so she could share it (via GitHub,
rsd-sagital_average) with
people from other labs. She didn't know as much Python when she started as she
does now, so you can see that evolution in her program.
Charlene is an advocate of reproducibility, and as such she has been keeping trach of what versions
she's run for each of her results. "That's better than keeping just the date!" you can hear her saying.
So for each batch of images she processes she creates a file versions.txt with a content like:
scikit-image == 0.16.2
scikit-brain == 1.0
git://git.example.com:brain_analysis.git@dfc801d7db41bc8e4dea104786497f3eb09ae9e0
git://github.com:UCL-RITS/rsd-sagital_average.git@02fd8791c93b630fc2aecd198e4e4a3a9b441eda
numpy == 1.17

With that information she can go and run the same analysis again and again and be as
reproducible as she can.
However she's found that sagital_average has a problem... and she needs to re-analyse all
the data since that bug was introduced. Running the analysis for all the data she's produced
is not viable as each run takes three days to execute - assuming she has the resources available in
the university cluster, and she has more than 300 results.
We can then help her either by letting her use our laptops or by finding when the bug was
introduced and then run only the ones that need to be re-analysed.
Finding when the bug was introduced seems the quickest way. Download the repository with
her sagital_average.py script and use git bisect to find when the script started to give
wrong results.
Do it manually first (as explained on the
notes),
and then create a simple test that you could use in a more automatic
way.
In all the versions of the program, it reads and writes csv files. Charlene has improved
the program considerably over the time, but kept the same defaults (input filename
brain_sample.csv and output brain_average.csv). She has always "tested" her program
with the brain_sample.csv file provided in the repository. However (and that's part of the
problem!), the effect of the bug is not noticeable with that file.
Steps to help Charlene:


Fork Charlene's repository and clone your fork.
Run the latest version of the code with the existing input file
Create a new input file to figure out what the bug is
Hint: You can generate a file that shows the error as:
data_input = np.zeros((20, 20))
data_input[-1, :] = 1
np.savetxt("brain_sample.csv", data_input, fmt='%d', delimiter=',')
You need to create the brain_sample.csv file each time you run
the program.
Use bisect manually until you find the introduction of the error.
Write some code to use bisect automatically.
Hint: For doing it automatically you can use subprocess. For example:
subprocess.run(["ls", "-lh"])
will execute the ls command with the -lh arguments.
Use assert or a similar function (e.g., np.testing.assert_array_equal) to make the program fail.
What should Charlene do now that she’s found the bug?
	import numpy as np
	import subprocess

	# Creates input file
	data_input = np.zeros((20, 20))
	data_input[-1, :] = 1
	# The expeted result is all zeros, except the last one, it should be 1
	expected = np.zeros(20)
	expected[-1] = 1

	np.savetxt("brain_sample.csv", data_input, fmt='%d', delimiter=',')

	# run python program
	subprocess.run(["python", "sagital_brain.py"])

	# Check result
	result = np.loadtxt("brain_average.csv", delimiter=',')
	np.testing.assert_array_equal(result, expected)