Saturday, 1 November 2014

How not to debug, Part II

In my last post, I gave some general hints and tips on how not to set yourself up for debugging. This would include practicing ergonomics, getting into the right mindset, proper coding practices, documentation, commenting code, and more. Today, I'll be taking all of those and actually debugging some real problems I encountered over the years, and how concepts from class lectures, the textbook, and my previous blog all apply. Without further ado, let's get started.

Example problem

Last month I was working on an assignment for my artificial intelligence class. In this assignment, the user was placed in a room filled with 20 doors and each door was assigned three properties:
  1. Cold or hot.
  2. Noisy or quiet. 
  3. Safe or unsafe.
Each door would be assigned one from each pair, so a door could be cold, noisy, and safe, or hot, quiet, and safe, and so on. I ran into two significant bugs while working on this project.
  1. Noisy doors should play their sound when within close proximity. Regardless of proximity, the doors were playing their sound.
  2. Doors were not being assigned the variables correctly. 
I will analyze the number 1 first. So, as mentioned, each door should be noisy when close. However, regardless of proximity, the sound plays. This is my current scene:


Noisy doors problem

Four doors are cold, one is hot. They are all defaulted to safe and only the hot one is supposed to be noisy. From where I stand in the scene currently, a lion roar sound plays. This should only play when directly in front of the door. Because the title of today's post is how not to debug, let's go over some ways not to debug this particular problem

What not to do: Start changing code.

While it may seem very, very attractive to just start playing with values, checking initialization, moving stuff around, etc, this is not how you should start out this problem. When you begin immediately start changing code without looking into how or why it defects, you create a habit where you do not think logically about your problem. This in turn can waste your time and create artificial stress. You could easily play with little segments of code for hours before actually figuring out the problem, to which you may not understand how you fixed it, or accidentally creating even more problems.

How to do it instead: What should I do instead?


Recreate the problem, or at least try too. Try to remember exactly how you got to the problem and recreate it. And when you do recreate it, either write it down or remember it, because this will help you develop a hypothesis later.

After you recreate the problem in code, you can do one of two things, and I believe both are valid depending on certain circumstances: Google (research) the problem or use all tools available (breakpoints, call stack, any other tools) to try and narrow down the problem so you can at least get an idea of the problem. I only advocate for researching when, based on the defect and how you made it, could be fixed by a quick search into documentation or Google. 

Actually Solving it

So instead of just changing variables, I look into the problem. I place break points at initialization, the play function, and at other points of interest, view the call stack, write a few lines of code to cout if some information is initialized correctly, a practice I should have done before, etc. 

I see that the sound is playing from exactly where it should be, the listener is exactly where it should be, and the sound has the correct set up in terms of how the sound resonates, falls off, and volume. So if it not an apparent problem with the code I wrote itself, it could be a problem elsewhere, such as:
  • SFML
  • I screwed up on the audio file export
  • Could still be a problem with my code, but it's looking very unlikely
If it is a problem with SFML, then I can find it in two different locations, Google and documentation. I resort to Google first and after scrolling and scanning through a few different links, I find that SFML had bugs for stereo sound, not mono. My audio file was exported as stereo. After a quick export of the file in Mono, load it in, test, bam it works. While the way I fixed it was valid, another way would have been to load up the documentation on SFML's sound class and simply skim it. It says it blatantly in the document. 

The problem was relatively simple, but a lot of programmers can fall into the trap of not debugging correctly by looking for a problem that they don't understanding.

Doors not being assigned truth table correctly

So I had spent hours into the night coding the assignment of variables for the doors. Essentially the, the doors are assigned variables based on a probability. See the chart below.

Hot Noisy       Safe Door           Percentage of Doors
Y Y              Y                             0.05
Y Y              N                             0.10
Y N              Y                             0.03
Y N              N                             0.21
N Y              Y                             0.06
N Y              N                             0.11
N N              Y                             0.40

N N              N                             0.04

The problem I encountered was that the doors were not randomizing properly. A quick run down of the program: upon initialization, it creates the room and loads the door property text file. For each property, it uses the percentage value to determine what the door should be. I have two std::vectors, one contains ID values for each set of properties (values 1,2,3, etc), and another vector the probabilities (represented by an int value, so I multiple the percentages by 100).

The program sorts the percentages from highest to lowest, and sorts the ID's so it ensure that the ID's match the sorted value (see below), generates a random value, and if the value lands between any of the percentages, it selects a set of properties to give to a door. An example of what was happening was this: if I set the probability of a door to be hot, noisy, and safe, to 100%, it would not land on that, instead it would land on a different set of properties.

Before Sort:

Hot Noisy       Safe Door           Percentage of Doors     ID
Y Y              Y                             0.05                       1
Y Y              N                             0.10                       2
Y N              Y                             0.03                       3
Y N              N                             0.21                       4
N Y              Y                             0.06                       5
N Y              N                             0.11                       6
N N              Y                             0.40                       7
N N              N                             0.04                       8

After Sort:

Hot Noisy       Safe Door           Percentage of Doors     ID
Y N              Y                             3                          3
N N              N                             4                          8
Y Y              Y                             5                          1
N Y              Y                             6                          5
Y Y              N                             10                        2
N Y              N                             11                        6
Y N              N                             21                        4
N N              Y                             40                        7


What not to do: Panic, immediately change code, blame yourself.

Changing code, panicking, stressing, etc, are the last thing you want to do because you're wasting your time. In fact, if you really have your heart set on being a bad debugger, make sure in your critical functions, such as setting door probabilities, you have zero (or meaningless) comments, poor naming conventions, and no plan either. When it comes to a bug like this, if you panic, try to change some code, slap some stuff together, or try to hack it, you'll waste a lot of time and not realize what you're doing.

What to do: Recreate the problem, search for patterns, and hypothesize.

When it comes to a problem like this, where probabilities are not setting correctly, it becomes very tricky because it is most likely you have a logical error somewhere. The best thing to do is make sure, if you don't already, have a plan for how your program should flow and what each block of code should do. Describe and justify your code! Try to break it by thinking of how it could not work under certain cases. This can really help wrap your head around the problem.

Next, play with your text file a bit. Try a few different cases and see if it works perfectly in some cases and not so perfect in others. After that, check the door setting function using break points. In my program, I print out the initially loaded text file, the after effects of the door sorting algorithm, and which random value is generated and where it lands/selects. When trying to solve the problem, this is probably a good place to start.

Actually Solving it

This bug gave me a headache because sometimes logical errors just screw you up big time. The first thing I did was want to understand the problem in greater depth. Here is a general run down of what I did to try and understand said problem:

  • Recreated the bug several times to see if it broke in multiple cases with or without a relationship (meaning were cases where it broke related to other broken cases or did they appear to break arbitrarily).
  • Analyzing the programs output and trying to see if it is landing on the incorrect set function.
  • Going back to my original plan, trying to determine cases where my code would break, and seeing if my case would fit.
  • Analyzing my load function of the door.
What was important about solving this bug was enumerating and eliminating possibilities. Very quickly I was able to determine that my script loader was not to blame. It was determining the random value correctly, so it's not that. It sorted correctly but I was able to determine through my output and analysis of the breakpoints that the final value it was getting to determine which set function it used was incorrect. If the value used for selecting which door to use was incorrect, then the problem most likely lays within the block of code where it resides.

As a hypothesis I state that the value used to select the appropriate ID is randomizing correctly, but finding the incorrect value. Here is the block of code associated with ID selection.


Essentially, it generated a random value, creates a new door, selects the random number, then tries to place it within the sorted ID's. After viewing this function several times, it works as intended. The problem doesn't appear to be here. What about the sort function? If the problem exists with the ID not selecting the right door, what if I'm not sorting correctly?


Again, after thorough analysis, the sort function appears to be sorting things as intended. So where could this problem exist?

Taking a step back

A good way to debug poorly is not stepping back from your code to clear your mind. If you keep pushing yourself harder and harder to understand a problem that you simply aren't understanding at that moment, you'll doom yourself to stress and anxiety. This was one of those bugs that I needed to step back from and go back to the drawing board about. I thought very carefully about the sorting function and began to ask myself questions such as, why am I even sorting in the first place, what does it get me, why am I using and ID value, and so on.

What I came to realize that when it came to sorting was that I had an inherit flaw in the organization of my code. Essentially, I had a block of this code:


This was responsible for setting the properties of the door. This wasn't the problem, but it was a clue as to what my problem was. Essentially, I was sorting the doors and NOT sorting these if statements to be recognized by my new sort. Which begs the question, why am I even sorting in the first place? Was there an easier way to get the random number generation I needed without having to write a convoluted mess of code just to make sure the if statements lined up?

The idea then became a mindset of doubt, which is healthy. I doubted my very sorting function saying, what if you're wrong? How can I prove you to be wrong? Going back to the drawing board, I knew that the total value of all probabilities should equal 100%. So why not instead not sort the values and use them as is? For example, if I have two values, 0.2, 0.5, and 0.3, why sort them? If I multiply by 100 I get 20, 50, and 30, and they all add to 100.
I tested several scenarios on paper and determined that by adding value 0 to value 1 in the vector, and then the summation of value 0 and 1 to value 2 in the vector, and so on, I actually get a correct scale in which to generate numbers that will, hopefully, lead to correct  random number choosing. Time to comment out my sort function, write the new code, and test that hypothesis.

It worked!

Long story short, the program worked as intended and help up on several tests just to make sure. Bugs fixed. Hooray!

In conclusion

I hope you learned something about what not to do when debugging. When coding, remember, don't panic, don't cry, don't pull your hair out. Relax, think logically, and do your best. That, and, read the documentation, make your own documentation, comment, and have good practices for naming conventions. Thank you for reading! 

No comments:

Post a Comment