(piano music)
- Hi, I'm Caroline Taymor
and I'm here to talk to you
about getting unstuck,
using the scientific method
for debugging.
I'm a software engineer
at Pivotal Cloud Foudry
and I want to talk to you
about my favorite tools
for debugging.
So who here has gotten
really stuck while debugging?
I see a majority of hands which is great
because I think it's a
pretty universal experience
and I think if you haven't
yet, you will someday.
I know I definitely have.
So I want to share with
you today my favorite tool
in my toolbox for how
to deal with that moment
when you're really, really stuck.
It's the scientific method, the process of
making your debugging
a science experiment.
So today we're gonna talk
about the what, the how,
the why and then when?
What is the scientific method?
How do you use it for debugging?
Why is it such a valuable tool?
And when do you know to pull
it out of your tool box?
What is the scientific method?
The scientific method is just a fancy term
for the process of doing science.
Which is really cool,
because I think science is really cool.
So it starts with a step
of gathering knowledge,
what is already known about your topic?
If you are researching
stars or frogs, it's what,
what research have other
scientists already done,
what is known about your research area?
And then you start asking questions,
which are really what don't
we yet know about this topic?
What is there still to learn?
What did the previous research
find which is really weird and interesting
and you want to dig in more to?
And then you make a hypothesis.
A hypothesis is just an educated guess,
it's a statement about
what you think might be
the answer to your
question and when you do,
when you're doing scientific research,
you phrase your hypothesis
in terms of a no-hypothesis,
which is to say you phrase it as if like,
the statement that there's
no correlation between
two variables and then
you want to disprove it
to provide evidence that
there is some statistical
correlation between two variables.
And then you design an experiment.
What information do you need
to disprove your hypothesis?
What information can you
collect which will give you
some sense that it's incorrect?
And as you're designing your hypothesis,
your experiment,
it's important to be very,
very detailed because
when you're doing scientific
research, it's not,
it doesn't have any weight, your research,
until it's been replicated
by other scientists.
So you need to be explicit
enough in every step
that someone else can
do the exact same thing.
And then you run your
experiment following your
experimental procedure and
you take really good notes
because if you don't take
notes on what you see,
then you'll have a really
hard time figuring out
what you learned.
And I really think the step
of taking good notes is where,
is part of where the
scientific method really shines
as a tool for debugging.
After you've run your experiment
and you've taken good notes,
then you come to a conclusion.
Did what you observe
about your topic disprove your hypothesis?
Maybe it didn't disprove your hypothesis
which might lend some
support for the idea that
the hypothesis might be true.
Or it might have shown
that your question was
really irrelevant and
uninteresting or that your
hypothesis is just like really off-target
and you don't really know.
And all of those are great results because
then you can come back to
the gathering knowledge step
and you know a little
bit more than you did.
Even if what you know is that's
not an interesting research
question or this hypothesis
is definitely not
true,
those are useful things to know.
And then you also share
your knowledge out.
What good is scientific
research if it stops in your lab
or your living room or your office?
When you're doing scientific research,
this looks like publishing your results
in a peer-reviewed journal and
we'll talk a little bit later
about what this looks like
when you're doing
debugging on your software.
So how do you debug with
the scientific method?
Well one of the first steps is throughout
the whole process it's really important to
write it down,
that's part of what makes
this such a useful tool.
I don't think it matters a
lot how you write it down,
sometimes I use my notebook,
I'm more of a notebook,
stickers on notebook person,
and stickers on laptop, so
you can see my cool stickers.
The whiteboard is a really
perfect tool if you're
collaborating with others,
if you're pair programming,
if you're talking with other team mates.
I like stickies if I'm
working with people who are
a little bit resistant
to using this method
and then I can make my
experimental procedure
on sticky notes at my desk and my pair can
be less enthusiastic about
using the scientific method,
and it works great.
My favorite way of writing
it down right now is actually
just in my text editor, it's integrated,
it's right where I'm doing my debugging,
it's really integrated
with my process flow
and it's really easy to
copy information out of
code or out of logs straight
into my notes and then
it's really easy to copy
relevant pieces of those notes
into a bug tracker or
into Slack or other ways
of sharing out with people
so it's, works really well.
I don't actually think that
you have to write it down,
and I know I said write it down,
and I'm gonna tell you that
again and again, but I
think the crucial step
is that you in some way
take some of the information
that's floating in your head
and move it out of your head
in some concrete way.
So I love writing, it
works really well for me,
but I think you could
get the same results by
talking into a tape recorder
and not like saying,
like being thoughtful
about what you're saying.
Like what would you write down?
You could say that out loud,
I know that not everyone
loves writing as much as I do maybe.
So then you start with that first step,
gathering existing knowledge.
What does this look like
when you're debugging your software?
It looks like doing a brain dump
of everything you know about your bug
and you want to start with
the user-facing impact.
How did you notice this bug?
A customer went to the
admin page and got a 500
and they sent us an email
with not enough information.
And you want to see everything
else you've discovered
in the process of using this tool.
I often forget about this
when I'm starting debugging
and then I only pull it
out when I'm really stuck,
so I've often been working
on the bug for an hour
or a day or a week, so I've
learned some things and
forgotten some of them,
but you want to write down
everything that you
remember that you know.
And you want to include
like weird log lines
you've seen, other strange
behavior that seems
maybe related, just sort
stream of consciousness,
write it all down and as
you start writing it down,
I think it's, comes kind
of naturally that you
start asking questions because
at first you write down
the things are definitely happening.
And then you start writing down the things
that you think are maybe happening
but you're not really sure.
So for instance, maybe my
service is spamming the database
and that's why the whole
thing's falling over.
I'm not sure.
That's a great question to write down.
And you might have two or three questions,
you might have twenty,
you might have an overwhelming
number of questions,
and, but once you start
getting a few questions,
hopefully one of them will
start being interesting
or you can sort of give
up and be like oh my gosh,
I have seven million questions,
I'm just gonna pick one.
And that's sort of the
second half of this,
is you pick a single question
and one, and a couple of other
interesting questions to ask
when you're thinking
about those questions are
how do I know what I think is true?
So I stated my facts before,
the things that I know are true.
How do I know they're true?
Are they true?
Potentially the most valuable
question you can ask is
is the thing I think true actually true,
that's an, a great question
to pick as your one question
to start with.
You can come back to the
others, you don't have to like
be afraid that they're going away forever,
they're still on your paper.
So once you've picked a single question,
then you make a hypothesis
about it, an educated guess.
When I'm using the scientific
method for debugging,
I sometimes play a little
fast and loose with it,
I think of it as a general
framework, not a rigorous
scientific approach so
occasionally I frame it as a
no-hypothesis, but sometimes
I just frame it as a more
general statement about
what I think is happening
in a system, a guess to the
answer to that question.
And I think it's helpful either way.
If you have more than one
idea about what's happening,
that's okay, too, you can
just pick one, flip a coin,
doesn't matter, you
just need some statement
against which you can test.
So then you design an experiment,
and it's important again
to be really detailed.
Here it's less that you're
gonna publish your experiment
in a peer-reviewed journal
for other people to replicate
unless you have a different
type of bug than I do,
but you might want to, you'll
want to refer back to it
for yourself, for team mates,
and so it's important to
write down in detail the
steps that you want to take.
And the reason that this
is a helpful step is
because it's much easier to
say what information do I need
to disprove my hypothesis?
I don't have to fix the bug,
I just have to prove
that the problem is not
that my service is spamming the database.
That's all I need to find out.
And as you're designing your
experiment, you also want
to write down what you expect
to see, what things would you,
what log lines, what behavior
if you restart the service,
what would you expect to see
if your hypothesis is true
or if it's false.
And then you run the experiment.
And you take really good notes.
What do you take notes on?
You want to take notes on
what you expected to see.
Did you see all those
things that you wrote down
in your experimental procedure?
And you definitely want to
write down about all the things
that you didn't expect
to see and especially the
oh my gosh, I had no idea
my software could do that
and I really don't know
why, those are also
really important to write down.
I often find other, unrelated
bugs while I'm following
this process, definitely write those down
in your bug-tracker, not
now, don't get distracted.
Those are for later, but write them down,
you have new bugs, yeah, it's software.
I'm not sure it's great.
One of the things that's really
helpful in this taking notes
process is I love to grab
annotated log snippets.
So you don't need to read
like the details of what
the logs here are doing,
but the interesting thing is
I grabbed a chunk of logs
with all their timestamps
from the logs of the system
that was taking too long,
it was timing out and I didn't know why,
and then I took a note of
hey, the time between the
beginning of this process, the staging,
and the end of the creation
phase took 52 seconds
and I know I have a
three minute time out for
a process of which this
is only a small part
and in a healthy system this
takes three or four seconds.
I don't know what's going
on, but it's interesting.
And this is helpful because
if you've just grabbed
the logs then tomorrow
you're not gonna know why
those logs were interesting.
So if you just write a short
snippet about why the logs
are interesting, it's really valuable.
This bug was like a month
ago and I still know what it
was doing because I wrote two sentences
at the top of the log.
So then you come to a conclusion.
Was your hypothesis disproved?
Do you feel like you know
what's the cause of your bug is?
Maybe you still have
no idea what the cause
of your bug is and that's
actually great because
before you had 6,000
possibilities and now you have
599 and that's actually
a lot less possibilities
that your software could be breaking.
And it helps you move forward.
If you're, if you've figured
out how to solve your bug,
that's great, go on and
fix it and don't forget the
share-out phase that well
talk about in a moment.
But if you haven't figured out
why your software's breaking
your hypothesis was disproved
or it was proved in a way
that you still don't really
know what's going on,
then you circle back to the
gathering knowledge phase,
because you now know more about your bug.
You know that your hypothesis
is untrue and that's not
what's actually going on.
And you've probably because
you've taken detailed notes
on exploring a corner of
the system that's related to
your bug in some way,
you've probably have learned
a whole lot of other things.
So you may have new questions.
You may have new knowledge
that prompts some questions.
You may also have no new
questions, but you have,
you can then refer back to the
questions that you set aside
before and maybe one of them
looks a heck of a lot more
interesting or heck of a lot
more likely as the cause.
And then there's the share
what you learned phase.
So what, what is important to
share when you're debugging?
There's a lot of things that you can learn
when you're debugging,
certainly all those new bugs
that hopefully you put
in your bug-tracker,
you can share those with your team,
your product manager.
You may have seen problems
that others may see later, so
I sometimes will be working
on a bug and I'll discover
that the issue is not
actually a bug in the code,
it's that the system was
mis-configured and another customer
might come along and configure
their system in the same way
that we know won't work.
So it's really helpful
to tell my team mates,
this is what I saw, here's the answer,
you can under, like,
and now next time that
you see the same thing,
you can have an idea in your
head of what the problem is.
You may have things that you need to share
with another team, it
may not be your software
that had the bug in it, it
may be another team's software
and that's a great thing to share.
And my friend Ray Krantz
helped me see I think
one of the most helpful
things that you can share
from this process is the,
is your experimental procedure because
if you just wrote down a
playbook for how you solve
really hard bugs in your
software which can be a great
onboarding tool for a new
person who doesn't have any idea
how to solve bugs in what
is now their software
but they don't know it because
it's only their third day on the job,
so it can be a really great
teaching tool as well.
So,
story time.
I want to share with you a
little bit of how I've used this
on some interesting bugs that I had.
So I was working on a
project and we had a process
where we were taking some
customer input and customer
data and shelling out to node
to run some code on it in
a java script sand box.
I know this sounds like really wild
and probably like a terrible idea
but we were doing it
for pretty good reasons.
And we had a customer who
was reporting software that
they were starting this process
and then it was just
hanging for like five hours.
And we expected that sometimes
it could take a little while,
it could take a minute,
it could take two minutes,
you know, our web app was
designed to handle that,
we had like a spinning little
bar to show that you're doing
asynchronous stuff in the
background and like don't worry,
it'll return and five
hours later, no returning.
So I knew vaguely what section
of the code base it was in,
I mean vaguely like it's in this third,
and so my first question
that I wrote down was
what part of the lifecycle is hanging?
And I thought maybe what's
happening is there's something
funky in the java script
that's running in thee sandbox
and it's not returning, I
don't know, I know more rails,
java script like, what we're
doing here is kind of wild,
maybe this is the problem.
And we didn't have very good
logging, so I couldn't tell
from our logs what part
of the system it was so I
wrote down a procedure I'm going to add,
we were able to
get data from the customer
that was similar enough to
their production data that
we were able to replicate it
on our system, which is
great, on a test system.
And so I added a bunch
of log lines just like
we're at this part, we're at the
about to shell out to nodes step,
we're shelling out to node,
we got back from java script,
just like where in this
lifecycle are we log lines and
I restarted the rail server
and it didn't hang in
the java script code,
it never got there, so this
hypothesis was not true.
And I had learned something new.
So then the question was
maybe the data's too big?
We knew that our test data
was a lot smaller than our
customer's data, we had seen
this before other teams were
testing with larger
data sets than we were,
and we had this vague sense
that maybe our customer's data
was even bigger and so we thought maybe
when the data's too big,
shelling out to node hangs?
And
so we,
we tried it again with
smaller data because,
in the exact same situation
except with a much smaller
data set and that worked.
And so we were starting to
get a sense that this is
really what the problem was.
And then we circled back to
the gathering knowledge stage
and we did more research
and we found out that node
has a buffer limit that
was way smaller than our
customers' data set which was way bigger
than we had ever imagined.
And so we
realized that the problem was that we were
not typing our data properly
and never flushing the stream
and so it was just sitting in
the node buffer and we were
able to fix the bug and by properly using
like the IO library,
no Open3, and not like,
forgetting to flush our data.
So I sort of hope this shows
how you can use the cycle.
Like your first question
might not be the relevant one,
and that's what's useful about it.
So
why is the scientific method
helpful for debugging?
Well the first reason is collaboration
and it's great for getting unstuck.
And once you're unstuck, it
helps you keep moving forward.
Why is the scientific method a
great tool for collaboration?
Well it can help you and your teammates
get on the same page.
You have, when you, if you
have been working on a bug
for awhile and you're frustrated
and you go to a teammate
and ask them for their help,
your notes are a really
valuable tool for helping
them come on to the bug
in an efficient manner.
You can tell them just what
you're working on right now,
and you don't need to tell
them the entire history,
but you have your notes so
you can refer back to it
when
you,
when you need to, like
when they become relevant.
You also can avoid telling
them all the rabbit holes.
If I'm telling a coworker
just stream of conscious
about a bug I'm stuck
on, I often go through
the process that I've
gone through to get here
over the last two days and
that has a lot of dead ends
that aren't useful information
for my coworker and so
you can, you can refer to
your notes, realize they're
dead ends before you tell
your coworker this information
and just skip the dead ends.
It's also really helpful
because,
oh, I think I switched those slides, well,
the good notes do make it
really easier for your teammates
to help you because of
that getting onboard but
getting your teammates on the same page
is also really helpful
if you're disagreeing about
what the cause of the bug is.
I do a lot of pair programming,
I spend most of my day
pair programming and sometimes
when my pair and I are
working on a bug, we have
really different understandings
of what the bug is and we
might have really different
questions, we might have
really different hypotheses
and so by picking this
procedure where we have to agree
on one question and one
hypothesis, it actually gives us
a really, it forces us to
communicate clearly about what
we think is under, happening
and it gives us a really
generous way to give way to
each other without feeling
embarrassed or feeling like
we're not being respected
because we can say, okay,
we came up with 20 questions,
we have to pick just one,
so let's investigate your question.
If it turns out to be
correct, that's great,
we'll fix the bug and if it's
not then we can come back
to my question and it gives
you a really generous way to
communicate when you're
sort of really disagreeing.
We talked a little bit about
whey good notes make it
so much easier for your
teammates to help you.
So it also helps you
by getting unstuck and
the reason it does this is
that it narrows your focus
and so you get unstuck.
Little rocket ship, I love that.
Writing down what you know helps
you organize your thoughts.
When you're feeling really
stuck, your thoughts are often
just going in circles and
they're getting a little
overwhelming and you
just kind of don't know
what's going on and by
writing down your thoughts
and then being able to look
at them afterwards, that,
that sort of externalization
process is really helpful
for organizing your thoughts
because you look at all the
things that you've stream
of consciousness dumped
and then that gives you a
little bit of a emotional space
to step back from your feeling
terrified that you can't
solve this bug
and it lets you notice which things
you still think are important
after you've put them down
on paper.
And it turns out you might
think some of them are not
important and you might
be like, oh that one,
that I wasn't even really
thinking of is really,
I think that's where my question is,
I think that's the problem.
One other way it really
helps you get unstuck
is the question of how do I disprove
this hypothesis is actionable?
How do I fix this bug when
it's a really overwhelming bug
is not actionable, there's
no clear action you can take.
But very often, when you have
narrowed your focus down to,
if you take our example
from before, I think that my
service is spamming the
database, then you can say okay,
I can look at how many
connections from my service to the
database there are and it's
much clearer what information
you need to select if you've
narrowed your focus down
so much, it's just much
easier to come out with
what information do I need
to do to disprove this
tiny question than to fix the whole bug,
and so it helps you move forward.
Indeed, moving forward.
It helps you once you've gotten unstuck,
then it also helps you keep going forward,
once you've gotten past that frozen place
you can keep going.
One of the most valuable ways
the scientific method is really
helpful is that it prevents you from
repeating yourself in two ways.
The first is when your
teammate comes in and you've
asked them for help and they're like hey,
did you try restarting
it and you're like yeah,
I spent all afternoon restarting
it and that didn't work
and like do you think I'm an idiot?
And you know that your
teammate's trying to help because
you've asked the same
question to someone else
but you're really pretty
frustrated about it.
I hear some laughs so I
think other people have
had that experience.
And so the scientific method,
you have your notes and
you have your experiment
and so you have your great
notes from when you spent
all afternoon trying
restarting it yesterday and
your coworker gets to see
the results, what happened,
because that's what they
really want to know.
They don't think you're an idiot
for not trying restarting it,
they just wanna know
themselves what happened
when you restarted it and
so you have your notes
and they can read your
notes and then you can skip
that whole process and
jump to where you are now
which is eight hours later from
the time when you were
trying restarting it
with new knowledge.
It's also great because it prevents you
from repeating yourself.
Who here has been working
on a bug for awhile
and then all of a sudden
you realize you're doing
the same thing you did three
days ago to try and debug it?
Again, I see a number of hands,
I've definitely done this
and so having that record of
what questions you've asked,
what experiments you've
run as you're debugging,
helps you
not go back and do the
same things pointlessly
because you're lost in
this fog of oh my gosh,
I've been working on this bug forever
and I don't really know and
it all just blurs together
and is it three days or is
it ten years, I don't know.
So it helps prevent that
repeating in the fog.
It's also great because each
iteration moves you forward
in a way that's really
observable and concrete.
Every time you've answered your
question and you've followed
your experiment and you come
back to gathering knowledge,
it's much clearer than if
you aren't taking notes on it
and you aren't following
sort of a framework,
it's much clearer to see
that you are making progress,
even if you now instead
of 6000 potential reasons
for your bug only have 599,
five thousand nine
hundred and ninety nine,
and that's still way too
many, but it's really clear
that it's one less and so it
helps to not get frustrated
and it helps you move forward.
So when should you use the
scientific method for debugging?
Another story.
So this method really started
to crystallize in my mind
when I was working on a project
where we were shipping a
Ruby on Rails ad as a VM image
for five different
infrastructures as a service
and each one had to be
packaged differently.
And
we were using as our VM image an Ubuntu,
like a standard Ubuntu
image from Canonical
and then we built, we
were using Packer and Chef
and a bunch of other
tooling to put all our stuff
on the VM image.
And we decided that we
wanted, instead of using this
standard Ubuntu image from
Canonical, we wanted to use
a VM image that was
based on that same image,
but that another team was
building and they had done
all kinds of security
hardening and so we could get
all their work for free, right?
So the general, the best
general procedure we could
think of for this
was go in Packer,
which is a tool for building
VM images and swap out the URL
from the Ubuntu one to the
one our team was building
and try and build it
and try and boot it
and hope it works.
And let me tell you that
it doesn't work that way
and we spent about a month
doing this and sitting there
and being like, okay, the VM didn't boot.
We can't even SSH in,
what's going on?
And so it was really a month
of the whole team being
incredibly stuck and
incredibly frustrated and
we learned a lot and we
got it done eventually but
I started pulling together
these tools of like,
take notes when you're
frustrated and you know,
throw out a variable or focus
on one variable and like,
all these tools that I had
that worked individually and
started pooling them together
and to thinking of it
like the scientific method
because we just kept
getting stuck and I had to
develop ways for us to
move forward because we
needed to move to this
new image for our VM.
So I think there's two
types of getting stuck.
And I actually think the
scientific method is really helpful
for both of them.
This idea of two types of
stuckness and that when you're
debugging or just building
software or really just
living life you often are
alternating between them
comes from my friend Jesse Alford.
So the first type of stuck
is when you have too much
information and you can
recognize that you have
too much information and
that's why you're stuck because
you feel overwhelmed or
bewildered or you're saying
to yourself it could be any
one of any thousand things.
Or you're noticing impossible
things that definitely
can not happen in your
software, you're really sure
it can't do that but it
seems to be doing it anyways
and are there leprechauns
in your software?
I don't know.
Maybe it's poltergeists.
The other type of stuck
is when you have too little information.
I can't think of anything,
I have no more ideas.
You feel frustrated, you feel stalled,
and both, the scientific
method has aspects which will
help both so when you have
too little information,
writing down everything you
know is a way of realizing
you actually have some information.
And so it's very helpful.
And when you have too little information,
when you pick a single
question and you pick a
hypothesis, you're narrowing
your focus and so even
though it could be one
of a thousand things,
now you have just one
little thing to help.
So it's actually really
helpful in both phases
of being stuck, and
you're usually alternating
between the two as you go
through a debugging process.
And scientific method sort of
goes with you the whole time
and can be your friend for both.
So we've covered the what,
the process of doing science,
the how with lots of writing.
Why the scientific method is
so helpful for collaborating,
getting unstuck and moving
forward and when to use it
when you're feeling
frustrated or overwhelmed on
really complex bugs.
I hope the next time you're stuck,
you try gathering knowledge,
asking questions,
making a hypothesis,
designing an experiment and running it,
taking really good notes,
coming to a conclusion,
circling back to gather knowledge again,
and sharing out with your team,
and definitely make sure
to write it all down.
I have a reference guide for this
that you can refer back
to while you're debugging
on my website so that you don't have to,
I'm not a huge slides as
sharing, I think that they lose,
they're very low fidelity so
this can be really helpful for
referring back to.
And I would love to hear if
you find the scientific method
helpful or have any improvements
on it for your debugging
on Twitter
and
thank you.
I think we have a few
minutes for questions
if folks would like.
(applause)
Yeah, so the question is how
do you know when to stop?
I talked about spending a
month on this bug and like
how do you know when
it's something external?
I think that's a really hard question
and I think that it has
a lot of judgment call
involved in it.
I would definitely not
recommend spending a month
on a bug without going
and talking to other teams
and other people.
We were doing a lot of going
and talking to other teams
and other people who knew
more about this as well
during this process.
Probably if you're spending
more than a few hours
stuck before at least
having a conversation with a
coworker or you know, a rubber
duck or something is like,
that's probably too long.
I don't have any clear
guidelines I think that it's,
it's definitely a judgment
call based on your software
and based on your team
and your product team and
how important is this bug
and maybe we thought it was
an important bug and we
thought we could fix it
in two hours but it's not
that serious if it's gonna
take us a week,
it's really a judgment call, yeah.
Yeah, so the question was
do I estimate in advance
about how long it will take me
to solve the bug and because often
as engineers we underestimate
how long it will take us
to fix, we think we can fix
it in an hour and that's
totally unrealistic.
I don't usually
try and make concrete estimates with bugs,
I generally try and develop
sort of a working framework
with my team, with my product manager,
with my team lead,
if I'm not the team leader,
if I am the team lead like
with engineering leadership,
it's sort of a general
framework of like at what point
should we start checking
in pretty regularly so
on my current team, that's
usually like, two days.
Like if a bug is taking us
more than two days to fix,
then I try and be having
a daily conversation about
how is it going, are there other people
or teams that need help,
or that could help, you know,
is it still worth keeping
going on this bug,
but I think that's really
something that develops
with each team and so it's a
conversation to sort of have
with your co-workers about what is our,
our team's cultural
understanding of like when
should we start having regular
conversations about how
long the bug is taking.
Yeah, other questions?
Okay, well I know nobody will
be sad about going to lunch
a couple of minutes early.
Thank you so much.
(applause)
