Good morning and welcome to 
DevOpsDays Chicago 2020
!
...
Sasha: Hi, everyone.
I am wishing so much that you 
all could be here with us in 
person. But welcome to the first
ever virtual DevOpsDays Chicago.
I'm your host, Sasha Rosenbaum.
Matt: I'm your host Matt 
Stratton.
Margaret: And I'm your host 
Margaret Valtierra.
Matt: DevOpsDays Chicago is a 
chance for our local
DevOps community to come 
together once a year to learn 
from each other.
And while this year we have a 
virtual presence that doesn't 
require you to
actually be in Chicago, our 
focus is still on the Chicago 
DevOps community.
But we look at it this  way:  
This year we get to invite
our DevOps friends from all over
the world to join us.
Our hope today is that with a 
combination
of great talks, live chats, and 
interactive
breakout sessions, participants 
and viewers will be able to 
expand
their knowledge and consider 
different areas of the DevOps 
spectrum.
Covering topics such as remote 
culture, chaos engineering
, inclusive leadership and 
resilience engineering.
Sasha: We are the people you see
on the screen today.
But this conference is brought 
to you by 14 awesome organizers 
this year.
Let us make some instructions in
no particular
order.
Matt: Jerry Cattell. Brian 
Devlin.
Kevin Reedy, Chris Reed.
Margaret: Albert Cheung, 
Christian Herro, Hari Vedam
, Trevor Hess.
Sasha: Josh Goldman, Brian 
Andersen, and, of course, the no
Tor Torous DevOps Yak.
Matt: Inclusivity and respect 
are core values of DevOps and 
DevOpsDays.
Accessibility is for important 
to our events and we want to 
ensure that all community 
members have experience our 
virtual event.
All of the talks and fireside 
chats are gonna be live 
captioned today.
These captions can be found in 
YouTube in the live stream.
Plus you could view them by 
visiting DevOpsDayschi.
org/captioning.
Margaret: Conversation and 
collaboration aren't limited to 
in-person events. Social media 
is extra valuable this year. We 
want participants to join in on 
Twitter.
Remember to use the
hashtag DevOpsDays, and tag 
DevOpsDayschi.
For fun, also check out the 
DevOpsDays
DevOps Yak.
Sasha: Let's talk the history 
just a tiny little bit.
The first ever DevOpsDays was in
Ghent in 2009
and there are dozens of events 
happening globally today
. DevOps is an integral part of 
the entire DevOps movement.
We would not have DevOps as we 
know it without the first event 
in Ghent.
And the global impact of these 
events which are first
and foremost about the community
of
practitioners, not tools
and vendors
, has greatly influenced the 
spirit of the movement.
Last year, we had an organizers 
event
in Ghent to celebrate a decade 
of DevOps.
There were more organizers than 
there were participants at the 
first ever DevOpsDays events.
This year we have a talk from 
Rafael Gomes, one of the
core DevOpsDays organizers and 
the main organizers
for DevOpsDays Brazil.
This will have a glimpse into 
growing a technical community,
overcoming major obstacles and 
changing people's lives for the 
better.
Matt: The very first DevOpsDays 
Chicago took place in 2014.
We had about 250 people 
attending. Took place in the 
Sears tower, yes, the Sears 
tower.
And it was the first time that 
this group of people worked 
together on an event, or pretty 
much almost anything.
What started as a bunch of 
strangers in a conference room 
in the office of
a small cloud consulting company
turned into a key annual event 
for the Chicago DevOps 
community.
And over the  years, DevOpsDays 
Chicago has evolved and changed
. We have been at various venues
all around the city. We added in
a live mascot. We started having
workshops.
And we're always improving, but 
still focusing on that 
participant experience.
Over the years, we have heard 
from countless
people whose careers have been 
improved from this event.
Now, I have a lot of great 
memories from the very first 
DevOps Chicago, but this is one 
that always makes me smile.
One of my former coworkers came 
up to me at the event and said
, you know, this DevOps stuff 
seems great, but clearly this is
really for large enterprises.
And shortly thereafter, another 
friend of mine and
I were talking, this is a great 
event, Matt, but all this DevOps
stuff, this is for small 
startups, right?
And as we know, they're both 
right. It's if all, DevOps is 
for everyone. We are now in the 
seventh iteration of DevOpsDays 
Chicago.
And with over 1200 registrants 
this year, to say
that we've expanded since that 
first year might be a little bit
of an understatement.
Since the very beginning, we 
have always considered 
DevOpsDays Chicago to be a 
participant conference.
If we were all together in 
person, this is the time I would
tell you to
look at your badge and see that 
it says "Participant." Y'all got
your badges, right?
We sent them to all YouTube 
users in the world? Maybe you 
didn't get yours.
Take a look at your pretend 
badge and see that it says 
participant.
When we were making the decision
to make DevOpsDays Chicago this 
year be a
virtual  event, the most 
important was to embody the 
spirit of the community
all participating together.
Margaret: Typically DevOpsDays 
Chicago is an  in-person event.
Since we made it virtual for 
2020, we decided to limit it to 
just one day.
Just like DevOps, it's not about
the tools, DevOpsDays is not 
just about the content.
The most important part of the 
conference is that it brings our
technical community together.
DevOpsDays Chicago 20
2020 is much more than a set of 
talks about DevOps
. We have  fireside chats with 
the speakers and virtual  
breakoutses in Discord.
After each talk is a  fireside 
chat with the speaker, streamed 
live
. Also after each talk at the 
same time have
a virtual breakout in Discord.
Today's program it online at our
website, DevOpsDays
.org/chicago.
Sasha: We have amazing speakers 
for you today.
And just to make things a little
more fun, we
made personal stickers for our 
speakers this year which is what
you see at the avatars here.
We will hear talks on a wide 
range of topics as we aim to 
bring together wider technical 
audience.
None of these talks will be a 
deep dive into configuring 
Kubernetes.
However, we will talk about the 
stories of
human experience, learning, and 
growth in IT.
We are tremendously grateful to 
our speakers
for investing the time and 
effort it takes to create talks
. And we're looking forward to
diving even  deeper into the 
toics during the fireside chats.
Please refer to the program on 
our website at DevOpsDays 
DevOpsDays.
org/chicago/program to see the 
schedule details at any time.
And big thanks to Kelly Mahoney 
for the great artwork.
Find her on Twitter @sosplush.
≫ 
Margaret: This year we have 
Ashton Rodenhiser from Mind's 
Eye Creative again to capture a 
individual representation of the
talks.
Drawing the graphic recordings 
for each talk and sharing
them Discord throughout the day.
Find her on Twitter also at 
@mindseyeccf.
Matt: As mentioned before, one 
of the key  things that makes
DevOpsDays Chicago special is 
we're focused on the community 
interacting.
In our traditional event, it's 
Open Spaces.
For this virtual event, we are 
using the Discord platform.
You can chat with other 
participants all day as well as 
discuss a talk with a speaker.
Plus we have specific breakout 
sessions happening all
day long based upon topics that 
you, the participants, suggest.
Use the breakout channel in 
Discord to suggest a breakout 
topic or vote for your favorite.
These breakouts will be 
available both as text-only
chats as well as audio-visual 
breakouts as well as represented
by Margaret here.
And each of these video breakout
channels will have a volunteer 
moderator in it.
We have a great group of mod
moderators joining us today with
decades of experience in the 
events and discussions.
The moderators are there to make
sure everything goes smoothly. 
Thanks, mods!
If you haven't yet gotten 
access, visit event
event.devOpsDayschi.
org to register for instant 
access to our
Discord system.
Sasha: Code of conduct matters 
and this doesn't change for a 
virtual event.
We will expel any COC violaters 
from the conference.
To quote effective DevOps, 
people don't always understand 
the impact of their words and 
actions.
Individuals need to understand 
the
director link between their 
behaviors and the impact to 
others.
And please remember, impact is 
more important than intentions.
Please know that I personally 
and many other folks 
participating
if had DevOpsDays
events feel save in this 
community because we know that 
the
COC enforces a harassment-free 
environment.
Please take a moment to read and
acknowledge the code of conduct 
if you haven't already.
If you have to report a code of
conduct violation, you can send 
an email to help 
help@devOpsDayschi.org.
Or most a mention@organizers in 
the event-help channel in 
Discord.
You can request an organizer to 
send a direct private message to
you to get more information.
Please do not post the details 
of your report at
the event help channel, but ask 
an organizer to send you
a private message using @
organizers as a tag for your are
post.
Margaret: This year, our 
sponsors took a gamble on us. 
We've never done a virtual event
before.
But our sponsors trusted we 
could pull this off
. We got to try exciting things 
thanks to the backing of 
sponsors. Thank you, sponsors.
Hopefully you have been in 
Discord and seen the list of 
channels.  Grouped by topics and
types.
Sponsor channels are company 
name-sponsor. Please say hi.
And check them out because 
there's great giveaways going on
throughout the day today.
Matt: If you need help from 
organizers at any time, there's 
a few ways to get it.
You can always email the 
organizers at 
help@devopsdayschi.org.
And additionally, the event help
channel in Discord is a great 
place to get your questions 
answered.
So, if you haven't joined us
in Discord yet, remember, you 
can do so at any time today
. Just register, for  free, at 
events.devOpsDaysch i.org for 
instant access.
Sasha: At the end of the day 
today, all registered attendees 
will get
an email with a link to take a 
participant survey.
Please share your thoughts with 
us, it will only take 57 5 
minutes.
We will improve the event for 
next year and
this is the first time doing it 
virtual, please fill out the 
survey.
Sasha: And now it is my absolute
honor to introduce our first 
speaker, Lani
Phillips, a Vice President at 
Microsoft, giving a talk about 
authentic conversations.
Lani is an amazing speaker and a
remarkable leader.
I have personally found Lani to 
be an inspiring
role model back when I  worked 
for Microsoft Chicago.
A role model that each one of us
needs in our our careers. Lani, 
please take it away.
Lani:  Welcome to DevOpsDays 
Chicago!
You have an amazing agenda 
planned on all things
DevOps with sessions like 
computational thinking and 
cutting cloud costs in a 
COVID-19 world.
Which we all know is top of mind
for everyone right now.
I am so very humbled that so 
many
of you see the importance in 
starting
the day engaging in a topic that
permeates
all sectors in all industryies.
For a community who is so deeply
focused on your work
developing code and building the
next big thing.
We have to make space for people
to be seen and heard as well.
That's why today I invite you to
learn about authentic 
conversations.
My name is Lani Phillips and I 
am the Vice President
of US Channel Sales at 
Microsoft.
I want you to think about the
last time you found yourself in 
a situation
where you could share your 
personal perspectives or 
experience. But wasn't sure if 
you could be vulnerable.
If the other person or persons 
on the receiving
end could identify or empathize 
with your experience.
How can we make others feel more
comfortable and engaging
in authentic conversations with 
us?
A couple of months ago, like 
many others, I hit the wall.
I was deeply saddened by all the
events in succession that 
happened.
COVID-19, Ahmaud Arbery, Breonna
Taylor, Christian Cooper, and 
George Floyd to name a few. I 
decided enough was enough.
And I had a responsibility to do
something right where I was 
planted.
It all started with a strong 
desire to
create space for my organization
to
share how they were feeling 
without judgment.
Allowing them to just take off 
the mask and just feel. Knowing 
they were not alone.
The audience quickly grew.
And what followed surprised us 
all.
What emerged was this level of
authenticity that allowed people
to have a glimpse
into their world through 
personal stories.
We know that empathy is 
developed through proximity
. This forum was just one small 
way
to allow our organization to sit
at
the virtual table to share 
authentically, listen, and 
learn. I opened the discussion 
with my intention.
To create the space for an awe
an authentic conversation and 
how people were really feeling.
And how this is so important is 
because according to the
Harvard Business Review, most 
leaders
respond with no response and 
hold the conviction that race, 
like politics, is best  
discussed elsewhere.
And although 78% of Black 
professionals
say they have experienced
discrimination or fear a loved 
one will
. 38% feel it would never be 
acceptable
at their companies to speak out 
about their experience of bias.
It was important to me that I
recognized others who have 
experienced injustice.
Our Hispanic, Latino
communities, LGB it
(
GTQ + communities, native 
Americans, people with any type 
of difference.
I mustered up the courage to 
have this dialogue, knowing it 
had never been done before.
I decided to unapologetically
be me and try to bridge the 
divide that continues to widen
because everyone is so afraid to
say what needs to be said.
Living in fear of the 
unspeakable happening.
It was time for us to just have 
a real conversation about
race, difference, and the 
experience it creates.
To demonstrate our desire to 
have an authentic
conversation, I read this 
article written by Danielle
Cadet followed by a personal
story to demonstrate my 
willingness to be  vulnerable.
When we opened it up, the 
silence was deafening. But I 
refused to speak.
Because I knew if I held on long
enough, someone would speak up. 
Then it happened. The first 
person spoke. And a beautiful 
conversation emergeed.
Out of respect for everyone in 
those discussions, I will not 
dive into the details.
But I will say, that
many were able to have an 
authentic
conversation that opened their 
eyes to the
hardcore reality that for some 
changed their lives forever.
If we want to drive real change,
we must have that uncomfortable 
conversation.
There's a quote that says, "The 
key
to inclusion is understanding
who your employees really are.
" Through these discussions you 
develop shared understanding and
began to move forward.
Now, let's talk about how do you
have authentic conversations? 
Number one.
Just show up and listen with 
empathy
. The first step is just be 
willing to show up without 
distractions and be fully 
present. Just listen with 
empathy. You don't have to say 
anything. Just demonstrate you 
care.
Renee Brown says
that
empathy communicates  an 
incredible healing message of, 
you're not alone.
Number two, be vulnerable.
Be willing to admit you are 
uncomfortable.
Recognize you do not have the 
same lived experience.
And acknowledge this might be 
new for you.
Emphasize you want to support 
them and learn
. Number three, active listening
. Listen to understand, not to 
respond. Sit in the 
uncomfortable silence.
Just deeply listen and hear what
they are trying to say to you.
It may not be very succinct, it
actually might feel very clumsy,
but give them time to peel back 
those layers and share.
Your job is just to listen and 
seek clarity when needed.
Number four, validate their 
feelings. Do not judge their 
feelings or feel the need to 
comment.
They have a right to feel those 
feelings.
Just thank them for sharing and 
acknowledge it took courage for 
them to speak. You're not there 
to solve all the problems. Just 
to listen.
Continue to listen until they 
are done.
Number five, ask where you can 
help. Allow them to participate 
in the solution.
You don't have to come up with 
it alone. And number six, follow
through.
When they tell you what they 
need, put some action behind 
your words.
If they ask for help, then just 
follow through.
In closing, the key
learnings were self-awareness, a
desire
to learn more and be a better 
ally
, subsequent conference 
conversations
that peel back more, and the 
sheer curiosity
among many to get to know their 
peers at a much deeper level.
Be willing to get uncomfortable 
and drive authentic 
conversations.
Remember, proximity breeds 
empathy.
Let us challenge ourselves to 
have deeper, more  meaningful 
conversations.
Only then will you see the 
gateway to driving real change. 
We all have to be the 
difference.
Now, how could you create the 
space
to engage in an authentic 
conversation? Thank you.
And I hope you enjoy your 
conference.
Sasha: Hi, welcome back.
So, you remember that community 
interaction is a key part of 
this event.
So, before we go into our first 
ever fireside chat with
Lani today, we're gonna play a
short video describing how you 
can join Discord and get the 
most out of this experience.
So, you can -- remember that you
can
join Discord at any time at 
event  @devOpsDayschi.
And here's a short video 
explaining how it works.
≫ So, the first thing that you 
need to do to
take part in the DevOpsDays 
Chicago
virtual participant experience, 
is you're gonna need an account 
on Discord.
If you already have a Discord 
account, you can skip this step.
So, just hang on and bear with 
us and we'll get to the
part about how you join our 
Discord server in just a minute
. The first thing that you need 
to do when you register
-- and you can do this at 
Discord.
com/register -- is fill out, you
know, all the usual fun stuff.
So, we go ahead and do this.
And one of the things that's 
gonna happen, is it takes you in
and logs you in. We can go ahead
and close this for a second.
You're gonna need to verify your
email address or you won't be 
able to have any access to our 
servers.
Once you have your email 
verified, you need
to join our Discord server
. The actual invite code that 
you're gonna
need you will have received in 
your email or
if you log into Eventbrite and
check out the online event 
details, you'll find it there.
The code that you're going to 
see me using here is not our 
real code.
You need to register in order to
get the code. It's cool.
And we're gonna say we want to 
join a server
. Put in our invite  code, and 
it brings us in.
Now, what we need before we can 
see everything
else in the DevOpsDays Chicago 
Discord server, is we have to 
accept the code of conduct.
So, we can see here the quick 
version of our code of conduct, 
there is a link that will take 
us to the long one.
And in order to agree to the 
code of conduct, you just simply
have to click on this thumbs up 
button
here which will grant you access
to the DevOpsDays Discord 
servers. We'll go ahead and 
click that.
And now you can see, we suddenly
have access to all of our 
wonderful channels.
A couple important things to 
know.
So, any text channel has the 
little hashtag in front of it.
So, you can always refer back to
or code of conduct in the code 
of conduct channel.
If you need direct help from an 
organizer,
especially for something related
to code of conduct, you
simply go to the event help 
channel and you can post a 
message in
here and you can tag organizers 
and say, hey, I need some help!
And what will happen is that 
will notify the organizers.
And one of them will reach to 
you via a private message to 
help you out some more.
But throughout the day, we're 
gonna have all these great 
breakout sessions.
If you want to propose one, you 
do that in the breakout-proposal
channel.
So, if you go in there, you can 
see an example, someone said
, I want to talk about using  
AWS for my personal blog. I 
might have a different idea.
Now, if you see a topic you 
would like to talk about
, you can always just react to 
it with a little thumbs up kind 
of thing. This shows there are 
people interested in it.
During the breakout channels, 
you will see all the breakouted 
current
ly live for any existing 
breakout.
For example, the DevOps-for- 
DevOps-for-yaks-breakout, if I 
want to talk to people in there.
But what's interesting is you 
will see some breakout channels 
with this icon.
These are voice and video 
channels where they're a
little bit more interactive
. They are limited to 25 people 
at a time and will have a 
moderator in them. But when I 
want to join them, I click on 
that. And I might have to give 
it permission to my
microphone.
When I are actually click on 
that channel and I take a look
, we'll take a look and see if I
can connect my video.
And show up and it looks all 
fancy. Hello.
And you'll see other people in 
here and this is where we can go
ahead and have our conversation 
during that breakout.
When we're done, it will just 
disconnect and we can take it 
away.
Similarly, we have channels for 
the sponsors.
And our golds sponsors have 
channels where they might have 
demos and other interactive 
things.
We recommend you visit our 
sponsors, we can't have the 
event without them.
And then also, you will see that
for every talk throughout the 
day, there's a dedicated text  
channel.
If you have any questions for 
the speaker, please post them in
the channel.
You will see the organizers show
up the in organizer list and 
they will be in blue.
These are people who can help 
you in all kinds of different 
ways
. If you have any kind of 
problems, you need to report an 
issue with code of conduct.
We still recommend that you tag 
@organizers because a certain 
organizer night not be available
right then.
And likewise, our great 
moderators all show up in 
purple.
And these moderators will be 
joining in the video breakout 
sessions
just to help keep the 
conversation going along and try
to
make sure that we don't have any
problems or challenges. They're 
definitely here to help.
DevOpsDays Chicago is a 
participant conference.
And we want everybody, whether 
they're a moderator, a speaker
, a sponsor, or just connecting 
and wanting to talk to each
other, we're all going to be 
interacting together throughout 
the day.
Sasha: Hi, and welcome to our 
first ever  fireside chat at 
virtual DevOpsDays Chicago 2020.
It's such a pleasure to have 
with us our first guest for 
today, Lani Phillips. Hi, Lani.
Lani:  MyLani:  Hi
, thank you, guys for having me.
Sasha: It was to important to 
have your voice heard at this 
conference today.
I'm so glad you could join us. 
I'm going to go straight into 
questions that came in.
2020 has been a challenging year
in so many ways for so many 
people and
brought so many societal 
tensions up to the surface.
And sometimes I feel like 
there's traumatic events 
unfolding every single day.
So, what can we
leaders do to take care of their
teams in this continuously 
challenging situation?
Lani:  An important question to 
ask considering the times that 
belive in.
One of the things that I heard 
recently
that I really could identify 
with, if you think about the 
series of
things that have happened in 
succession in 2020, people are 
saying it's like post-traumatic 
stress.
And so, I had a moment where I 
just paused to think on that. 
And I said, you know? I guess 
for a lot of people it could be.
What I would tell leaders to do 
today, a couple
of things, one, before you jump
into any conversations, whether 
a one-on-one
or a group discussion, build in 
that time up front to do a 
check-in.
Check in with people, understand
how they're doing and allow them
to just vent a little bit.
And I think it will be a really 
healthy discussion for you to 
create that space.
Some of the other things that 
I've
seen people do is abandon 
agendas for meetings to
just create that space for 
people to talk,
share best practices, and to 
just create a safe space for 
those conversations.
The other thing that I would 
encourage you to do is
identify where the benefits are 
for your employees
. Because a lot of people right 
now are dealing with a lot of 
stress
. Just remind them that they 
have benefits that they can take
advantage of.
Whether that be if they need 
time off or if they want to
take any special cares program 
that you have in place.
To just remind them of what's 
available to them and encourage 
them to use it if they ever need
it.
And the third thing I would just
recommend that leaders do is 
meet people where they're at.
And jQuery understand what 
they're going through and 
recognize that everyone has a 
different experience.
So, I think those are probably 
the three tips I would have.
Matt: I think one thing that 
jumped out to me, you know,
when you talk about checking in,
and something I see that's 
really important
is giving people the opportunity
to be checked in with the way 
that's right for them.
So I know, as someone who has 
had challenges with things.
Sometimes you're not -- if I'm 
going to she can in with Sasha, 
I want to check in. You don't 
have to reply. You don't have to
-- I'm here whatever you need.
Maybe the answer is just I'm 
cool, leave me alone, and I 
think
giving people permission to not 
have to take that on. But making
it open. And like you said, meet
them where they're at. You think
that speaks to that. That's 
great.
Lani:  Absolutely.
I do think you have to remember,
Matt, that everyone is different
. Some people are not 
comfortable talking about their 
emotions. And make that okay.
Just let them know that you're 
there and that you care.
Sasha: This is interesting and 
brings me into the next 
question.
In your talk, you mentioned that
we should listen with empathy 
and make sure that we're not 
judging people's feelings. 
Feelings just are, they can't be
wrong.
But sometimes you're faceed with
something
that's very off-key or some 
remark that's very nonof
non-inclusive toward some 
members of the community.
As a leader, how can you take 
care of that and
handle that and make sure you 
can handle that, especially in a
group
setting when there's more people
potentially being impacted?
Lani:  So, are the
comments being made in front of 
the group or to me in private
?
Sasha: Let's address both maybe.
Lani:  Right?
If I were in a situation where 
it was one-on-one, and someone 
was
in the middle of emoting and it 
was off-color,
it just didn't look -- it didn't
sound inclusive and maybe even
disrespectful in some senses, I 
really mean what I say. In that 
moment, it's my job to just 
listen.
Let them vent and let them get 
it out.
And then once they've had an 
opportunity to do that,
I would then ask for permission 
to say, I want to tell you what 
I heard. And was that your 
intention.
And sometimes people are just in
the middle of feeling and their 
feelings are clumsy. They're not
comfortable talking about what's
on their mind.
And they had
no intention on communicating 
that, that's just how it came 
out.
So, I usually like to check in, 
to validate,
tell them what I heard, right if
? And is that what you wanted me
to take from that.
If yes, what would you like to 
see me do with that
and I immediately go to a space 
of just asking them, what would 
you like to see
me do.
If there's any form of 
disrespect, I would find a way 
to
address that because I want to 
make sure that I created an 
inclusive environment.
So, that's how I would handle it
one one-on-one.
If I were in a group and someone
said something that was really 
off
-color, I would probably 
immediately
assert myself to the front of 
that conversation before the 
rest of the room reacts.
And I would probably look for a 
moment and say, hey, you know 
what? Obviously there's a lot on
your mind.
And I want to make sure that we 
can allocate
the appropriate amount of time 
for you to be able to share your
thoughts and feelings and we can
work through that.
Let us find a time for us to 
have this conversation and I'm 
committed to do that by the end 
of this day.
And then I would quickly bring 
it to a close.
And then I would say, but, 
please, hold me
accountable because we will have
this conversation a little bit 
later and then I would 
transition the conversation.
Because you want to make sure 
that no one felt disrespected in
the room.
Matt: Our AV team is letting us 
know, this is great news, that 
there's some wonderful 
discussion happening in the 
channel about your talk.
So, our participants are talking
about this and having a great 
conversation.
-Lani:  Good!
Matt: Exactly what we were 
hoping for.
Sasha: Another question that I 
can personally relate to.
As a white person, I don't 
always hear about the events 
affecting the Black community.
And that can be a problem 
because if I come into work all
shiny and bright-eyed and I talk
to you and, you know, I say, oh,
how are you doing today?
And I totally don't give you 
space to kind
of be where you are and feel 
what you feel, I am
not everyone aware of even aware
of what's happening.
So, the question is, I guess, 
how can we
see more -- more events that 
affect the Black community 
specifically?
And make sure that we're aware 
of what's happening in the 
world?
Lani:  Well, the first thing I 
want to make sure that you 
understand is that you're not 
alone.
You have members of the Black 
community who don't know 
everything that is going on.
And part of the reason for that 
is some of us have
had to stop  watching the news 
because there was
just so much negativity and it 
was just killing our spirits.
And we were just all just having
to carry the weight of that 
throughout the day.
And so, I want you to know that 
you're not alone
. That's not just
our Caucasian colleagues, all of
us are saying enough is enough. 
We may not all be up to speed on
everything.
The best thing to do, I always 
tell people go out on the 
Internet or
watch a news channel to try to 
get a summary of
all the things that have 
happened over the last 24 hours 
if you want to stay informed.
But I don't even want to 
encourage to stay informed about
every single news story.
What I'm  trying to get across 
to people is just meet people 
where they're at.
And maybe right now in this 
time, because
we know it's crazy with the 
pandemic and all
the social injustice and all the
kids are back in school and
parents are worried and trying 
to be teachers to these kids, 
everybody's got emotions all 
over the place.
Just before you jump in, just 
stop and say, how are you today?
And recognize that when you ask 
that question
today, it could be different 
from tomorrow. Or how are you 
doing right now?
Because sometimes that can 
change  moment-after-moment, you
know? So, I hope that  helps.
Matt: I like that.
I like the idea that  checking 
is not something you should do 
just because you're aware. We 
could all use it right now.
Lani:  Yes.
Matt: It's a fairly safe bet 
that any of your coworkers could
appreciate a check-in. Probably 
so.
Sasha: On a personal note, I had
a couple of personal
efferents
that how much that affects other
people, just a few short months 
ago.
So, another question that's very
relevant right now
, is how would you say having 
these conversations differs 
between an
in-person and a remote 
conversation?
Lani:  Well, you know, usually 
in-person,
there's just something about 
flesh-to-flesh, right in
there in front of each other, 
eyeball-to-eyeball.
But what I've tried to do with 
all the conversations that I've 
talked about in the talk today 
has all been virtual.
And the one thing I can stress 
to you is just the power of the 
camera.
And really looking into the 
camera with no distractions
. So, that person feels like 
you're right there with them and
you're looking them directly in 
the eye. That is the most 
important thing.
Nothing will ever replace being 
able to reach out
and touch someone's skin to let 
them know you're right
there with them or give them a 
hug if they
need a hug, or hand them a 
Kleenex if they are shedding 
tears.
But there is a way to do it 
remotely where it just allow use
to just connect with them.
And when they are feeling 
emotional, just sit silent with 
them.
And just let them know that 
you're there and it's okay.
But like I said, I tell 
everybody, this works in the 
interim.
But I hope we go back to a world
where we have a mixture of both.
Because I just enjoy 
face-to-face and that 
connection.
But I really am starting to 
embrace the current situation 
because it's our new normal.
And we as leaders and as people 
have to find a way to leverage 
the
technology to still connect with
people and really connect
with them at an emotional level 
where they are.
Matt: It's about intentionality,
right?
It's like -- it's not a thing 
we're doing while we're doing 15
other things.
Lani:  Right. You've got to do 
it without distractions. It is 
the hardest thing that I do 
throughout the day.
But, you know, people are always
sending me little instant 
messages or chats.
And I just have to abandon that 
when I'm on a
call and I have to intentionally
just be present in the 
conversation.
Sasha: So, another question that
we have is do you think there 
needs to be
already psychological safety on 
a team before you can even 
attempt having a conversation?
Lani:  Yes.
I think in order for you to be 
able to allow yourself
to be completely vulnerable, you
would need
to see evidence that it's okay 
to be vulnerable
without any repercussions of any
kind.  So, I would say, yeah. 
But there are ways to test  it.
I think now what I enjoy is 
people use words like, you want 
to be vulnerable. I need a  
no-judgment zone.
You kind of set the ground 
resumes rules before you open up
and then you test the waters.
I do  think, even if you don't 
have psychological safety on the
team
, you can ask for it, say, hey, 
I need to create a safe space.
I need -- I need a space where I
can feel like no judgment and I 
can be vulnerable here.
Usually when you say things like
that, it makes other
people go, oh, of course,, of 
course.
And it makes them something more
-- they're more aware of
it and go, wow, that actually 
was pretty good.
And you might actually be the 
one, if
there isn't psychological 
safety, you may be the catalyst 
that
creates that for the team over 
time if you practice some of 
those tips.
But I understand for those 
environments where it's not 
quite that way, you might want 
to just test the waters.
So, why don't you try it by 
setting those boundaries
up front.
Matt: I think -- to reiterate on
that, especially from someone in
a more
privileged space than most of 
the work spaces, it's
a little safer, a littleless 
vulnerable for me to show 
vulnerability. It's less risky. 
And that's something I can do.
Because the thing with 
psychological safety to me is 
that it's having
the trust that you will not be 
punished or made fun
of or anything like that for 
speaking up. And it's a lot 
easier for me to speak up than 
other people.
So, it's a relative -- you know,
it's literally
the least I can do, you know, to
help model and that understand
that behavior, psychological 
safety, comes from modeling.
Lani:  Yeah, and here's the best
thing
we can do when someone is being 
extremely vulnerable and
they are sharing their heart, is
it doesn't have to be the 
leader. It can be another one of
their peers.
It's just to say, I want to take
a pause for a moment and thank 
you for doing that.
That really meant a lot to me. 
And that is in a safe space.
You just say that, you're 
already shifting the culture.
Matt: That -- I want to
reiterate that, because we have 
been having
a lot of conversations, a lot of
times we sit there, I'm
not a manager, I'm not an 
executive, I'm not whatever. I'm
an engineer.
This is an absolute thing that 
you can do that is as simple as 
thanking someone.
Lani:  Just thank them.
And I guarantee you what you 
will experience when you thank 
them is
a sigh of relief inside for 
them.
And what you will find is that 
other people will just naturally
respond.
Sasha: So, this is an 
interesting thing.
And I don't know that there's 
really an answer to that
. But I do -- I've heard 
complaints from all sides of the
spectrum, right?
From white males and from Black 
ladies and
from anyone being like, I don't 
feel safe voicing
my real opinion, bringing my
real self to work because I'm 
afraid of repercussions.
And I just sometimes -- I don't 
know how to alleviate
that other than personally give 
them a safe space and a 
one-on-one conversation.
But I kind of don't know how to 
address that in a team setting.
Lani:  So, are you addressing it
as a manager? Or addressing it 
as just a peer?
Sasha: As a peer is very 
interesting because not everyone
has the authority on -- to
sort of speak up and create that
safe space by just exercising 
authority.
And we all need the ability to 
help people feel safe.
Lani:  I would say to you that 
everyone
,  regardless of their title, 
has the ability to create that 
space.
And you have the ability to 
empower anyone
that you speak with to be able 
to show up as their authentic 
self.
And one way to get people to 
show up as their authentic self 
is for you to model that 
behavior.
And then to just express genuine
curiosity
about who they are, what they're
about, what they enjoy. You 
know? Their creativity.
If it is something they're
wearing, if you're curious about
something, just ask.
And say, I really love this 
about you and I would love to 
experience more of it here at 
work. Share more of that with 
me.
And I think the more you 
encourage people to share, the 
more they will start to show up 
as themselves.
And then I have learned as a 
leader, it's important to be
able to recognize one of the 
things I loved to say is, hey.
When I say it's no judgment 
zone, I expect all of you to 
show up as your authentic self. 
We all have strengths.
We all have things we are 
working on.
But I want to take advantage of 
the strengths you have and 
figure
out how together we can take our
collective strengths to achieve 
a goal.
But I could do that even if I 
wasn't a leader.
I'm just doing that with one of 
my teammates and we're working 
on a project.
I want you to know we can all 
empower one another in this area
and begin to model the way
.
Matt: Yeah.
That's -- people who know me are
like, I can't believe that Matt 
has nothing to saw. But I have 
nothing to say.
So, that just -- letting that 
sink in.
Sasha: And, you know, it's about
like also this year being 
traumatic.
We actually had a comment on 
Slack the
other day where someone was 
like, Sasha has no opinion. Like
I really didn't.
I don't feel sociable, I just 
can't deal with this right now. 
Take it away. You know? This 
totally happens.
I think one of the things we 
kind of didn't
discuss today is the LGBTQ 
community and being inclusive 
towards them.
And that is also something 
that's a very rough topic
and lots of different companies 
handling it differently.
So, I wonder, again,
what can leaders and teammates 
do to be more inclusive towards 
the LGBTQ community?
Lani:  Oh, my god, that's one 
community that I'm absolutely 
fascinated with. It's so much 
diversity within the community.
And I have so many friends and 
family members that are in that 
community
that I think the biggest thing 
we can do there is just to watch
our language.
Like we automatically assume 
when people are married that 
it's opposite sex.
So, create the space that maybe 
reference it as a partner.
I would also ask, because you 
have a world
today where people are not 
identified to any one specific 
gender.
And it's not the way we've 
looked at it in
years past, give them the space 
where you start to see people --
what pronoun you associate with.
I think that is so important. 
To, again, meeting people where 
they're at. Expressing genuine 
curiosity.
And not subjecting people to all
of the social norms we may have 
grown up with.
And be open to new ideas in the 
way that they express 
themselves.
And I think the biggest thing 
you can do is just invest in 
your own education to learn.
I mean, some of the things did 
is I tried to do some mentoring 
within the community. I do a lot
of reading in the community. I 
love to watch some of the shows.
One of my favorite is
-- I can't wait until the new 
season comes out.
I got deeply  embedded in their 
lives through the characters.
And it just made me
genuinely who they are and how I
can be a support system to them
. So, I think everybody can do 
that.
Sasha: That is so wonderful.
We have another question that is
just came in, which is
, how can an organization ensure
safety in this space? Do we 
control the dialogue? Do we put 
restrictions on the space?
How do we, as an enterprise
, handle these conversations? 
And psychological --
Lani:  As an enterprise, I think
the biggest
thing we have to do is make sure
that we educate leaders on how 
to have these type of 
conversations. And how to create
safe spaces.
You cannot minimize the fact 
that a lot of
leaders have not really been 
taught how to do this.
I do think for some who are 
higher on the
EQ in the balance of EQ and IQ, 
they probably find it a lot 
easier to do. But let's not 
assume that everyone knows how 
to do this. So, invest in 
education.
And I will tell you, I just 
decided,
like I said in the -- in the 
clip where I do the talk
, it was just tissue
-- it was unrest inside of me 
and I felt a responsibility to 
show up as a leader. And I 
called a town hall. And I didn't
do anything special.
I literally just showed up. And 
I listened. And I think we all 
can do that.
And then just check your biases.
And recognize that it's not our 
responsibility
to subject our opinions about 
things on people when they're in
the middle of feeling and 
emoting.
Our job is just to listen, 
provide a
supportive environment, and do 
that active listening to 
validate
the things that we've heard and 
figure out how we can help them.
But I do think they  will 
leadership education.
Sasha: And I do think -- so, an 
important thing that I want to 
bring up is that these are EQ 
skills, right? You cannot read a
book and be better at this. You 
need to practice.
Because in this stress
situation, and a rough 
conversation is a stress
situation, your brain reacts 
automatically and you need to 
train yourself to react a 
certain way.
Which, again, I'm hearing just 
from your conversation, I'm 
seeing that you can handle this.
But I know so many leaders who 
just never have been exposed to 
that
. And I wish -- in my whole 
career I have been
looking for that kind of 
training and it's been extremely
hard to find.
Lani:  Well, I have a few people
I can recommend.
Actually, they're probably going
to be tickled that I'm going to 
say this.
But there's actually a podcast 
called more in common
. There are two guys who grew up
together
that are really  focusing on 
creating compassionate 
conversations. And teaching 
people how to do that in a safe 
space.
And I think you're gonna start 
to hear more and more about 
their work.
But I want you to know that 
there are a lot of people out 
there that recognize
the importance of being able to 
create these safe  spaces, how
to do it in a
psychologically-safe 
environment, and how
to have those difficult 
conversations without making 
everyone uncomfortable.
So, I think you're gonna find 
that there's
more and more people that are 
going to come out to solve this 
problem.
Sasha: And this brings me into 
our last question because we 
only have half a minute left.
Which are the best resources, 
maybe books, maybe resources
you can recommend maybe in 
addition to the podcast.
Matt: Despite that Sasha said 
you can't read a book to learn 
this.
But by the way, Lani, what books
would you recommend to learn 
this?
Lani:  I'm going to tell you 
right now, and actually here in
my office, I'm a big fan of 
Brene Brown. I'm reading dare to
leave.
Actually my entire leadership 
team and the MSUS leadership 
team is  reading this.
And I think what the book does 
is really good in teaching
you to have difficult 
conversations in terms of how to
think about them.
And really talks about being in 
the space of vulnerability and 
the power of it.
So, I think if I were going to 
give you one book
to read, anything in her work, 
you will find that's her sweet 
spot and I think she'll give you
a lot of great coaching there.
So, I'm probably  singing to 
Brene Brown just for right now.
And then let's figure out a way 
to get you
tied to an organization, more in
common
is an organization where you can
actually work with these 
companies and allow them
to help facilitate and do the 
proper coaching with your 
leadership
team and your organization to 
help you do it better.
Sasha: That's amazing.
And Lani actually has her own 
book coming out shortly which I 
wanted to mention here.
Lani:  Yes! I'm finally gonna do
it. 2021, still a little bit of 
a secret. But, yes, I'm doing my
first leadership book. I'm so 
excited. And it's really gonna 
be speaking to inclusion. And 
how to do it.
How to drive inclusion in the 
work place.
Sasha: All right. Thank you so 
much for being with us today. 
We're at time. So, thank you so 
much. It was so important to 
have you. And I'll see you soon.
Lani:  I'll see both of you soon
. Thank you both.
Matt: Don't forget, everyone 
watching, we're back at 10:05 
Central time.
But each watching the stream for
a word from the
lovely, lovely sponsors. And 
surprises.
The party doesn't stop just 
because Sasha and I stop 
talking.
Up next:  
Building a DevOpsDays Culture in
a  Remote World
Emily Freeman
Matt: Wow, we are off to a 
fantastic start.
We've had a great talk, and I 
understand that we're having 
great discussions in Discord.
I can't tell because Trevor took
my phone away from me and told 
me I'm not allowed to look at 
that right now. So, from what I 
hear, everything sounds 
fantastic.
But speaking of fantastic, our 
next speaker is Emily Freeman.
Emily is the author of the 
best-selling books, DevOps for 
Dummies.
And I hear she has one heck of a
biscuit recipe.
Emily: Somewhere along the last 
10 years, DevOps stopped
being a philosophy of how to 
develop software.
And started to be a sales tact
ic.
That shift, I think, has caused 
many
developers and operations 
engineer
s to loose trust a little bit
that DevOps can actually make 
their jobs better.
Developing software is hard. It 
just is. We solve complex 
problems. Often with complex 
solutions. Every single day.
That complexity is what makes 
communicating
with your colleagues such a 
challenge.
Code isn't the hard part of 
software development. Instead, 
it's compiling. Literally.
The work of everyone on your 
team into a cohesive
, polished product that brings 
real value to your customers.
I think everyone in this session
has stood through an hour-
long stand-up in a room with a 
tall table and no chairs.
Argued for 15 minutes on whether
a feature was an 8 or a 13. 
Heck, you might have even done 
it this morning.
The truth is, any production 
database older than two
weeks is basically held together
by duct tape and positive 
thinking. It's true.
There are plenty of 
anti-patterns in our work, and
you've probably experienced most
of them. I'm Emily  Freeman.
I'm the author of DevOps for 
Dummieses and a principal cloud 
advocate at Microsoft.
I am  obsessed with
helping technology organizations
transform
themselves by creating company 
cultures in which diverse
, collaborative teams thrive.
Knight Capital is by far my 
favorite DevOps story.
Because you couldn't write a 
better story to show
why culture, not tooling, is the
foundation of DevOps.
In 2012, Knight Capital received
notice that the New York Stock 
Exchange was opening its dark 
pool called
the Retail Liquidity Program,
RLP, after getting the go ahead 
from the SEC.
In simple terms, a dark pool is 
sort
of a private market that  
operates outside of public 
scrutiny.
Knight Capital needed to 
integrate with the
new dark pool because they were 
what's referred to as a market 
maker.
They made money on the delta 
between buy and sell prices on 
stocks.
If trades took place in the dark
pool
and Knight Capital wasn't 
involved, it would lose money.
And risk their foothold in the 
market. That's not good.
Now, at the time, 
knight manageed an average daily
US equity volume of 3.
3 billion trades worth roughly 
$21 billion.
That's insane.
Knight had just over # 30 days 
between the
RLP approval and its go live 
date of August 1st, 2012.
With an incredibly short 
deadline, the engineering team 
got to work.
You can imagine the stress that 
development team experienced. I 
get a little sweaty just 
thinking about it. Whoo! That's 
fast.
But here's the kicker.
To integrate their system with 
the
RLP, Knight Capital would have 
to make changes to the trued 
execution system.
Including the high-frequency 
order rout
er referred to as SMARS
. Short for smart market access 
routing system. Naming is hard.
SMA RS could execute thousands 
of
orders per second and was 
designed to compare prices 
across all markets within a 
fraction of a second.
Now, as you might imagine, 
SMARS was a relatively complex 
algorithm
. And SMARS was old, at least in
terms of software
. It contained unused legacy
code, sometimes referred to as 
dead code.
Part of this dead code contained
an
algorithm called power peg which
was sunsetted in 2003
, nearly a decade earlier. 
That's bad.
Power peg was a test program 
that bought high and sold low.
Now, if you're thinking, that 
kind of breaks the
basic fundamental rule of stock 
trading, you are correct!
Power peg was never intended to 
be used in a production 
environment.
Instead, it was used to sort of 
verify the behavior of other 
Knight trading systems.
Like a test or a staging 
environment.
The engineering team coded 
feverishly for nearly a month.
Planning to deploy the new 
system using a
feature flag a week before the 
deadline of August 1st.
Now, on the go live date, they 
would simply flip
a switch on the feature flag and
Knight would be trading in the 
RLP.
During that last week of  July, 
an
operations engineer manually 
deployed the new
RLP code in SMARS to a eight 
servers.
There was no peer review to 
ensure everything went well.
Nor was than an automated
system to provide an alert if 
the production code or behavior 
was inconsistent. At 9:30 a.m.
Eastern Standard Time on August 
1st,
engineers flipped the switch on 
the feature flag
and SMARS began to route orders 
through the RLP.
But everyone could see 
immediately that something was 
off.
Internal charts showed strange 
spikes in activity
. And 4 minutes later, the New 
York Stock Exchange called.
Knight Capital was executeing so
many
orders that the entire market's 
trading volume had doubled.
And perhaps more  alarmingly,
Knight appeared to be buying 
high and selling low.
Knight was losing thousands of 
dollars per second. Per second!
No outage is cheap, but that is 
a nightmare, folks.
Everyone jumped in to try and 
find the bug,
but no one could quite pinpoint 
the issue.
Orders were originating from the
new router code, but that's all 
they could see. The code looked 
right, tests passed. Everything 
looked like it should be 
working.
After 20 minutes and no 
solution, the CIO made the
decision to roll back the 
deployment. It wasn't ideal. 
There would be an explanation 
needed. But it would stop the 
bleed.
Only the rollback failed.
As soon as they released the 
last stable
version, the SMARS  software, 
trades spiked again.
When the developers shut down 
the entire system complete
ly 28 minutes after the market 
opened, Knight
Capital had lost  $460 million.
That is over $16 million per 
minute. Knight Capital was 
finished.
They were eventually acquired 
for pennies on the dollar of 
their original value.
So, what happened?
Well, it turns out, the issue 
was that when
developers updated that legacy 
code, they re-purpos
ed the feature flag that had 
been used to disable that same 
code.
Unfortunately, the Ops engineer 
who deployed the
new build to the servers, he 
made an error during the manual 
deployment. Only 7 machines were
updated. Leaving one running the
old code.
Because the feature flag was 
reused, traffic
hitting that one machine called
the Power Peg program, that 
algorithm that bought high and 
sold low.
The rollback didn't have the 
intended effect because no one 
thought to turn off the feature 
flag.
In fact, the rollback 
intensified
the problem because when they 
reverted to that last
stable build, all eight servers 
were
running the legacy code instead 
of just that one.
Now, these engineers, they 
weren't stupid people. They 
weren't using bad tooling.
This was one of the leading 
financial institutions of the
time and Knight hired some of 
the brightest people in the 
industry.
As with any service outage or 
degradation,
there were a number of 
contributing factors which I'll 
discuss in a moment.
Hiring great engineers is just 
one piece of the DevOps puzzle.
Allowing these engineers to 
thrive and to be the
absolute best performers they 
possibly could be
, that requires thoughtful 
planning. And work.
DevOps is a philosophy,
a methodology, that prioritizes 
people over process and process 
over tooling.
When I ask people what's going 
wrong in
their DevOps practice, they 
almost never tell me that it's a
tooling issue.
And even when it is, it's 
usually about how their team is 
using that particular tool.
Rather than a problem with the 
product itself.
If you put tools first, all
you've accomplished is
automating and obfuscating the 
bad habits that already exist in
your organization.
Knight Capital didn't fail 
because they chose the wrong 
CI/CD tool.
They careened toward disaster 
because of a series of
relatively small poor decisions 
that compounded into a
nightmare scenario. I bought a 
house last year.
And the yard is a mess of weeds 
and
vines and, well, mostly deny
debris most
dandelions.
I have been picking them out of 
my yard in a slow, endless 
battle that I'm probably losing.
This little, tiny flower, it was
so magical
to me as a child, but has become
quite the nuisance in my 
adulthood.
But the dandelion I think 
embodies a lesson
. Weeds can
grow in the worst soil, clay
, rock, soil, they find a way
. But plants that give us life
, they need a hardy, rich 
environment.
Farmers must care for the land,
balances the nutrients ands adty
and rotate crops. It's hard 
work. But the effort bears 
fruit.
DevOps is centered around a few 
core  principles.
You can think of these as the 
work you must do before you ever
think of planting the seeds of 
tooling.
It is in embodying and executing
on these principles
. When your actions reflect the
ideas that you begin to truly 
transform your organization. 
DevOps starts with leadership.
And I want to be clear that you 
don't have to be a manager to be
a leader.
Empower your team. Share 
responsibility.
Create cross-functional
collaboration across engineering
teams and disciplines.
Your team should feel free to 
work independent
decisions based on their 
expertise. Collaboration starts 
with trust.
Your day-to-day interactions 
should be opportunities for 
people
to freely share their challenges
and pitch in to help.
Meetings and activities should 
have a human component so that 
the people on your team can 
start to build rapport.
The bottom line is that it's 
much easier to
get through product and 
architecture disagreements if 
you trust the people with whom 
you work.
Cross-functional teams in which 
members have
complimentary skill-sets are 
ideal for establishing a culture
where information is shared 
freely.
And it eliminates the sort of "
Not my job" attitude.
One of the best ways to build 
these is to
hire people who don't just have 
the technical skill-set
you need, but also have this 
sort of natural curiosity.
These engineers often embody a 
dogged determination to figure 
it out.
And they will grow to support 
each other over time.
Create opportunities for people 
to teach each other more about 
their disciplines.
This is especially important for
developers who need to learn
about infrastructure and 
configuration management
. As well as operations 
engineers who need to better
understand the complexities and 
difficulty in
writing reliable code. Failure 
is unavoidable. It just is.
You cannot outthink or outsmart 
failure, as much as we try, it 
just doesn't work.
If you have one operations 
engineer for
every 20 developers, you're 
going to have an issue. People 
burn out. Especially when 
they're on-call.
So, Ops folks need to create 
open
access to telemetry and logs so 
developers can actually learn.
And  developers, in turn, need 
to support the
operations team by being added 
to the on-call rotation
. Practicing failure is 
incredibly important
. My favorite speers is to take 
the
most senior engineer on your 
team you know, the one. They 
have been there forever. They 
know the system in and out. 
Yeah. Take her laptop away. Send
her on a long vacation. I guess 
staycation.
And see how the rest of the team
actually manages without her.
You'll learn a lot about the 
human dependencies
in your system as well as the 
technical ones. And both are 
important.
Communication is something 
undervalued in tech.
We focus so much on that sort of
raw
engineering intelligence, that 
we overlook the ability of
some to convey technical 
concepts to
others.
But you think it's communication
that separates the truly
phenomenal disruptive teams from
those who
simply maintain the status quo.
Try to make time to communicate 
over video rather than just 
email or chat.
It's a more efficient and 
nuanced form
of communication that conveys 
more than just the words. Though
be aware, there is a fatigue.
So, don't overinundate them with
video calls.
Provide your engineers training 
in those sort of
"Soft
soft skulls" they're actually 
quite hard.
Convey their concepts to teams 
like sales and marketing.
One of the challenges DevOps 
faces is its implementation.
It is much easier to
conceptualize the values I just 
discussed than it is to inject 
them into your work environment.
The reason implementation is 
hard is because we all work at 
different companies.
And each of those companies has 
different people and resources, 
constraints.
I really wish I could hand you a
checklist
of which at the end of which you
could declare DevOps victory! 
And be done.
Instead, there are a few things 
we can learn
from Knight Capital and 
implement them right now to 
strengthen your DevOps practice.
Old, unused legacy code should 
always be removed. Delete, 
delete, delete! We have source 
control for a reason.
You can always go back and find 
code if you actually need it.
But I promise you, in you won't.
In that same vein, please
never measure developer efficacy
by lines of code written.
Sometimes the best thing an 
engineer can do is delete
thousands of lines of code and 
dozens of file files.
Make new feature flags for each 
component or service or update.
If you're unfamiliar, a feature 
flag is sort of
-- it's a wrapper of a specific 
feature or service or product.
And a binary flip is stored in 
the database to tell
program whether to use that code
in production or to keep
it it hidden.
It is a great way to accomplish 
continuous deployment
in your system by allowing 
engineers to
constantly deploy their new 
software without actually 
releasing it to your users.
Rushed deadlines are never a 
good idea.
Sure,  Knight Capital didn't 
know for sure that the New York 
Stock Exchange would receive 
approval to open the dark pool. 
But they knew it was in the 
works.
And they could have been 
preparing for this ahead of 
time.
Yes, if, you know, the SEC had 
blocked
the dark pool, Knight would have
"wasted" engineering hours.
But I think sometimes it's worth
the risk.
Deployments and releases is 
where tooling like Azure
DevOps really starts to shine 
nap
. That poor lone engineer who 
deployed the code
manually across eight machines 
made a mistake.
We make dozens of mistakes a 
day, I certainly do.
And constantly relying on humans
to behave successfully is risky 
at best.
Opting for deployments through a
CI/CD pipeline as much as 
possible.
Every manual task involved in a 
software release is an 
opportunity for error.
But make sure that those 
automated processes are  
supported by a robust test 
suite.
It's clear that  Knight Capital 
did not
write tests to accompany each 
new feature and bug fix.
That said, Knight engineers had 
received alerts from their 
system that something was wrong.
Earlier in that morning of 
August 1st, the system
generated 97 emails that 
identified
an error described as power peg 
disabled. But the emails were 
obscure.
They had no clear call to action
and were mostly ignored.
If your alerts are confusing, 
unclear, your team simply 
receives too many of them. 
They'll develop alert fatigue.
And essentially, people just 
stop paying attention.
Which is about the worst 
possible reaction.
Finally, practice incident 
response
. Knight Capital had no plan for
how to handle a service outage
. The CIO and his team were 
making
decisions in real-time based on 
the data that they had. And it 
wasn't good enough.
You need a plan for how your 
team will handle
incidents as thoraces they 
arise.
No, you can't write a bug run 
for every single thing that 
comes up.
But the more assistance you 
provide engineer and the more
practice in handling small scale
incidents, the better prepared
they'll be when a truly horrific
situation strikes. DevOps isn't 
a silver bullet.
But it is a framework in which 
you can design company culture 
so that everyone can thrive.
I believe engineers who are 
proud of their work and
proud of their work place 
produce software that is
better, faster, and more 
reliable. Thank you.
Matt: Fantastic.
Just a quick reminder, all of 
our great breakouts are 
happening in Discord right now.
So, if you aren't in there, you 
are missing  out.
Make sure to register for 
instant access at event
.devOpsDayschi.org. But!
If you're here, and that's 
amazing, we are now talking
to our speaker who we just saw 
an amazing talk from, 
everything's amazing, amazing, 
amazing, amazing. Okay. Got it 
all out. So, Emily -- Emily 
Freeman.
Emily, thanks for taking some 
time with us this orange.
Emily: Yeah. I'm so happy to be 
here.
Matt: Kind of chat about that.
One thing if  for, you know, 
people who are very
observant are notice, you know, 
that there's kind of a little 
bit
of a dissonance because Emily's 
recording looks similar, but not
quite the same.
And I might be  messing with 
you.
Margaret: In the meantime, I 
think Emily pulled off an art 
heist.
Emily: I traveled around the 
world.
Matt: Around the world, quickly.
So, speaking of like stealing 
things, I'm gonna
steal some questions from -- 
that people
are wanting to know about your 
talk and your things, right?
Emily: I love it.
Matt: Here's the thing, rights?
Most of us have made changes to 
production before. And by most 
of  us, I mean, all of us.
Because I mean, if you're not 
making changes to production, 
what the
hell are you actually doing, 
right?
Emily: Yeah.
Matt: But changing production is
really  stressful.
Emily: It is.
Matt: And I believe we don't -- 
we can't prevent incidents, 
right?
We are not going to be able to 
avoid them, they're going to 
happen.
But that being said, what's a 
key takeaway for us to
think about to make sure that we
don't
, you know, bankrupt our entire 
company by pushing one line of 
buggy code.
What's like the -- yeah?
Emily: Could you imagine being 
that that room? The amount of 
stress and sweat.
It would just  like -- huh -- 
the intensity of that.
Just what I said, number one, 
feature flags are awesome. Love 
our friends at LaunchDarkly. 
Don't reuse the feature flags. 
Point number one.
And then I think the other error
was this sort of manual deploy, 
right? Especially with no check.
Like there are cases  where, you
know, folks still need to 
manually  deploy. That's okay. 
But maybe have someone around.
And when you can, please move 
toward an automated system.
That's where you're looking at 
CI/CD, you're looking at a 
pipeline.
And that does a couple things 
for you, right?
It automatically runs the 
changes for you and deploys for 
you.
But it also, you know, you can 
set up your tests and that can 
be
like, you know, everything from 
functionality of
the actual code to security 
gates to make sure
that you're not doing anything, 
you know, that puts you at risk.
So, all that have runs 
automatically.
You're not relying on humans to 
be perfect all the time.
Because we know we're flawed.
Margaret: If you were in that 
situation where you were 
offering advice to a future
Knight Capital, what would you 
say is a good starting point? 
Because they're all really good 
points. But they only had a 
month.
So, what do you do day one?
Emily: I don't know. Part of my 
problem is the fact that they 
only had a month. I feel like 
this was a leadership failure.
I understand like they didn't 
want to waste money by building 
a
system for something had it not 
gotten approval.
But I think you can kind of rely
on assumptions a little bit and 
make that investment. In the 
worst case scenario, you waste 
engineering hours.
But you can also think of that 
as like an exercise for those 
engineers.
Any time people are taking on a 
challenge, they get better at 
what they do. That's important. 
Stressing people out. Mistakes 
happen in time crunches, it just
happens.
So, avoiding that sort of one 
month scenario would be ideal.
But first steps, I don't know.
What do you think, Matt?
Matt: That was the thing I 
wanted to double down.
Seems like the answer is first 
thing you do is invent
a time machine and go back and 
don't have a one month crunch. 
So, barring that, I think that's
sort of the thing too.
Because whether you sit  there, 
okay, this happened because 
there was a failure in 
leadership and all this stuff.
But all those individual 
contributors which are the ones
that got put on the line for the
buggy code, what do you do -- 
can't just be like, well. 
Leadership  sucks, right?
I think a lot of folks want to 
think about like when you are --
whether you are
empowered or not or you feel 
that you are or not, whatever it
is.
When we have that leadership 
challenge, but as an individual 
contributor. You know?
To me, I kind of think about 
like, you know, Emily brought up
a bunch of points.
And my answer to which one 
should I start with is yes?
And I don't mean all of them, 
but I mean, any of them.
Emily: Start anywhere.
Matt: There's not a wrong 
answer.
Emily: Yeah.
Matt: And that's what that comes
back to.
The hardest part about trying to
get better is avoiding that 
analysis paralysis.
I've talked to plenty of 
customers, they want to do this 
transformation. We've got to 
think about every single thing. 
What about when this happens?
Listen, you're going to spend 18
months thinking
about all the freaking ifs and 
you're going to miss some of 
them anyway.
Emily: Yes.
Matt: Might as well just get 
started and try some stuff.
Margaret: The hazard hardest
part of running is putting the 
shoes on and getting out the 
door.
Emily: Yes.
Looking at the transformations, 
pulling a
small piece out, stay away from 
the cart service, things that 
earn you money.
But pull something small out and
run an experiment on it
. Implement CI/CD, implement a 
robust test suite, the whole 
thing with that.
And you're gonna kind of learn 
those lessons along the way
and you'll get sharper and 
faster and you can apply it to 
the sort of figure services.
But it also, that experiment 
serves as a
really strong data point to sell
other engineers
, to sell executives, whomever 
you need to convince.
Matt: You just brought something
up. And this is gonna be ground 
breaking. So, everybody get  
ready.
Emily: Okay. Ready.
Matt: I promised when I hit 
10,000 followers on Twitter I 
would reveal the secret to 
DevOps success.
Emily, you just teed me up for 
it. Get ready.
Emily: Okay.
Matt: The secret to DevOps 
success is piloting, like Emily 
said. But the secret to that 
secret is cheat. Stack the deck 
for success.
Emily: I love it.
Matt: Emily nailed it. Don't 
pick the most important project,
product or feature.
But also don't pick something 
that nobody gives a crap about 
either. It has to have business 
value.
But here's the thing, don't 
staff that experiment
or that pilot with the people 
who think DevOps is dumb or
they're cranky or whatever. Find
the people that are along.
Because you're going to find 
rough edges and you need
the people that are, okay, that 
didn't quite work the way we 
want. Versus people that are a 
little resistant. As soon as 
they hit the bump.
We all know the co-worker that 
is, I could have told you this 
would happen.
In fact, I did tell you many 
times and you ignore me.
Emily: Yeah.
It also reminds me of growth  
mindset versus fixed mindset.
And I feel like you have to own 
that growth
growth Mindset.
Coming from Karol Dweck, growth 
mindset.
My example, I am not naturally 
talented at math. I wish I was, 
but I'm not. When I was growing 
up, I had a fixed mindset.
The minute I hit like trig 
nomenclature tri, I
was like, not for me
, I'm just not good at 
trigonometry
. A growth mindset is, oh, this 
is a challenge. I can learn from
this.
View it as a puzzle, a fun 
thing.
Margaret: That's a good way to 
empower people which you talk 
about empowering teams.
Emily: Yes, absolutely.
Matt: And you can do that -- oh.
I was going to say, the key is 
we talk about measurement is 
part of DevOps. Here's why it 
matters. You want to show 
success.
Because it was great, stack the 
deck and cheat. Now you have to 
get everybody else to come 
along. The way to get them to 
come along is now you're showing
results. I'm from Missouri. Show
me, right? Like get the details.
We have an interesting question 
-- maybe it's more of a -- here
, it wouldn't be a tech 
conference did, it's not really 
a question so much as it's a 
comment.
Emily: I love it, though. Let's 
just be honest.
Matt: Someone says, you know, I 
know the story of Knight 
Capital.
But I like to think they were a 
division of Wayne Enterprises. 
So, I don't know -- you know 
where that goes.
But that does lead into a 
question someone in the chat has
asked and
said, Emily, did you work at 
Knight Capital?
Emily: No. Good god!
I think I would have gotten a to
go -- that would have been the 
end of my -- you know what? I'm 
out.
Bye.
Margaret: So stressful.
Her hair would be gray.
Emily: It would literally 
change. A reverse Benjamin 
button. The quick caper.
The guy in Indiana Jones, he 
chose poorly.
Matt: To avoid looking like that
as a manager, what should a 
manager do
when they realize they found 
themselves embedded in
release schedules that they 
can't
match without risking Knight 
watch? What did do you do?
Emily: Obviously leadership has 
a greater burden on them to push
back on certain things.
But also sometimes engineers 
they put themselves in the code 
monkey box. Well, executed. I 
don't have ownership over this.
And sometimes if leadership 
isn't giving you that sort of
empowerment or autonomy, one, 
consider finding a new job.
But if that's not possible, the 
second option is to really kind 
of take that on yourself. You 
know?
Don't be afraid to speak up and 
say, listen.
If you want this done well and 
correctly with all this money on
the line, this is what it's 
going to take.
And we can't do it in a month.
Like, there needs to be some 
forethought and some pushback.
And also, like noting is for 
permission all the time, right?
I think one of the best things 
you can do, especially when 
there's
a small blast radius, is justdo 
the thing opinion. And then beg 
for forgiveness if it doesn't 
work. Do the thing.
Don't ask permission to write 
tests with your code. That's 
good engineering.
Do the job that you feel is 
necessary and you're proud of 
and kind of convince the people 
around you why that's the better
approach.
Margaret: So, you're promoting 
skunk working with right?
Emily: Yes. I think so.
Matt: When -- so, again, kind of
a thing calm in
thing came in.
When we talk about deleting 
code, deleting legacy code.
But what do you think about 
thinking about not just legacy 
code, but legacy processes? 
About like --
Emily: Ooo -- yes.
Matt: And that is almost a 
little scarier to  do, right?
The code is like that's just the
machine stuff, but the processes
, that's where the people live.
Emily: Yes.
Matt: What are some ways to 
think about when you want to 
start getting rid of these
legacy processes in a way that 
doesn't get people really 
worried about their cheese 
moving?
Emily: Yeah. This one's I think 
much harder than deleting code.
Because deleting code, and you 
could easily undo the code -- we
have source  control.
Matt: Also, you just pressed 
delete. There is literally a 
button.
There is no delete my legacy 
process.
Emily: Okay.
Matt: Mandate.
Emily: Processes, it gets 
trickier because someone is 
probably clinging to that 
process for dear life.
Like they're like, this is how 
it's always been done.
And what we have to realize and 
have empathy for, we have our 
comfort levels. You see this a 
lot on teams.
Every team I've ever been on has
like three communication tools.
Okay, I  use -- and I use -- and
I use Slack, whatever.
And so, it's like, everyone's 
gone to their corner. This is my
tool. I will not use that other 
tool.
It's like, is this really the 
hill we're gonna die on? This is
it? This is the thing?
But the root of it  is, you have
this comfort level in the thing 
that feels best for you.
And sometimes it's, you know, a 
UI that your brain kind of works
naturally with. Sometimes it's 
just habits. Like this is what 
I've always done.
But kind of having empathy for 
that and then using that sort of
data in storytelling.
My big thing is convincing 
someone to do something, 
persuasion, it's an emotional 
argument.
Even if, you know, we're 
engineerles,
we're not emotional, we're 
intellectual.
Matt: Speak for yourself, I am 
not intellectual at all.
Emily: But your lizard brain, 
it's up there and responds to 
that emotion. Hope and fear and 
all of that.
So, if you can use that and 
trigger that, and support
it with data, it's a compelling 
argument.
Margaret: Brings it back to the 
quick wins that we were talking 
about. Stack the deck type of  
things. Look at what we did. It 
worked well.
Emily: Yes.
Matt: Someone in the chat did 
have a comment about my secret 
to DevOps success.
Cheat to win if that, and my 
response to that is do you have 
10,000 followers on Twitter? No,
I don't think so.
But more practically, and more 
useful, could you
give any examples of where 
you've seen like good
examples of teams that are 
deleting code on the reg, right?
Where that's happening?
Emily: Yeah.
I mean, like -- I'm not going to
name companies.
Matt: Yeah.
Emily: Here's a list of good 
DevOps companies and a list of 
bad -- could you imagine.
Matt: We all have that list. We 
just don't say it out loud.
Emily: 100%. I'll never share 
that. Yeah.
I think when you look at people 
that are moving fast, you know, 
while
still not crashing and burning 
every week, so, these are
people like if you're deploying 
regularly, you're probably
in this -- you're in these good 
habits already
. And so, that will kind of 
increase
your ability to kind of 
implement some of these things. 
Yeah.
I think -- I really think  
deleting code is
something that senior and 
principal engineers should be 
doing most.
You know, when you're thinking 
about the architecture, when 
you're thinking
about the long-term product 
roadmap and like what the base 
and
the sort of standards of your 
codebase should be, that's
where at that level of technical
leadership, they should be the 
ones going through and really 
reviewing things,
while teaching your lest 
less-experienced engineers how 
to do so.
But it's easier for the new 
folks to be writing
code to get into the rhythm, and
the people who know
the codebase a little bit better
to be -- delete, delete, delete.
Margaret: That's a good point.
Matt: Not that the point of the 
chat is to have a conversation 
with Discord without being in 
it.
Emily: I kind of like it.
Matt: Apparently the reply was 
that, prepare to win is a better
choice of words. And that could 
be, that could be.
But I would -- would say the way
that we approach
game play in my family is play 
to win, don't have fun.
So, this is also, by the way, 
why I am
an emcee and not actually 
speaking at a DevOpsDays
event at this point because 
nobody should be listening to 
me.
I want to bring it back around 
from
I'll -- not all kidding aside, 
but some kidding aside.
You talked a little bit before, 
I want to bring it back -- it 
was brought
up -- the connection between why
growth mindset is
so, like, connected to DevOps 
transformation.
Emily: Yeah.
Matt: Can we dig into that a 
little bit more?
Why is it such an app metaphor
or modality or frame of 
thinking?
Emily: Because this is hard.
Everything that you do is really
stinking hard.
If you are  approaching 
something, you're going to hit 
speed bumps along the way.
Everyone here has, you know, 
said like, oh, the feature, 
that's a 3.
I'll be done by 2, it's fine.
And 4 comes around, oh, shit, 
you said it's a 3. Whatever. How
do I kind of make up for this?
We have all had the opposite as 
well, which is why estimating is
just hard. But I really think 
it's true. This 13 took 30 
minutes, awesome.
Yeah, I think you're going to 
run into speed bumps, some 
technical, some people.
And kind of being able to not 
view that speed
bump as, you know, this 
insurmountable challenge, you 
know?
Look at -- it is a mole hill, 
not Everest.
You can get over it or around 
it, or through it. Figure it 
out.
But having that sort of growth 
mindset really helps you 
approach those things.
Margaret: Zooming out on the 
bigger picture. Like what you 
mentioned, it was about code.
But having junior people really 
zoom in and experienced people 
can zoom out and bringing that 
perspective to each other.
Emily: Oh, my gosh, yes.
I think that's valuable.
Matt: We have time for one more 
question. It's going to be a 
biggy.
So, thinking about hybrid 
organizations.
Why you've got multiple 
companies working together.
Maybe you've got
systems  integrators -- no, I'm 
talking about like your 
contracting to
a consulting company, in the 
government especially there's
a lot of work with different 
SIs, and you have multiple SIs 
in the same agency.
Emily: What's an SI?
Matt: System integrator. It's 
another company, another 
company. Right?
What are some of the things when
you're dealing with a hybrid 
organization like that? With 
those organizations, they have 
different cultures and 
priorities. Like how do we 
DevOps this stuff?
Emily: I think you just go in 
and yell at everyone.
You tell them exactly what your 
plan is, it's the best
. You say I have 10,000 
followers and you black out.
Matt: This entire conference has
really gone off the rails at 
this point.
Emily: I love it. It's online 
conferences. We can't be with 
each other.
We might as well have fun the 
way we can.
Matt: Yeah.
Emily: So, I think it's tricky.
I think you can kind of apply 
growth mindset to this.
You are all about to see a tail,
that's Walter.
I think using your persuasion, 
but also being open. When you're
looking at disagreements.
Whether this is like the 
approach or like a product 
disagreement.
Kind of really going into it 
with an open mind
and thinking about this is 
something that you're curious 
about.
Being curious about the other 
people's opinions and 
perspectives and listen. And 
don't view it as I've got to win
this.
Because when you go into an 
argument like that. This is my 
corner, I will die on this hill.
This is mine.
Someone will be declared the 
Victor, but you lose a ton of 
social capital.
You lose that trust with the 
people you're speaking to.
I think being open to their 
points and finding those 
compromises along the way is 
really powerful.
Matt: Emily, thank you so much 
for taking your time this 
morning.
Taking your time to answer 
questions and chat with people 
in the Discord. Since you're 
allowed to do that and I'm not 
right now. But -- but this has 
been really, really great.
Everybody, we are going to be 
back at 10:55 central time.
But stick around on this stream,
there's everyone more fun and 
amazing things happening.
And we will see you all in a 
little bit. Thanks, Emily.
Margaret: Thanks, Emily
!
Next prevent Heroism:  
How to Work Today to  Reduce 
Work Tomorrow
Quintessence Anx
Margaret: Angst angst
Quintessence Anx is a developer 
advocate
for PagerDuty and mentors folks 
with the organization
she founded, inclusive tech, 
Buffalo
. Quintessence?
Quintessence: Hello, my name is 
Quintessence, I'm a developer 
advocate at
PagerDuty and I'm here to talk 
to you about preventing
heroism or reducing work today 
to prevent work tomorrow.
To get started, I'm going to 
talk a little bit about how I 
was inspired to give this talk 
in the first place.
One day I was listening to 
developer T podcast and the
person was interviewing writer 
Dan Heath about his book, 
Upstream.
And started with a health care 
parable and it went a little 
like this.
You and a friend are having a 
picnic beside the river
. Suddenly how hear a shout from
the wort, there's a child 
browning and
you go into the water, grab the 
child and bring them to the 
sorry.
But as you're putting them down,
you hear another shout
and another child and so back 
into the water you go.
While you and your friend are 
continuing to pull children out 
of the water, you start 
struggling to keep up.
Until once you look up and see 
your friend getting out of the 
water and walking away. Where 
are you going?
I'm going upstream to tackle 
whoever throwing these kids in 
the water. Okay.
So, upstream.
Upstream work is the proactive 
and
preventive work that resolutions
the need for reactive, or 
downstream, work.
When I'm talking for the
remainder of the discussion, I'm
using these terms in that sense.
What does that look like for  
us?
We're in IT, and not health care
parables, not diving into water 
for work. And we have things we 
discuss.
And now is regaling you with 
falling disown stream, the angry
SysAdmin.
And theing any SysAdmin
has a lot of skills and they're 
asked a lot of questions and 
they're asked and iterate and 
learn more each time.
But as you keep inundating them 
with work, and
it's usually the same one or two
people,
they are more skilled 
disproportionately to everyone 
else.
And they get more work and work 
overtime because the
ad hoc requests are not what 
they're internalizing their day 
job is.
Depending what they're doing, 
Kubernetes, or the s
ys stuff, the access for users 
in a company.
Whatever it is, they're not 
doing that, they're doing this.
To compensate, they do that 
after hours. It's not ideal.
But because just they're 
amazing, customers want to meet 
them. Why meet the people when 
you want to meet the brains?
And unfortunately, that applies 
internally too because you're 
brought into the meetings as the
subject matter expert.
Why talk to anyone else when you
can talk to the person who can 
gives answer and hopefully take 
their  advice.
And all of these means we have 
basically created breath.
For those who haven't ahead
the Phoenix project, Brent is 
the place
holder for the angry cyst ad
cyst cyst ad minute. He can't 
free himself.
If he had time to train, that 
would be his upstream work.
Let's talk about Brent, or more 
specifically, our Brent alike. 
What does this mean? Brent's 
stressed. Not even a little bit.
And they're gonna have a lot of 
impact from that stress.
You're going to have some 
internalization about what 
high-stress environments are 
like
, because even if you're 
watching this or listening to 
this and you're
not normally in a high-stress 
environment, right now we might 
be in a high-stress environment.
There's one or two minor global
crisis happening concurrently in
the background and affecting us 
in different ways.
That's what a high-stress 
environment is like to a varying
degree.
The angry  sysed a is angry. 
They have other emotions like 
resentment and so forth. And 
because of all of this, their 
relationships are very strained.
Not just personally, but 
professionally also because 
they're trying to work with 
other
people and they both know they 
need to be asked the questions, 
but wish you wouldn't.
And, again, that feeds too.
And part of it's being fed by 
everyone else.
You don't want to bother Brent, 
and you're really sorry
and that's how it is because 
that's who has the expert
. This leads to issues with 
Brent or Brent-alike's time 
management skills.
Might have a single person or a 
small team that has
a certain capacity in 
hypothetically ideal
situations and now they're 
overcapacity perpetually.
And now you have a focus problem
because when you're getting a 
ton
of requests, they may not be
prioritized well, and if you 
can't do the upstream
work of prioritization, they're 
inbound, you can do little
bits of them kind of 
concurrently instead of doing 
them and then doing another one 
more serially.
So, in an attempt to dropping 
the balls and trying to
divide your focus, it's actually
going to decrease the overall 
quality of your work.
There's also something called 
allostatic load.
This is a term that started
flowing around because of the 
pandemic in places, phases
in situations where people may 
be quarantining and stuff.
Allostatic load is the wear on 
the body in response to 
long-term chronic stress.
It's the brain fog you're 
probably feeling as well as the 
sleep deprivation.
It's the state where you're too
awake to truly sleep, too awake 
to sleep,
you can't time management, can't
do intense cognitive tasks at 
all.
But beyond the Brent-a-like, 
business
. Talk about upstream work and 
how this can impact everything.
And Brent and the Brent-a-likes,
they're employees to someone.
Maybe that's really bad for 
Brent, new a
certain corporate culture, they 
don't care why it's bad for 
them. They want to know why it's
big for them.
You have the niching returns, 
you have the brain fog so
you can't think as quickly or 
clearly as would be ideal. 
You're solving problems slower.
As a result, the business can 
promise fewer things to anyone 
using their product or products.
Or if your customers are 
internal, for example, if you're
doing infrastructure with other 
Devs, you're
still promising fewer things, 
just to on  internal-facing 
environment.
When these sorts of things 
happen, you get  burnout.
Which is commonly talked about 
in industry, when employees get 
burned out, they walk. They walk
away to a situation that doesn't
burn them out.
So, if the business is seeing 
these employees with these
stress cycles and they don't 
respond, those  employees leave 
and that has a cost too.
Because you have the lead time 
to hire the replacement or
replacements and you have the 
lead time to train those 
replacements.
And those cycles may not be 
short depending on the seniority
level you're hiring for.
Could be anywhere from 2 weeks 
to 2 months to 4 months.
And then once you actually have 
the person, you maybe have a 90-
day onboarding period even if 
they have the requisite
module because they're  learning
about your business in that 
time.
And all of this feeds into 
reputation.
There's a cost to the reputation
too. There are two main sides to
the reputation coin.
On the one side, it's going to 
be difficult to replace people
if you start learning a 
reputation of burning them out 
and not caring
about their the wants and needs 
to be functional employees to 
you.
But on the opposite side of the 
reputation coin, you have the
customers be en, internal, 
external, when they're not 
getting their
questions answered from support,
or not getting features through 
the pipeline
successfully, they're going
to take their wealth and put it 
in a different company.
That will impact the finance for
that business and impact the 
valuationuation of that 
business.
There are values of work about 
stress, there are values of work
about
business and this talk isn't 
entirely scoped to that.
However in my resources slide, I
have links to things you can 
read if you want to pursue this 
topic more.
But back to upstream and 
downstream.
If your business team or 
whatever is not already focusing
on
upstream work, you and/or your 
team are going to have to upsell
the cultural change.
But just like nothing happens in
a vacuum as the phrase goes, for
DevOps, nothing happens in a 
silo. You need to get buy-in, 
right?
So, if you're listening to this 
talk and maybe on a team
but don't have decision making 
power, you have to think about 
their
boss and their bosses and 
whomever
you have to upsell to and what 
they need.
You can build a business case 
based on the stress on the 
individual or
team and the broader business 
impact and cycle through more 
resources so you can say this is
bad and it's bad here.
And once you do that, you can 
start to outline a plan to 
actually instantiate that 
change. Which is what I actually
want to get to talking about.
So, shifting your work.
When you get started, you need 
to brainstorm the work. This is 
not  planned.
You don't want to say, oh, I 
only want to think about this 
work, all the work. Don't scope 
it, don't do anything else.
Get your paper, your whiteboard,
whatever you're going to do.
And just start writing down 
tasks, they can be incredibly 
broad, or specific.
After you brainstorm for 5, 10 
minutes,
whatever making sense for you or
your team, start categorizing by
streams. Which of the pieces of 
work are reactive? Which of the 
pieces of work are proactive?
Or to use are the terms I'm 
using, downstream and upstream? 
Where does the work fall?
For now, for this cultural 
shift, you can ignore anything 
that already falls into 
upstream. It's already there. 
That is your happy place where 
you want the rest of the work to
go.
Looking at the drown stream 
work, reassess and see what 
patterns it has.
Typically you have some things 
that are  repeating and not all 
of them will cause
people  pain.
You might have repetitive work 
that, oh, I have to run this 
script or put it in cron job. 
Going home.
And that's all you have to do.
And then there are other types 
of reactive work
where you might have a database 
or a cluster that goes down on 
the regular
and it's a huge time sink and
majorly  stressful and maybe 
involves other
teams and keeps going to the 
public and goes on and on and 
on.
Look for patterns and see what's
repeating and categorize by pain
level.
But you also want to start 
thinking about
how you might resolve these 
problems and switch them to the 
other stream.
You know, for example, if you're
not doing any sort of 
pipelining,
you're doing manual deploys, 
building a pipeline and
maintaining that pipeline for 
the deploys would be switching
manual to automated, switching 
downstream to upstream.
Typically when you're 
remediating reactive downstream
work, it's going to be either 
short-term or long-term and 
permanent or temporary.
Now, in this case, the way I'm 
using short-term and long-term,
it's not how long the solution 
is good for, that's the 
permanent versus temporary. It's
how long it takes it to happen.
As a quick example, let's say 
that
you know that some microservice 
is flappy and
it's always flappy so it 
qualifies as a relatively high 
pain downstream situation.
But the remediation is, if I 
delete this configuration
it no longer does this and it's 
only using these files or 
whatever. You fix the issue.
You know you can do a quick 
deletion and now you  fixed the 
issue.
So, it's a permanent solution 
that was quick to implement. 
That's permanent short-term.
A permanent long-term would be 
when
people talk about
cloud migrations, monoliths to 
architecture services
. You're legacy, and it's 
long-term because
it's not in a day or sprint. 
It's done in phases.
That would be the permanent, 
long-term.
Looking at temporary solutions, 
might be
doing something like, oh, fixed 
an outage. Rebooted the cluster.
You don't know why it went down.
The short-term solution got 
everything back up and now you 
have to switch modes into
something that's long-term so 
that it doesn't go down again 
later.
The long-term, but temporary 
solution is something that
totally doesn't happen ever when
you write that script, that 
shell script. Just use it one 
time, definitely on that one 
time.
The whole company was definitely
never going to become dependent 
on it.
It was definitely going to be  
rolled into the mean application
, it never was
. It's still technically 
temporary in the backlog  
somewhere. But it never dies.
When you categorize this work, 
you need to plan and prioritize.
When you notice some of the up 
and downstream tasks are going
to be high-pain and you're doing
it repeatedly and it
takes a lot of stress, versus 
low pain, or
needs to be done immediately 
because there's a massive
impact, versus later because of 
the immediate  things, you can 
stage out your work and behave 
accordingly.
Capacity plan while doing this.
Don't plan for an eight-person 
team even if you have six.
We have two job reqs open and 
they will be filled soon, or 
some definition of soon. Make 
sure you're planning on the 
resource you have.
If you get the other people, you
have shorter time
that, rather than shorter time 
that you've expanded to longer 
time. I definitely avoided using
a logo on that slide.
Once you're doing the work, you 
need to start to iterate
. Because sometimes things take 
longer to
actually finish than you expect 
or shorter to finish than you 
expect.
And you can add more tasks or 
remove tasks accordingly.
But if you plan for a quarter 
and never revisit
that during the quarter, at the 
end of the quarter, you might be
feeling some pain. Just -- just 
a small amount.
And make sure when you're 
iterating, you're planning 
around known difficulties.
Not all difficulties are 
necessarily painful in the way 
we normally think of as pain.
It may be something like an 
internal conference that's 
coming up and everybody is 
super-excited.
But all of your  engineers need 
to make demos.
If you don't reduce their other 
capacity, that's a problem.
Or this Tweet, this company has 
a
pandemic tax, they're  adding a 
buffer into their sprints. I 
think it says 20%.
They're saying we're reducing 
your workload by a certain
percentage because we understand
what brain fog and stress is.
We're reducing your capacity 
anyway because there's something
happening and we're planning for
it.
That's the type of iterating and
planning you want to be doing 
around your workload.
And once all that's 
accomplished, you can  totally 
go back to not playing the goose
game on your lunch breaks. But, 
yes. How do you measure it?
Because now you're thinking, oh,
I can do all these cool things
, but how do I validate that 
those cool things are actually 
having results?
Because when respect results, 
you want to make sure 
everything's correlating.
And you want to prove that 
there's a reason to keep doing 
it
, otherwise you're going to be 
adding labor or changing labor 
and it might not look like 
there's a reason to the people 
you're  trying to convince.
So, let's talk metrics for a 
moment.
How do you measure upstream 
work?
And the reason this is an 
important question to ask is 
because since
upstream work isn't a direct 
thing that you're holding,
it's an avoidance in a way of 
things
this that you're not having 
happen, thinking of
melt metricses a bit 
differently.
Say there's an intersection, 
there's
a lot of accidents, it's changed
to a traffic light. You're in 
the measuring the accidents.
Any individual accidents might 
have been avoided by something 
else.
Someone looking up, not crossing
in the pedestrian crosswalk.
What you want to look at is the 
reduction in the accidents 
you're tracking.
When you have metrics, you start
thinking of them this way
because we have lots of DevOps 
KPIs and such.
But they're measuring events 
happen and you're
measuring the deltas to track a 
correlation between events that 
did not happen.
So, some things to keep in mind 
with that. Really understand 
your needs.
You don't want to say, oh, I 
want this traffic
light, but maybe if you're
not solving the accident 
problem, maybe didn't need a
traffic light, maybe not invest 
in building and that changing 
the structure that have 
intersection. You also want to 
make sure that you're measuring 
the actual need.
So, when you're saying, okay, I 
am
reducing the accidents with the 
traffic light, make sure you 
know what you're measuring.
Because if you don't communicate
that to anyone who is 
responsible for the
measuring, you say, hey, measure
stuff around this intersection, 
they're going to give you all 
sorts of data. And it might not 
help you.
Some much it is not scoped 
correctly, you get all the 
accident data for the county. 
You don't want it for the 
county.
It's going to bury that 
intersection and won't know if 
it's resolved or not.
Or might give you something 
that's too narrow or a different
type of data, tickets at the 
intersection. But that doesn't 
tell you about accident 
reduction either. When you're 
saying, what's the question? How
am I measuring it?
Making sure the scope and the 
actual type of measurement is
mashing what you think it is, 
and revisit it in case it isn't.
You also need to establish 
biaslines. this  one's harded.
If you're solving a problem 
you're not measuring, you're
not going to get a good delta.
If the light was put in the 
intersection where it's not 
known there was a history of 
accidents.
Can't say accidents Lyst week, 
last month, last year. But you 
can start tracking now. It's 
starting now.
It's not the same type of 
baseline, but it's better than 
no baseline. And this goes back 
to knowing your capacity as 
well.
Again, don't plan on resources 
that you don't have when you're 
making these  measurements.
It doesn't make sense to say, I 
want an accident reduction of 50
% by tomorrow.
Maybe? Maybe you'll have an 
accident reduction of 50% 
tomorrow. You might not. It 
depends on a lot of things. It 
could be a sample size.
If it's a smaller intersection 
in a smaller city or town, the
dataset being smaller might 
actually drop it by 50%.
But a larger dataset, it's a 
single data point. You don't 
know if it's correlating to 
anything.
It's too short of a time period 
to actually give you mean 
meaningful information. Make 
sure you're planning this way.
When you understand your 
capacity, you can understand 
what your rate of change is. 
When you're understanding what 
your rate of change is, track 
it.
Maybe not 50% tomorrow, but what
about 50% by end of year? And 
maybe that makes more sense.
But when you're revisiting it, 
as I may
have mentioned a couple of 
times, when you're revisiting
it, you might notice, oh, it's 
dropping by 1% a month. It's not
going to drop by 50% per year. 
But you can establish a 
different goal when you see that
happening. It's not a nothing, 
it's not
a zero.
Be you need to readjust your 
goals
to be more  realist i  realistic
.
And then set were goals.
If you don't know how fast 
you're going to make it through
backlogs and things, you can 
make an educated guess for the 
first goal.
But never forget to revisit it 
because now you're paying 
attention.
You're going to reset those 
goals at whatever cadence makes 
sense.
It might be sprints, might be 
QBRs, whatever it is
, just make sure that you're 
giving a look back and making 
sure that it makes sense.
Iterate always.
Because you want to make sure 
that the changes you're making
are actually making the results 
you want
. So, if the traffic light 
didn't reduce the accidents, 
that's unfortunate.
It's probably not worth the 
expense to remove the traffic 
lights just there now. But we 
still have to focus on the 
problems. Now we're iterating it
and trying it again. Maybe they 
need something else.
So, from that same sort of view 
of metrics, we're going to apply
a new perspective.
So, there are lots of well
known DevOps KPIs, I'm going to 
re-mention a couple of them and 
some other ones as examples to 
help you get started.
So, here are my symmetrical
thoughts with service level 
agreement with change failure 
rate and question volume. As 
well as keyword mentions and 
unplanned work rate. I'm going 
over each in turn. Service level
agreements. We no them, we love 
them. We don't really love them.
But we make promises based on 
them.
When we make promises based on 
them, our customers, internal
or external clients, whatever 
they are, they have an 
expectation.
Quick and easy example, support 
might have an SLA of responding 
to
emails in an hour and phone 
calls within 10 minutes. 
Something like that. And you 
might say, well, okay, we can 
get that down.
Going back to reducing whatever 
is making the things get 
backlogged over there.
You're not reducing the service 
level compliance because you're 
not compliant necessarily. 
You're reducing it to have a 
better experience. So, there are
two types of work there.
In this  situation, if you find 
that the support team is
unable to hit two those SLAs,
they need to bed a justed and 
promises adjusted.
If they're meeting the
SLAs, clients are like that's a 
long time go to get on the 
phone.
I can't go to a cafe from 
waiting. You know, 2020.
But you can say, I'm reducing my
time promises in a way that 
makes sense.
Even if it's not broken, still 
need it to make sense. But doing
that, you're doing upstream 
work. What about this team?
Can we change support, whatever,
so that they can make our new 
compliance arrangements?
You also want to talk about the 
change failure rate.
So, change failure rate is 
typically measured for DevOps
or Devs when you have failed 
deployments that make it all the
way through to production 
without being caught. So, it's a
failed deployment.
If you're doing successful 
upstream work,
maybe by pipelining if you don't
have them, testing if you're not
doing it,
increasing your test service --
surface, et cetera, all of these
things you should see a change 
failure rate that goes down.
And so, you're focusing on the 
results of the measurement.
But you need to articulate, 
going back to what you
need to know, you need to 
articulate what you're trying to
measure with this metric.
You have a baseline of a lot of 
failed deployments
and your first attempted 
upstream work is a pipeline, 
watch it with that in mind.
Make sure that you're scoping it
to that, though.
Because if you have a bunch of 
teams and deployments and
you're  aggregating the data, 
like the traffic light, it's not
going to help you.
Monitoring question volume is a 
little bit  helpful.
This is away from traditional 
DevOps
KPIs, but something you can to 
track progress through
. It's inbound through intercom 
to customer
support, or public and
private Slack channels, however 
you're receiving information.
Depending on what you're  trying
to address, question volume 
should decrease.
Maybe you adjusted the docs 
because you realized a lot of 
questions were being asked about
the docs.
Maybe you fixed a certain lack 
of a feature or maybe resolved a
bug as a feature.
Whatever you're doing, you're 
going to see questions
changed based on the information
you're making and make sure it's
tracking.
Relatedly, but not necessarily 
the same, are the keyword 
receives receive receives.
Keywords could include not 
questions if this is something 
you're interested in keeping 
track of.
Something like this might be, 
hey, I see a lot of people
campaigning about terra form on 
our public community forum. Fix 
that. It's not a question, it's 
a statement.
But it's not a positive one,
but alternatively, maybe
they're  pricing Terraform, see 
what the team put out. I want to
be cool like you.
But when they make the changes, 
you see the keyword mentions
with the temperature of the mood
increase, we need to do more of 
that, support that.
Unplanned work rate is something
that is a very key
cornerstone to downstream work
because since downstream is 
reactive and in an IT scenario, 
reactive is
mostly alerts because something 
happened that you didn't plan 
for.
Unplanned work is a pretty 
direct measure of broadly 
everything that's hitting you. 
You'll need to break it down, 
though.
Once you get an unplanned work 
rate,
that's a number, you static 
sites of the to categorize it 
and see that something is 
ballooning.
It might turn out that the 
overall unplanned
work rate is high, but maybe 
it's one thing at 90%. Don't 
stress if it's just one.
It's important to track it, 
because as your  unplanned work 
rate drops, planned work goes 
up.
Planned work is easier to work 
with because you know the scope 
and
duration mor accurately than 
something
that's gone out with no known 
cause yet.
So, to summarize, upstream work 
is proactive prevent 76-/x04 
work. Down stream is reactive.
Upstream work reduces stress and
the extended  impact.
Shifting and categorizing your 
downstream work helps you reduce
it.
Know what you're asking and why 
you're asking it so that you can
define the success, quantify 
with metrics.
If you want to do additional 
readings on any of this, please 
take a look at this slide.
The link included has all the 
resources on it and that same 
link is on my thank you slide.
Thank you for watching, I'm in 
the chat to answer questions.
And again, my name is 
Quintessence
. Have a wonderful DevOpsDays 
Chicago.
Margaret: Remember, breakout 
sessions are happening now on 
Discord.
This round includes block chain 
DevOps, DevOps and data 
sovereignty.
Full remote war stories, 
incident management
and incident command system, 
starting DevOps from scratch. 
All right.
Sasha: Hey, Quinn, it's been a 
long time.
Quintessence: Literally since 
yesterday.
I guess if we don't count 
pandemic lockdown.
Sasha: In-persons used to be 
nice.
We used to be able to go have 
dinner.
Quintessence: I know. I remember
all the food I eat at places I 
visit. It's my favorite thing.
Sasha: I know, right.
It's like, I can't even remember
the last time I went to a 
restaurant.
Quintessence: Same.
Actually, I do remember, it was 
DevOpsDays New York City
. The first and last Ent
event of the year.
Sasha: I was so glad we were all
there  together.
It's kind of a nice -- a nice 
last thing to do in person.
So, I -- I really enjoyed your  
talk.
And I've actually, you know, 
kind of spoken on sort of 
similar topics before.
And the perspective I usually 
take on
this is that we actually 
encourage people to be hears all
the time. Right?
Companies implicitly and
explicitly pay people to -- 
reward people for these 
behaviors, right?
so,
whether or not it's singing 
praise to the person who worked 
the most hours or the smartest
person in the room or whatever 
it is, there's a lot of
, you know, we encourage 
inclusivity, but we reward 
competition type of thing
, right?
Quintessence: Yes.
Sasha: So, the question is, how 
do we
convince companies or managers 
to stop rewarding bad  behaviors
and heroism?
Quintessence: So, a couple of 
different parts to that.
There's first you have to 
convince them that the behavior 
is bad
. Because it doesn't feel bad to
have someone who knows things. 
You're like, oh, thank goodness!
Someone new something.
The systems were on fire and 
someone knew something. That 
feels like a praise-worthy 
event. As a one-off, it is.
It's when it is every individual
situation isn't the problem.
But also when you're trying to 
get that buy-in, you know, I 
think it's important to have 
consistent messaging.
Let's say someone did dive in 
and they're the persona knows 
things, great.
You need to adjust their 
workflow so that they can 
distribute
. Because if there's a situation
that occurs where they're the 
one
that took action because they're
the only one who can
because maybe they're out of 
hours or on PTO or whatever
, you're forcing them to repeat 
that burnout behavior and it's 
not visible to the business. 
Anyone beyond maybe their direct
manager wouldn't be aware
of  that. It's not their job to 
keep track of it.
It's the manager's job.
Has to have buy-in from the 
manager to
sell upwards what they're doing 
to shift the workflow and why
it's relevant and show the 
decreased burden in onboarding.
You don't burn out employees, 
you keep them.
I have had a long tenure at a 
company
, there's average 2- # year job 
cycle.
As recently as a decade ago,
people would stay for a decade 
at a job. Some of that is  
burnout-related. You could sell 
that. We could do that again, if
only.
And you could upsell it to the 
business that way.
Margaret: You mentioned turnover
as a metric.
What other tools as a manager or
a non-manager can you use to 
sell it upstream?
Quintessence: You're kind of 
looking at the way the metrics 
touch what you directly want.
And you're looking at how they 
move in the opposite direction 
from what you want.
So, for example, you want more 
features. That's something 
people say a lot in Dev. We want
more features.
But our feature releases are 
down again. And again and again.
And the reason they're down 
again and again is because of 
high  turnover and onboarding.
They are down, but if we allow 
it to be
this is, but don't prevent it 
from being just is.
Actually we could do more 
release faster and not
just in the we need to automate 
a pipeline kind of thing, which 
is also a thing, but also in the
protecting our humans way.
Our humans will be able to focus
better and produce
higher quality work and fewer 
errors et cetera even
without the automation if we 
guard their brains.
Sasha: That's interesting. 
There's another part of this.
We just talked about, how do we 
discourage managers from 
encouraging heroism? But the 
truth is that we like to be 
heroes, right?
Each one of us enjoys being in 
the spotlight
and being the smart person in 
the room and being
rewarded for solving the 
problems and that sense of 
importance that cos with it.
So, how do we convince people to
stop  acting that way?
Quintessence: I think you 
touched on it a little bit.
You started by saying, oh, well,
we like to be the smart person 
in the room.
But then you said, because the 
reward feels good. You have to 
change the reward system.
Because we're not reacting to 
being the smartest person in the
room by itself.
Because if that's not a rewarded
behavior, we don't respond to it
. Our brains are reward-
oriented.  Right? So, we're 
looking for that dopamine.
And if we shift to where the 
reward is, then we're gonna 
follow that shift.
So, if the reward is less about,
oh, you solved
this problem super-fast, you're 
the best person ever, you're the
10x engineer. Ah -- right?
And you shift the reward to say,
oh
, thank you until the next day 
to get back to me. I know I 
messaged you out of your hours. 
yo explicitly state it.
You don't let it passively 
happen. It's going to feel weird
at first.
But you have to explicitly state
the change in reward structure 
or incentivize. Or unlimited 
PTO,
they aren't directly punished 
for taking time off.
But if you shift the reward 
structure to, I'm so glad you 
took the time off.
Which someone at a different 
company
, Logz, I was visiting their Tel
Aviv office,
there was a discussion between 
the American and the EMEA 
employees.
Someone was like, oh, I worked 
through my sick and I got -- and
one of the EMEA
employees  said, maybe you would
have worked better if you just 
took the work off. Just complete
my dead panned.
But that's the shift in reward 
structure,  right?
You would reward taking the time
off when you're
sick as opposed to reward 
pushing through.
Margaret: That's a good example 
of culture. Continental and work
place culture.
If you don't work through your 
sick time, you silly Americans. 
But also, you know, you hero 
worshipers.
And I would say that Israelis 
are also very hero cultured. So,
that kind of surprised me.
So, I wanted to say like the 
other question I
had -- I wanted to ask --
Sasha: The other question I 
wanted to ask is when we're 
interviewing for a new job,
what are the warning signs to 
identify this is a company 
that's all about heroism.
Like, I really don't want to 
work here because I'll probably 
get
burnt out in, you know, the next
few months?
Quintessence: Yeah, you would 
start to try -- similar to the
way you can't measure what you 
want, you can't ask
what you want, unless you're in 
a comfortable interviewing 
situation.
How would you handle it if I 
didn't respond to your question 
on PTO? Because they're going to
know the right answer to give 
anyway. Right?
But you're going to want to 
start to ask around it. In your 
implications.
Start brainstorming before you 
get to that level, section, 
whatever of your interview 
process. You can start to think,
okay.
I need to be able to guard my 
time.
Start asking them how they view 
PTO, or how to truly unplug. And
their expectation.
When I unplug, am I expected to 
have Slack on my phone, 
whatever?
So, that I'm still reachable, or
can I truly disconnect?
When you start to ask around 
your expected
behaviors, you can see if they 
never expect you to truly never 
be offline. That can be a red 
flag.
Or start to see,
-- you could usually ask how you
handle if you missed a deadline,
dropped the ball,  whatever. 
What kind of supports are in 
place? How do I reprioritize? Do
you reprioritize?
Try to ask around and get them 
to
reveal the process instead of 
the canned right
answers and that will reveal red
flags.
And at least in corporate 
America, it's  heroic, and 
that's not even IT-specific. You
might have to do tradeoffs.
Okay, they don't expect me to 
truly unplug.
But mitigating factors and you 
might be making decisions like 
that too.
Margaret: And like you said, 
processes.
Another process I've found 
that's helpful to look for is 
on-call rotations. Is there 
on-call? Do you have that 
concept, how do you escalate?
If there's not a good answer, 
that might mean everyone's 
on-call all the time. Red flag.
Or if there is an on-call 
rotation, do you respect it?
If you're not on-call, do you 
actually unplug?
Quintessence: Right. That would 
be situational to your role.
How do they pull in people into 
the oping.
But at PagerDuty, with the 
incident command.
We have one that's not a subject
matter experts explicitly so 
they can do their thing.
And how advanced or
gran granular they are.
Margaret: I forgot you worked 
for PagerDuty when I slipped in 
on-call.
Quintessence: I thought that was
on purpose.
Sasha: This is an indicative 
question.
If the company doesn't have an 
on-call, then everyone is 
on-call all the time
. And you just immediately know 
it's going to happen
. No matter how far you are from
production, you're going to
end up fixing an issue on a 
Saturday night after the second 
margherita.
That doesn't bode well for 
anybody.
Quintessence: No, it doesn't.
Sasha: One of the advices I hear
often, make
people go on vacation and see 
how the team deals without them.
I've never seen a company 
actually execute it.
I've never seen a company except
for people who are regulated who
have to do it actually force 
people out on vacation.
So, how can we handle this 
capacity planning and how
do we create a process to ensure
that people actually
take time off and don't become 
Brents?
Quintessence: So, that a little 
bit falls on the direct manager.
Because you can't expect people 
a degree or two or five like 
separated to be keeping track. 
So, when you start to socialize 
the manager.
But the manager is going to be 
mirroring behaviors.
If the manager is taking freely
take PTO, the employees will, if
the manager is unplugs, the 
employees will.
The employees are going to start
to mirror the manager or
whomever they're directly 
reporting to to try to get 
whatever the cultural baseline 
is.
Sometimes the encouragement can 
be very passive like that
. You're setting up and mirrors 
whatever you want to see and 
kind of getting people to 
follow.
But if you notice people aren't 
following, setting you were and
modeling this behavior, but 
people aren't
acting on it for a variety of 
reasons, you can still start to 
poke  them.
For example, my birthday was in 
June, and
I didn't initially want to take 
a
day off for it because I was 
starting in PagerDuty in May. 
And my
manager was on Slack, why aren't
you taking a day off
for your birthday? Because I 
just started. I don't care.
We don't do that here.
That's a kind and emotionally 
gentle way of doing it
. She doesn't let that slide and
passively allow me
to think or be under the 
impression that it would be a 
problem when it wasn't a 
problem.
So, that's another thing that 
people that are higher in the 
power hierarchy can  do.
Where they can just take action.
Sasha: One concern that I will 
have with this accommodation is 
it puts the responsibility 
personally on a manager. Right?
And people join companies, 
leave, managers.
Quintessence: That's why I'm not
a manager.
Sasha: And you see sometimes 
managers come from the outside 
and have a totally
different culture than your 
organization and that can impact
the entire downstream from them.
Quintessence: Yes.
Sasha: And I do think there's 
got to be something we can do as
a company to ensure that
we encourage the behavior in 
these managers so we can protect
their people from burnout.
Quintessence: It's just shifting
it further upstream.
Because every manager reports to
someone all the way up to the 
C-Suite.
When you start making that 
change, if you have the execs 
behaving a way and then the 
managers below them mirroring
it and so on. It's downstream 
from them.
But the company culture, the 
people
in veto power, the employees 
can't push
back if they want to take PTO 
and they're
told no or retaliated against, 
that's that
. But if you have people higher 
and higher up that are 
normalizing the behavior, it 
will start to follow through the
rest of the organization.
But beyond that, you want to 
make sure managers are getting 
training that's relevant. It's 
not something to think about as 
a manager.
Like a Dev or an Ops manager, 
you're thinking about the direct
problems that you're addressing,
the Jira tickets or  whatever.
And not about everything 
surrounding. Managers are 
thinking about people problems.
If they're hourly, clocking in 
and out. Oh, did I do
a meetup with them?
And thinking about action items 
and training them around the
remaining human needs that they 
need to be paying attention to 
as a manager.
Margaret: An anecdote to Sasha's
question
, she's worried about leadership
and doing it from the top down
. I have been lucky on my 
current team to have experience 
of doing it through peer 
pressure. My team right now is 
super-awesome. We joke with each
other. Hey, what are you doing 
online? Get offline.
Quintessence: Yeah.
Margaret: We're peers and we can
joke with each other.
About 8 months ago I took 
maternity leave. My coworker saw
me checking email. I have AD 
access, I will turn off your 
email.
Quintessence: Nice.
Margaret: It was a nice, like, I
punish that behavior. But we see
your work and thank you. But we 
got it. And knowing that --
Quintessence: And that's, again,
creating that pressure. Yeah, 
yeah, yeah. And that's creating 
the pressure. So, you can do 
that.
The reason I focus on managers 
is because they're kind of in 
their own little island.
Like if you want to do 
organizational change, you kind 
of have to start where you're at
and push out. Or start at the 
top and go down.
You can't easily, like, I as
the developer advocate, if our 
CEO wasn't
doing XYZ, I'm not going to be 
able to convince everybody down 
here to do it.
I might be able to convince our 
team because we do have close 
relationships. Let's do this as 
part of our rules of engagement.
And then we might say, hey, this
works for us other teams, do you
want to add that to your rules 
of engagement?
And that would slowly distribute
and push up.
But yaw can't successfully as a 
single person push down or up.
Margaret: I have a question and 
I want to tie that into your 
metrics. Because that's another 
thing that I really think is 
important.
Starting with what you have and 
then how do you turn
that into metrics that you can 
show off?
Quintessence: Right.
It's important to make sure that
the metrics are there -- the 
metrics might be too broad or 
too granular.
Make sure that you're checking 
that what you think you're 
measuring is
telling you what you need it to
. But make sure you're pushing 
through on setting the metric.
So, for example, let's say 
you're trying to do something 
with human capacity and you want
to increase capacity.
You want to push out from
that and say what do we need to 
do to increase more free people 
without adding head count, 
right?
You're going to start looking 
around measurements for that. We
don't need these people. Fewer 
of this and more of that.
And tally the metrics around 
what you're going to be doing.
And shift that to your available
engineering hours.
Sasha: So, one of the questions 
that came up is we're currently 
in
a pandemic environment and not 
everyone is lucky enough to work
for a great organization 
already.
And not everyone has a luxury of
quitting their job
find and finding a new 
organization.
There are organizations where 
command and control is the 
foremost
thing and management thinks 
developing
developers are lazy and they're 
working from home and need to be
controlled and we need to 
encourage, you know, total 
surveillance type of thing. 
Right?
How do you even bring out the 
idea
that, you know, we need to be 
concerned with burnout and brain
fog and worried about hero 
culture?
Quintessence: I would start 
small and push out. So, you 
might just start with yourself.
Because, again, if you're 
working in an organization 
that's developed incredibly 
toxic behaviors.
People say an organization isn't
going to change culture
unless they're brand new or in a
bankruptcy moment. At a time 
don't have a reason to. It works
well enough or sustains.
If you're pushing against what 
works well enough to sustain.
There's a book by Dominica that 
talks about making work visible.
If you say I'm  burning out, be 
they don't know what you do.
More specifically, writing code,
they don't know what you
do, they don't know why it's 
happening, they can't take 
action even if they're motivated
to do so.
Maybe you can write up a report 
for a week or a month or 
whatever makes sense.
Start making the work more 
visible to yourself and scope it
out and push outward. Now we 
have all this work.
I'm telling you what my work is,
which
parts are redundant and which 
are removable and how to shift 
into the moving  parts. There's 
no easy buttsen in that 
situation. It's going to be a 
slow push.
I think it can slowly be done, 
especially when you start to get
buy-in
from other teammates and other 
teams that are probably 
suffering too, right?
And when you start getting that 
done, that's how you would push 
out.
Sasha: Yeah, so, another 
question is, do we have any 
resources
on your metrics that we talked 
about that people can find 
online?
Quintessence: Yeah, I linked to 
my talk in the channel
in the Discord and it has all 
the resources and some 
supplementary reading that went 
into the talk. Including that. 
Yes.
Sasha: All right.
So, we're at time and we'll be 
right back with our first round 
of Ignites.
And please don't forget that you
can visit our sponsors
channels and talk to the 
sponsors who
made this show possible and with
we couldn't have done the 
conversation without the 
sponsors. Thank you so  much, 
Quinn, it was a pleasure. See 
you again in person.
Quintessence: I know, sometimes 
soon-ish.
It would be great to see you.
Sasha: Thank you.
Quintessence: Bye.
≫ Next:  Igniteed Round
Ignites Round One
Sasha: Hi, everyone, welcome 
back.
The next thing on our schedule 
is Ignite talks
. At an  in-person events
, Ignites are 20 slides with 
autoadvance every 15 seconds
. Because of the length, many 
folks think Ignite talks are 
easier than full length talks.
But they're challenging because 
of how precise you have to be 
with your timing.
Thanks to this event being 
virtual,
this year we asked our speakers 
to create talks that are
exactly 5 minutes long and they 
could experiment with the rest 
of the format.
Here are the speakers for the 
first round of Ignites.
Laura Santamaria, developer 
advocate at
dog DNA.
Ren e Henry Quinn, clarity 
solutions
. Jeff Smith, in the Chicago 
community
and at Cheyenne Centro in 
Chicago.
And Jeremy
Meiss, CircleCI. Laura, over to 
you.
Laura: Thanks.
Hi, my name is Santa Clarita .
, developer at loving DNA.
I think you should sit down in 
the middle of a production 
incident. You can meet coworkers
and learn how everything is put 
together. Why do I think this 
way?
Well, we all deal with 
production instances every day, 
don't we? Hopefully not every 
day.
But you're able to learn so much
about all the components 
end-to-end and how they're put 
together. Let's start with a 
story.
A long time ago in a corporate 
office far away
, I actually left basic 
orientation to see
a whole bunch of Ops people 
sitting at a table while 
everyone else was filing into a 
conference room. Something there
should be a bit of a warning. 
Hm. I wonder what was going on?
So, I went over, plopped myself 
down and  started writing as 
fast as I can.
People thought I was crazy to 
start drinking from the firehose
that fast. They may be a little 
bit correct.
However, I do think it was a 
really great way to learn about 
everything going on.
The very first thing you to keep
in mind was stay calm.
You don't want to get everyone 
else worked because
you are just freaking out that 
this is your first production 
incident. It's okay. Don't 
worry.
You want to just know, 
everything will be fine.
Then go ahead, grab a seat, 
don't be shy
shy  people might not know you 
very well, but this is a great 
way for them to get to know you.
You sit down, start asking 
intelligent intelligent 
questions and start listening. 
Listening is the big part.
The best thing to do is to offer
to scribe and write down 
everything that's going on. Why 
is that a good idea?
Well, you learn about all the 
components, because you have to,
and you know what's going on 
because everyone's telling you.
So, it's a really great way to 
do it. But don't worry about 
being perfect.
If you have typos, if you're not
sure what a word is, get it all 
down with context.
Someone will be table figure out
where you went wrong and they 
will tell you what you're  
missing. Great way to learn, 
actually.
Also, write down everything 
that's going on.
You don't know what small detail
could actually
have caused the problem or lead 
to a solution
. So, get it all down where 
someone else can see it.
While you're writing everything 
down, don't be afraid to prompt 
for updates.
People get really focused on 
what's going on and they don't 
think to ask, hey. Do you know 
what's going on yet?
Hey, what's going on? So, try 
looking over someone's shoulder.
My personal favorite, I like to 
describe current state. Why?
I'm gonna get something wrong 
and everyone's favorite
past time on the Internet is is 
to tell someone they're wrong on
the Internet.
I'm going to get updates, learn 
something new and everyone is 
going to participate.
Also, in person, I try
to sit between the two most 
experienced people in the room.
They have to talk over or 
through me
to get through what's going on 
and I can hear all the 
troubleshooting steppens PSF it 
a way to learn.
I'm going to take copious 
personal notes along with the 
notes for the incident. That's 
to look up later.
Hey, I don't know what that 
means, I'm going to write it 
down and figure out what it is.
That's the next point, be sure 
to only ask
crucial questions.
Not what's that a thousand and 
one times, not  devil's advocate
questions.
Those sorts of things will get 
people really annoyed at you and
you won't learn anything.
But if you happen to have some 
prior knowledge, don't be afraid
to speak up.
Your past experience and your 
prior knowledge coming to this 
incident
are going to help make or break 
a
new feature, a new  fix, 
whatever it might take. Because 
you bring a fresh perspective.
In fact, you're an outsider  
looking in.
This is that perfect golden time
to share the information that 
you have
and help make sure that the fix 
is a permanent one.
Along with that, as you're in 
the middle of all this, maybe 
you're bored. Offer to help 
monitor the fixes. Offer to help
monitor the environment.
You're gonna learn something 
new, you're gonna give somebody 
a chance to breathe
and you're gonna make somebody 
really happy that you're sitting
there
helping them out with all of 
that monitoring that you're 
doing.
As you're sitting there, make 
sure that  afterwards,
you participate in the 
postmortem incident
report, the post incident 
meetings, anything that you can 
be part of
. Because you're gonna learn 
something new and you're gonna 
be able to
participate as a full team 
member because you spent all of 
that time doing all that have.
And don't forget, I mean, after 
all, this is a
learning exercise for you as 
well as helping to fix what 
production incident is actually 
happening.
So, sometimes sitting there 
drinking out of that firehose is
useful.
Especially if you're going to be
at the helm of the ship next.
For me, I  learned a ton, met 
everyone, figured out how it all
worked. Pretty good for my third
week on the job. Thanks! Have a 
great day.
Henry: Thank you.
So, I'm Henry Quinn and this is 
leading a digital transformation
at the speed of government.
My first job right out of school
was the sole program for the 
Connecticut district courts at 
the federal level.
I worked at a beautiful art deco
building in New Haven like this 
slide.
I was helping with jury teams, 
civil and criminal dockets, 
opinions, docket timelines. I 
was sitting on a treasure-trove 
of data.
And you might be wondering, what
did you build with all of that 
data, Henry? And the answer is, 
absolutely nothing.
I built a couple of tools if HR 
and
got invited to go to Disney 
World to speak about them.
But we had this mountain of data
and did nothing
with it until I threatened to 
leave and find a new job.
Manageit was given me to placate
me. Getting a judge started is a
big investment.
That's $15 to 20 million a year 
spent on judges getting ready 
every day.
That seems like a problem to 
solve.
The catches was the app was 
written in ColdFusion.
Static app, 16 courts by the end
of the year,
and again, it was written in 
ColdFusion
? in 2017, just insign. In order
to get them stood up, no 
automation.
Perfectly doing everything by 
hand, it took a week top to 
bottom to get one court 
onboarded.
gettingget 50
courts onboarded would be the 
next year. It was untenable. I 
talked other developers having 
the same issue. There was a lot 
of buzz about containers and how
they can solve our problems.
And through talking to the other
judiciary
developers, that was probably 
the move to get through this 
particular hurdle. Unfortunately
my boss said hell no to 
containers.
You have to set up the 
application the way I 
architected it, a static 
application. And I was spending 
my time working on features. 
Static apps can be hard to 
scale.
This is the first step of my 
leadership hierarchy failing me.
Small we managed to get funding.
We were handcuffed to DC and
an administrative office, they 
had a lot of ideas and what we 
needed to be doing.
That's part of the politics game
of working with the district 
courts, like the federal system.
And there were those all over 
the country, we had to run in
the shadows and step out of the 
leadership hierarchies just to 
get our work done. It was
infinitely easier to go a little
rogue than to color in the 
lines.
Ask permission  forgiveness, 
permission.
I was only one of three working 
with ColdFusion and other 
documents.
And I managed to shove it into 
containers and made it work on 
Kubernetes. That's about the 
point when the money started 
flowing. Our handler loved that 
we were able to make it 
scalable.
Gave us money to hire more 
developers to help finish up the
application
. We got invited to alpha test
a new cluster to be a new 
strategy for the entire federal 
court system.
But again, instead of
spending the money on people, it
was spent to buy back time on 
the app.
The Phongs in DC couldn't do 
much than cheer
us on as I was going through a 
personal hell.
However, spinning up the 
environment with one command in 
the
containers made it so I could 
work remotely and not on 
ColdFusion.
I banged out and got it through 
security testing and good to go.
And finally push forward and 
make good progress.
Finally hired developers to 
focus or not code
and helped me get the deployment
time down from a week to 10 
minutes.
That was a 99%ing savings I was 
not authorized to make.
I had to step outside of the 
leadership hierarchy to fulfill 
it.
But there was a shift at the 
top, someone high up
in DC resigned and they made 
their passion
projects a priority, and I broke
all the rules,
but government red tape still 
managed to get  in the way. 
Really annoying. We were looking
for resources just to onboard 
the next three. In a government 
agency with
near unlimited resources and 
data, we can do everything right
and still lose
. Could have saved 15 million a 
year, but it was hard to find a 
few thousand in our budget. 
That's when I decided I needed 
to leave.
I documented everything, passed 
off the build scripts and flew 
to San Diego.
I rented a car, drove to 
Seattle, it was a good way to 
clear your head. I recommend it 
if everything gets back to 
normal.
I taught myself to program and 
got the way to
save the government to save $15 
million a year, and got the rug 
yanked out. That's before I got 
involved in the community.
If there's a takeway, think 
about all you can
do if you take initiative, lead 
from the bottom and step outside
of your leadership hierarchies 
when you need to.
With the last 15 seconds, thank 
you so much for tuning
into DevOpsDays Chicago and I 
hope you reach out and talk 
about
gov tech or building healthy 
tech communities
communities. Thanks again!
Jeff: Hi, I'm here to talk to 
you today at my
Ignite talk, cutting cloud costs
during COVID-19
and a few strategies, three in 
particular to reduce spend.
I'm Jeff Smith, director of 
production operations at
Centro here in Chicago.
This slide was taking longer 
than I anticipated.
And I wrote a  book, operations 
antibiotic
patterns with DevOps solutions, 
please buy it so I can feed my 
family.
The first method was the Amazon 
EC2 reserved instances.
And reserved instances are a 
building construct on AWS's 
side.
You commit to a particular level
of compute and enter into the
agreement with that agreement, 
there's reduced pricing.
Typically the things that you 
make a decision on are a 1 to 3 
year term.
Submit to the spend monthly 
regardless of if you use it or 
not.
And then you have different 
payment options, all up front, 
no up front or a mix of the two.
Depending on the level of 
commitment, Amazon passes 
different levels of savings.
You can see all up front is 
highest benefit with 41% off.
But even without up front, 36% 
savings over the on demand. 
That's easy to save money. But 
it's reservation frustration.
Again, reminding you, you are
committing to the spend whether 
you use the instances or not
. It's easy in a pandemic to no 
longer need the compute you 
needed
previously but you're stuck with
that stuff anyway. Be careful 
billion reservation
instances.
Another thing was shutting down 
what we're not using. That's not
particularly new for us.
But when we looked at it and got
aggressive, shutting down 
instances after business hours. 
From the hours of 7 p.m. to 7 
a.m.
Monday through Friday and over 
the weekend, it was a 37% 
reduction in
savings versus the reserved 
instance price. Not everybody 
the on demand price. 39.
42 per month for reserved 
instancessed, or 24.48 with the 
instances. Spot instances were 
the third thing.
That's AWS selling you excess 
compute.
That's aggressive discounts, 
sometimes as high as 90%. 
Mileage may vary.
But for us, we were seeing a 
minimum of 37% reduction versus 
the on demand pricing.
But the other nice thing about 
this is, if you mix the
spot instance working hours with
just shutting instances down in 
the evening, we were looking at 
$4.
92 per instance if we only left 
it running Monday through 
Friday.
The nightmare scenario with spot
instances, if capacity
gets high and they need
extra volume, your instance 
could get terminated. Instances 
could get shut down.
One way to protect is instance 
flexibility. You can request 
multiple instance  types.
If one becomes in high demand, 
it will roll over to another 
instance and another instance.
And even if you have to oversize
your  instances, it's still
a savings over the on demand 
price. The other is mixed 
instance policy.
You can request within the same 
group on demand as well as spot 
instance.
I need three on
demand instances to ensure work 
is being served, but fill out 
the rest with spot instances.
With spot instances, you get a 
nice notification that lets you 
shut down your applications 
cleanly. You can monitor when 
you get this response.
You know that you need to shut 
down your workload because the 
instance is going to be 
terminated.
So, with spot instances, people 
are nervous, where do I start?
Work with interruptible 
workloads, background
infrastructure tools and 
stateless applications where you
can implement them.
We used our Jenkins DCI build 
slaves
. Because it's  ephemeral, and 
you don't care about the 
instance.
If you have different node types
and labels in your CI tool, you 
can limit the blast radius.
And the impact on developers is 
really that they need to rerun 
the build. You can get a lot of 
mileage out of that.
People also ask about spot 
instances and the frequentiy of 
interruption.
Since May, less than 2% 
interruption rates. It's 
typically in batches.
See a chunk of interruptions on 
one day during a 4-hour period.
Now, there's a few Kafka 
caveats.
One is RDS is usually the 
biggest spend component, there's
no spot instance.
And the sleep-wake, you have to 
coordinate how the instances 
come up
and down to make sure 
dependencies are available for 
things like web services.
Highly recommend the AWS config
toolss  awise well so they have 
meeting tag dependencies.
There's nothing worse than 
looking at huge chunks of spend 
that's a no-tag key environment.
I know that's a lot to talk 
about in a short period of time.
But if you want to hit me up on 
Twitter, I'm at
dark and nerdy and would lo of 
to chat with you more about  it.
Thanks!
Jeremy: Thank you.
Hi, I'm Jeremy Meiss
and from '95 to '97 I
worked in Kansas City doing
the eCommerce for the mission 
impossible and independence day 
movie
everies and worked with Apple on
early
eCommerce initiatives pitches 
hired
as tech support and build out 
great CD
s Netscape navigator, gofer, 
Archie
, wind sock, and progressed to 
cyst
ad
to sys admin
. And we were running the new 
CAT5
cable to the offices and let the
contractors connect them to the 
wall outlets and number them 
there. You can see the issue 
there.
Unwith of the many problems is 
wherever
there were issues, we didn't 
know which outlet was connected 
to which port.
The CTO said we should identify 
which pass panel port went to 
which outlet at each desk.
The fact that this should have 
been done from the very 
beginning and by those of
us who actually ran the wiring 
instead of the contractors was 
lost on imhad.
And it wasn't the first nor was 
it the last such example in my 
three years there.
And we routinely butted heads 
around things like that.
Now, I  mentioned we. Which of 
course meant me.
I sat at my desk and
mentally calculated how long it 
was going to use the
tone generator on the roughly 75
connections to map to the patch 
panel port behind  me.
As I calculated, roughly 8 to 10
hours to do that.
Meaning I was looking at almost 
a day and a half at least being
fully devoted to this asinine 
project.
My 20-year-old attention span 
was not  amused.
I remember sitting and looking 
at patch panels behind me. And I
had a eureka moment.
I reached up, unplugged a cable,
looked back, 
turned to face the desk, looked 
at the phone and waited. Sure 
enough, I got a phone call.
Now, I've reproduced the 
conversation as best I can here.
It was a bit like this.
Complete with, Eudora, Geocities
, and let you take the next few 
seconds to marvel
at the sheer genius of my 
20-year-old brain.
I had found the magical 
solution.
Over the next 2 hours, our
building experienced 
intermittent network outages
that affected everyone in groups
of 10, I wasn't a monster.
Everyone needs to feel that 
sense of community over random 
times
. However, I successfully 
completed
the task with the minimal mall 
amount of effort.
I implemented a few of the core 
DevOps
principle that Emily Freeman 
laid out in her fantastic
book, DevOps for Dummies before 
they were a thing. Review them 
now.
Encouraging teamwork, instead of
taking on the project myself, I 
officially roped others in.
They unknowingly provided the 
legwork, allowing
me to eat bonbons until the next
group of people recognized that 
they needed to join me on the 
project and give a call. 
Reducing silos. So, I live in 
Kansas.
The silos are a regular thing 
across the plains as you travel 
across I-70.
I can say that as a result of 
any brilliant
plan no new silos were created 
during or since. I would say 
that's successful. Practice 
systems thinking.
I holistic, approached
the problem by using the 
resources, patch panels and
switches within easy reach, a 
pen ready to write down the 
location and it had plenty of 
ink.
Learning from failure.
I recognized the failure in not 
marking locations
when the wires were run, which 
left me
with a learning opportunity how 
to automate.
I learned that a ballpoint pen 
wasn't
nearly as effective as Sharpie 
so I learned from my mistake. 
Communicating.
Telephones, they used to be used
to talk on, not
text, do TikTok videos in your 
bathroom with the toilet up. I 
used it.
I communicated with multiple 
people over a few hours and
they were all extremely thankful
for my 
resourcefulness and skimps 
accepting feedback.
I heard loud and clear, took the
feedback, rolled it into 
actionable results by plugging 
their cable back in.
I accepted they were happy with 
the service I had given and I 
would say that was a very 
productive morning.
So, the challenge with any new 
project, whether it's software 
or building
a deck, is how quickly you can 
repeat what you have just done.
Making changes along the way 
thus making way for the next 
sprint
. I was able to take a one or 
two process and turn
it into less than half a day 
success story by breaking it 
down into groups of ten. And 
finally, automation. This is 
where this 20-year-old really 
excelled.
I was able to automate the 
manual process of going
to 70 outlets to never leaving
my chair and having everyone in 
the building do the task for me.
I dare say that was also a good 
example of teamwork.
This was completely in jest and 
not at all the correct
application of DevOps principles
orreally a good way of working 
for anyone or anything.
I have learned a few things 
about DevOps since then, or so I
hope. Thank you.
Sasha: Hi.
Thank you so much to our Ignite 
speakers.
And now it's actually time for 
lunch  break.
But we will continue to have 
really fun content for you all 
the stream while we're taking a 
lunch break.
And please feel free to continue
chatting in Discord the whole 
time.
And we will see you back on the 
stream at 1 p.m.
Hi, everybody this
next:  The Bone 
Talk:  Resilience and
resilience engineering explained
Richard I.
Cook, M.D.
Matt: Welcome back to the 
afternoon of DevOpsDays Chicago.
I hope everybody had a great 
lunch.
If you didn't get your 
compliment
ary delivery to your house, 
probably because you missed that
check box on Eventbrite. Better 
luck next time! That being said,
so far the event is off to a 
great  start. We've had a bunch 
of amazing talks this morning.
Lots and lots of participant 
conversations in Discord. It's 
all really exciting.
Don't forget to visit our 
sponsors in their sponsor 
channels in Discord. Say hi. A 
bunch of them are giving stuff 
away.
Like really giving stuff away, 
not lying about it like
I did just with the pizza and 
Italian beef.
But speaking of things that are 
not a lie, things that are true,
and we should all learn about, 
we're coming to our next talk.
And this talk is by Dr.
Richard Cook who is well known 
in the field of resilience 
engineering.
You haven't seen it yet, I 
highly recommend checking out 
Dr.
Cook's article, how complex 
systems fail at how.
complexsystems.fail. Really easy
to remember, right?
It is my great pleasure to 
introduce Dr. Richard Cook.
Dr. Cook: Hi, my name is Richard
Cook.
And we're gonna talk a little 
bit about
the marvelous
resilience of bone and its 
relationship to resilience 
engineering.
This is a simple talk about 
complex systems, well, as simple
as we can make it.
It is a complex system, it's got
lots of moving parts, there's a 
lot of complexity here.
But we're going to try to 
simplify it out so we can see 
the
really important bits that 
relate to resilience and also to
DevOps. And I'll begin by 
asserting this.
Bone is not a model of or a 
metaphor
for resilience.
Bone is
the archetype of resilience.
I hope that you think about bone
whenever the subject of 
resilience comes up.
Resilience in bone is the source
of information about what Vince 
real is.
And if you're looking at 
something and asking, is
this resilience or is this 
resilience engineering, I hope 
that bone will come to mind for 
you.
If bone is the archetype
of resilience, what then is 
resilience engineering
? We're going to try to answer 
that question and I think you 
might be a bit surprised at the 
answer.
Bones resilience,  firstful all 
is what I call
Woodsian, that it
is it fits Davis
wood's resilience, the first 
being  graceful extensionbility,
the sending is sustained 
adaptability.
The key features are that it's 
expensive, it requires
a continual input of energy and 
resources
. It's delicate, can be 
disrupted by loss of feedback, 
it's susceptible to disease. 
It's somewhat limited.
It has a peak and diminishes 
over time. It doesn't last 
forever.
It lasts for  someone's life 
lifetime. It's got got a limit 
to it.
The most remarkable thing about 
bone, it's continuously 
remodeled. The adult skeleton is
replaced every 10 years.
The making of bone is in dynamic
balance with the destroying of 
bone.
So, you don't really notice that
it's undergoing remodeling.
To you the skeleton seems 
relatively static
. But that's because the balance
between new bone being created
and old bone being chewed up is 
kept so
that there's just about a 
constant amount of bone
and the structure feels very 
stable.
It's also important to note that
the demolition and destruction
of bone happens along lines of 
mechanical strain.
The kind of placement of
bone is dependent on where the 
mechanical strain lies.
Bone repair, what happens when 
you break a 
bone, is an extension of the 
remodeling process that engages 
some additional ecosystem 
niches. Andweek talk about 
repair.
And the primary store for 
essential elements
, notably calcium and 
phosphorous.
Calcium is the most important, 
you have about a kilogram of it,
most of it is stored in bone.
But it is absolutely essential
to have the activities, 
intracellular and
extracellular that make up the 
activity of the body.
Finally, bone and the resilience
derives from signaling.
There's no master controller of 
all this, there's a
messy layered network with lots 
of crosstalk
between the organism, the 
anatomical regions,
the cellular and intercellular 
activities going on.
All of these are involved in 
various kinds of signaling that
helps direct bone and causes it 
to be laid down and chewed up.
Now, bone has both a complex 
macro and microarchitecture.
The macro architecture is 
something you have seen before
. The exterior of the bone, 
compact bone, is solid and 
stiff.
And then there's this kind of
woven, spongy material in the 
middle, cancerous bone.
At a microscopic level, a much 
more
complex architecture that 
relates to where the cells are 
actually located and how they're
communicating with each other. 
Bone undergoes continuous 
remodeling. The use of bone mass
is energyingly efficient.
That is, we don't have a lot of 
calcium to make bone from.
So, we need to put bone exactly 
where it will do us the most 
good.
That is particularly along the 
lines of strain that the bone is
trying to resist.
Regular strain leads to a 
regular pattern.
That is, if you cut through the 
bones of multiple different 
people and
look at the sections  
side-by-side, they look very 
much the same. This pattern is 
tuned.
the organism is prepared for 
future stress by the kinds of 
bony patterns that are built up 
as bone is laid down.
The pattern  emergents.
It's not programmed, there's no 
molecular
signal that says make the bone 
in this particular pattern.
In this picture in the upper 
left, the head of the femur, you
can
see a line that crosses it about
midway through, that line
is actually produced by the
pattern of strain that's going 
on in the bone in a lifetime
. It's not programmed in a 
molecular
way, it's an emergent property 
in the bone.
This remodeling that goes on is 
very  complicated.
There are cells which 
specifically laid down bone, 
they're called osteoblasts.
And cells that chew up bone and 
return the
calcium to circulation, those 
are osteoclasts.
And the reliance between the two
cells leads to the
realignment of structural 
strength to fit the needs of the
body
. There's both local direction 
in terms
of the strain lines that are 
sensed by the cells and cause 
the
bone to be laid down along the 
lines, and
global module halation, in the 
sense that the parathyroid is 
controlling
the amount of calcium and the 
overall function of the 
osteoclasts.
It is this mechanism that 
repairs
microfractures that are 
happening all the time, and this
mechanism
allows us to recycle calcium
and potassium and have calcium 
ready for  use.
Calcium has to be regulated in a
very narrow region
. This is done by having the 
osteoclast constantly
chewing up bone using hydro 
chloric acid that they secrete 
and returning
that calcium back into the 
circulation where it can be used
for contraction.
As I said, there's only about a 
kilogram
of calcium in the body.
Most of it, 90% of it is in the 
bone.
It's absorbed from the
intestine under the incidence of
vitamin D,
and secreted under the 
parathyroid hormone.
The macro and microare  
interacting here
. The signals at the cellular 
level
part of the overall signals that
are sensed
by the parathyroid and
controlled by the secretion of 
the parathyroid
hormone and calcitomin.
Bone is being laid out and menud
up in a ratio that is 
one-to-one.
Overall, you don't feel like 
your skeleton
is changing its caricaturist 
characterrics very much.
That's fiendishly complex. There
are 20 signals provided that 
have influence over that.
And among those signals is one 
call the
PTHrP, the parathyroid hormone 
related protein.
We'll come back to this in a 
little while
. But just to recognize that all
the
signals going on here is a very 
dense network
of communication by various 
kinds of chemicals that are 
being
sensed and excreted by
the cells that make up the 
bone's active surface.
Here's an example. Here's a 
25-year-old man.
Who reported that he, quote, 
kicked
a lamp post in a fit of anger 
and later noted that he had, 
quote, difficulty walking. One 
might suspect that alcohol was 
involved.
Then the radiograph on the left,
even those of you
who aren't very medically 
experienced will be able to see 
that there's clearly something 
wrong here.
And in fact this man has a 
mid-shaft fracture of his tibia.
Now, we know that this will heal
together over time after the 
initial injury.
There will be a period of time 
that lasts for a few
days during which there will be 
a hematoma
, a blood clot with lots of very
active
substances there, that call in 
cells of cartilage that
fill up that space and are 
gradually replaced
with this woven bone to form a 
callous. It's a thick, hefty 
structure.
If you have broken your radius 
or
your clavicle will have felt 
there's a lump there for a 
period of time  afterwards. 
That's the callous.
And the bone that's not 
incorporated into the
structure of the bone that's 
being repaired, over
time, the normal remodeling 
process,
guided again by strain, is going
to smooth down that callous
callus, taking away the bone and
being
efficient and it's smoothed down
until you don't really notice it
anymore.
This happens whether or not the 
bone is well aligned.
Here's a picture of the results 
of
an ark owe logic excavation
in Chile of about a 50-year-old 
woman from 1400A.D. who had 
broken her femur. The break very
clearly has healed. The two ends
of the bone are well-connected.
But you see that it's got a very
rough kind of big chunky look to
it.
And noticeably in the middle 
picture,
you can see that the healed 
femur is now at least an inch 
and a half shorter than the 
other one.
This lady would have walked, if
she walked at all, with a really
profound limp. This is 
resilience.
The bone is knitting together, 
the
mechanisms we were talking about
caused that fracture to heal.
The problem is that it didn't 
heal in a very functional way.
Now, we don't have so much of a 
problem with this now because we
understand how bone works and 
how it will heal.
So, if you break something, like
your radius, you're going
to find that the bone will be 
first
of all reduced, the fracture 
will be reduced
so that the ends are in 
alignment and then
stabilized by the application of
a cast to hold
the ends in a position to allow 
that resilient
mechanism to be able to knit the
bone together in a functional 
way.
Mechanical stabilization is the 
key here.
The reduction is something that 
happens fairly quickly. But it 
takes time for the bone to heal.
And so, it must be stabilized in
that position for long
enough for the bone to build up 
that callus and regain some of 
its strength to hold it in 
alignment.
This business about finding ways
to fix or stabilize
bone is part of what we now 
think of as orthopedic surgery.
Much of
orthopedic surgery is about how 
to reduce and fixate broken 
bones
so that they will heal together.
On the left, external
fixaters, outside of the skin, 
usually with
screws that are driven into the 
bone and connected together by
a rigid apparatus so that the 
two ends of the bone are held in
alignment.
Another is internal fixation.
Surgery is done, the skin is 
opened up, the bone is
exposed and then the bone is 
held in position eitherly 
applying a plate with some 
screws or putting in a nail. On 
the right-hand side, you see an 
example of this.
This is what's called an 
intramedullary nail.
The long, white object is a 
piece of
steel or titanium driven
down and essentially take the 
patient and hold them
in a position so that the head 
of the femur is
sticking out and you can get at 
it and you take the
nail and hammer and drive the 
nail down
in the length until it bridges 
the spas where the break has 
occurred.
And the little pin at the top is
driven in to help hold it in 
place.
You can see why I have chosen to
be an these
an  an anesthesiologist.
The key idea has been known for 
a very long time
. The idea of producing both 
reduction and
fixation goes back to the 
ancients.
In fact, these splints were 
found in a time from the
Pharaohs from at least a few 
thousand BC.
And the Ed Edwin
Smith papyrus, authored around
1600BCE talks about this 
explicitly.
The high row glyphs are
repured in the left, and because
your
Hieroglyphics are probably 
rusty, I'll read it.
This is for somebody that's 
broken the upper arm.
The bones that are -- you should
place him process strait on his 
back with something
polled folds gene his shoulder 
blades.
And stretch apart the upper arm 
until the break falls into 
place.
You should make two splints ofly
Nintendo
linenning and apply on the 
outside and under side of the 
arm.
And you should bind it up with 
this word, 
ymrw, which may having
ancient gist of plaster of 
Paris, but it's not created yet.
And treat it every
day until he recovers.
This is sound medical advice.
If you do this, someone with a 
broken humerus is going to 
regain function of the arm.
If you don't, it is very likely 
they will never regain that 
function.
Is this a kind of resilience 
engineering
? The body is resilient, the 
bone will knit.
But the understanding of how 
that works allows us to
engineer a solution that makes 
it
so that the resilience plays out
in such a way so that the result
is a good one. Physicians don't 
heal broken bones.
What they do is realign the
broken bones so that the natural
process of healing,
this resilience of the body, 
plays out in such a way that the
end result is a good one.
This is a key notion about 
resilience  engineering.
This is a kind of resilience 
engineering that's applied to an
already-resilient system.
It depends on the resilience 
that is already present in that 
system.
It requires some  understanding 
of how resilience will play out.
For example, how long you have 
to keep the bone stabilizeed in 
a position to allow it to heal.
But it can be successful even 
without knowing much about the 
underlying mechanisms of the 
resilience.
The ancients obviously did not 
know about cells or any of the 
signaling going on.
But they were still successful 
at setting these bones. They 
could do resilience engineering.
There is benefit overwork to 
knowing what modulates the 
resilience.
Good nutrition aids bone healing
and this has been known for a 
very long time.
The marvelous resilience of 
bone, just to review
, is somehow related to its 
being a storehouse for calcium.
It's a function of being 
continuous  replacement.
That is, we don't call upon the 
resilience only when
something is broken, it's  
running there all the time and 
that provides
the opportunity for us to deal 
with something like a fracture.
That there's this process of 
taking away the old and
adding the new that gives us 
calcium -- access to calcium.
And the balance between 
instruction and construction has
local and global regulation.
There's a lot of regulatory 
features that make this 
resilience work.
But as observed in Tom
Brown's Schooldays,
life isn't all beer and 
Skittles. Bone is perfect.
There are chooses, osteogenesis 
imperfecta.
The bones don't calcify and they
can weaken and easily break.
Paget's disease causes a kind of
moth-eaten
appearance on radio graphs and 
loss of structure in the bone.
Osteoporosis is a condition 
found in old age where the bone 
loses its density.
There are  canses of bone, for 
example,
osteocar coma which are 
essential casheses.
And Ricketts and malnutrition.
We don't see that much in this 
current, but they were common a 
hundred years ago.
And there's dysregulation,
hyperparathyroidism, and looks 
at the calcium
and it can get out of regulation
and you can chew up bone and 
have very high serum calciums.
Let's talk about that 
osteoporosis thing for just a 
moment. It's a very interesting 
area.
It's a problem that's seen with 
increasing age, especially in 
women.
It's responsible for a lot of 
morbidity and mortality in the 
US.
Many hip and vertebral fractures
are the result of osteoporosis. 
What is osteoporosis? It's a 
loss of bone density over time.
You see here on the left the 
piece of vertebrae from
a young person and to the right 
of that
is the
osteoporotic vertebrae from an 
adult.
Osteoporosis is more common in 
women, particularly after
menopause, this is because the 
hormone
estrogen has a lot to do with 
the production of bone and the  
laying down of bone.
The problem here, of course, is 
that the balance is no longer 
good.
That is, the osteoclasts are 
doing more than the osteoblasts 
are doing. And so, over time 
bone mass is being lost.
There are some new therapies 
that have been developed to 
address this medical condition.
And they're mostly therapies 
that are trying to change
the balance between the 
osteoblastic
creation of bone and the 
osteoclastic reabsorption of 
bone. It doesn't make everything
better.
We just tip the balance slightly
in the favor of the osteoblast.
Or reduce the activity slightly 
of the osteoclast.
Reducing osteoclastic activity, 
sometimes by giving estrogen
or other d
ysphosonates has been done for 
the last 20 years or so.
It's become clear that it's 
possible to use the specific 
signaling related
to the resilience of bone to 
increase osteoblastic activity.
Remember that I told you about
the PTHrP, the parathyroid 
hormone protein.
It's possible to synthesize
this and there are some new 
drugs that actually take the
role of PTHrP and send signals 
that cause the osteoblasts to 
create more bone.
This idea of the
parathyroid hormone related 
protein being added is a 
powerful one.
I point out here that in 
comparison to things like adding
calcium
to your diet, there's
tiny, tiny amounts to get 
calcium to be
effective, you have to take 
grams and grams of the stuff.
But the parathyroid hormone is
given microgram amounts, that's
a millionth of a gram, because 
it's actually one of the
signals about resilience that 
changes the function of the 
osteoblast.
There's studies that show that 
this actually works very well.
It reduces the risk of vertebral
fractures compared to placebo by
91%.
This is a tremendous improvement
and it's
substantially important because 
it may reduce the very severe
morbidity and mortality 
associated with vertebral
fractures and possibly hip 
fractures in old elderly. Again,
it's not all Skittles and beer. 
There are problems.
There are studies in animals 
that show with
large doses of this PTHrP, you 
can end
up producing tumors in some of 
the animals exposed.
It's not obvious this thing is 
without side effects and there 
may be risks attached.
This is another type of 
resilience engineering,
different than the first we saw.
This thyme type of resilience 
engineering acts on resilience 
mechanisms themselves
. It goes inside of the things 
that produce resilience and 
makes an adjustment to that.
It depends therefore on a much 
deeper understanding of 
resilience
than was the case when we were 
just employee
ing the existing resilience and 
getting the results to our 
liking.
It has an effect of  sustaining 
the adaptive capacity.
It's not just about fixing 
broken things like orthopedic 
surgery has been.
It's a way of parenthesessing 
things in the future.
It sustains the adaptive 
capacity of the organism
. But also generates some new 
types of hazards
, some of which are not yet well
understood.
So, we see that there are two 
different kinds of resilience 
engineering.
One that's about 3500 years old 
and one that's 5 years old.
The oldest one depends on 
resilience that's already 
present and our understanding of
how resilience plays out.
We apply engineering to an 
already-
resilient system, and we can be 
successful
without understanding too much 
of the mechanisms that underlie 
this resilience.
On the other hand, there's this 
newer type that acts 
directly on resilience 
mechanisms and depends on a 
qualified and deep understanding
of resilience.
And admittedly, probably 
generates some new types
of hazards because it's actually
tweaking the signaling that's 
going on.
What relevance does this have to
DevOps?
Well, the first I would -- the 
first observation I would make 
is there's a lot of claims about
resilience in IT. You'll see the
word used very often.
And perhaps even in some of your
own companies
. There are diverse uses of the 
word that don't agree on what 
it's meaning is.
There's a lot of use of it that
implies that resilience is sort 
of a  super-reliability, or an 
increment to reliability.
That resilience is after 
reliability or what you get if 
you
go a little bit further and if 
you look at some of the 
marketing
materials that are prepared, it 
looks almost like
people have substituted the word
resilience for the word 
reliability.
However, there is resilience 
here, it's just not to have been
us because it's above the line. 
We've talked about the 
difference between above the 
line and below the line.
Above the line being  people, 
the organization,
the activities, the cognitive 
work that surrounds the systems
that are running the technology 
that's below the line and this 
line of
representation that exists in 
between the two which is what we
look
at since we never can see 
anything actually functioning 
below the line.
There's lots of resilience in 
the system. But the resilience 
is in the people. The resilience
is in the DevOps activities.
The people constantly  tweaking 
the system, looking at it, 
understanding how to maintain 
it.
Doing things that are resilient 
in terms of the way that they 
influence the system.
And also exploit the resilience 
that's present there. There's 
resilience engineering going on 
here already.
Now, we've talked about this a 
little bit,
there's a article in the ACMQ 
that you may know,
and there's another article 
coming
out in ergonomics talking
about building and venture 
capital revising
adaptive capacity in incident 
response as a case of resilience
engineering.
But the key idea is that there 
is resilience in IT systems.
But the resilience is not in the
places where we are  normally 
looking for it. Resilience is 
above the line.
I hope that this introduction to
resilience has changed
your view of about what 
resilience is
. And I hope when resilience is 
mentioned in the future that you
will think about bone.
And when resilience engineering 
is discussed, that you'll
think about orthopedic surgery 
and giving
PTHrP to forstall fractures in 
the future.
These are really, really good 
examples that you can hold on to
and use when someone says, oh, 
we have resilience.
Or we're doing resilience 
engineering.
My idea is unless they match up 
very well with what's
going on in bone and the 
resilience engineering around 
bone
, they're not really the kind of
resilience engineering that I'm 
thinking about most of the time.
Thank you very much for your 
time.
Sasha: Hi, everybody.
Welcome to a fireside chat with 
Richard Cook.
The first thing is we want to 
introduce our next round of 
breakouts.
So, the breakouts happening in
Discord right now are:   Argo
CD, my personal favorite, Azure 
DevOps 
pipelines, DevOps and open 
source, remote work, what works 
and what doesn't. The longest 
name ever.
And sociotechnical system 
design. I want to know who 
proposed that. I don't, but 
please tell me on Twitter.
With that, I would like to 
welcome Richard Cook
. Thank you for joining us today
and thank you for the amazing 
presentation.
Dr. Cook: Thank you very much 
for having me.
Sasha: And I absolutely love the
tie, by the way.
Dr. Cook: Thank you. It's from 
my days in pediatrics.
You know, if you wear long ties 
the  2-year-olds can
get a hold of them, but bow ties
they can't get.
Sasha: So, let's jump rights 
into the  questions.
So, the first question was kind 
of -- so, first of all, we
love the metaphor of the bone is
not
a metaphor for resilience, it's 
actually the archetype of 
resilience.
It's really good to sort of keep
ourselves true to what is 
actually happening in the world.
So, if resilience is something 
that happens all the time
, then how can we test the 
organism or the organization for
the presence of that resilience?
Right? We don't want to break 
bones on purpose.
So, how can we see what the
resilience process currently is?
Dr.
Cook: Let me begin first of all 
by thanking you very much for 
the chance to come and be with 
people.
And thanks to all the people who
have interacted in Discord which
I've enjoyed a lot.
And also, thanks to many -- the 
many people behind the scenes 
who have done all this work. 
Because there's lots and lots of
folks who are involved in doing 
this.
And believe me, this is as big a
production as any
real, you know, face-to-face 
meeting
. In fact, probably more so than
a face-to-face
meeting because of all the up 
front work that's required.
And I think it's quite 
remarkable and I'm pleased to 
have been part of it here in 
Chicago.
I think resilience is always 
present and it's always 
functional
. It's -- it's what keeps the 
world stitched together and 
makes it go.
And you can sort of see that in 
the bone example by
what happens if you don't have 
that natural resilience, it 
looks like disease.
You have osteogenesis, and wears
out at the end, osteoporosis and
so on.
And those disease states are 
ones that we sometimes see in 
our work. And you'll sometimes 
see in tech.
But mostly what you see is 
resilience because people are 
very active.
Trying to move things around, 
make adjustments and so forth to
keep the systems going.
And so, I would say that you 
live in
a resilient world where 
resilience plays an
important part, but a bit like 
the bone which grows and that 
resilience
is in such balance and keeps the
skeleton
sort of stable for such a long 
time, we don't really know that 
it's ongoing.
We don't feel as though our 
skeleton is being torn
apart and rebuilt
continue  continuously every 10 
years, but it's true. That's the
same thing about the technical 
systems.
They're being torn apart and 
rebuilt continuously over a 
shorter period of time.
But because it's in balance, 
it's producing
the sense of a stable, sort of 
smooth operation. It's anything 
but.
But it's very active as anyone 
can tell you.
I think we see the resilience 
play out and we see people
dealing with breakages and 
rebuilding the system and doing 
all those things.
But we rarely pay attention 
because the overall
result, I've got this body, 
about
the same shape it had yesterday 
and let's keep going.
Matt: Yeah, it goes back to like
continually thinking about that 
it's about expressing  
resilience, right? It's not 
creating it.
Like I  said, it's constantly 
there, but it's how do we see 
it? How do we encourage it, 
maybe?
Or how do we allow it to express
itself? Maybe.
Dr.
Cook: So, there are definitely 
things that can enhance or erode
resilience.
And you've seen some of them in 
your own experience.
You've seen circumstances where 
the team gets really in an
unstable state, it's, you know, 
you lost a bunch of
people, are critical, you 
haven't been able to get all the
new people into their roles yet.
You have this sense of sort of 
being on the edge of a 
potentially big
problem that you really don't 
want to sponsors at that moment.
This -- these are -- you're 
dealing with resilience during 
that period of time.
You're trying to add and enhance
the resilience.
You're also trying to be able to
use the resilience
in ways that allow you to 
accomplish what you need to 
accomplish.
But I think that the key thing 
here is that that's a natural 
process. We are the 
beneficiaries of it. We get the 
benefits of resilience.
But we rarely pay much attention
to what's going on behind the 
scenes.
And so, we can, without knowing 
it
, erode it very badly or fail to
invest in it, haven't done much 
to sustain it.
Essentially the equivalent of 
poor nutrition or lack of 
exercise, if you will.
And we can end up building 
systems that have lost much
of their resilience that is 
needed to deal with the kinds of
events that are gonna happen in 
the future. That's what we all 
are concerned about.
Sasha: So, one of the biggest 
takeaways from the talk is that 
resilience and reliability are 
not the same thing, right? And 
we do often confuse them in the 
industry.
And a lot of times when we talk 
about
increasing resilience, we 
actually mean increasing 
reliability.
Now, in terms of like that 
second mechanism, right?
Increasing the signal and 
actually
affecting resilience, how can we
do this in our organizations and
our systems?
Dr.
Cook: Yeah, it's probably the 
case that it requires us to
understand resilience at a much,
much deeper and more fundamental
level
in the same way that we had that
first kind
of resilience engineering which 
involves, you know, reduction in
fixation for thousands of years.
But we've only found out about 
the second kind here in the last
decade.
And I think we're clearly in 
tech, and in much of
organizational stuff, in the 
first 1500  years, not the last 
10 for the most part.
I think you do it in a variety 
of ways, though.
And much of this has to do with 
trying to pay attention to what 
you think the sources of 
resilience are actually going to
be.
And I would think that the kind 
of care that
you take with  bringing people 
into the organization, your 
attempts to
keep people working together to 
build some structures in
which people can move and adjust
to
help other people share the 
adaptive capacity that they 
have.
I think these are all techniques
that are involved in making 
resilience work.
There's this paper that Beth
Wong from Nu re lick and I have
Relic and I have coming out in 
November, it's a good example. 
I'll wait for the paper and you 
can look at it. But basically, 
there are people doing this.
And not always because they 
start out with the mindset 
saying we're going to increase 
resilience.
They're just dealing with 
problems in the way they're  
dealing with problems.
It turns out what they're 
dealing with is actually trying 
to enhance and increase 
resilience.
Matt: I think that goes back to 
when we talk about that
not like to kind of, you know, 
we'll
, you know, wool gather what
resilience is, but talk about 
incident prevention
. And the number of incidents is
a metric that is
influencing or revealing how 
resilient your organization is
. Which to me in a way, number 
one, it's really -- it's not, 
right? Because resilience is how
you rebound.
It's like, in fact how could you
know if you're resilient if you 
have no incidents at all, right?
So, what are -- but can we 
expand a little bit on that?
We're saying that incident 
avoidance is
not a way to understand whether 
or not your organization is 
resilient.
Sasha: So, kind of what is the 
metric that we want to apply 
here, right?
Dr. Cook: Yes.
I think this is pretty poorly 
aligned for metrics. At least at
the current stage of 
understanding.
But let me put you -- tell you 
why I think this is difficult. 
People use the word resilience. 
It's become a popular term.
So, it's used a lot in a lot of 
places.
And in many cases it's just, you
know, essentially
reliability plus, or reliability
plus, plus.
It's an increment, a slightly 
long, slightly stronger version,
that sort of thing.
If you look at the people -- at 
the work of folks like
Dave Woods and other people who 
have done really basic work on
what resilience is, they would 
say resilience
is not really about rebound or 
just the ability to return back
to where we were before the 
event happened, but it's about 
being able to
make adaptations in order to be 
able to change the way the
system is functioning so that it
can handle these new kinds of 
situations.
And I think you might find an 
example for many
people in the ways that you've 
had to adapt to  COVID. COVID 
was unexpected.
We didn't -- nobody really 
thought we were
gonna have to all work from home
starting at the end of January
. Most people didn't have any 
appreciation for what the 
disruption of gonna
look like in an economic sense 
or a business sense
. Certainly many of the big 
providers of services, 
especially in the cloud
, did not prepare very well for 
all these things. Because they 
couldn't anticipate them.
And yet, what we saw
saw over a short period of time 
was a huge amount of
adaptation takes place that
allowed us to resume or continue
the functions more or less not 
interrupted.
You had a lot of head start, for
you it was an extension. It was 
still a big extension and a big 
change. But actually, it was 
made -- that's an adaptationing.
That's a kind of resilience.
When you have to adapt what you 
are doing to achieve these other
goals
, that's real big capital R 
resilience in the biggest way. 
And you're doing it. You're -- 
we see it.
This is a -- this is in some 
ways an ideal test case for us.
What's happened in the response 
to COVID is a clear 
demonstration of
sustaining  adaptive capacityies
for the people and organization
organizations, structures and 
systems can adapt to deal with 
this new set of circumstances.
Now, the key thing in 
Woods' notion here is that
, can it be sustained?
Are we in a position now where 
the next shock that happens, we 
will be able to adapt again?
Or have we consumed all of our 
resilience in adapting to this 
one shock?
How much are we investing at 
this moment in being able to
weather the next storm, the next
disruption, the next difficulty 
we have?
That would be that sustain
sustained adaptability that 
Woods is talking about high end 
sense of resilience. A lot of 
people are thinking about this.
When I talk to leaders and a lot
of workers, they're thinking 
about, how
to get into the position where 
we can deal with the next 
problem
. Whether that's a major weather
event, some sort of political 
disturbance.
Whatever is going to happen, 
we're going to have to adapt 
again.
This isn't the last time we're 
going to need to draw on 
resilience to adapt.
And so, investing in that  
adaptive capacity going forward 
is
a really key  idea.
Matt: I think that's interesting
to me too because I look at it, 
and the cynic in
me says,  like, so,
we did this, but then when 
everything settle
s my -- leadership goes back to 
-- back to then there's a thing 
that happened. And we can 
continue to work the same way.
Not seeing it as an opportunity,
the world right now
, everything that's happening is
an amazing use case to teach 
everybody about resilience, 
right? You know? But are we 
going learn from that?
Dr. Cook: What do you think?
Matt: Or did we get lucky?
Dr. Cook: What do you think 
DevOpsDays is?
Matt: Well, I hope we're 
learning.
Dr. Cook: This is it.
Matt: Right.
Dr. Cook: This is it right here.
This. This thing is about that.
And that's what -- if you look 
at the list of talks and the 
discussions and the breakouts.
It's all about adapting and how 
to adapt. You're doing it.
This is what it -- this is what 
it takes.
Sasha: So, it's an interesting 
analogy between software and 
hardware.
Like hardware is something that 
is relatively static and hard to
change.
Whereas software is something 
that's relatively easy to adapt.
And this is kind of like -- 
DevOps is our software here. In 
this industry, right?
That  learning experience that 
we can apply from today to
tomorrow to gain some more 
resilience.
Matt: Someone's gonna interpret 
this as Sasha said you can't 
DevOps on bare metal. We're 
going to have a Twitter fight 
now.
Dr.
Cook: I think it turns out that 
if you look at the history,
particularly of the 1950s and  
1960s
60s, turns out software is hard 
to adapt and modify and keep 
going. That's one of the great 
realizations.
It's not that we make it up 
however we want it. There's big 
commitments here.
And the persistence
of the shell, should be 
something really fundamental
about changes under the hood. 
Let me show you the picture of 
the tree. Can we call up the 
tree?
If you look at the picture of 
the tree, what you see
is that there's this living 
organism that has adapted
to the circumstances in which it
finds itself, confronting this 
kind of rigid structure, right? 
That's what we're talking about.
Is it  pretty? Maybe not. It's 
not the way you want things to 
be. Is it effective? Does it 
allow the tree to grow? Yes.
This is, I think, the lesson. 
Adaptation doesn't mean that we 
get always what we want.
Or that the systems behave the 
way that we want to.
But it does mean that we are 
busy trying to make these things
work.
And those combines efforts 
across a whole group of people,
many of whom are in the audience
and participating today,
is what actually makes this 
possible.
It's DevOps that makes it 
possible to continuously adapt 
and to sustain that 
adaptability.
And that's why it's so good to 
be with you today.
Sasha: So, another question I 
would like to  ask, we're 
learning from biology here a lot
in terms of observation, right? 
Parallels and metaphors.
But can we learn from biology in
terms of how to apply this?
Like how to get better at 
systems systems engineering?
Dr.
Cook: I think the first 
observation would be that you 
can study a lot about
resilience and learn a great 
deal without understanding a 
molecular
biology of osteoclasts and 
osteoblasts.
Orthopedic surgery, a well 
developed
field knew almost nothing about 
the molecular basis of these 
things until recently. But it's 
effective since 1500B.C. or 
before.
So, you can do a lot by 
observing the patterns of how
resilience plays out and 
building the engineering of 
yoursorganizations to take 
advantage that have. And I would
encourage you to do that.
I would also encourage you to 
think about the kinds of things 
that are likely to help.
Remember if you break a bone and
then starve the organism
from nutrition, the bone will 
heal very poorly and very 
slowly.
And you might not think, 
immediately, oh,
it's this connection between 
nutrition and bone strength, but
there is. And it's a really 
important one.
If you watch for those 
connections, you'll be able to 
see them.
I think in some of your 
organizations, some of your 
experience will tell you
places where there's lots of 
nutrition and good feeding going
on that helps that resilience 
play out in a very effective 
way.
Matt: I think, yeah, when we 
think about how to -- how to 
observe this. And that might be 
kind of the thing.
When we want just in practice if
we want to get
better at this, what we're 
getting better at, we have to
start by -- by observing it, by 
understanding the resilience of 
the organization.
Especially when you think about
someone who is an individual 
contributor, someone who is, you
know, getting shit done, you 
know, kind of thing.
What are some of the those 
tactical things we can do to see
where this is happening?
Because it helps us feel better 
about it, right.
It helps us understand  things 
are happening.
Or we might not see because
we are measuring by the number 
of receive sev1s.
Dr.
Cook: That doesn't bring 
attention to resilience, might 
have to do that
for a variety of reasons, but 
that's not going to lead to a 
much better understanding of 
resilience.
But here is a different kind of 
view.
If you start from the -- a 
position that
resilience is present and 
functional and that your systems
are working
because of that, then the task 
for you is to
uncover those mechanisms and 
understand them better. Not to 
go out and make them from 
scratch. But to rather 
understand how they play out.
And that seems to me to be the 
next challenge in front of us.
We have to understand more about
how this resilience actually 
works to make things happen.
How it allows us to be prepared 
to adapt to something like this 
COVID crisis that we've had.
And how it -- we could enhance 
the qualities
of the system that we end up 
with by being
aware of that and doing what is 
effectively the same thing as 
open reduction or internal 
fixation.
What does it mean to put a cast 
on the organization
after it's had a break so that 
it can heal in a functional way?
Again, these don't -- it's not a
one-to-one  mapping, it's a 
little hard to do the metaphor.
But what I hope that happens is 
that as you go through your
daily work and you see something
that looks
like resilience, and you say, is
that resilience? You think back 
to the bone example.
And I think that will help you 
see more clearly into the 
systems that
you're working with and have an 
appreciation for that
. The dynamic balance constantly
being
destroyed and created along the 
lines of stress
. That for me is a very powerful
image and I have been captivated
by that for many years.
Sasha: So, I think that I love 
that we are having this 
conversation.
Because I think like even after 
watching your talk, I've
learned a lot just from us just 
having this dialogue right now.
And one of the biggest things is
just learning through 
observation, right? Like the 
resilience is already there.
How can we learn what it is and 
help it manifest itself?
So, thank you so much for being 
with us.
And also, thank you for --
thank you to also for
answering questions on Discord.
Matt: John has been doing more 
on Discord than just trolling. 
He's been helping out and being 
productive.
I can only imagine that maybe 
half of his
responses are links to PDFs.
Sasha: Speaking of links to 
PDFs, we should share more for 
the folks who want to explore 
the topics. Of course, the 
recording is going to be 
available. Thanks so much for 
being with us today.
Dr. Cook: Thanks so much. It's a
privilege to be with you.
T it's an honor to be able to 
talk with you. You guys -- you 
folks are now where it's at. 
You're the ones that are doing 
it.
And the resilience that exists 
in organizations, in these 
companies
, is represented by the people 
who are on this Discord right 
now. And you're an incredibly 
valuable resource.
And I want you to know that 
those of us who study
resilience are turning to your 
field and looking at it in order
to better understand resilience.
We get our ideas from biology 
and other places, but frankly, 
we're watching you very  
closely. Because we're learning 
so much from what you are doing.
So, thank you very much.
Matt: Thank you very much.
Sasha: Thank you.
Matt: We will be back at # 2:00 
central time.
Stay tuned for some more fun 
videos and go visit our sponsor 
channels.
They're friendly, they're giving
stuff away.
Next:  Computational
Thinking can have
Sarah Aslanifar
towing and welcome back. Next up
is Sarah Aslanifar.
The CEO and founder --  sorry --
of a
Chicago-based company giving a 
talk about computational 
thinking.
Sarah has successfully taught 
computational
thinking to her kids, and to 
paraphrase Albert
Einstein, if you can explain a 
topic to a 6-year-old, that 
means you can truly you said it.
Let's hear from Sarah.
Sarah: What does Computational 
Thinking mean?
It's many things that keep track
with each other and it's a 
complex system.
As with all complex systems, 
modeling after oral or
written language, we list losing
some essentials
. Instead of me just jumping in 
and trying to give
you a definition of 
computational thinking
, I want to walk over examples 
about what does not fit the 
system.
And then look at history and 
understanding where we were and
where we are today, and based on
that, we'll give you my 
definition of computational 
thinking.
And then we can look at examples
of how
to use computational thinking in
our day-to-day life and job.
Take a look at this bus in this 
example.
I took this from a series called
brain games
. In this research, showed the 
bus to adults and children and
asked if the bus was going to 
move forward, which direction 
would it go. I'll give you a 
second to think.
So, when they asked this from
small kids who take the bus to 
school every day, the correct  
answer, it's
important to know that, the 
correct answer came to them 
almost automatically.
They  looked at this, they did 
not see a door into the  bus, 
therefore the door must be on 
the other side.
And the side you're seeing right
now is the side that the driver 
sits. The answer is, it really 
depends. Do you live in the US 
or in the UK? So, it depends on 
where you are.
And the culture and the rules, 
the answer could be different, 
right?
So, examples like this, that the
answer really depends on some 
common sense or culture. They're
not really computational 
thinking. You're not running 
that many algorithms.
You might overanalyze, but not
you're not  running any 
algorithms to understand and 
have the correct answer to this.
Another answer that doesn't
fit computational thinking is 
art, or things that have to do 
with subjective opinion.
This picture was sold for $250 
million. And I would love to 
understand why.
But I will never have anything 
to prove that this picture is 
worth $450 million. Obviously it
did to somebody, right?
So, anything that has to do with
subjective opinion
, you cannot -- you cannot say 
that's a computational thinking.
Money itself, I would argue, is 
very subjective.
Another great example, and this 
is my favorite, is impulsive 
behavior.
When you're in a mosh pit, you 
are not thinking
come pew
thinking computationally,
unless you're a physics 
students, he
loved attending mosh pits, and 
was far away and
looked at the mosh pit and the 
behavior and just learned the 
concept.
It hit hum, this is a collective
behavior. He went home and 
started building this model. 
This model to simulate what's 
happening.
And I think this shows you that 
it's very different
being in the mosh pit than model
the patterns of movement of mosh
pits.
So, some have described 
computational thinking as 
thinking like a programmer.
I want you to think about that a
little bit.
If computational thinking is 
thinking like a programmer, and 
you
hire enough full stack 
developers, then
you should have no problem 
solving all the problems, 
anything
that needs that computational 
thinking, correct?
It just happened that sometimes 
we hire the full StackOverflow 
developers.
And these are developers that, 
you know, they have an error in 
their code
and they take that error, Google
it,
probable land on some 
StackOverflow page, copy the 
first
answer they see without really 
understanding why that error was
there or why the solution was 
suggested
. They copy/paste in their code,
they run again.
They run into another different 
error and they just repeat
that behavior without really 
taking time to understand what 
is actually going on.
A little bit about me.
I'm Sarah Aslanifar, I'm 
co-founder
of tested minds and interim CTO 
at  caradvise.
For the past 15 years I was 
really keen to learning 
programming languages and tools.
And, of course, lots of solving 
problems. I was fascinated.
To C++, Python, Clojure, Java.
And suddenly I  started to 
think, what's really the purpose
behind learning all of these 
languages and all of these 
tools?
And the moment of truth for me 
was when we had kids? My husband
and I had kids.
Suddenly I had to think, how 
would I prefer my kids to
grow up and be successful not 
knowing what is ahead of us in 
20 years?
Especially in such a fast-paced 
world that we live in. How would
I teach my kids to be 
successful?
My bet was that I can teach them
how to
learn and how to adapt and how 
to differentiate between 
subjectives and objective 
opinions.
And how to analyze the messages 
that they
see in the media or on the 
computer, on the news, and find 
the truth.
So, I wanted -- especially 
that's important especially in 
these days that most schools are
moving towards like remote 
schools.
So, kids are spending a lot more
time
on iPads and the computer and we
need to really teach
them that the computer is not 
just for games or just to do 
school homework. They can use 
these machines to think with.
So, that brings us to this quote
from
John  Culkin, he says, life 
imitates art. We shape our 
tools, and thereafter they shape
us. This is a huge claim.
So, what may takeaway from this 
is
that there is a formation in the
artifactses that we produce as 
cultures and that information 
changes us and changes our 
culture over time. So, again, 
it's a very big claim. But 
there's some supporting evidence
to this.
For that reason, let's go back 
and look at the history of Homo 
sapiens.
Most researchers dated Homo 
sapien
s around 200,000 years ago in 
Africa.
For all that time, a small 
fraction
, they passed knowledge in oral 
culture from one generation to 
another.
And when you have oral -- when 
you rely on the oral cultures, 
the message always changes from 
speaker to a listener.
And when the message changes, 
there is no objective truth.
And that's why there is these
times are called pre-history. We
don't know much.
We can't call our ancestors 
illiterate, there was no form of
writing or reading even 
invented.
Walter spent most of his
professional career exploring 
the impact of cultures moving
to literacy and the impact on 
their culture and their 
education
. If I were to show you this 
slide, you know, with a few 
tools and I just give you 
answer.
But with a few tools and a log, 
and ask you
which item does not belong, 
which one do you pick? I bet 
most of you picked the log. The 
log does not belong.
Because we have a concept of 
abstractions.
But in oral  culturing, they 
don't have that abstractions 
like a set of tools.
When they look at this picture, 
they see
a couple tools that they can use
to cut the logs, but not so much
to do with the hammer.
So, they actually choose the 
hammer. The hammer does not 
belong.
In oral cultures, concepts are 
used in a way that it
minimizes abstractions, focusing
to the great Essex
tent possible on objects and the
situations that are known by the
speaker.
And this example shows you that 
the writing actually 
restructures your brain.
So, that telephone game and the 
oral culture continued for about
150,000 years. And that's why we
call that pre-history.
Again, we don't have a lot of 
artifacts
to really prove and understand 
our ancestors, the way they 
lived, the things they knew.
We don't see any evidence in 
writing about
writing until 5 to 6,000 years 
ago.
And the earliest appears right 
about a thousand years after 
that.
And even though at that time a 
form of writing and  reading 
existed
, that doesn't -- that didn't 
mean that everybody saw that 
it's necessary for them to pick 
up and learn that.
In fact, only a few
people, scribes and religious 
leaders, political
-- politicians -- saw it 
necessary to learn how to read 
and write.
And for the rest of the 
commoners, that was just fine. 
They weren't allowed to read and
write. Because the  reading and 
writing was information.
And information was a form of 
power.
So, things started to
change for trapped literacy when
printing innovation begins in 
about the mid-15th
century when Gutenberg invented 
the printing press in Germany.
And that in order of magnitude 
made it easier for us to capture
knowledge.
For the knowledge to be 
captured, mass produced, and be 
preserved over time and space.
So, for once now this changed 
what was printed, by whom. What 
was the read? And what was the 
thought?
To put that in perspective, when
you look at some of the data
, in about 1400 the population
was about 400 million.
And by 1800, our population just
about doubled, a little bit 
more. What happened to printed 
books?
We went from zero, at least 
close to zero,
to 1 billion in the 18th century
.
And the effect that had on 
cultures and this data is for 
Europe.
You can see that we went from 
less than 20
% of people knowing how to read 
and
write to in some cases to above 
80%.
And it's very important to know 
that
these new readers in this 
growing population are working 
with new ideas.
And this was our transition from
the Middle Ages.
So, here Marshall
McLuhan is talking about the 
effect of media, especially 
writing, on Greek culture.
In the same way, printing made 
it possible.
It didn't merely encode that.
And it is important to note that
what was
printed wasn't just the same 
idea of the oral and craft 
literacy period
. And also, we're not talking 
about books just like 
Shakespeare.
We're talking about books like 
this, the trig
metric function written in 1817.
This is data.
This book was extremely error 
prone if
at all possible to be written by
scribes and religious leaders. 
But  printing press makes this 
happen.
And people like Matthias
Berne G US ero, who was a child 
growing up
with the printing press, he was 
able to write
his ideas down and produce this 
book which was used
by astronomers and navigators 
and building  architectses.
So, let's come forward a little 
bit. This is a more recent 
example.
In the 1940s,
people like Betty Stafford,
recruited in World War II to 
work for earlier versions of 
NASA.
And these people.
Praised for doing more in one 
morning than an aerospace 
engineer could have finished in 
one day.
Here you can see she is
adopting ideas about a 
conversion of a number to a 
natural log.
Of course she has tools, has 
calculators. But most of the 
thoughts and the ideas were in 
her head.
So, World War II also funded a
massive amount of R&D into human
and mechanical computing. It's 
to solve problems for automated 
fire control.
And also, they start investing a
lot of money into electronic 
computers.
So, a great example. The Emacs.
The first computers, also women,
who were programmers
, not only had to have a
really good understanding of the
problem, they had to come up 
with a good solution for it.
And they had to have a great 
knowledge of the actual 
computers to be
able to rewire and change these 
cables
in order to put their ideas and 
program.
It's important this was 
electronic and you could 
reprogram them. But 
reprogramming was not as fast.
So, what they really needed was 
a good abstraction.
To let them really focus on the 
problem at hand rather
than understanding the details 
and the implementation of the 
computer.
And speaking of abstractions, 
when
you have too little abstraction,
it loses the power.
And when you have too much 
abstractions,
that really makes you distracted
from the problem at hand. In 
this case, it's literally a 
leaky abstraction.
So, the decision on what amount 
of abstraction
is right, it really depends on 
your
jock job and what you need to 
accomplish.
So, in about 19
45, then one human architecture 
gives us programs.
At this point, we don't need to 
worry about the details and
implementations and
actual mechanic of the computer,
we can focus on the problem. And
those problems, those ideas, 
become code.
And that code becomes  data.
Now we have an object that we 
can think with and we can
use to express our ideas, to 
test our ideas. So, here is 
another example.
In this example, this is from 
Betty who is
probably understanding or 
modeling a down wash and writing
her ideas. You can see that she 
has a drawing.
And on the other side, this is 
from Dr.
Lorena Barr 
that she's learning or 
understanding computational 
fluid dynamic.
Side by side, you can see what 
Betty had to
go through to get tested
versus Lorena, open the
Jupyter notebook, visually see 
how
the code implements and tweak it
as much as she wants it. So, 
that feedback was very fast.
So, just to give you an idea
what happened is that these 
objects
are as important in media as 
alphabet and
printing press for our 
ancestors.
We just came from orality to 
literacy to computability.
We are among the first 
generation in our history that 
is
able to intractively explore our
ideas
and our concepts on a machine 
that executes
those instructions with no 
subjective judgment and no 
permission required.
And this makes our ideas 
falsifiable and tangible. We 
have an object that we can use 
to think with.
And even better, we can share 
those results almost
instantly with anybody anywhere 
in the world.
So, your ability to understand 
those
abstractions we talked about, 
your tools  or pragmaing 
languages
programming languages, really 
depends on your job function and
your role.
In some cases, may just be a 
power user.
Copy and paste some ideas and 
use a library without needing to
know what's going on behind 
that. And that's okay. Are you 
okay with  that?
In other cases, you might 
actually open every single 
library
or every single tool that you're
using to have a great and deep 
understanding of those libraries
or tools.
In that case, I would say you 
probably are
a professor at the university in
the computer science department.
Are you okay with that?
So, what's important to note 
here is that you don't stop when
you have a perfect model.
You stop when you have an 
imperfect model that you 
understand
. But the more abstractions we 
understand, the
larger our toolbox is and the 
better we can think
. So, learning a programming 
language
, the things and tools that you 
use in day-to-day life
, it just makes you faster and 
makes you more productive
to focus on the problem you're 
actually trying to solve.
All models are wrong. But some 
are useful.
So, if I were
to define computational 
thinking, this is my
definition, computational 
thinking is an iterative system 
of
generative reasoning in which 
people build models of subject
in a notation capable of being 
executed by
a computer objectively and
automatically with observable 
and falsifiable  outputs.
I'll give you a second to just 
digest that. So, it's really 
easy to take that for granted.
To take that cop abilityfor 
granted.
But as we saw through the 
history, that shows that in
our entire civilization, we are 
among --
this has been just
doable in less than a century 
and we are there.
As Brian says, an example would 
be handy just about  now. Let's 
look at even in DevOps work.
One example of this is is let's 
imagine you have to curate
a net by going through the AWS 
console and click through to 
build what you need.
What part of that process is 
executable by a computer? What 
if you make a mistake? How do 
you know that you made a 
mistake?
And how do you -- how are you 
planning to transfer
that knowledge to a person who 
might have to support that after
you?
So, by turning this manual 
process into something
that's precise infrastructure as
a code
, it helps capturing these steps
and the
knowledge to create and 
re-create this environment. And 
in everything that you're 
planning on doing.
A good example of this is 
Terraform.
Terraform is declarative.
You don't have to understand 
every detailed implementation 
behind Terraform.
You could use it as a tool to 
tell it where to implement that 
plan and what's your end goal?
So, now if the  Terraform is the
what,
I would say ADRs,
or auto-- decision record is the
why behind it.
It's always great that you use 
code and that's executable by 
the computer.
But augment that with some 
documentations.
ADR consists of context
, status, decisions
and consequences of the action. 
What happened between you and 
the team? Were there pros and 
cons? What were the options?
Thinking about Terraform versus 
CloudFormation?
What was the reason that you 
decided to use one over the 
other?
And there was an
experience defined by Josef 
Blake at his team at Spotify.
The why and the how of the ADR 
combined
with the why and the how of 
infrastructure as code makes a 
great start for knowledge 
transfer and onboarding.
So, a big  takeaway
from this talk is that you have 
an object that you can think 
with.
You can think with, you can 
explore, you don't have to wait 
weeks and days even sometimes.
You can build tests, you can 
write your testing, you can 
build simulators.
Just like the map is not the 
territory, your code is not the 
computational thinking.
Your test, simulations, 
documentation should help 
re-create
ing all of the computational 
thinking that went into the 
design.
And that's all I have. Thank you
for your time. We are hiring at 
CarAdvise. I would love to hear 
from you.
Whether it's about hiring or any
thoughts and comments that you 
have on computational  thinking.
Thank you for your time.
Matt: We're back at WDPBS radio.
No, no, we're not. There's no 
radio. This is TV, right?
Like a TV suite. Welcome back. 
Fireside chat time.
Before we go into that, just a 
reminder, if
you're into  breakout time, a 
couple breakout
sessions going,  docs like code,
documentation in your pipeline.
Sounding like talk titles. How 
does cost impact your DevOps 
option? It does.
And DevOps and chaos 
engineering.
Or you can hang out with me and 
Margaret and Sarah. So, make 
your choice.
Margaret: You're here.
You made the right choice.
Matt: You made a choice.
Sarah,
welcome to the DevOpsDays emcees
get
slap happy afternoon.
Sarah: Thanks for having me.
And thanks to working hard to 
make this happen.
It's beening a interesting time 
and a lot of talks and 
conferences that we were looking
forward to and they canceled.
So, the fact that we're here 
today and everybody is working 
really
hard to make it happen is really
amazing.
Matt: So, one of the things that
I really love about your talk is
it is so -- what I love and also
found frustrating
, as someone who is trying to 
come up with questions and 
stuff, is it is incredibly 
dense. There is so much material
in that. Which is fantastic. And
I feel like I need to watch it 
many times. It's great because 
it's recorded.
But one thing that someone in 
the chat brought this
up, but I would like to kind of 
run into it to think about is
, like, thinking about, you 
know, you're kind of
talking, like, think about mosh 
pit as a study of managing 
complex systems. Right?
There's a ton of individual 
actors doing whatever they want
, but with clear rules like 
nobody gets hurt that doesn't 
want to.
And make sure everybody leaves 
with their  glasses intact, 
right? That are strongly 
enforced.
So, when you think about that as
--
when we think about complex 
systems and how do those places 
and
where do those things go in, 
maybe talk about that for a 
little bit -- expanding on
how -- I don't want to say a 
metaphor. It's not really a 
metaphor.
It really is one of those things
, right?
Sarah: It is.
When I was thinking about the 
examples, my husband Bobby, he 
brought up, I used to go to mosh
pits. I was like, what?
I'm a person who sits back and 
like analyze things like is that
save? Where is the origin? What 
do people think? Do I belong 
there? I sometimes do 
overanalyze things.
But he was like, there is so 
much fun and I look at it and 
I'm like, what are you thinking?
Like, were you thinking to be in
there?
So, I think that was a really 
great example
. As you mentioned, people 
collectively move together, 
right?
But there's not really thinking 
blind it. No pattern behind it.
No matter what you do, you can't
Exactly repeat those patterns 
and see the outcome.
It really is at the time and 
really looked at the movements 
of the crowd.
But it also is interesting about
it is that when
you're in the mosh pit, you're 
not thinking. You're just moving
with the crowd.
A great example of this is when 
I mentioned
we had Silverberg, he used to be
going to the mosh pits.
And then one day he was standing
behind and looking at it from 
far
, and he was just like, this is 
collective behavior. And he had 
just learned it a couple weeks 
before.
So, we went back and he built a 
model
to, you know, model this mosh 
pit and this collective behavior
and see how
things interact with each other 
and therefore -- not exactly 
predict,
but see some of those movements.
Margaret: That reminds me, have 
you read team of teams?
Sarah: I have not.
Margaret: It's one of those good
organizational books. I'm going 
to mess this up.
It's complex versus something 
that is complicateed.
Sarah: Yes.
Margaret: Complicated, you can 
break it apart into little 
pieces and you know how 
something looks. Like a bicycle.
It has a lot of parts, but you 
put them together and you know 
how it works. Complex, they use 
the pool example.
You shoot a pool ball into a 
number -- however many you queue
up. But and you can't predict 
how it turns out. That's 
complex. Just like a mosh pit.
Like you can't really re-create 
it perfectly every time.
Sarah: It has a lot of moving 
parts.
So, you have to really break it 
out and really understand how 
each
piece works and what's the 
intent behind each of those.
Matt: And I think that's the 
thing.
Again, thinking about complex 
systems, there are things that 
are happening
individually but still
within some, quote, unquote, 
agreed upon rules, right? And 
the same thing.
Like, you know, so like Aaron 
said in the chat.
There are these rules and not 
like you sit down and discuss 
them at the beginning of the 
concert.
But it's just this known known 
that nobody gets hurt that 
doesn't want to get hurt. And 
you're working within these 
boundaries. But they're not 
formal boundaries.
They're just sort of intrinsic 
boundaries and rules and I think
that's the thing with complex 
system.
Again, the complicated system 
has defined
, understood quote, unquote, 
rules,  right?
A complex system, they're not 
really defined that way.
Margaret: Yeah.
Sarah: And it's really hard when
you talk about this complex 
system. It's very hard to really
put it in words. Or build a 
model out of it, you know?
Because, again, like all models 
are wrong.
Matt: Yeah.
Sarah: The best thing you can do
is to build something, you know,
take it out there, test it, 
observe it
, see how it behaves in the 
environment, bring it back and 
test it again. In order to make 
it better and better. So, yeah.
I think -- like, even with
mosh pits, I'm thing thinking, 
like, I'm deferral a person who 
looks at rules.
Putting me in the middle of a 
mosh pit, I would be
squashed within 5 minutes 
because I'm trying to look 
around to see where should I go?
I personally would not survive 
that.
Matt: Right.
I think that's the thing with 
complex systems.
It's where our prefrontal cortex
works against us, we want 
pattern recognition and
when we think about complex 
systems, we want it to be when 
we turn
this dial, which will fix this  
thing, change this thing.
But it's all the contributing 
factors and the layers.
And that's contradictory, 
because
it's fundamentally cricket
fundamental contradictory with 
how our brains work. Our 
survival is finding the 
patterns.
That's how we evolved to survive
when we were
, you know, trying to eat fruit 
and not get eaten by tigers and 
stuff.
But now we're trying to solve 
global warming and all this 
other stuff.
Turns out our brawns are trying 
to not get eaten by tigers.
One away from tigers, problem 
solve solved.
Don't be where tigers are.
Sarah: Yeah.
And one thing that's important 
in computational thinking is the
ability to build predictions.
To build a model to really test 
out your predictions.
There are two types of  
programming, in our field, there
are times where we're just 
learning. What is this API even 
is? What does it do?
What are these data even telling
me?
And there are times that we have
some clues about our data and we
want to make some sort of 
prediction.
So, you know, just thinking 
computationally about
pieces and have some ideas about
what our goal is,
that gives us the ability to 
build some model and test that.
And basically, just evaluate our
assumptions. Were we correct? 
Where did we get it wrong? Do we
need a bigger dataset? And also,
be able to share that with 
everybody.
So I think that's really one 
thing that if I want
people to take something with 
them, is that the
ability to really think deep 
about building their
models and making predictions 
and
, you know, sometimes you will 
get a little bit lazy and push
a lot of these modeling and 
testing stuff to the last minute
that we need.
But the fact is, these days it's
just so easy to do that.
And I encourage everybody to 
think about those, you know,
things well ahead of the time.
Margaret: I think about it every
Wednesday in the morning, every
other Wednesday in the morning 
when it is our sprint retro and 
I realize that we were really 
bad at predicting our points for
that sprint. And we're like, oh,
man. We were really consistent.
But really not accurate.
Sarah: Consistently wrong.
Margaret: Yeah.
Humans are not good although 
predicting, turns out.
Matt: Well, that's because we 
don't have a crystal ball.
And I think that's sort of the 
thing with that part is like 
that, you know, cognitive 
distortion of like fortune 
telling, right?
Which is like if we only had 
enough data, we could -- we can 
look for likely outcomes. But 
that's sort of the problem.
We can go down a whole other 
thing that Margaret just opened 
up.
But like that's a whole -- like 
all the story pointing, just so 
we can all agree, that's just 
all fiction, right?
Those are  lies we tell 
ourselves so that we, like, feel
better about what we're doing.
But like at any point that 
someone could say, with what I 
know
at this point, I can tell you 
anything other than hard or not 
hard.
Margaret: Yeah.
Matt: You know?
But I think that's -- so, here's
the problem and I don't want to
go all the way down this path, 
but if you look at
what that stuff was intended 
for, it was to get a feel about 
it. But it's now turned into a 
measure of --
Margaret: Right.
Matt: Put it this way, if you 
think about the way that story 
pointing
and stuff intended, it was 
totally fine if you were wrong.
Sarah: Yes.
Margaret: But you have to learn.
Matt: Margaret, you said it was 
8 points. You know? Why isn't it
done?
I don't remember the Fibonnaci 
anyway.
Sarah: Like my last example.
Margaret: It is.
Matt: It is one, okay.
Margaret: How many hours is 
that? It's 8  points.
Sarah: This is a hot topic for 
me. I'm an interim CTO. So, a 
lot of these things we have to 
put in place. And that team is 
exceptional. They're amazing.
They're, you know, adapting as a
team and
trying to figure out some 
processes so that we can 
communicate these decisions.
When it comes to story point, to
me, it doesn't really matter
unless there's a stakeholder 
sitting on the other side that 
needs some predictions, right?
Aside from that, the stories are
in the backlog and you have to 
deliver what's -- what has to 
get delivered.
And I think the review of that 
makes
people a little -- just to pick 
up a little bit ahead of the 
time knowing what's coming.
Get your questions in
and think about how you want to 
approach building it, what data 
do you need to collect?
What predictions do you need to 
put in place and those are
things I very much value when it
comes to sprint and sprint 
manning.
Matt: That's the thing.
When you're using it as a tool 
for committing to a reasonable 
amount
of  work, it's -- it's fine
. But it's not a number unto 
itself of your success. Right?
Margaret: Exactly.
Matt: You know what happens 
then? We start sandbag.
Margaret: Exactly.
Matt: We'll find a way --
Margaret: Ticket's closed? Open 
a bunch more. Close them.
Matt: So, like, I it take on a 
reasonable amount of effort.
And the reality is if it turned 
out it was bigger than you 
thought it was, it shouldn't be,
you suck at predicting. It 
should be, oh, that was 
uncomfortable. What do you learn
from it, right?
Margaret: Exactly.
Matt: This what we talked with 
Dr. Cook about. How are we 
learning from these things?
How are we becoming -- but 
that's the problem, all this  
measurement
is not about learning, it's 
about, you know, showing 
activity. Right?
Margaret: So, then there's the 
other  measurement too. Like the
models that you talk  about. 
Which is not just input out, 
input in. You know?
It's a little bit more 
computation
computation.
Sarah: Right. It's a matter of 
learning and thinking.
And predicting how this is going
to behave in the real world.
A good example of this was at 
election
time right around February when 
everything fell in -- fell 
apart. I had to bring it up. 
This isn't my previous talk.
But we're software  engineers. 
We can predict what went wrong.
You know, somebody didn't test 
their stuff.
Somebody maybe didn't test this 
with real users.
So, as
software engineers, we can make 
predictions, hey, if I was on 
that project, we would do these 
things and things would go 
better.
It's ya easy to say it.
For other people, it's not a 
bug, it's a feature
. But when it comes to you as a 
software
engineer and somebody who makes 
the decisions, you have to be 
prepared to answer those.
You have to look at your work, 
did I put a
bunch of tests and something 
happened that it broke in the 
real environment.
Or did I completely skip testing
because I have to meet my
story points at the end of the 
sprint,
you know?
Matt: With, we have -- a 
question came in from the chat. 
And I would love to talk about 
this one.
But the question is, how
is computational thinking 
different than the scientific 
method?
Sarah: I think with the 
scientific -- well, 
computational thinking, again, 
it applies it in  many, many 
things, right? It -- quick 
example of this, actually. My 
kids are in first grade and 
third grade.
And a couple years ago, I went 
to school
to have a program that parents 
going and talk about their work.
So, I went in and I was like, 
all right.
Should I teach you about C
and C++ or Clojure. Let me think
about it.
Maybe teach them a little bit 
about how to think, how to 
approach the problem with a new 
tool.
So, you know, of course, my kids
are stuck with two engineer 
parents.
They have no other way to learn 
a lot of these algorithms
and like pocket  sorting and 
bubble sorting and all of that.
So, I put the little activity 
together and went to school and 
I gave the kids, you know, a 
card. I took a deck of cards. 
There were 20 kids.
So, I just gave them a few 
cards, asked them to sort it as 
fast as they  can.
Obviously the first thing they 
thought is to look for a one and
a two and a three.
As you grow the deck, this is 
gonna take a lot of time. So, 
how about we break up the work.
How about each of you get five 
cards and then we put buckets
, like 1 through 10, 10 through 
20.
You run around and put your 
cards in the correct bucket and 
then you take that bucket with 
your team member and sort it. 
So, we brought the idea.
And basically, broke it down and
fixed the problem.
And then we put together and 
within like 5 minutes we sorted 
all the cards that we wanted to.
So, to teach them how to 
approach a problem.
Like, hey, I know that bubble 
sort is your instinct.
But how about if I teach you a 
new thing. Does that make your 
life different?
Does that help you look at the 
problem a little bit differently
when you go home? That was my 
hope.
And lo and behold the next day 
the teacher sent me
a message and said today the 
kids were writing about Martin 
Luther King.
And they came up with the idea, 
because of yesterday, that they 
wanted
to put buckets on the table -- 
and we're talking
about 6-year-olds -- they came 
up with the idea
about putting buckets on the 
table with topics
. And each child put their 
papers in the proper  bucket.
And therefore they have their 
index and they put their book 
together.
So, with that 5 minutes of 
exercise to give these
kids new tools, a lot of that 
thought about it the
next day and thought about, you 
know, how do I solve this 
problem that I have like a 
hundred pages of papers? How do 
I organize them?
I think that's a point. You 
know?
I think with the computational 
thinking, it's really helping 
everybody
to think about their tools and 
how they're using their tools, 
how are they documenting?
How are they really transferring
that knowledge to the rest of 
the group is
extremely important.
Matt: I think that's really 
important to bring in.
Which it's not really a
-- it's a reframing and an 
approach.
It's having where does that 
impact how that work happens?
And not because you knew a new 
process
, but having that
other perspective is kind of 
firing the neurons a little bit
different to see how this could 
apply, right
? Because those intuitive leaps,
like the example you gave
. Nobody told those 6-year-olds 
that
you could sort things the same 
way you did the cards. They made
that leap.
That's what we do with our human
brain, right? But you have to 
have something to connect, 
right?
And be willing to do that.
Margaret: Yeah, I thought of an 
escape room,
a lot of them are architects, 
real architects, as they say. 
And they thought kind of the 
same. And one friend was a 
brewer. His perspective was so 
different.
One said line up the things and 
look at the shadow. Everybody 
was trueing to make them the 
same height.
Look at them from different  
perspective.
Matt: Having diverse backgrounds
and  experiences might make 
teams stronger.
Margaret: That's crazy.
Matt: Who knew?
What do you think -- as, you 
know, because you spend a lot of
time talking about this.
This is kind of -- what's the 
thing that you think usually
surprises people when you're 
having conversations around this
approach?
Sarah: A lot of times that 
they're thinking 
computationally, but they don't 
know that they're doing it. You 
know? Like a lot of time you 
choose your tools for a reason. 
Or, you know, you have a 
process.
You have an algorithm running in
your head every time you do 
something.
But when you -- a lot of times 
when
you point at them that, hey, you
are
actually thinking 
computationally, you have an 
algorithm running in your head,
it's hard to -- it's like that 
common sense that's not so 
common. You have to tell people,
you're actually doing it. So, 
don't be afraid of it. That's 
one of the things that's 
interesting that comes often.
People are just thinking that
computational thinking and the 
buzzwords belong to people who 
are good with math
and they're computer scientists 
and know how to break a computer
apart and put it together and 
work even better. But the fact 
is that's not true.
I think all of us have -- we're 
at the point, I think one point
in my talk is that with the 
interface that we
have right now and the 
programming language that we are
using right now, you don't need
to have that deep understanding 
of operating system in order to 
do anything.
You know, a teacher or a physics
professor or physics student, 
they can open up a Jupyter
notebook and write a few lines 
of code to test their ideas.
And those are people who I call 
them power users
. You don't need to understand 
any library deep to understand 
exactly how it works.
You just want to run a couple 
data and get some ideas.
Feel free to just, you know, 
copy/paste and just get your 
ideas and see how it works. I 
think that's something that's 
very important.
And the encouragement to tell 
people, you know, just take 
advantage of it.
That's not run your ideas and 
testing your ideas and sharing 
them with everyone around. It's 
so easy a.
Take advantage of it and don't 
just push everything to the last
minute.
Margaret: That's very close to 
Emily
's growth mindset that she 
mentioned earlier, and the math.
Matt: It's like it comes 
together. Like we built a 
program. Note, we did this on 
accident. But Sarah, thank you 
for an amazing talk. A great 
chat.
This has been definitely 
stretching the mind, you know?
We're shaking off, doing some 
thoughts there.
But we will be back at 2:50 
Central time.
Got another fun break video 
coming up.
And this is the part where we 
can see that my AV team
thinks they can pull a Ron 
Burgundy and I can say anything 
on the thing. Tip your sponsors 
with a visit to their voice 
channel.
So, do that, I guess, and we'll 
see you back at 2:50 Central 
time.
Margaret: Thanks, Sarah.
Next:  How to grow a local 
DevOps community without funding
Rafael Gomes
Margaret: Rafael Gomes is an 
impressive DevOpsDays organizer 
and really knows how to scale in
real life.
Not only is Rafael a core 
organizer
for DevOpsDays around the globe,
he's started events and
promotes DevOpsDays across South
America.
He's spread DevOpsDays events to
over 16 cities in his
beloved home country of Brazil 
in just two years.
Rafael: Hi, my name is Rafael 
Gomes. My nickname is Gomax. 
Organizer of events here in 
Brazil.
Today I will talk about how to 
grow a local DevOps community 
without funding.
How we handle it here in Brazil.
Here is the agenda. You can see 
on this slide.
The DevOpsDays is started in 
Brazil in 2010.
Based on the comment of the 
organizers,
this DevOpsDays was a kind of 
failure because they didn't
have the expected amount of 
people attending this event this
year.
I think maybe that Brazil was 
not prepared to receive this 
kind of event.
This event happened one year
after the first DevOpsDays in 
the world, 2009 was in Ghent.
And in 2010 we had here in 
Brazil.
We started the DevOpsDays in 
2016 inport  Alegre. I was 
living there at that time. We 
created a team to organize the 
event.
We started doing the
big step idea and then at that 
time we
had 140 people attending this
event just two sponsors giving 
us one and
two sponsors without any money
. They gave us travel expenses 
and so on.
And one year next we had 
DevOpsDays in  Salvador.
Salvador is the capital of
Bai lle, one state in the north 
region.
In the north region, they 
hadless money than south.
Port Alegre in the south in 
Brazil, and then we
did that event one year after
and had # 1990 attending in that
time. Without  money.
And all the money we used to 
organize this event was from the
subscription.
I'm from Salvador and at that 
time I was back to
do the event in the north.
Here are the challenges that we 
have and then how we handle it 
when we started. The first 
challenge was Brazil is huge.
You can see here, and Europe 
behind.
This is a kind of problem that 
we cannot solve. That we just 
need to handle it.
The next challenge was, we 
didn't have a company to
handle all the financial process
to handle the money, to receive
money and share to
the organization of the cities 
in Brazil. And then we had two 
options.
The first one was the best 
option that we had
. And we're still doing that 
these days.
We used it, the ticket company 
to receive money and then share 
it with the organization.
It's completely legal and worked
well and we're still doing that 
today.
The second option was made by 
the organization from Brasilia 
city. It is the capital.
And this is to happen two times 
and then they didn't organize it
any more because they have a lot
of issues with this kind of 
organization.
They built a company to handle 
money and then they had a lot of
problems with that. Because the 
bureaucracy is too high.
Just one idea,
to shut down a company in Brazil
is really  expensive. And that 
was a really bad idea.
And I advised that you don't do 
this anymore.
The next challenge was building 
a mutual trust with companies.
When we started DevOpsDays here 
in Brazil, the companies didn't 
know what this means.
Because of that, we needed to 
establish the idea and we could 
show.
After that, we could help it 
back
. Then we did that and the 
companies started to
help us sending and paying to
speakers to talk at events and 
so on.
The next challenge was, no money
to start
. When some organizations from 
some
cities start an event, they 
didn't have any money to start.
And because of it, we created a
kind of group in telegram
to share money, to share ideas 
to help each other.
With some events have extra 
money, they can share
with other events to help them 
to start and do other things.
The one thing we did here is we 
created the idea to
funding underrepresented people 
to give a talk in another city
. Because we have really good
people, Black women, Black men, 
and
LGBT people and so on, but they 
didn't have the
possibility to speak because of 
the expense of travel.
Then we created this group to 
help these people to
represent these underrepresented
groups at the DevOpsDays. Did 
that.
We have a lot of good 
representation among the  events
that we have here in Brazil.
The next challenge was, a few 
good people talking about 
DevOps.
To solve that, we create a list 
with
people that they have a company 
behind that can pay these 
expenses.
And with this list, we can share
with the
new cities to offer good
speakers to attend to be invited
to do the talk.
When we have these good speakers
there,
other companies can see the 
events with another eyes and it 
was really good to do that. And 
this worked really well here in 
Brazil.
Just one example,
Maeus Prado gave ten talks in 
the same year. Thank you, 
Mateus.
Another challenge is how to 
handle
marketing without money and 
without people?
And we used the same telegram 
chat to help on that. We create 
a market strategy.
And when something
happened like a paper, and one 
event is starting
and another one, we created a 
strategy to share on this group.
And then have the organizers 
share and they -- on social 
medias and this worked really 
well.
And when some cities that they 
don't have enough
attention, when another 
organizer
that has some people with some 
expression in Brazil, they can 
have attention on this new city.
This works really good and is 
working really well
. And we paid too, but it didn't
work well here.
And another -- the next 
challenge was
no money to sponsor people from 
other countries.
To pay -- to pay travel expenses
inside Brazil is really 
expensive.
To pay for another country is 
really, really, really huge.
We cannot afford.
This is a problem that we still 
didn't -- this is a problem that
we still didn't fix it. We still
have this problem.
We know that how good is shared 
knowledge between countries. 
But, yeah. We don't have money 
to do that, unfortunately.
Sao Paulo already has foreign 
speakers. But, yeah, another 
country is not possible. Here is
the list of achievements.
The first one is, we created a 
really huge community.
Now we can say that because of 
the
population of Brazil, now we are
talking about today
about the DevOps future in a lot
of cities.
We create a lot of meetups, a 
lot of local community.
We didn't concentrate on Sao 
Paulo
and Rio de Janeiro
. We created a big community in 
a lot of cities around Brazil.
And the second one is, with the 
people
from outside the big centers is 
talking about DevOps too.
Because it's really common here 
in
Brazil to have all the events 
just happening
on the Sao Paulo and Rio de 
Janeiro.
Now people are outside and now 
we are talking about the DevOps,
not only about the tools. This 
was really good.
Because in the DevOps event, we 
started to talk about that. The 
DevOps was not only about tools.
And one good thing that we had 
here
was that when we started, we 
started thinking about this 
social impact of this kind of 
event.
We started to
think about how many women we 
had
, Black people we had, poor 
people we had.
When we started, we started to 
think about that and then this 
was really, really, really good 
at the time.
The third one is we are changing
the people's lives.
Because offering knowledge about
this new
thing that is DevOps, we are
offering a better opportunity to
all the people.
We are offering opportunity to 
people outside the big centers 
to get this knowledge. And then 
get better jobs.
Getting a better job, they can 
get more money.
They can get more money to 
thundershower family and then
we can kind of help these lives,
right?
And here you can see the current
state they had. We started in 
2016 with two events.
Porto Alegre and Brasilia.
And then the next year, three, 
and then the next year, 11, and 
the next year, we have # 14.
As you can see, I added south 
and north and north, northeast 
region.
You can see when -- in the next 
-- in each year
, we focused to increase
the amount of events
happening in these regions 
without
funding, without money, without 
good jobs.
Because doing that, we are doing
a kind of
sharing knowledge, sharing 
opportunity with these cities.
And this -- this happened a lot.
I can see that here in
Salvador, we can
talk about the technology 
community before DevOpsDays and 
after DevOpsDays.
We have a lot of people here
. We have a lot of people 
talking about how much they
learned about DevOps technology 
of a DevOpsDays happened here. 
It was really, really good.
And this year that we are 
talking about, we just have two
or three DevOps because it's a 
completely different year.
And then we didn't have enough 
DevOpsDays happen this year, 
unfortunately.
The next step is, first, 
increase our presence on  North 
and Northeast. That's not good, 
yeah. But we are trying to 
increase that number.
The second one is, increase 
DevOpsDays
on other cities than capital 
capitals
. Because this is -- was really 
ease
-- not easy -- it's a little bit
easy to do DevOpsDays in a 
capital. Because they have more 
companies and so on.
But we need to do outside 
capitals too.
Because outside capitals is 
better because we can offer
more knowledge to the other 
people than capitals. Because 
capitals normally is the place 
you have events, right?
And to pay travel between cities
in the same state is still 
expensive to some people.
The next step is founding local 
meetups to help local community.
Because the local meetup
, normally they don't have money
to afford the things.
And then we're trying to have --
we're trying to offer money to 
local DevOps too.
And then another is invite good
speakers awe broad to help
speakers abroad to help us.
Because we know when we find 
people outside of Brazil, we can
share knowledge which is really 
good.
And the last one is increase our
participation in other countries
in Latin America.
And we are trying to be -- to 
establish -- to be -- to be 
there on other countries.
And invite them to be here to
exchange experience and 
knowledge about DevOps
. But down that is really
expensive because we did it too.
And here I need to talk about 
everyone that
helped me to build this huge  
community. I didn't build this 
alone.
I did it with a lot of people 
around Brazil.
We started in Porto  Alegre.
But this grew to a lot of cities
and then I received a lot of 
help.
I want to say thank you to 
everyone that helped me to do 
that.
I need to say a special
thank you to
you that helped me from the 
first day,
in the first day, he was there 
helping me to build this amazing
community.
This helped
on the DevOpsDays port
Porto Alegre 2016. He was not a 
member of the organization team.
He was there as an attendee.
And then he started to help 
without us asking him. He's an 
unbelievable guy.
And I want to say a special 
thank you to Samatrio.
And to everyone that helped me 
to build
this amazing community.
And  now -- now it is, right?
Now it is with this huge 
community helped me. A lot of 
people.
We had a lot of people talking 
about the good
things, trying to help each 
other.
Trying to help everyone in these
bad times here in Brazil. Thank 
you. Thank you very much. And 
now you can see my contact. You 
can see my blog.
You can see my Twitter, my 
nickname is Gomex.
You can find me on Twitter at 
Gomex and Twitter and LinkedIn 
and GitHub. And here is my 
email. Thank  you, thank you.
I see you in the next DevOpsDays
in some city.
If you want to come to Brazil, 
send me an email and we can 
figure out.
See you, bye, bye.
Sasha: Hi.
And welcome to our next breakout
session with Rafael Gomes
. Before we jump into the chat,
I want to mention our breakouts 
that are going on in Discord.
So, one is keeping connected 
while remote.
The next one is Kubernetes and 
development workflow
. The next one is post-COVID 
hobbies.
Someone's being very optimistic 
right now. And the last one is 
talk bay.
If you have never been to a talk
bay breakout, it's actually a 
really good thing to attend.
But also was a really good thing
to attend is this fireside chat.
So, you can stay with us and 
talk some more with Rafael. 
Thank you for joining us.
I'm so glad it worked out 
despite all the technical issues
and all things.
So happy to have you today.
Rafael: I'm happy to be here. 
I'm honored to be here, happy to
be here. Yeah, we had a lot of 
problems. I imagined I couldn't 
make it. But then I sent the 
movie and I'm really happy. 
Yeah.
As you can see, I have an 
English teacher
and then he told me sometime, 
Rafael, we speak fluently, but 
fluently wrong. But, yeah.
Sasha: Yeah, it's -- the measure
of success is, can people 
understand you, right?
Rafael: Yeah.
Sasha: That's the only -- so, 
the thing is, so, before --
actually we were many of us were
in Ghent at the same time.
Me and Margaret and Rafael met 
in Ghent last year
. And that's where we say saw 
this talk.
And it kind of -- I really 
wanted to be
able to -- for us to share
that experience with other folks
around the globe and other folks
in Chicago.
And, you know, when you first 
submitted a talk, I
envisioned you being here with 
us in person and getting to 
experience the city as well.
But I guess we'll have to table 
that for next year.
I do kind of want to
refer to all the obstacles that 
you had to overcome.
Like sometimes being in the US, 
we forget about how difficult 
this can be
around the globe to grow a 
community, to
function without sponsors, to, 
you know, have completely 
different regulations to deal 
with and stuff like that.
And I know you mentioned many of
these things in
why your talk.
To you personally, what was the 
biggest thing that you had to 
overcome during that first year?
Rafael: Definitely.
Because I -- my impression that 
the DevOps community
, DevOps future here is not only
-- not only about the 
technology.
When you build it, with these 
communities, we did more than 
that.
Our focus was never a tool 
technology only
. The part that we interact with
technology is
the part that we can offer a 
better opportunity
to people can have better jobs.
And this is -- this is 
completely different that you 
have before in Brazil.
Because since we know that we 
saw that the -- 
normally the good events happens
on the the South, on the big 
cities.
And when we start to do that, in
the new cities
and the new place in the
North, question
we started to change the 
situation of the
people. Not only the technology.
Our most important thing with 
the DevOps community
in Brazil is to help the market 
and the technology and so on.
Margaret: Other DevOps are big 
about metrics and numbers.
Do you track people who have had
great career success because of 
DevOpsDays?
Do you have numbers?
Rafael: I don't have numbers 
right now. But I can get these.
But I have one really good story
from Salvador
. It's my city, the first 
capital of Brazil.
Now we are in the Northwest, the
poorer region than south, right?
And then we had the first 
DevOpsDays here in Salvador.
And one guy, young guy, he 
started to work as a developer.
And then when we attended this 
first DevOpsDays
in Salvador, he told me, man,
it's Unbelievable the number -- 
the level of these events is 
just so great.
We never had these -- and so on.
And then in the next year, he 
submitted a talk
and was -- he was sharing
out the change he made in his 
company was his first job.
And he had the opportunity to 
get the information because 
these DevOpsDays.
And then we changed -- we 
changed
his job, changed his life and 
then he had the opportunity
to do something different that 
in the other case, in the past
, he needed to
pay expensive travel bills to 
attend a good event in another  
place. Yeah. I was -- that
was a talk about how good it is 
a to have a good event outside 
of the big centers is about 
that.
To offer this level of event
s to new people and offer new 
opportunities.
Sasha: So, this is so important 
to me.
Because when you first spoke 
about this, how sort
of people outside of Brazil
tend to focus on Sao Paulo and
Rio and disregard every oh other
city in Brazil. Brazil is giant.
Or talking about South America, 
it's
a huge place, a continent with 
lots of people. There are IT 
centers and jobs and people who 
need a community in education.
And from the outside of it, no 
one ever sends a
speaker to a small city in 
Brazil. Something like that.
And so, I think I personally 
have become
an advocate for, hey, we should 
go to
this small city in Brazil or 
Chile or whatever. Yes, there is
a community there and it's 
important to be a part of it.
And this is something I'm 
personally very familiar with.
Because I'm from Ukraine, and  
Ukraine, again, is a big 
company.
And people only know Kiev and 
maybe one other city.
It's like, hold a reference in 
the conference is only through 
the capital. Yeah, there's 50 
million people in the country.
We can talk about it some more 
than that
. So, one question that came in 
from the chat
is, were companies in Brazil 
asking for DevOps adoption?
Or was it really the people that
were driving the adoption?
Rafael: We had both case, right?
In the past when we started. We 
had both cases.
Some companies already started 
to do something and then they 
were looking for people to do 
that.
But in that time, I didn't --
I didn't already have the DevOps
as a  role. I didn't see that in
the past.
But I saw both cases
. Most of them are trying to do 
something without the knowledge,
without the skills.
And than they realized that 
DevOpsDays could be this place 
that we could find good people 
to hire. And then we started to 
focus on that. Oh, we are here.
It's a good opportunity to hire
people and then to the 
attendees, wow, it's a good 
opportunity to have a good job. 
And then we started to do this.
But this one is the one big
difference from the DevOpsDays 
events and the DevOpsDays 
community.
When you started that, we didn't
have any
problem to convince the good 
speakers from the DevOps 
community to travel to the small
cities. This is not a problem 
because in the other events it's
really tough.
Because, oh, how many attendees 
did we have? It's such a small 
community. I won't do that. No. 
Here we have the opposite.
We have a lot of speakers that 
they are ready to
attend events on the small 
cities than the big cities. 
Normally it's the opposite.
And then when you build this 
community, you create this idea 
to help each other.
To have the cities, and create 
that like in the
talk, created a list of speakers
that -- the company can pay to 
travel and, yeah.
It's completely different 
because of
it we can help -- can have
the company get the people to be
hired too.
Margaret: You started so many 
groups. At least 16 in Brazil.
How -- what's your advice for 
starting the very first event? 
What's the first thing to do?
≫ The first thing to do is --
is ask  Somatore's help.
My friend, helping the other 
DevOpsDays too.
And then other time people reach
out to help them
, I ask, please, ask Somatore 
help too.
But here in Brazil normally the 
first advice
that I normally give to the 
people is big step. Don't try to
do a big event. It's not like 
that.
The DevOpsDays is not about you 
have too many people.  No, no, 
it's not about that.
The idea of DevOpsDays is shared
knowledge and the local 
community. Try to do that.
If you have just 10 people, 15 
people, that's
okay. Your goal is to share 
knowledge. And then we try to do
that.
And then it's worked well
. Because we didn't have any 
DevOpsDays in
Brazil that that had that full 
attendance.
We sold out all DevOpsDays at 
least two weeks before the 
DevOpsDays happened here in 
Brazil. Yeah.
We never had open tickets in the
day of the DevOpsDays.
Sasha: So, it's really 
interesting.
Like I love that you're talking 
about a participant experience 
and
how important it is like never 
mind how many people you 
actually bring to the event.
It's something that I think, you
know, when we first
started doing DevOpsDays events,
it was really hard to explain to
companies
. And there were many companies 
who wouldn't sponsor and 
wouldn't talk.
And, you know, kind of didn't 
pay -- didn't think that the 
metrics were good enough for an 
event like this. But it really 
is core. The ability to bring 
the local community together. 
And the ability to discuss 
things. And I think that's where
impact happens, right? Because 
you go to a big, big conference.
You kind of get lost in that. 
There's  50,000 sessions.
With many topics that are not 
particularly relevant to you.
And you come to DevOpsDays and 
you kind of feel part of the 
community.
And able to actually discuss 
your work
experiences with the company 
right next door.
I think this also just sharing 
an experience for this year. 
This was a tough decision to go 
virtual. Right?
Because we to think through -- 
and we looked
at, you know, five different 
virtual platforms for holding 
this.
And we were trying to find one 
that would allow people
to connect and discuss and have 
as similar as possible an
experience as to what we do in 
real life
. This was not the question, 
just sharing the DevOpsDays 
Chicago experience as well.
Rafael: Now that you talk about 
the companies, I remember 
something, my second advice was,
don't expect money from the 
companies. Don't expect that. If
you have it, that is fine. That 
will be good, buy new good 
things. You can share. We did 
this a lot. We started without 
any money. Then received one 
sponsor. Okay. Now we can buy 
some T-shirts and we bought. And
which is more money.
Okay, we can have a better 
coffee break and we did it.
And we did like that because if 
you expect too many moneys from 
the companies, we need to follow
his rules.
When we -- the thing that we try
to don't do is something like 
that.
We are -- we're following the 
DevOpsDays community rule. Not a
company's rule. Not the  
market's rules.
Because of that, we try to do 
something small
er, something really simple and 
then we can grow when we receive
money. Because, yeah.
Because normally, we did -- we 
had this a lot in Brazil when it
started.
And when it started, we only 
went to the company, then the 
company,
 oh, okay.
But in the end, I need the list 
of the person
that attended the company -- the
events -- no, no. It's not 
possible. I won't have that. I 
will tell you how many people we
had. I won't give you the list. 
And then, okay. I don't want it.
Okay. I don't want it too.
I don't need you here.
It's -- you -- you ask me to 
sponsor and then this is my 
rule.
This was the most important 
thing that we have here
to remove the dependence of 
money to a big amount of money.
Because if you want to -- if you
want to
follow the the -- your rule, we 
need to
understand that and we won't 
follow the market rule
. Won't follow the rule too.
Yeah, that's what's the biggest 
thing that we need to handle in 
the beginning. Understand that. 
Understand that we can do it.
When we did, the first
DevOpsDays here without any 
money, no money, no money at 
all. And then we finished, I 
realized, man.
We can do DevOpsDays in any city
of Brazil no matter where -- no 
matter how far we had. Let's do 
it!
And then we create a find of 
framework to create a new event 
all over the place.
And then in the next year, we 
did DevOpsDays just increased a 
lot.
We had a lot of DevOpsDays in 
Brazil
because of these events. Because
of that, I just mentioned in my 
talk. Because that was that 
point that
we realized that it was 
completely possible to do that.
Sasha: It's interesting actually
to me because we often feel 
disempowered to do certain 
things.
Oh, we don't have the money, we 
don't have the leadership buy-in
. We don't have the authority 
and title. And whatever.
Like, I can't do this  because 
-- XY and Z and so many 
obstacles.
But your experience really shows
that we could do this.
Like it is completely possible 
as long as you have  like-minded
people to
pull it off just by investing
a little bit of time and a 
little bit of effort and bring 
it all together
. Again, once you have success, 
once you show success,
then people come to you and 
solve sort of the challenges for
you as well.
Rafael: Yeah. I have one 
experience on that.
In the first year, I had one 
company that I won't tell the 
name here for sure.
And then I saw some comments 
from this company that
, oh, no, it's a really small 
event. I won't care.
I won't sponsor, I won't send 
any
speaker or any attendee to these
events. Okay.
In the next year, he asked me to
be a sponsor
because he saw how big we
did -- the things that we did in
Salvador
, other speakers that we have, 
other knowledge that we shared, 
all that we did without money.
Oh, how can I sponsor you? Okay.
Come here. But yeah.
In the first year, we needed to 
show to everyone that it's 
completely possible.
Because in the end, we -- the 
point is not about money
. In the end, the point is 
really about the knowledge.
Because when they started to do 
that, the people
didn't understand and didn't -- 
and didn't believe me. I was 
doing that. I was doing about 
the shared knowledge. It's not 
about the money, not about 
anything else.
And then when the people 
realized that, the people say, 
oh, my god. I would do that. I 
would do that.
Because when it's -- when I -- 
in the past, I tried
to share to a lot of people to 
do DevOpsDays around Brazil.
And then I told, okay, if you 
need my help, please ask me. And
did the people, no, I don't 
believe it. It's not possible.
They don't receive any money. 
Without anything, yeah. That's 
true.
Margaret: It's so obvious, 
you're so  enthusiastic about 
DevOpsDays and making these 
events possible.
That's clearly another key 
reason that all of Brazil and 
South America have done so well.
I'm excited talking to you about
it. How do you keep the personal
attitude going? Like this is 
great, it's gonna be great. I 
can help you.
≫ Yeah, because when we started
, I realized that this is -- 
this
is possible to keep the
community as we think it, as we 
talked, right?
We started with the idea of the 
community being
to help each other and so on.
And when we found that a lot of 
people are thinking the same way
as me, and I didn't start this 
just in Brazil, a lot of people 
supported me.
When they started, they showed 
me that it's possible. We have a
lot of people that we can share 
that, right?
And then I realized that, this 
is a true thing.
It's not a -- it's not a lie.
Because the people normally when
they saw
something  altruistic like that,
then people think, oh, no, it's 
not possible. There is some 
money. No, it's  not. It's 
completely community stuff.
I'm from the open software 
community, from the Linux 
community. We have that in the 
past. We knew that it was 
possible. And then we just did 
it again.
But with some -- something
like more can the than 
technology. The DevOps community
is more like that.
Limits on how it's kind of 
knitted, right? And now DevOps 
is like that.
It's more about the people than 
the community and technology.
Sasha: We're almost at time, but
I'm going to squeeze one more 
question is.
And that is, do you think the 
move to the virtual conference 
is a good thing?
Because maybe it helps save 
money and maybe, you know, it's
easier to pull it off without 
any trouble
. But other than, obviously, we 
have a 
totally different Ent event 
here.
Rafael: Yeah. I don't know.
Because the good things to have 
the conference is to have 
someone in person.
To meet someone in person. To 
have the experience.
Until now, I didn't have a good 
experience on doing that 
remotely.
Yeah, we didn't have it until 
now here in Brazil. But I think 
we keep trying to do that.
But I don't have an opinion 
because I didn't have a good
experience yet besides this one 
that I'm really enjoying here 
and
trying to -- I'm trying to learn
with you,
other people that are doing 
them, we can share other 
DevOpsDays. But yeah. This year 
in Brazil is a really tough  
year.
Not only about pandemics, about 
politician, about all the things
happening.
I think the Brazilian people now
don't have the mind to do 
something.
Because of that, I'm not trying 
to push anyone to do that
. Everybody's busy, it's -- 
there's a lot of things 
happening right now. Handling 
the things. And if you can do 
it, okay. I can try to help. If 
not, that's not a problem. We 
can do this again
next year.
Sasha: Meet people where they're
at. Makes sense. So, we are at 
time. Thank you so much for 
joining us today. I really 
enjoyed having you here.
And, of course, we're gonna 
continue chatting on Discord.
We will return at 3:35 CST. And 
keep your eye on YouTube for 
more fun.
And please visit our awesome 
sponsors in Discord. They're the
ones that make it all possible.
Next:  Chaos:  Breaking your 
systems to make them unbreakable
Jason Yee
Sasha: Welcome back, everyone.
Our last talk of the day is 
brought to you by Jason Yee, 
director of advocacy at Gremlin.
Speaking about chaos  
engineering.
Over the years, Jason has helped
create many
technical conferences in the US 
building this community and
sharing his engineering 
experiences with the community.
And he also makes great 
chocolate
. Jason, please take it away.
Jason: I want you to think about
a time of unmet expectations.
Think about a time where us you 
knew how something was gonna go.
You had it planned out in your 
head.
And reality just didn't quite 
meet that.
As we know, sometimes reality 
just doesn't mesh. You have 
plans.
Maybe it's, for example, I was 
talking with a friend, talking 
about baking a cake. You have a 
recipe, you have instructions.
And we've all seen the great 
British
bakeoff or nailed it, and we 
know that the results don't 
always meet what we expect them 
to.
So, I want you to think about 
these unmet expectations. Keep 
that in the back of your head. 
And let's talk about complex 
systems.
The IT systems that we're 
building today are becoming more
complex.
If you think about architecture 
diagrams, for example, this 
might be what your application 
looks like. There's a lot of 
moving pieces. And guess  what?
This is just AWS's reference 
diagram for a WordPress website.
Used to be simple.
When I used to set these up, it 
was a server,
Apache, some PHP, maybe MySQL, 
and
now you have to think about 
auto-scales and a bunch of other
things.
But that's a simple website.
If you start to think of what 
your companies are building,
what you're working on at work, 
it might look a little more like
this.
When something goes wrong, then,
it means chasing down what 
service was that? Which team 
owns that service?
And oftentimes you're digging 
through dependencies that are 
several layers deep.
The thing is that complexity 
keeps increaseing.
As we've adopted DevOps, and 
started to move
to more automation and to 
breaking up teams and
breaking down silos, it means 
that you're interacting with 
more and more pieces.
And that complexity means that 
things naturally have a tendency
to break.
It means that as we build out 
complex
systems, these  incidents become
more frequent.
And because they're more 
frequent and because they're 
cascading, they cost more money.
Incidents are far more 
expensive.
If we take a look, the average 
US
company spends  $300,000 for 
every hour of downtime.
Super expensive.
So, at this point, it's where I 
introduce chaos engineering. 
Chaos engineering for the win, 
right?
Chaos engineering is what will 
solve your problems. Or at least
that's what I'm paid to say. Who
am I?
I'm Jason, the direct herb of 
company called Gremlin. I have a
bunch of fun hobbies.
Becauses it pandemic, my 
pandemic hobby is  
chocolate-making.
Which is super fun to do and 
doesn't require me going 
outside.
And I've  previously worked a 
bunch
of great companies, Datadog
, MongoDB, O'Reilly media,
the people with textbooks with 
animals on the cover. But when I
say this, none of it actually 
means that much. What it really 
means is I'm a good storyteller.
That's what it means to be an 
advocate or abevangelist
. And when I mention the 
companies that I've worked at
, I  worked at places that hire 
smart people. Some of the 
smartest people I know. Which 
maybe means that I'm smart. 
Maybe not.
But it means that people listen 
to me.
When I say chaos engineering is 
what you should do, they 
naturally adopt that.
Now that I've destroyed all of 
my credibility, let me talk a 
little bit more about chaos 
engineering.
I used to give these talks about
chaos engineering. About how we 
did them at Datadog.
There's a link right here on the
slide about how we did
chaos engineering at Datadog and
it has a bunch of fantastic tips
and how we walked through the 
process and everything that we 
covered. And it's really great 
tips.
But when I say how we did chaos 
engineering at
Datadog, well, the footnote for 
that is, that's the royal We.
Meaning it's actually how they 
did chaos engineering. How the 
engineers at Datadog did it.
Because, again, I'm just a 
storyteller.
So, now I'm working at Gremlin. 
I left Datadog at the end of 
last year. I joined Gremlin in  
January.
And I joined it particularly 
because I think chaos 
engineering is a fantastic 
practice.
Gremlin, if you're not familiar 
with the company, we build a 
chaos engineering platform.
We make it easy and safe for 
people to do chaos engineering.
Just a quick shoutout, though, 
if you want to get
more involved in chaos 
engineering, Gremlin 
Gremlin.com/community. You can 
find a ton of great resources 
there.
Not only about chaos 
engineering, but about 
reliabilities and building SRE 
practices.
But one of the things about 
joining Gremlin
is that I was now responsible 
for chaos engineering.
I leave the internal chaos
engineering practices within 
Gremlin, I lead
.
It is no longer how they do it, 
but how we do it. A quick 
rundown of chaos engineering.
A lot of people have heard of 
Netflix's chaos monkey. A 
program that would randomly 
destroy or take down servers.
I hate to tell you, that's not 
really chaos engineering
. Or at least not how you should
be doing it.
Chaos engineering is really 
thoughtful, planned experiments 
designed to reveal the 
weaknesses in your systems.
Both your technical systems, 
things like your user 
experience. Where that may be 
broken.
Or if your monitoring or  
alerting is working properly.
Or if you're building more 
reliable, resilient
systems, things like circuit 
breakers or retries or time 
outs.
But also for our human services.
Things like our processes. Who 
is on call? How does the 
escalation process work?
Is there documentation or it 
there runbooks when there's an 
incident?
And it's pretty much the 
scientific experiments that 
we're used to or that scientific
process.
You start with a hypothesis 
about how your system is gonna 
work. Particularly in terms of 
reliability.
But that hypothesis could just 
be simply that
if your data store crashes, your
application fails. But at least 
you get some alerting on it. It 
doesn't have to survive. But 
start with a reasonable 
hypothesis.
And then experiment to test 
whether what you think you know 
is true.
After you run that experiment, 
you analyze the results and you 
want to share them.
Because as with all things 
science, sharing that knowledge 
is the important part here.
Helping everybody learn through 
what you're doing.
And finally, after you analyze 
those, you can start to iterate 
and improve your systems.
But I want to come back to those
unmet expectations.  Right? 
Those things that didn't go 
quite right.
And if you were me, well, it was
taking
that knowledge from Datadog
about how we -- or how
they -- did chaos engineering 
and bringing it back into 
Gremlin
. Gremlin is more like your 
company than Datadog was
. Datadog was a fantastic
place, but Gremlin when I joined
was like your company. We 
started with chaos engineering, 
or it was interesting.
But as we grew, a lot of the 
engineers had never actually 
practiced chaos engineering. 
They had never even used our 
product.
So, as we grew, we had to evolve
and start to adopt these 
processes. And redefine them.
And as the owner of that, well, 
my expectations didn't always 
meet reality.
So, I want to share the top 
three challenges that I faced in
rolling out chaos engineering or
evolving chaos engineering in 
Gremlin
. And the first is just a lack 
of time.
There's always a struggle 
between building new
features in a product and 
working on reliability
. And if we're spending two 
hours
or more to prep and run a game
da I and run reports and 
multiply
that by your entire engineering 
team, that's a lot of which 
equals a lot of money. It's 
extremely expensive.
The second is that there's a 
lack of process. As mentioned, 
we had a lot of engineers 
joining the company.
And so, there wasn't clarity 
around where to start. What 
should you attack? How big 
should your attack be?
Really, what's the process?
The third was a lack of priority
. As a we're
running GameDays and doing chaos
engineering, it can generate a 
lot of tickets.
As engineers, we're designed
to see all of the flaws, every 
minor detail of the product 
we're building and we can 
generate a lot of tickets.
The first
few Guamdays, they would have a 
dozen or more tickets.
And the engineering managers 
would complain to
me, how do you expect me to run
another GameDay, you have 
generated so many tickets,
there's no way I can meet the 
current sprint, let alone meet 
my target deadlines.
We had to come up with creative 
ways to solve these three 
challenges. The first solution 
was to make the process smaller.
If you're using 90  minutes  two
hours to run
a GameDay, can you cut that 
down? What if it was 30 minutes?
Also, if it involved the whole 
team, can we make it
smaller by only having a few 
team members
do it and then  rotating through
them so the entire team gets 
experience? So, that was the 
first challenge. How can we get 
this down to 30 minutes?
Well, one way to do that is to 
make runbooks.
We're all used to  running 
runbooks or following playbooks 
when an incident happens.
Tell us exactly what to do, in 
what order to do it so we don't 
forget everything.
We don't have to think about all
the minor issues when we're 
dealing with an incident.
The same thing happens for chaos
engineering.
If we have a runbook, it guides 
people through the process. 
Makes it a whole lot easier to 
hit that 30 minutes.
That run leads
book leads into the final 
solution, that is to limit the 
scope.
As mentioned, we see all the 
bugs and so we can generate a 
lot of tickets.
With a runbook, we can focus 
people in it to ask one 
question.
And that's, what's the one thing
-- just one thing -- that
you can do that would make the 
biggest impact on reliability
? Take that one thing, make a 
Jira ticket for that and 
prioritize that.
So, I took these three ideas and
I put them
together and created a program 
that I called
MINI GameDays.
Mini gamedays use the tools 
we're used to using. At Gremlin,
we use Donut.
It's a Slack bot to build in the
organization.
It's a water cooler effect. 
Introducing you to people 
outside
Engineering, people in HR or 
legal or  marketing.
The great thing about Donut, 
though, is it creates a little 
Slack channel for you and a few 
other people.
In this case, three random 
engineers from each team
. It automatically generates a 
Google Calendar based on the 
times that they're free.
So, it will suggest that, hey, 
next Thursday you all have 30 
minutes free.
This is a good time to do that 
mini GameDay.
The other thing it does is 
creates a Zoom link.
It automatically gives you a 
nice place to community as 
you're running the GameDay.
When it comes to that runbook, I
created a Google Doc, or a 
Google form for that.
The great thing about using a 
Google form rather than a 
document
of just text is it now allows 
people to answer
or to place information directly
in that run book.
This makes a whole lot easier to
do the  reporting afterward.
When it comes to that run book, 
this is how we break things 
down. Again, we're shooting for 
that  30-minute time frame.
So, we want to spend 5 minutes 
assigning roles and selecting 
the scenario.
Those roles, as mentioned, 
correspond to three engineers
. The first is what we call the 
Game Master.
The Game Master is the person 
that will launch the attack
, evaluate for any abort 
conditions or any
problems that the end user might
be having and actually
call a failure on the test or 
cancel the test.
The Responder is the person 
that's gonna actually be playing
the role of the end user.
So, they're gonna be evaluating 
the application. They'll be also
in the monitoring tool.
So, they'll also sort of play 
the role of your SRE
or your Ops responder.
If an incident needs to be 
declared, that'll also
declare that incident and 
implement any sort of run book 
that you might have on hand. The
third person is the Scribe. And 
this person is really there to 
take notes.
You want to take notes as you're
going because it's much
easier to capture the 
information in the moment than 
to try to
re-create it and rebuild it 
after the fact.
When it comes to scenarios, we 
try to keep things extremely 
simple.
Again, coming back to the snuck 
method, you have a hypothesis.
Simply work to verify what you 
think you know.
Some examples of this are 
raising the CPU
by a certain percentage and 
verifying that
it can be seen in the monitoring
tool.
Or killing a pod or a VM and 
seeing if it automatically 
restarts.
And when it restarts, if it 
connects to the right services, 
the right data  stores, and 
handles sessions properly.
Or blocking access to a 
third-party dependency
and verifying that you're 
getting the right alerts or the 
right errors.
After that initial 5 minutes, 
the next 20 minutes is
the bulk of the time spent on 
running the attack and observing
those effects. And it's pretty 
simple.
It will change for each 
organization, but the general 
gist of
it is, notify your Ops team or 
notify whoever is on call that 
you're running a GameDay. After 
you do that, you can start the  
attack. And then validate your 
monitoring. Ensure that you can 
see that attack.
Ensure that it's  throwing the 
right alerts or errors.
And if it is throwing alerts or 
errors
, then look for any run books or
documentation that would help 
you solve that issue.
So, it's pretty
straightforward and often times 
it doesn't even take the full 20
minutes.
The final 5 minutes is spent to 
create the ticket.
And again, this comes back to 
that  prioritization
. We ask, what's that one thing 
--
just one thing -- that you could
do to make the biggest impact on
reliability?
You take that one thing, you 
create your Jira ticket
. You record your Jira ticket in
the form.
From there, the managers can 
prioritize things and 
everything's great.
Now, during this time, you're, 
of course, gonna think of those 
secondary and tertiary things 
that you want to solve. Hold off
on those.
Work on that one prioritized 
ticket, solve it,
run another GameDay to verify 
that it works, and if it
works, then your new one thing 
now becomes that secondary item 
from that initial GameDay.
And it allows you to work things
down without having to
juggle dozens of tickets and 
work on prioritization.
Some of the other tools that we 
use,
we use Slite for notes, it's 
like Google Docs, but we like it
a little bit better.
The searching tends to be better
for us and we use markdown. We 
use Gremlin for chaos 
engineering.
For us, the goal it so help 
dogfood the product
. Get familiar with our own  
users' experiences.
So, just to recap, a few of the 
takeaways. Focus on smaller 
GameDay teams.
It's nice to have the entire 
engineering organization doing 
GameDays. That's great.
But it's really hard to schedule
and it takes a lot of time.
So, focus on smaller teams.
And then rotate through your 
engineering org.
Second, focus on  shorter 
GameDays.
As mentioned, spending a lot of 
time is extremely expensive for 
your company.
You can follow the mini GameDays
and hone things down to 30 
minutes.
You'll make things a whole lot 
cheaper in order to
run GameDays, and you'll have 
huge gains in terms of 
reliability.
Thirdly, simplify your attacks.
It's easy for us to want to go 
out and go crazy and
reproduce our past incidents, 
have 
cascading failures, and just 
blow everything up.
But really focus on simple 
attacks that validate what you 
think you know. How does your 
monitoring  working? What does 
alerting look like?
If you're auto-scaling, you're 
doing retries, test the way 
those work the way you think 
they work.
And finally, focus on a singular
outcome.
What's the one thing you could 
do that would make the biggest 
impact on reliability?
In short, start small, build a 
practice.
A few of the resources that I've
mentioned, running GameDay, 
again, that's the video
of how we did things, or how 
they did things at Datadog. It's
still got a ton of useful 
information. So, I highly 
recommend that you watch it.
A few more resources. Gremlin.
com/community. Again, that has a
bunch of information about chaos
engineering. But also about SRE 
practices and reliability. You 
can try Gremlin for free.  
Gremlin.com/free. You can sign 
up for a free  account.
It gives you access -- limited
access to Gremlin to run black 
hole attacks and CPU attacks.
And finally, if you're 
interested in joining the larger
chaos engineering community, 
we're hosting Chaos Conf again 
this year. That's October 6-8. 
Encourage you to join the 
conference there. It's 
completely free. Thanks again. 
As always. Feel free to
reach out to me.
So, we're going to have some Q&A
time after this and I'll be 
hanging out on the Discord 
channel.
But if you think about after the
conference, feel
free to email me, yj
jye se e. And before we end, I 
want to do one more thing. I 
have a lot of privilege.
I have been given the privilege 
to be on stage and talk to you.
I like to use any privilege to 
shine a spotlight on others 
doing amazing work.
In this  case, the southern 
poverty law center.
Splcenter l  center.org.
They have been doing work, 
fighting the Klan and other 
groups in the US.
I highly recommend that you 
check out their website, check 
out their work and support their
work. Thanks.
Matt: Yeah, this is better. All 
right.
Before we go into our
last fireside chat, upcoming 
breakout talks.
You could go chat about crazy 
Terraform  things. Diagramming 
for DevOps.
Sounds like dualing for dollars,
I don't know. Kubernetes network
policy.
Licensing open source versus 
paid and on-prem versus SaaS.
So, in other words, like all the
different ways to spend money or
not.
And then Yakamel, quote, fun is 
what it says. So --
Sasha: All of these must be fun.
Matt: There we go. Hey, Jason. 
Thanks for  joining us.
You know, we kind of felt like 
watching
our last talk, we thought you 
were going to try to sell us a 
Mac or something.
You missed an opportunity for, 
you know, one more thing. Yeah.
So, let's talk about
some chaos.
Jason: Yeah.
Sasha: Yeah.
Matt: With regards to chaos 
engineering, a question that I 
think comes up is why would you 
break things on purpose?
Why, you know, aren't we trying 
to avoid system failure? So, 
like, yes, you're breaking it on
purpose.
But it's still  broken, right?
Jason: Yeah, yeah. Absolutely. I
mean, so that's a question that 
I get a lot.
I talk to customers, I talk to 
prospects.
And everybody's just like, you 
know, why should I break things?
They just break on their own.
And my number one response
is that when things break on 
their own, they
break on their own timing and 
they break on their own 
schedule.
And if you have ever been 
on-call, would you
rather have something break in 
the middle of the day when 
you've
got colleagues around and you 
can fix it, or would you rather
are have them break at 3 a.m. 
and wake you up?
It's obvious that if we can 
control how we break things and
learn from them and resolve 
them, then hopefully they don't
happen in the middle of the 
night when we're off trying to
sleep or celebrate, you know, a 
party or something like that.
Sasha: Or, you know, be relieved
that the DevOpsDays Chicago is
finally over and we can all get 
a drink.
Jason: Exactly.
I guess that's the other thing, 
when they break on your own, you
night be out drinking.
I don't know if other people 
have done this, I have had times
when I was drunk and needed to 
fix something.
I definitely shouldn't do this.
Sasha: I have definitely done 
that.
I don't drink anymore, but I 
have definitely experienced that
and not a good experience for 
everybody.
Based on what you said, comes 
back to the experience of 
getting good at this, right? 
Practicing the process.
Kind of luck like a football 
team, right?
Practicing a process is # 
process 100 times so when the 
real stuff
hits the fan, you know what to 
do.
Jason: Absolutely. Everything we
do takes practice. You're 
building up those skills.
And then along the way, you're 
refining them as well.
not enough to say we did
a fire drill and practiced 
Octeon practice on-call.
Every time you run it, the run 
book is out of date
or not quite as informative as 
it needs to be. Here are the 
things that it assumed I knew. 
The state of the world that it 
assumed.
Now there's a new service that's
in play or other things like  
that.
Matt: So, kind of one of the 
things that comes
to mind is, you know, if you're 
only testing the things that
you've thoughtfully planned, 
doesn't -- does that limit
your discovery to only those 
things you could think of to
try so to speak, right?
Jason: Yeah.
And I think that comes back to 
in my talk that whole idea
unmet expectations is as much as
you try to plan for things,
you inevitably find things that 
don't match reality.
One of the things I love doing 
for chaos engineering, you think
your application works in a 
certain way, right? So, you have
services and you're like, cool. 
I know that I'm dependent on my 
data store.
But there's this other service 
and it's only like -- it's  
non-critical. It provides some 
extra functionality. If that 
dies, you should still be able 
to operate.
And so, you go and you just 
isolate
that from the network and 
everything falls over, right?
So, you plan for certain things,
and really it comes down to 
validating what you think you 
know and if that's true.
And oftentimes what we think we 
know isn't correct. Or not in 
the way that we think it works, 
right?
And so, there's sort of this 
misconception that chaos 
engineering is figuring out how 
things break.
And I think that's -- that's 
sort of the glass half empty 
approach.
The glass half full approach is,
how does this actually work, 
right? Like how does your car 
work?
Well, we know it runs on tires. 
So, you can't take out a tire.
But if you start ripping things 
out from under the hood, will it
actually work? What's necessary?
What's just part of the 
air-conditioning? You don't know
that.
And often times as
emergency room engineers, we 
don't know
as engineers, we see where 
things might
break and points of failure, but
ultimately, unless we try them 
out, it's all conjecture.
Matt: Isn't that sort of the 
thing, right?
You have the hypothesis. But 
your hypothesis is that this 
will not break, right?
From our understanding of the 
system, if I shut
off this interface or throttle 
this, my hypothesis is that 
things will continue to work.
And I always like this think, if
your hypothesis is
not that, then you should not 
run that experiment.
Jason: Exactly.
Matt: If the hypothesis is 
everything will go terrible and 
I will lose a million
dollars, you should fix that 
because you know that.
Sasha: For the expectation of 
breaking things, back when I 
worked for a small cloud 
company.
We had an outage because a 
lightning bolt struck a data 
center. Okay. Starting off like,
what?
So, my particular product, we 
thought it was
distributed between data centers
except
it had a part that wasn't 
distributed.
That was in that particular data
center and that was the part 
that just got
completely throttled and we 
couldn't do anything else 
because that one piece wasn't 
working.
So, it didn't matter that the 
other parts were geodistributed.
And that's a learning experience
you can't really predict.
But so, the thing about
that is, is that you don't want 
to do that to your customers, 
right?
When we do red teaming, we never
break things that will affect 
customers on purpose. Right?
Jason: Yeah.
Matt: That's right.
I guess -- and maybe this is 
oversimplified.
But, again, my point is to test 
the hypothesis that I've 
actually built a reliable 
system, right?
So, in that case, you know, in 
the case
of your non-distributed system 
as it
turns out, you had a hypothesis 
that it was distributed.
Assuming that the lightning 
storm was a chaos experiment
, and by the way, that's 
impressive
. When is Gremlin coming out 
with electrical storm on demand 
is what I want to know.
But you are testing and you 
would find out that your 
hypothesis was incorrect and you
learned a thing.
But if your hypothesis was, if I
knock out this data center
, everything will go to shit, 
yeah, don't test that hypothesis
. Like get yourself to the point
that your
hypothesis is that your stuff is
reliable.
Jason: Yeah.
But you also bring up that good 
point of trying not to impact 
your customers, right?
So, there are a number of times 
where you're testing and you do 
know that it's a little bit 
risky, right? Something could go
wrong.
So, everybody always  thinks of 
chaos engineering, oh, I just do
this in production, right? Yeah,
eventually you want to get 
there.
But it's like, I mean, you don't
just like write new code and 
deploy is straight to 
production.
So, what makes you think that 
you should do the psalm with 
chaos engineering? Start in a 
development environment. Move 
into staging, move up to 
production.
Get comfortable with this rather
than just like yellow Ops.
Matt: So, when we think about 
adopting the practices around
this, like, what
do you see as kind of the 
percentage or the split or
the focus of,  like, your 
infrastructure
eng versus your app Dev like 
your
SRE versus SWE kind of is taking
on these chaos experiments as an
ownership for people?
Jason: Yeah.
So, it's funny because I had 
always -- because coming from 
like
that DevOps-like background and 
most of us have
luck
have like some Ops experience. I
have Ops experience.
But I'm doing these experiments,
blocking network traffic and 
like spiking CPU. That's like 
infrastuff.
I always  assumed it's Ops 
engineers and SRE folks that are
super into this.
And the funny thing is, the more
I talk to people
, it's actually been more 
software developers
and software engineers.
And my hunch is
that's because as we've moved 
into a more DevOps
world and breaking down silos 
and we have  cross-functional
teams and essentially software 
engineers are owning
those services, they're more 
invested in that idea of what 
happens, like, yeah, my code's 
good. But like what happens if 
something else happens? Like how
does that affect my application?
And I think they're also more 
sensitive as we've gone into a
more service-oriented world of 
like what happens when
somebody else's service or some 
other like third-party service 
that I'm using, what happens 
when that goes down?
So, it's been an interesting 
thing to discover that more and 
more
software engineers are looking 
into chaos engineering.
Sasha: This is actually 
interesting because we went into
titles.
And so, I have been saying this 
for the past year plus.
Like, I want a job title of the 
chaos monkey. Like I want to be 
paid to break things on purpose.
To be that person.
Matt: But you want to be called 
the monkey. Not a chaos 
engineering, you specifically 
want to -- yeah. Just so we're 
clear.
Wouldn't you be a chaos chat 
bot.
Sasha: Only if I have 10,000 
followers. Yes.
But so, I will say, it's an 
interesting problem to solve, 
rights? right? It's kind of like
red teaming again. It's kind of 
like breaking things.
What -- how can I break this 
system the most ingenious way? 
Like how can I think of that 
lightning bolt? Like that 
experiment.
And I know that now there's 
companies who actually do this 
on purpose.
But also there's definitely a 
part of this where it should be 
a
human job because, you know, 
we've got imagination on our 
side.
Jason: So, you would, you would 
say that it's yes and no.
No more most people doing chaos 
engineering, you don't want to 
do that super-creative stuff.
You want to think about your 
systems and start with the  
basics of like, does my 
monitoring work? How do you know
your monitoring works? I worked 
at Datadog.
It could be complete BS, like 
random number generator.
You actually have to play with 
your system and be like, this 
should spike CPU.
Look at your dashboards and does
it do that? The simple stuff is 
what you want to start with.
Getting creative and replicating
incidents is definitely 
something that you can get to.
But you feel like that's less 
about testing your systems
systems as it is to
having a safe place to replay 
the
incident and test your processes
and how are people going to 
react
and what did they learn how to 
react in an incident?
Matt: I think it goes to 
reliability and resilience.
What we're talking about with 
chaos engineering with the 
experiments
with the 
hypothesis, we're testing 
reliability.
Jason: Yeah.
Matt: Whereas the whacky weird 
shit that the chaos monkey wants
to do, that's resilience.
You're not going to build 
technology -- like your way to
adapt to Godzilla
comes and kicks over the  
transformer is the adaptive 
capacity. You do that with the 
sociotechnical system. The chaos
is not about finding a way to 
break.
It's proving your -- your 
hypothesis
that puff built a reliable 
system.
Sasha: So, actually, this kind 
of brought up a question in
my mind which is, is chaos 
engineering, the way
you see it, at least, is it 
about systems? Or is it about 
people? Like what are --
Jason: I think it's about both.
Sasha: The system is resilient 
or are you 
trying to train people to be 
resilient in a case of failure 
or something?
Jason: Yeah. It's definitely 
about both. When you're starting
out, it usually is about the 
system.
Because, again, you're just 
trying to verify that the thing
you've built works in the way 
that it's supposed to or that 
you think it's supposed to.
But as you do that, like running
GameDay
s is a great chance to practice 
your incident command, right?
And to start to build up those 
human processes as well.
So, definitely is a both.
Matt: It's a safe place to 
basically create that
sort of psychology, 
physiological
association of incident 
response, right?
So, here's a kind of a little 
chaotic maybe
. Is why the hell do you 
organize so many
conferences, Jason?
Jason: Why do I organize so many
conferences? I get roped into 
things. I don't know. Like, and 
I'm an idiot. Like clearly.
Like, I -- that was my whole 
slide point was like, I don't 
know why people listen to me.
Like, I have this credibility on
my slide.
But like, I'm honestly just kind
of an idiot. I just happen to do
some things really well. And I 
like community. So, maybe that's
it.
Is I'm just like addicted to 
building communities because I 
love interacting with people.
Or I can just claim
Sasha and Ken for --
Sasha: For at least one.
You know, it's funny how I think
it was in Brian's video
, you join a happy hour once and
you find yourself
doing community work for 10 
years.
Matt: Well and Brian it wasn't 
even a happy hour, he just got 
confused.
Jason: I don't mean some of it 
too, y'all are great people.
The people that I've met through
DevOpsDays, whether it's
at like DevOpsDays events or 
just like other
events are some of the most 
incredible people that I've 
ever, like, hung out with. And 
that have had the honor of 
calling  friends.
When you get to hang out with a 
bunch of people and spend a 
bunch
of time with them and build 
communities
and build spaces where people 
can hang
out and like build new 
friendships, like that's super
-rewarding. So --
Sasha: And I think personally to
me, like I know how, you know, 
DevOpsDays in particular had an 
impact on my career.
But also like I've had these 
moments over the
years where people come up to me
and they're like, oh, my god, if
it wasn't for this conference I 
would never have X, right?
And you have this, like,
sentimental like, you know, I 
want to cry right now and hug 
you forever.
Like, because you realize that 
what you do actually matters
and it's bigger than, you know, 
pushing a line of code into 
production.
Matt: I think conference 
organizing is really about being
a force multiplier, right?
We're  bringing together all 
these resources and stuff
and being able to sort of expand
it and share it
and do everything at a greater 
level than you would just by 
yourself. You know?
Sharing your thoughts on, you 
know,
how to chaos test which Pokemon 
team you should be on. Or 
whatever.
What do you think is the 
likelihood that whiskey
and chocolate will become a 
factor
in a updated release of Pokemon 
Go? Like maybe as drops or 
anything?
Jason: Probably not that. But 
definitely a factor in my life. 
So, yeah.
Sasha: Am I supposed to say -- 
sorry.
Jason: I don't know.
Did you receive the chocolate, 
though?
Sasha: I did not, actually.
Jason: You should check your 
mail.
Matt: So, hundreds of people 
listening to the livestream, 
we're going to
take a minute to talk about
logistics of chocolate from 
Jason to Sasha.
Talk amongst yourselves, 
infrastructure as code, or 
infrastructure nor code. Go 
discuss.
Do you want to get a tracking 
number from him or something
? So, I'm gonna ask you one more
serious question.
So, is it a -- this is coming 
from the chat -- is
it a goal to automate a 
completed experience as a future
test, right? Like you're 
continually testing, you know.
Like how does automated chaos 
engineering work or for that 
matter not work?
And does this alleviate some 
burden on the humans in the 
system?
Jason: Yeah. Yeah. Absolutely.
So, I mean, we talked about 
before, is it -- is
chaos engineering more for 
testing your technical systems 
or your human? And I said both.
But obviously for the technical 
side, right, like you're an 
engineer.
And any time you do something 
manually more than a few times, 
you should automate it. That's 
just standard engineering. And 
so, it's the same with chaos 
engineering, right?
If I have
a hypothesis that if my service 
goes down,
it should automatically restart 
and be added back to the load 
balancing  pool, that's an easy 
thing to test for. You can start
to just do that.
As it rolls out, one of your CD 
pipe
pipelines is deploy to the chaos
environment, kill it, see if it 
comes back.
So, you can start to do a lot of
the basic stuff as
automated tests.
Sasha: So, you're going to add 
on to your unit tests and now 
you have system failure tests?
I don't know what to call them.
Matt: You're not testing 
failure, you're  testing the 
condition.
You're not trying to make it 
fail, but, again, if the
thing you're testing is if the 
network connection is throttled.
Sasha: Is there a name for this?
Is there --
Jason: Yeah, a bunch of people 
have called it chaos testing, or
reliability testing. There's no 
industry-defined term.
So, you can just make up some 
silly name
. Chaos engineering in itself is
a silly name.
Matt: Please feel free to fight 
about made up words on Twitter.
We're going to come up with a 
new name for something that 
probably already has seven 
names. Yeah. I think that's the 
thought leadering that we're 
going to leave here with.
So, Jason, thank you for sharing
your thoughts on the
Pokemons and the whiskeys and 
the chocolates and
also the chaos. There was our 
last fireside chat. For 
DevOpsDays Chicago 2020. We got 
one more thing to  go.
We will be back at 4:20 Central
time for our second round of 
Ignites. Jason is a prolific 
Ignite speaker.
So, I'm curious to -- yeah, let 
us know
of the chat what you think
of the Ignites we have been 
rolling so far today.
Jason: Yeah, they have been 
great so far.
Matt: We'll see you all for 
Ignites.
Next
:  Ignites Round Two
Matt: It is time for round two 
of our Ignite talks.
Our Ignite speakers for this 
round
are Aaron Aldrich who is an 
organizer
for DevOpsDays Hartford, New 
York City,
Boston, probably any other East 
Coast DevOpsDays that needs it.
Amy Negrette who is an expert in
building
serverless apps and an awesome 
member of the
AWS Chicago user group.
And Brendan O'Leary, GitHub.
And Renata Rocha, get your 
French skills ready for her 
talk. And here we go!
Aaron: Hey, thanks.
I'm Aaron Aldrich and I'm gonna 
talk to you about how
90s band Cake has been singing 
about DevOps this whole time.
So, that's, right, you can put 
30 years of
DevOps down instead of just 10 
if you're keeping track at home.
First thing, that talk has a 
playlist, there's a link on your
screen,
that's going to create a sound 
track or keep the window hit. 
Welcome to prerecorded talks. It
could be anything. But we can 
work with this.
This concept not new, you
may have heard about Nick Harvey
talking about Run DMC in the 
'80s and  carries this forward.
I want to talk first, this quote
inspired this as I was talking 
about  observability.
It's true, she doesn't care 
whether or not the data center 
is actually on fire just as long
as you're still delivering 
value. Think about that.
It really doesn't matter to your
customers what the technical
details are of your failing, 
they just want to make sure that
they're still getting what they 
paid for. Whether I go to
Netflix, I don't care if the 
whole US east is down, I want to
play umbrella Academy, I'm bored
right now.
This is true, we want to deliver
SLOs to be
sure customers are  getting 
value, not just technical stuff 
is working today, right?
And implant blame is a four 
letter word.
This is the name sake, blameless
postmortem
s is being better than pointing 
fingers.
You want to learn stuff instead 
of figuring out who done it.
What better example than the 
long line of cars in LA traffic.
There's no root cause or a 
single point of failure in this 
system. There's a lot going on. 
We're trying to get to where 
we're going at the end of the 
day.
We accept the gray areas and the
fixes and get it working.
At the end of the day, we need 
to just deliver the value
. It's not me to blame, it's not
you to blame,
it's follow all because of us, 
but stuff goes -- does all 
because of you and because of 
me.
So, maybe the lyrics don't work,
or maybe
they work better this YouTube 
link, it's a
talk about doing blameless 
better by being aware of the 
blames we have. I will put it in
the talk resources.
We can talk about how say Satan 
is our motor. How many of you 
use Amazon web services? This is
a sea of raised
hands, but know everyone is 
nodding. And we all use Amazon 
web services from Amazon.
We recognize they're not a great
company to give money to.
They have some maybe 
anticompetitive practices and 
maybe have forced
workers to work during a 
pandemic with little to no sick 
leave. Yeah, kind of a problem. 
But we use them anyway.
And we don't want to wonder if 
that's a mistake, we don't want 
to
worry if we're going to stay 
together, we just want to build 
the services we want to build. 
Think about vendor lock in. This
was big. We want to build it, 
don't worry if it's going to 
work.
Serversless is big, don't care
about the underlying details, 
the systems to run it, I want to
write code.
If it's called and works, if you
want more, maybe more resources 
show up. I don't want to think 
about it. I just want to make 
it. But all of this costs money,
right? The more and more we do.
We have all these services that 
cost money, data has to flow in 
and out.
We have to hope that our VC 
funds hold out to
pay our Amazon bills to keep the
services up and running.
We have to go
to the reINN vent and learning 
about the new things.
If you're having trouble with 
the bill, tell Corey
I sent you and maybe they can 
kick something my way. Would be 
great.
They nailed it with stick shifts
and
safetity belts, bucket seats 
have got to go.
We don't want manual processes, 
we don't want to
toil to do something that a 
computer can do something 
without thinking about it.
We don't have a safe service or 
how to do agile correctly.
We want to figure out, am I 
delivering value and are we  
getting the right feedback and 
are people getting the things 
that we promise? Doesn't have to
be prescribed.
Just build the corrections and 
the relationships.
And think about certifications, 
these are highly prescriptive. 
We pass the test. We don't 
understand why we did in the 
first place.
They're sack
sack official rights, that's the
point.
Welcome to the DevOps cult, 
right? That's the whole idea.
Deploy your Jenkins pipeline
s and Ansible discriminate r 
scripts and you can be a 
DevOpsDays. As soon as you're 
born, you start dying. Might as 
well have a good time.
You can find me, the details at 
the talk there, and me at the 
other things on the slide screen
as well. And I'm in the chat 
channel.
So, see you around.
Amy: Hello, thank you and 
welcome to serverless monitoring
checklist.
We're going to go over how to 
monitor a serverless 
application.
My name is Amy Negrette,
I have been build apps for a 
little over 12 years and it's 
important for
me to know that they work and I 
can prove it
. So, what is serverless?
It sounds buzzwordy but, it's a
cloud architecture using cloud 
parts instead of making new 
ones. General monitoring goals 
is to know what an app is doing.
Is it up? Down? Over and 
under-utilized? And how is it 
doing?
Important thing to know that it 
is not testing.
Monitoring explicitly identifies
a behavior that can be acted on 
through metrics.
So, traditionally, all of these 
logs
can be found by bootstrapping 
the server to dump all the logs 
into one place.
The request failures 
utilization, all of those
we can know immediately and know
where to find it. However,
when we do it serverlessly, what
we end
up having is it ends up going to
different places depending on 
the service. Whether or not it's
down.
We may not care if it's 
under-utilized anymore.
And the only people that end up 
knowing what it's doing is your 
Devs.
So, if it's down, that means
a component has gone 
non-responsive,
preventing the full function of 
a service and it will go down 
with or without you.
What we actually want are no 
surprises.
Some services that are never 
down, like queues and
ETLs are okay that they're 
always able to be
accessed or written to or read 
from because they're managed by 
the cloud.
Some services are often down 
like
FaaS, and that's okay because 
they scale to consider when 
they're not in use.
The problem is when they're 
sometimes down, but are always 
up, because it's
cheaper to put it back into a 
server
. If the the ones that are 
always up, if they're down, the 
permissions changed.
And it's hard because of this to
tell when something is actually 
down.
So, what you want to check for 
are data gaps and periods of 
maximum requests.
Basically, you either stop being
able to access a service or 
nothing
is going through.
Whether or not something is over
or  under-utilized,
it really means how much the 
service is being used
. Since serverless is pay by 
usage and
you don't pay when it's idle, 
it's okay that it's going to 
stay idle for long periods of 
time.
The only question is you need to
know when the windows are so you
know when it's actually down.
And also, if things are -- that 
can
sometimes be down are  running 
all the time, then
it's probably easier to just put
that stuff back
into a server instead of a 
batched service.
So, what do you want to check
for are data gaps as well as 
time outs and memory  issues.
As for what it's doing, these 
can be any number of places 
depending on the service.
If it's network traffic, it's 
going to be held in a
separate place in your 
application metadata.
And with another service 
aggregate both of these types of
logs together.
If you use a third party 
service, it can give you
an overall picture, but you end 
up paying a premium price for 
it. Still, it
may be worth it to get 
human-readable reports. What do 
you check for?
Network traffic and its 
application behavior
as it's being logged through 
your log service as well as 
service help. Sometimes it may 
be the cloud's fault. God knows 
it's happened before.
So, whose job is it to make sure
all of these things get 
monitored
in the application engineer 
needs to tell DevOps what these
limits are so they know can when
things are down and when things 
are just idle.
DevOps needs to know what these 
limits are to
they can reconcile the 
infrastructure performance with 
the application performance.  
Basically, if you touch it, it's
your  responsibility.
So, overall, monitoring does 
have the same  goals, it just 
ends up going to different 
places.
And because it goes to different
places, it ends up being a 
shared responsibility.
Brendan: Oh, hey, yeah.
Hi, my name is O'Leary and I 
have a theory.
I think this everything I've 
ever needed to know about DevOps
I could have  learned from XKCD.
And in fact, it's still teaching
me today.
For those who don't know comic, 
you can find it.
Can anybody think of a better 
definition of DevOps than 
romance, and language.
It's a good thing it's creative 
commons licensed.
I have been doing DevOps for 
years and going to DevOpsDays 
for many of them. I hope to 
bring you a new perspective 
today. Remember that DevOps is 
over 10 years old. The first 
DevOpsDays was in 2009. I double
checked, that is more than 10 
years ago. And this comic here 
is already two years old itself.
Yet sometimes it can still feel 
like a Rick roll when you go
to a talk or a meeting about 
DevOps and learn something. This
talk is, of course, excluded 
from that.
And while tempting to make up 
stories to
enter notes into a teletype or 
having
my first SSH key being in Morse 
code, it's critical to remember
the before times.
It wasn't that long ago where I 
was brought into
a deployment methology that was 
an 11
-page Word document that told
you what to copy where and in 
what order because that 
mattered. There are folks living
with inefficient processes.
And worst, some of the bosses 
think DevOps is something you 
buy off a she felt shelf.
It's our job to carry on DevOps 
so we can all learn from the 
past.
What can XKCD tell us about 
DevOps? The most important is 
outside the box thinking.
If we're the ones carrying the 
DevOps flag in our organization,
we need to own this.
Now, my marketing department, 
the event organizers and
my mother want to make it clear 
that you don't code inebriated. 
Think different.
That's an Apple and a Microsoft 
reference in one slide
. It's vital to know that 
everyone is faking it until they
make it
. Think of conference call talks
as the Instagram of DevOps.
Remember looking at them, you're
seeing the best possible version
of that person or company. And 
here's
the thing, I myself referenced 
these comics about Git all the 
time. And I work at a company 
with Git in the name. It's okay 
to not know everything. Everyone
has processes that aren't 
perfect.
Uneven once we think we have a 
process down, may be another 
problem up or down the stack.
In that way, our jobs are never 
really done.
You never have DevOps or 
complete a DevOps transformation
with apologies to consultants 
who bill that way.
And no matter how well you plan,
the systems fail at some point.
It may not be your fault, but it
is your  responsibility.
Left pad being removed, or the  
doomsday scenario here, makes 
sense.
Test for and plan losing every 
part of the stack that you're 
responsible for and much that 
you're not.
We have to also recognize that 
DevOps isn't really the thing 
that DevOps is trying to 
optimize for. It's the developer
productivity. An example here 
comes from years ago.
I was at a small medical
software company and walking 
from the CEO's office
to mine, in between was the 
server room, the door was open 
and that never good.
They had a screwdriver in a 
server
, they were fixing it because 
they don't know what to work on 
next.Ly a moment of clarity.
I don't want my team, our 
engineers, spending time on the 
tools we
use and spending time,
I want to spend time on work for
customers.
The first ten years of DevOps 
saw an
explosion of tools and things to
put screwdrivers in, either 
literally or in the AWS console.
The next was very different and 
we have seen that.
Just like our foreigns do a more
more than 2008 or nine, 
organizations that have
acquired these spent work 
stitching the tools together, 
managing
the integrations between them, 
maintaining and up maintaining 
and
upgrading them.
It's not a random tools, but 
DevOps platform.
DevOpss tools companies and 
software is consolidated in much
the same way. Where does that 
leave us?
Well, if I could leave you with 
one thing, it's that there's 
always more to learn. And that's
okay.
I've learned a lot, even putting
together this silly conversation
with car teens.
Did you know there's a web-based
for XKCD. This talk is not 
original.
Randall, the creator of XKCD, 
has done would be or two 
himself. And last literacy, you 
can learn about learning. It's 
important to understand we're 
all learning. That's what makes 
us human.
Just because someone knows 
something that you do
with you should feel lucky.
That's obvious to most of us, 
but it's critical on the other
side of the equation, we are all
learning. Thank you so much for 
having me.
Renata: 
Welcome to Cecin
' est pass une  pipeline.
This is a talk where I'm going 
to convince you that CI/CD is 
not about a pipeline.
CI/CD is a much bugger concept, 
it's about ideas and cultural 
change.
Qui suis
je, I'm Renata Rocha
, I'm from Toronto and I have 
been doing
DevOps for 10 years and systems 
operations for 20. Okay. What is
this?
This is about what is CI/CD, 
what is not
CI/CD, and most importantly, why
it matters.
CI/CD is a concept.
It's not the pipeline that you 
were trying to view. We are 
trying to get into that. And why
do we care?
It's because there are so many 
implications of CI/CD all over 
the place.
And people are unhappy with 
that. Okay. What is CI/CD?
CI/CD stands for continuous
integration/continuous delivery 
or deployment, there's a blurry 
line there. And again, it's 
about the culture, okay? And 
what is not CI/CD?
It's not the shell scripts that 
you run on your machine. It's 
not a cloud migration.
And also, not a uploading a YAML
file to your Kubernetes cluster.
And why do you want it?
You want it because it increases
your velocity, it increases your
testability, and it gives you 
broader control.
No one wants to fear deploying 
to prod on a Friday. No one 
wants to fear not having 
control.
And you also want to know tools.
I get the question many times.
You want to know whether 
Jenkins, CircleCI
, GitHub Actions, whatever, if 
that is the best CI/CD tool. You
are the person who can give that
answer to yourself. The best 
tool is the one that fits your 
own requirements. The best tool 
is the one that works for your 
business.
The best tool is the one that 
fits your own use case. Do your 
proper research and find the 
tool that suits your needs.
Okay. What about AWS?
It's another entire animal
. AWS offers tiny puzzle
pieces that you put them all 
together and you can view the
custom CI/CD solution for your 
needs. But I want control! Yes. 
Everyone wants control. 
Micromanagement is not control.
Manual steps reduce control and 
they give you a false sense of 
control.
What you want is to control the 
automation and not the humans. 
You cannot fear the robots. The 
robots, they are our friends. 
They are here to help you.
The only thing that the
robots are going to do are to 
repeat the steps that you have 
told them to do.
Robots help you build better, 
faster, stronger and more 
reliable systems. Robots are 
good for you. Robots are here to
stay.
And if you embrace robots in 
your
systems, your systems are going 
to become faster and more 
reliable.
What about errors and bugs?
Everyone is afraid of having an 
error and bug make it into 
production.
First of all, humans are the 
ones
who introduced the er Rosanne 
and bugs
. Automated systems have errors 
the same as a manual system.
And if you automate
, you will catch the errors 
sooner rather than later.
Test early, test often, and test
everything, you're going to have
way less errors.
The idea is that it's a cultural
change
and one does not simply CI/CD. 
You educate your team.
You make your team crave that 
change that is going to lead
you into a CI/CD state of mind. 
Thing.
If you lead by example, if you 
tell your peers, if you
listen to people, the magic of 
CI/CD is going to happen.
You're going to have the best 
environment for your work place.
But you want it! Everyone wants 
things right now. Take a deep 
breath. Exhale.
Plan, research, and software 
comes last. You have to do your 
proper research.
And last but not least, repeat 
after me. It's not a pipeline. 
It's much more than that.
CI/CD, it's a mindset.
If you welcome new ideas, CI/CD 
is the end result.
Merci beucoup, I'm
Renata Rocha,
this was Ceci N '
est pas une
Sasha: Hello and welcome 
everyone for the last time 
today. Thank you so much for 
being a part of this event.
An email is coming out
to registered participants and 
individual talks will be coming 
out soon.
Please subscribe to our channel 
on YouTube to be notified when 
we release them.
Matt: This wouldn't have been 
possible about the work and 
support of everyone involved.
Thank you to AV Chicago for the 
amazing production. Thank you to
our speakers. Thank you to our 
sponsors.
Thank you to our volunteer 
moderators
. And, well, you know, it's like
normally this is
when we would bring all the 
organizers up on stage. But 
maybe we can.
everybody wave!
Sasha: Smile and wave.
Matt: Smile and wave. And thank 
you most of all to you.
All of you who have been 
participating and watching our 
stream today and
being part of this very
special first virtual DevOpsDays
Chicago.
Sasha: DevOpsDays is so 
important to this entire 
organizer team.
We felt that it was more 
important than every to create a
space for this community to 
connect.
Even if we had to do this 
differently.
So, thank you so much to 
everyone who
tuned in into the stream and 
everyone that
participated on Discord for 
being a part of the event.
≫ See you all next year!
