hi everyone welcome to today's webcast
we're going to give people another
minute or two did to get join the line
this has been a popular one so I want to
make sure everyone gets start at the
beginning in the interim I'll note that
the chat panel which is in your
GoToWebinar control panel the bottom
most tab the chat panel we've actually
put two links in there so this is
totally optional but if you'd like to
download the training data set that
we'll be presenting today and the thread
hunting notebooks that we've built you
can actually do that using the URL in
the chat you'll go to graphics trees
website and they have a blog post where
if you scroll to the bottom you'll find
links to download the data and the
thread hunting notebooks and you can
follow along again optional not required
but if you'd like to follow along
secondly there is a slack channel a
dedicated slack channel and graphics
tree and core light will be hanging
around after the webcast ends on that
slack Channel
to help out anyone who's who's tinkering
with the data or the notebooks if you'd
like to do that so that slack join link
is also in the GoToWebinar chat panel at
the bottom right and again we'll get
started in just a minute or so we're
going to get people a little extra time
to join
alright let's get started with the
webcast welcome today we're going to be
talking about threat hunting and we will
be presenting three data science
notebooks to help you find bad actors in
your networks log today's webcast is
presented by correlating grapha Street
I'm John Gamble Director of Product
Marketing here at core light and just a
quick housekeeping note before we get
started for those of you that have just
joined since I made this announcement
about a minute ago there are two links
in your chat panel at right
that's the GoToWebinar chat panel so if
you click the chat panel and expand it
you'll see two links one is to a
graphics three blog post which at the
bottom has download links if you're so
inclined and would like to download the
training data and the threading
notebooks to follow along not required
but if you'd like to do that those links
are available and there's also a link to
a slack channel pasted in that chat as
well which after the webcast ends we'll
be hanging around and can help people
that are tinkering with the data or
threat hunting notebooks or any
follow-up questions that we didn't have
time to answer so again those links are
in your GoToWebinar chat panel if you
would like to do that let's get started
the agenda today we'll be taking a quick
look at the tools that we're going to be
using just so we can contact set
everybody and make sure everybody's on
the same page and understands the high
level process of what's happening and
then we're gonna spend most of the time
in the demos and notebooks will be going
through three specific scenarios showing
you how to hunt through SSL traffic for
suspicious behavior a second scenario
will be looking for insider threat
indicators in ntlm and SMB traffic and
the final and third scenario will be
hunting in malicious or hunting rather I
should say in DNS traffic looking for
malicious activity and we'll close with
a QA I would encourage you to ask
questions throughout and if we have time
we'll try to pause in real time and
clarify any questions about the data
you're seeing on screen but we have a
lot of material to get through so we'll
probably try to save most of the Q&A for
the end but enter you can use the
control panel to enter your questions as
well we will see them and we will get to
them as we can and if we can't we'll
follow up afterwards with you or in that
slack Channel that I mentioned today's
speakers you guys are in for a treat
we've got some real expertise on the
line today we've got Richard chitchat a
mitri correlate
Angeles pratik or light shit
worked in at Edward Jones as a senior
security analyst and spent before that
over a decade serving in the US Navy in
a number of cybersecurity roles
including work in the in the TA oh group
which if you're familiar will let you
know that that chat chit has great
expertise when it comes to front hunting
and cybersecurity operations and our
second panelist today is Leo meyerovich
CEO and co-founder of graphics tree you
know Leo's data science expertise at
graph analytics expertise just superb
and he's going to show you a number of
really incredible capabilities that the
graph history platform has in terms of
visualizing the data that correlate
generates and not only isn't the CEO and
co-founder of graph history but he's
done a lot of award-winning research
including hardening JavaScript security
policy verifier buyers and helping the
first reactive web language research
around that so without further ado these
are the tools we're going to be using
today so threat hunting requires data
you're searching through the data
forming hypotheses but you need that
data and the data that we're using today
is seek logs C is an open source network
security monitoring framework that takes
your raw traffic and turns it into rich
protocol organized logs Chet will be
showing use of data query screenshots
will be using Splunk today but you could
certainly use any sim tool or data
storage analytics tool that you like to
conduct raw queries but really will be
spending most of our time in graph
estreet looking at the rich
visualizations and interrogations of the
data that the graph history platform can
present following at home again as I
mentioned at the beginning of this
webcast there is a dedicated slack
channel it's pasted in the GoToWebinar
chat panel if you'd like to join while
the webcast will end at noon pacific
time we'll be hanging around in the
slack channel afterwards for anybody for
hands-on QA and any further help you
might need and the training data and the
threat hunting notebooks and queries
you'll see in today's presentation also
available at that URL you see on screen
also paste it in the e GoToWebinar chat
panel so you can download the data and
the front handing notebooks yourself and
experiment and follow along and a
recording of this broadcast will be made
available we'll be hosting it
YouTube channel it'll be available in
other places as well so if you'd like to
share this with colleagues or watch it a
second time and do the walkthrough
exercise yourself that's going to be an
option just a contact set on Zeke
because we don't want to presume
knowledge intimate knowledge of the
audience of the Zeke framework it's
there's a lot to learn but here's the
high level thing because if you don't
understand this some of the queries you
see might not make sense Zeke is an
interlinked logging framework it's a
connection-oriented logging language so
basically it looks at network sessions
connections and generates logs and
stamps those connections with unique IDs
so you can pivot across the protocol so
if someone establishes a connection a
connection log gets generated if there's
that an HTTP session an HTTP session log
gets generated with that unique
identifier connected to the session so
analysts security analysts can pivot
through that and see the full spectrum
of activity on the wire so you're going
to be seeing chit and leo make these
kind of pivots in in the raw query and
in the graph the street platform that's
what's happening behind the scene that's
that's the power of the Zeke platform is
it allows you to make these fast pivots
across things like hashes or timestamps
or that unique identifier that's
associated with connections all right
let's jump into it scenario one we're
going to be looking at hunting for
suspicious activity in encrypted traffic
streams we picked this use case because
as you know network traffic in a lot of
corporate environments is increasingly
encrypted but so are the attacks the
attackers are also encrypting their
traffic whether that's encrypting their
c2 communications you know encrypting
the files that they're using to kind of
install and get placement on devices
there's a lot of encryption in attacks
as well and so we're breaking and
inspecting is it an option how do you
how do you tackle this threat well as
you'll see in a couple in a couple
minutes it's it's actually there's a lot
of insight you can generate without
breaking and inspecting the data but you
know this is a real issue today not only
getting visibility into encrypted
traffic but actually addressing
encrypted threats and this study that
was done by a time network shows that
41% of organizations reported some
cyberattacks that we're using SSL
encryption to evade detection so you
know you're in line detection solution
like a firewall
or an ids/ips it's it's not going to be
able to give you information on
encrypted traffic and threat alerting
you're going to have to go somewhere
else where's that somewhere else well
Zeke logs so Zeke generates a number of
logs the ones in particular we'll be
using for this hunt are the SSL log the
x.509 log which gives certificate
details we won't be touching on the SSH
log today but that also gets generated
as well these logs are really rich they
provide a bunch of great detail things
like the expiration date of the
certificate so you can see soon to
expire or already expired
certificates being used which may be an
indicator of risk you can actually hash
some of the SSL connections which should
we'll talk about more later did you have
anything to add to these logs
uh definitely thing is that another
thing you can also use is the connection
log because because the data is
encrypted you can also see data based
off of produce consumer ratios of you
know which direction traffic is flowing
inside this encrypted channel yeah great
point
the master log the the net flow log on
steroids is that connection log which it
unites all connections it can be used of
course in this context all right shit I
want to turn it over to you and Leo so
check why don't you get started
you might or someone I'm looking oh yes
sorry so when when we talking about like
looking at data when you're doing a hunt
we always have to form a hypothesis and
your hypothesis is as only as good as
the data that you're being provided so
when we look at it we're looking at data
on the network so one will look at
whether attackers are routing traffic
you SSL to evade detection and then
looking at whether or not they're you
know self injured certificate so do your
self certificates that are you know
generated on the fly so that they can
try to encrypt that traffic because a
lot of times when they're doing this
type of data expose they're not going to
have an actual certificate authority out
there on the web providing them good
certificates to use so one thing to look
for is to investigate self-signed
certificates looking at the generation
of like expired or soon-to-be expired
certificate so looking at when it was
issued and then how long it's you know
good for and then also looking at search
using outdated TLS versions and then
also keeping an eye out for you know
weird canonical names you know
certificate or --'tis or you know
issuers things of that nature
next slide please so when I'm looking at
a query I kind of wanted to look okay
what is going on with SSL so one of the
things that I wanted to look at is a lot
of data sets in terms of the unique
identifiers you know who rigid is the
the connection outbound to 443 or to
anything other than 443 but then also
looking at the TLS version so if you're
very familiar to us 1.3 is going to come
out we had SSL version 3 which is the
oldest that is known on the Internet
using for routing it very little we have
1.0 which is been proven to be to be
easily broken and then we have 1.2 which
is more more current outside of TLS 1.3
next slide please
so one of the things that I wanted to
look for was just validation status
looking at also versioning as well just
to get an idea of like what's going on
so as you can see I can kind of get a
good count of what's going on in the
network and looking at you know what's
running SSL version 3 is it updated you
know till it's 100 is there anything in
my network that still uses that
because it is that I need to either
start investigating for you know
potential exploits or anything that's of
that nature and then also to talk to my
certificate authority guys internally
the certain managers to say hey why
haven't we upgraded to you know 1.2 or
1.3 yet next slide please
so with this I was kind of looking okay
well let's look at self-signed and kill
at version 100 because if I'm looking at
those things that are more current now I
have an issue because let's say that I
already did some cleanup in my network
and nothing should be self signed the
certificate authority is ready to manage
and I'm starting to see some time you
know search on the more older version if
you less so when I started doing the
query I've been looking at okay what
actually is running inside my company
and as you can see I'm seeing you know
something with Obama at us and
unfortunately it's looking like it's
coming from the Gaza Strip so now that's
kind of a good indicator of like okay
this is an encrypted connection let's
start doing further investigation next
slide so what are the ways to do it is
using you know Salesforce coms j-3
script and what's nice about it is that
this will help you kind of identify
things that are abnormal in your network
by one thing you're printing the stuff
that you know is really good in building
a whitelist again but then also doing
things like finding things like this
Obama's certificate and then
fingerprinting that to see if it's
starting to communicate out you know
down it might be good for now
potentially it's not doing anything
harmful but as you noticed it through
your network now you can start alerting
and notifying and looking at what's
going on and then I would like to fit it
to Leo for more information inside the
baptistery great thanks chip Leo I'm
gonna give you screen sharing permission
here great thanks John and uh thanks
everyone for attending today this is a
pretty one fun one for us all right
great
and so I should have just gone
fullscreen for everybody and so the what
she was just sharing with spunk is
probably familiar to a lot of folks on
the call who maybe you're using squonk
or rxi or
one of the new cloud Sims but you some
folks may have also started to notice
that that while that's pretty good in a
kind of an independent setting some
teams are starting to adopt something
called data science notebooks and so
before I want it before I jump into that
exact example I wanted to introduce
what's going on here and a bit about why
these things are starting to get adopted
so what I'm showing here is just one of
the one of the data science notebook
environments out there this is called
Google collab if you have a Google
account you can get it if you just
launch graph this tree you would
actually get something else called
Jupiter oh it's probably Jupiter's
probably the most popular and you get in
that's always free and you can download
it for yourself and what's more it's
generally getting adopted by data
scientists developers a lot of advanced
analysts especially in environments
where you're doing like collaborative
analysis or rapid prototyping things
like that so what's uh before we think
about going to like exactly what that
means I just want to kind of give you a
feel for it so a notebook is it's kind
of like a Google Doc so you can have
like a URL and and the fun thing is it
actually interleaves code and results
and the code could be any language
generally it's Python so here we just
have for example the beginning just set
some some credentials if you're
following at home or what well you'll be
able to see you could always just write
a bit of a bit of code if you have an
error it will complain but if you write
good code you can see also I was able to
edit the code and then we get the result
here and and the power starts happening
essentially from just like an individual
level of you can for example install
SDKs actually just be clear on that
just hit shift enter to run code or
there's always these run buttons but the
power starts to be that you can actually
interface with different databases we'll
be using in Splunk today you can
actually do fusion so often you might
have let's say security logs and Splunk
and account data in sequel and then you
can actually start applying libraries
like for data analytics such as with
rathus tree maybe some machine learning
maybe you have some automation and you
can actually turn these into automations
and so
in this case when we focus on a Honda
Huntington the the notebooks actually
end up getting used in roughly two ways
one is just as a as a independent
analyst you might build up some
different techniques and kind of go
off-roading and play around but then as
you become a hunting team you want to
start adding methodology to your
organization we're seeing people
essentially build up Arsenal's of
notebooks for you know not just for past
tons but actually for different ways of
looking at different kinds of data so if
you're looking at DNS data if you're
looking at wind logs just starting to
build up different different kinds of
place and helping them reusable and
automatable through notebooks if you're
following along all you need to do is
put in the creds if you don't have a
graph of shrieky go to the the slack we
updated the blog post where you can get
it and then if you're have your own
Splunk put put the logs in there and
what's more for the index of being I
think correlate tutorial I'm not I'm not
going to go through all that here I just
want to jump into the actual example so
I'm kind of like what shit was doing so
for the first time we're gonna look at
the look at the encrypted traffic
particularly from the roof in this case
the perspective of TLS and it feels kind
of similar it's a Splunk this but this
is a little worse than Splunk where you
you know you don't get the syntax
finally highlighting but when the basic
idea is we when we run the search we'll
just get the search result and then
return turn it into something called
Python pandas which is basically a data
tables that let you kind of manipulate
them pretty easily and that's about all
you really need to kind of get going
with notebooks and including with graph
this tree so for the case of a tryst
example instead of doing one hunt at a
time
what are focused with a lot of the
analysis today is is instead of
essentially hunt and pecking could we
actually get a whole bunch of stuff and
see it together and then kind of more
quickly and effectively I'll look at the
different dimensions of data since so in
this case what that meant for example is
we just stuff all of the related
heuristics like sort of expired
certificates things like
into the search we're happy to take
let's say 50,000 results in this case we
got 5,000 results and then that's what
we're going to look at and as Joe is
noting we're focusing on connection logs
so as soon as we have a data table the
one of the big tricks with graph
analytics is actually any pretty much
any table any CSV there's a useful graph
living in there we're essentially at you
we can just start correlating different
dimensions and so one of the things uh
for graphics tree is well we'll get to
in a second about the actual
visualization but one of the the very
powerful things is that we can take any
they're called data data frames we could
take any data table and then pick for
example hey I want to see the IPS mapped
out and I'm kind of curious for example
with the search like that issuers and
subjects because I want to see if
there's any of that kind of look funny
and I just want to see them all together
and I want the tool to sort of show them
all cluster them together for me maybe I
want a bit of control over however
they're linked or I just let the tool
trust me trust it maybe I override the
colors things like that so in this case
as soon as we have that data table
instead of kind of like going through
like like line by line and unlikely
missing things we're gonna put it
together and let technology work with us
so basically I'd hit shift enter here I
would get that visualization this could
be a little of a small screen here so I
already popped it out in a new window
and so for that for this first
investigation we we have that exact same
data table except for now it's turned
into a graph and in this case what we
have for the graph is for our nodes we
have like I said we have like IPE search
things like that and edges are whenever
there's an event in common so if i if i
zoom out here we start to see that it
all got linked together it's a bit of a
mess so I might do something like for
example run a clustering algorithm and
so when we find for example you know two
IPS have a lot of events between each
other they're gonna get naturally pulled
together and then if they have nothing
to do with them on another they'll drift
apart and so we start to see a bit of
structure here and then graph astray has
all sorts of tools for understanding
that kind of thing and so we can
actually understand that
we don't need a look at all 5,000 events
we could actually see that there's like
like basically different areas of
incidents this still is a lot and so
what's really important for graph this
tree and and other types of visual
analytics tools is to be able to quickly
work across different dimensions of the
data kind of do quick thought
experiments you don't have to go back to
Splunk and rewrite code things like that
we just take a bunch of data in play so
in this case for TLS and the particular
heuristic for you know we're using we
have like two really interesting levers
the first thing is the version and so
one of the columns of our data that we
got back from Splunk was the TLS version
so for example like a 12 here we have 10
here and a TLS 3 here and so for the
network we're looking at is we only saw
a little bit of TLS 3 so which is good
that's like that's the clean stuff so
it's probably a lot of benign stuff that
the or 6 didn't catch but we were
probably more curious about for example
the really old really old versions the
other thing we have here is actually the
validation status this is also was
driving a bunch of our heuristic
these bars are kind of low so I'm
actually gonna amp it up a bit and so we
have for example next expired expired
certs
and self sign starts so I'm gonna start
playing with the data first thing I'm
gonna do is actually we can always color
on the fly so I'm actually gonna color
based on the validation status P so if
you have like we if we have any any
activity like a coordinated activity
it's probably using the same certs and
so they'll sort of related and so here
for example in red we have all of the
expired certs and here we have all the
self signs um and and the other part is
the again de who probably want to facet
on the just that start let's just start
off now but organize the investigation
let's just focus on the really old cos
and so for the really old TLS we see we
got a bunch and kind of like a good like
a good combination of heuristics here is
old tea and TLS and self sign starts
it's just like a pretty conservative
thing so I just clicked on them the tool
created those filters for me
could always a back them out and now I
have a lot less stuff on screen we
actually see there's a lot of now like
entities that don't have any events kind
of saying that they're they're
interesting so we actually got a like we
actually knock out all the ones that
don't have any our matching this
criteria now we've actually really
reduced the screen here so now we're
just we've actually gone down to just
ten entities of interest and 30 edges
and I could do things like for example
this is a little messy so I might run
the clustering algorithm again just to
clean it up a little bit and so and so
now we can kind of take a look into
what's going on
so over here for example what we see is
one device talking to uh having some
talking to another device we see the
email address here and this is actually
as chip just showed this we basically
very quickly without really having to
think we're able to find the the Obama
scenario and so if you just always had
your your notebook for certificates to
be very easy to just be able to get to
this case the other incident but for
that same exact work we saw there was
actually another incident this case we
didn't have as many edges between
identities so it's really just a happen
one time I can actually go through that
a little more clearly so for the the
Obama case when we had that Obama a
email we see there's actually for
example for this IP we have three edges
here which means that the IP we have
three events here anyway so this kind of
the first time we can generalize it in a
bunch of different ways but hopefully
you're already starting to see that we
can kind of skip a lot of the the
querying and kind of the random order of
just unorganized Splunk manual Splunk
searching and actually start getting a
methodical path here and actually start
getting in a repeatable and shareable
way with that I'm gonna turn it back to
you John great thanks leo
we're gonna add something what's nice
was that when we saw that Obama
certificate one of the things that
because grab Street immediately pulled
that out based like the clustering of
the data we could always just google it
we can always say like hey I saw this
cert let's do a comparison let's pull
the x.509 certificate information and
then see if it's on a black list and
then we can kind of build you know
valuable intelligence from that
framework to see hey is this really you
know malicious or is it benign yeah
great point shit so let's pivot into the
second scenario so we're gonna be doing
some hunting exercises looking for
evidence of insider threats and looking
specifically in ntlm an SMB traffic
that's Microsoft protocol traffic so why
would we be doing this well you know
insider threats whether they're
malicious insiders ie employees that are
going rogue or have left the company or
if it's an outside third party
compromising you know internal employee
credentials and then they become the
insider threat because their credentials
have been compromised it's a huge huge
problem and it's it's the cause of many
many breaches and so trying to look for
evidence early evidence of this behavior
in your organization is a great way to
catch this activity early before it does
too much damage and in fact this the
survey here that CA technologies did
found that over fifty percent of
organizations reported an insider attack
in the past year so it's it's just a
highly prevalent attack that it's quite
common across all industries and
organization sizes and check maybe you
could come a little bit about the logs
that you'll be using for these initial
queries yeah so what we're looking right
now is a lot of the Microsoft logs that
are actually parsed by rightly and
what's nice is that ntlm is the new
technology land manager which is used
for remote login so you know
authentication against domain
controllers but with that information
usually mplm links to a bunch certain
things including you know file access
over SMB or mapping a share or even
using it against our DP as well and then
sometimes if you are authenticating
using you know DC RPC you can also see
that type of information used with ntlm
next slide please so when we look at
ntlm land manager right so ntlm land man
or NT land manager login are very
interesting because the title identifies
host behavior who's logging in from what
host you know who what you know piece of
the Internet are they authenticating to
so a lot of times in I have an antigen I
worked at when I worked at NSA I was at
NSA CSS Hawaii where Snowden was so at
the same time that Snowden did it leaked
things like this information could have
been caught early on to show hey he's
logging in using other people's
credentials and he's scraping
information internally in the internet
so we could have identified insider
threat based off of that type of
information but if you turn on ntlm
logging on a domain controller it's
going to be very happy
so you're going to get a lot of logs a
lot of logging it's really hard to
capture but you know using zeke logs you
can kind of profile in a very very small
format so I would start by identifying
MCM activities just looking at user
behavior pivoting into the Associated
connection protocol activities via the
UID so that we can identify what the NT
lan activity is being used for then I
would start either mapping it with
something like rapid tree and auditing
so that way I could see hey now that I
have any question feel it out this is
what could happen and then I would also
look for strange and unauthorized
devices though anything Road and then
also you know use your equity hater
next slide please so when I look at MCM
I always want you to do an examination
of what's going on so when I pull up you
know at random ntlm log there's always
that unique identifier and by pulling
that information I can start seeing you
know hey you know this username was so
knows I saw that he was successful in
logging into something you know via four
four five he was logging in via some
hosting called intent and then I was
looking at the domain name which is
workgroup so next slide please so I
picked off with that UID which is really
nice so now I can see all the logs that
are generated from that single
connection so based off the connection
there's one con you know I had a
modified connection long to show some
other stuff I see the files that he
access you know in conjunction with the
smv files access that you have as well
as the ntlm login used for the
entire connection as well as what map
sure is he had next slide please so when
I pivot off of that unique identifier I
can start looking at things like what
files did the access so he opened a file
the Sonos guy and it looks like he went
to hack p.m. you know PMM and now I have
to think about this should probably be
you know just a music device why is it
accessing this type of sharing I can
look at all this other information and
then just type act and then to the next
UID so next slide please so now I think
that what he was mapping with that ntlm
so it looked like he was mapping music
but at the same time when he's looking
at when he's doing that I'm looking at
where he's doing an IPC touch in mapping
IPC dollars so that is a hidden remote
share and that is something that need to
be concerned about because you know what
was he using IPC was it actually some
map this you know this music share
wouldn't use for something else for
Linux PC our PC or some other things and
that's why we put it back so think about
that nature believe next slide yep so I
would send it to you to Leo to show kind
of how this helps building out like user
behavior profiles we planted after
Street great thanks chit Leo I'm going
to give you screen share and control
okay OOP that's good my full screen
going here yes so I really like this
example for for a couple of reasons so
one thing is you know insider threat
like data exfiltration access to file
shares whether it's a file share a like
a wiki or anything like that on that
that's that's pretty critical for a lot
of organizations and as soon as we have
a you know user bump just jumping around
everything or we have a device
automatically doing that okay it could
be pretty hairy to actually understand
what's happening especially if you have
interleaved behavior may be multiple
malicious or a lot of benign and kind of
untangling it and and the second reason
why I really like this example you know
so just be clear they're like we're just
unit for SMB today but you could do for
other things as well but the other
reason I like it is kind of those hops
that just took they're totally
reasonable and what's interesting to me
is we can actually just write them out
and encapsulate it in a single Splunk
query and then shifted to less about
again hunting and pecking to more could
you just run it on everything with that
rough methodology and then it started
trying to think of where to go next just
have it there for you and so in this
case what we did is if we look at the
core of this search we're just going to
look at that a ntlm behavior in there
and then what we're gonna do is we're
gonna we're gonna pull out the the UID
and this also actually shows I think the
power of the power of Z Quogue sphere
where once we have like all of those
ntlm logins and we have those u IDs then
without having the person do it we just
have the query do it is we pivot into
all of the other Z blogs to see what
other hits do we have for that UID so
it's not going to be just n CLM but
could be everything else and so we might
get connection logs we can get who knows
well right and so in this case we're
gonna we're gonna get all those hits we
just asked for a thousand looks like we
actually didn't get a lot just 46 in a
real enterprise I would expect to see a
whole lot more of activity here or for
at least for this style of pond and the
what we do with it is now where it does
become SMB specific is we are gonna as
before we're gonna take a look at the
IPS but now we're also going to start
looking at actually like network share
activity so in this case you might want
to look at like host name domain name
what's the files involved if I was gonna
do something like Dropbox vogs or
something like that or wiki if I might
throw in those those URLs in here as
well right the user names all that kind
of stuff and you can model that all
together in this case we're again we're
just limiting it to a sandy and so and
so then when we when we run it pop it
out now instead of just having like that
data table and trying to figure out you
know this one incident is a 10 incidents
is it like you know 20 20 incidents but
only like two of them are actually
interesting like in the breast for
benign you know
well we actually get when I kind of zoom
out is something uh pretty approachable
if you're following at home what what I
might actually didn't really like first
scenario like this is actually I
probably would actually want to open up
the time-bar um there's often different
kinds of time time data like you know
when we ingested first of when when the
one actual time is just as a note on the
actual data set we're working with is a
replay of the capture the flag and so
the time bar is actually a bit
artificial here so it's like one one
event secularly up the other but
normally what I would actually be able
to do is like hey you know maybe I wanna
I'm gonna start with the first activity
and then see what happened a bit later
and see and kind of so on and just kind
of like move along the the incidence or
I might want to do something like in
this case for example one thing I
actually often like to do is just
actually color by time to understand the
progression so in this case for example
what I might do instead of having to do
work it's always great when the
visualization does the work for me so in
this case what I'm gonna do is just
color the events cold too hot so we can
close that time bar because we don't
need it anymore and let's run that
clustering algorithm a little bit to
make sure it's a little clean and now
what we can actually start reading this
is actually it looks like there's pretty
distinctly two different incidents going
on at the the cold time period the
original one we have something going on
over here and then a bit later we had
kind of a progression of things where we
see a lot of this interactivity here
first and then it turned into kind of
more outer activity after and we can
actually kind of click drag to kind of
show that that distinction so in red we
sort of have this flower shape and then
in the middle we have I think we're
gonna see more of this inside of the
flower and then the first incident was
over here what's kind of cool is by
being able to actually see across time
and one visualization when I'm actually
able to say is like look it really is
two different incidents this one and
then that one they are a little bit
related I'm actually seeing that the
domain name is this work group and so
that may or may not be interesting and
now we can actually treat them
separately and try to understand them
separately
so if we start with uh with this guy
over here
what's we see you know the device is
involved we see I'm guessing this is a
this is actually pretty interesting what
I'm looking at the host name that looks
look pretty weird for a host name so
this is already this is like a rogue
device like this is already something
interesting to me and now we actually
know it's like this this rogue device we
know is trying to do something with the
browser this is kind of interesting I
might want to kind of go further now we
jump to this other one now we take a
look when we look at the colors we see
there's a bit of a rainbow going on so
for what we see is always these two
devices we have that full rainbow of
activity here and so it's kind of nice
to be actually be able to see that like
the activity here was going for that
full time interval between these two
once more only these two devices are
involved so that's why they're kind of
in the middle all the other entities are
basically coming from these two these
two and then when I look at what are the
other interesting things here I'm like
okay well is the user is called Sonos
and so I think this onus is some sort of
like IOT like speaker a lot of like boom
box type of thing and and then when I
look at the what's actually being what
else is going on here I see all these
files like hack gif gif and ACK
JPEG dot string this is actually pretty
weird and so I don't really expect mymy
Sonos to be jumping on to doing NTM
login on to my network shares and then
accessing these is actually pretty
strangely named files and then so I do
have to do a bit of thinking of like
what's actually this is just pretty
weird like that my speaker's shouldn't
be doing anything that looks like this
so in summary uh hopefully this was kind
of a fun example where we went all the
way from and taking what used to be a
manual hunt and in a pretty legit one
and showed that basically we can
actually do a lot of using the Zeke
correlation IDs we can actually do a
couple of nested queries to actually
instead of just do one thing at a time
we can just pull them out all out
together and then being able to actually
then when we jump into the visualization
being able to a bit of quick setup to
actually be able to very clearly figure
out yep here's user one
here's user to actually on that note
maybe it's actually this kind of a cool
trick chip was showing me earlier you
can actually for example color based on
the user which is another good indicator
for for different activities and so now
what we see is indeed user one is Sonos
and the rest is actually we don't know
that user but hopefully what you're
seeing is like you know this is we're
actually a small dataset but if we
actually load it in you know with
grauffis tree you could load pump in
hundreds of thousands of these and it's
still fine I mean you can do the same
exact thing so with that I'm gonna I'm
gonna turn it back to a John four I
believe our final hunt here great thanks
leo all right the final scenario
scenario three so we're going to be
looking at hunts in DNS traffic looking
specifically for evidence of DNS
tunneling so I think dns-based threats
are interesting because they're they're
quite prevalent there's there's quite a
lot of sophistication in some of the
ways that attackers are manipulating DNS
traffic there's a variety of different
ways that they can use it in hijack it
and I think it's it's it's a sleeper in
terms of threat recognition in the
community you know we don't see a lot of
high visibility and awareness and
strategies internally necessarily to
address this particular threat so we
definitely wanted to include it and
we're closing on this threat because we
think it's a really important one to be
aware of so you know specifically DNS
tunneling right DNS is the backbone of
the internet there's so much DNS traffic
flying in and out of a typical typical
corporate environment it's a great place
for attackers to hide and one way they
do that is by basically you know
including their either command and
control information and communications
or data exfiltration though actually
encode those in DNS traffic so it you
know to the untrained eye if you were to
look at this it would look like a DNS
request and you would kind of think huh
that looks a little strange but whatever
there's there's a lot
these requests going on the reality is a
lot of attackers are using this vector
to to establish control in environments
and also exfiltrate data and so the
critical logs for this hunt of course are
the master connection log that Zeek
generates which is that kind of net flow
on steroids log that every connection
gets and the DNS log of course and the
DNS log here that Zeek generates is kind
of like a DNS server record on steroids
there is a ton of data fields here in a
Zeek DNS log that you're just not going
to get out of your typical DNS server
and that's if you have DNS server even
logging even enabled which in many cases
for performance reasons isn't even
enabled but you're getting the five
tuple here you're getting you're getting
the full kind of I'm sorry excuse me
you're getting the actual response of
the DNS query which is typically missing
from a DNS server log as an example Chit
maybe you could talk about the hunting
hypothesis here yes so what John talked
about was straight on when we talk about
DNS it's both north-south and
east-west so DNS is utilized in most
company insurance if not all for
everything from meaning you know host to
servers and then also outbound and
that's why it's really popular to use
because it's something that the factors
can use to get in and out
north-south but also go across east-
west and be relatively undetected
because a lot of visibility isn't being
used in east-west domain so when we look
at it attackers will try to hide c2
data exfil. a lot of things inside be
enough and then these DNS sections will
probably exhibit non-human DNS behavior
is usually in the form of really long
streams so I will go over these examples
in my queries so next slide please
so one of the things that I wanted to
pull out was you know when we look at
DNS entropy levels right for the length
of a query the max character set is 255
characters and so when I want to build a
query around these I know that anybody
who's typing google.com is just going to
google.com but when I start seeing
longer DNS queries I know either it's
one somebody clicking on the link or you
know maybe advertisement redirects so
based off of that kind of hypothesis now
you can start sorting to see okay
if I see one B two Z's probably not as
as being 18 or 13,000 215 so when I
looked at it I can still pivot off of
that unique identifier and then I can
pivot into all the DNS records to see
exactly what's going on and see the
queries as you can see below so when I
look at this query I see a lot of
interesting things that look like
command control as well as encoded you
know string information you know lots of
subdomains on to a TLD and it looks like
it's going to scream cold .com if
I advance the slide I can start looking
ok well now what are the answers and
like John said if you're not logging if
you're not doing DNS bug on your
DNS servers you're not going to get the
answers and a lot of times you're not
going to know who actually made the
official request too as well so you only
see the DNS server making requests
outbound
well now if you look at this data you
can see more introduced strings for
answers and when I think about it DNS a
when I do an answer a request
I should see an IP address so that way
it points me to the direction of where
that server is located not a more encoded
string so this looks like command
control and then if I pivot back to you
you know the connection log that that net
flow on steroids to see the summary of
everything I can see how many packets
did you know worked through how much the
data was being transmitted across the
session and how long it was you know why
did it take around 2 seconds for this
DNS query to come back and why
was it you know looking in this manner
next slide please so then I can also
still take that information like you
know the host and pivot to information
like who was he talking to
next slide and then also potentially
what connection state history he had to
all of those other you know things
internally so now that I'm drawing back
this picture I can see you know he was
probably doing scanning so now that I'm
looking at it it's not just the DNS
filtration it looks like you know the
compromised post to scan and do
reconnaissance of the internal network
next slide please and then I can also go
back and reverse it to see who else has
talked to the DNS server who is
originated outbound and what's
interesting as I see that the DNS server
is outbound from connections as well to
the 128 so next slide
and if I do if I pivot back to that
conn logs now we can see that DNS server
originated data which was in the form of
an ICMP you know six packets of ICMP and
that's very interesting because now I
know that something really definite
occurred on the network and I would like
to pivot to Leo to show how great it
looks in Graphistry great Thank You
Chit great yeah so we actually end up
working with a lot of folks around DNS
data and that flow data will see it both
like for internal and external as
scenarios let's say threat research
externally and so before we go deep into
that exact one I think it's interesting
to actually just think about DNS in
general so one version of the world is
we just put everything on screen and
just just and just figure out what's
going on but DNS is very inland
especially in a big corporate network or
if you're doing like mass internet scans
things like that like you're an ISP
or a telco it's key it's pretty
overwhelming you like you don't have
that many pixels on your screen and so
one of the first things we do in this
kind of analysis is we just take all
those results and then what we want to
do is just kind of get summaries like
you know like right between any two
IPs what's the max bytes or the sum bytes
 like if we want to see data no data
expel you want to know how many bytes
went out that kind of thing and so in
something like Splunk that's super easy
just use stats and then we just
basically pump that through and then and
then we can start mapping it out so
we were able to do that for this
network here I asked to just get 50,000
these reductions and and looked like we
got about 13,000 so in this scenario
when we got the results here we
were able to kind of get a nice
visualizations here so um I'm gonna kind
of show that and this also is for a lot of
Graphistry users it's sort of 
kind of the first ah has where if you have
a network you have like you know ten
a hundred thousand a million devices I'm
gonna actually go from scratch here
and I'll kind of give you that experience of
just going all the way from scratch so
just being able to like take in a lot of
that network data and then just load it
in and see what happens it is pretty
amazing so what we're seeing here is is
actually still for Graphistry's
perspective this is still a pretty small
visualization so it's about eleven
thousand devices and a thirteen thousand
summarized network communications and
and what I did in this one if you look
at the you see the edges have different
colors I did a I did a quick thing ahead
of time where I just said color the
edges based on the total amount of bytes
or actually the most bytes ever sent
actually but I probably should have done
it might be a little more interesting is
to do something like the sum bytes I
don't know if we have that or let's say
for example we look at the sum of the
response bytes and so being able to kind
of get a good summary of what's going on
here is pretty powerful I'm so a lot of
people we work with it's it's just kind
of the first time they could could
actually see everything imagine going
through a time line or whatever so now
if we get to back to the hunt basically
what we're trying to figure out is if
somebody is abusing DNS to basically
do something like command and control to
do data expel to do tunneling like is
basically is is it kind of not just
being used for domain them resolution
so all we did here is we took that same
query basically you know we were
summarizing all those communications and
the thing we want on this one that's a
little different is it so that we just
modify it a little bit is we want to
know when somebody makes that domain
name request how long was that request
in case they for example tried to sneak
a big message out that way
likewise when we get the response back
how big is it so if the somebody tried
to sneak back a really big response and
so the intuition is essentially nobody
ever really wants to type a really
long domain name so they shouldn't be
that long and the domain should resolve
to an IP address not something that's
another super long string and so we're
going to get that when we get those out
we
we just get the table I think this is
also a great example where if you're
sitting on a big enterprise network
manually doing this kind of stuff is
that's tough right that's just like a
lot of ground to cover but if you can
actually just load it all in and just
let's see naturally see the patterns and
kind of have stuff shown to you you can
actually go go much faster so what we're
gonna do here is actually pretty much
the same exact graph now we're actually
gonna color the one differences we're
going to color the the edges here based
off of the the the max answer lengths
and so let's take a quick look at that
here and I think it was right here and
so I just loaded that all in and so it's
the same as before but we did one more
thing where we're putting the the
questions and answers on screen here and
so here for example we're seeing the
response here in green this is
Sweetwater so we see a bunch of green
actually a bunch of other Sweetwater's
and then in blue the queries so this is
some kind of machine looks like a nobody
would ever in the right mind type to
type out this query so this probably
fake domain requests and what's what's
fun is when i zoom out I'm able to see
all of it and I actually see there's two
clusters of behavior because this stuff
got automatically clustered for me we
could let it settle a bit more what I'm
actually seeing here for example in this
incident very clearly these two devices
talking to each other they're sending
all of those like looks like the data
going in both directions and so that's
probably tunneling it's like a
back-and-forth activity and I'd pretty
happy to take a look at what's going on
in that second one we actually have some
pretty long edges here we see pretty
naturally it's just these two devices
were involved and we can kind of do that
same same analysis here what's
interesting to me here it's just these
blue nodes and then I remember blue
nodes I think were queries yeah so it's
just queries and so if I was thinking
about the second incident here this
might be either a beaconing or a data
exfil and I might try to look at the
amount of like that bytes out something
like that and for what's going on here
so if I if I look at well anyway well
we'll leave it who leave it just leave
it at that in interest a time but like
stepping back again like I'm hoping
kind of kind of you're kind of seeing
here we're with one quick query you can
just have here's your DNS query or
here's your DNS notebook you can
actually just map out what's going on in
your network in general if you ever
wanted a 360 and then you can kind of
you know go the next level here I want
to look at from like for this type of
instant in particular and then you can
kind of very quickly instead of having a
whole bunch of these like interleaved
results you can actually just eyeball it
pretty quick and when you can't you can
kind of drill in and kind of go into any
individual thing and kind of do this
kind of reasoning I do want to do one
more quick note before we wrap up here
an interesting thing about a Graphistry
is I've been showing here a lot of
the notebook based workflows all these
visualizations you're seeing here
there's no reason you you have to do
them in the notebook you can just embed
them into whatever else you're doing and
then the second thing is we find out a
lot of folks who are using a sim or have
a custom dashboard system they want it
when they're doing that embedding they
actually don't want to they still want a
lot of that power of the notebook and so
one of the things we've been working on
is things like for example could I pivot
out from Splunk let's say on an IP
address or a malware alert and then
instead of jumping into a notebook with
all that boil it and python boilerplate
can you actually just jump into the tool
where you have it eliminate the
boilerplate you just could write the
Splunk queries and have those get chains
together and then you can just generate
those which is basically our we've we've
been working in the notebook world for
two or three years now and then so this
is as we've been kind of looking at what
might be more appropriate as you go more
from a hunting scene to an incident
response team it's something that might
make it a bit easier so with that I
think it's a turn it back to you John
great Thank You Leo you know I just
wanted to comment on something that I
thought was very interesting too so when
we look at like things like Splunk, Splunk
kind of an ingest limit and data
retention for X amount of days we've got
Graphistry we can actually just
import tons of data into it right and
look over longer periods of time
yeah so Graphistry  sort of yes and
no so graph the street if you're
familiar with BI tools it's sort of like
a middle tier we so you're not gonna
want to send you know like a petabyte of
data to Graphistry will sit on top of
your data lake but but interestingly and
then feel free to contact us about this
we're working with partners for example
blazing VB and part of something
interesting called the Nvidia rapid
ecosystem where to give a sense of the
numbers if you want to kind of replace
your sim with the end-to-end GPU
computing something like on Amazon the
ethernet give just your network
connection on Amazon for one note might
be one to ten gigabytes a second
GPUs could actually crunch data at that
speed and so if you're kind of
interested in what's what's gonna be
replacing us in technology over the next
few years
insurance if you want instant analytics
that that's and then you actually want
to try it out on something like for
example your flow logs on Amazon please
reach out to us and we can actually get
you doing some pretty killer stuff
thanks Leo and Chit we'll just wrap up
here I think we'll have time for one or
two questions
you know just briefly about Corelight
you know what's our role in all of this
well Corelight was founded by the
creators of Zeek so if you want
Zeek data in your environment if you
want to be able to do these kind of
queries and pivots that Chit was showing
and and visualize the data in a graph
analytics tool that like Graphistry
you're going to need the data and in the
open source world that can take weeks or
months to tune Corelight sensors are out
of the box plug in play mode so you can
basically just set these sensors up
they'll be off and running
highly performant compared to open
source implementations and packed full
of enterprise features just to make it
really easy to use and even more
powerful in a lot of cases with some of
the integrations and capabilities we've
put in the sensors so again you know the
the it might not be clear to those of
you who didn't weren't familiar with
Zeek before this call but the kind of
queries and pivots and visualizations
that you were that we were shown in this
presentation they're not possible
without Zeek data like you can't do that
with appliance logs like DNS logs
they're just you're not getting the full
picture in a format designed for fast
search pivot and visualization and
that's exactly where Zeek comes in and
Corelight is the enterprise solution
from the creators of Zeek to make Zeek
deployment and data generation really
easy so you can get it to your
favorite SIM and your favorite analytics
tool downstream and I'm just so
impressed by what Graphistry can do
in terms of being able to show you the
macro and the micro in a very intuitive
visual format to be able to draw insight
out of it Leo did you have anything you
wanted to add yes so I I think um
hopefully just visually focus where you
look kind of follow along both whether
you you want to play with the notebooks
or you're building your own custom maps
or you want to do your own self serve
and your team wants to start automating
your investigations for for more that
post not not the soar perspective of
what you do ahead of time but when
somebody picks it up how do you how do
you do that but the the other bit is a
for anybody interested in this stuff the
Graphistry some folks from the
Graphistry team will be sticking around
for the next hour or so online so just
go to the go to the blog and the Graphistry blog and you'll see a link to the
the slack the updated link to the slack
channel and I'm feel free to chat with
the team and we're have to get you going
on data awesome thanks if you're like as
Leo mentioned if you'd like to follow up
with us that's the email right there you
can follow up info@corelight.com info@graphistry.com for further demonstrations
or sales questions if you have them we
have about two minutes left so let's go
I tried to answer questions in real time
on the chat we had a flurry of questions
come in and I was kind of individually
responding let me just try to pick out
one or two that might be a benefit for
the broader audience we had a question
about encryption and and digging into
exactly what kind of data is being
parsed and shown here and searched so
just to kind of reiterate what I said at
the beginning Zeek has a network
security monitoring framework and
Corelight using that technology does not
break and inspect the traffic so all the
logs that you saw the SSL law of the
x.509 log and all the pivots that Chit and
Leo were doing those are occurring
around data that was extracted from the
certificates of those encrypted traffic
streams and the nature of the encrypted
protocol handshakes being parsed as well
so there's no break and inspect
happening but as you saw they were able
to pull out and find you know
potentially malicious activity without
breaking and inspecting the traffic so
that's that's the power of this data
format and these search tools that you
were shown and then the last question
I'll leave for you leo
we had a question here about Graphistry
are most of your users using notebooks
and can you explain how people are
generally using Graphistry what are the
main use cases yeah so we're seeing a
use all over like cyber fraud
counter-terror but that all sorts stuff
like on digital crime essentially what
we find is it's really uh there aren't
uh let's see uh data scientists and and
hunting teams will be doing the the
notebook stuff but very quickly if
you're more like an incident response
team we are actually seeing people more
interested and more successful with just
setting up the basically setting up
plays using the those templates the
other thing I was just showing at the
end and and basic essentially the the
intuition is you need to respond fast
and you need to be able to let's say run
fifteen steps of data gathering and be
able to pivot around and so you don't
really have that time to write Python or
or or sequel but you do have the time to
set up ahead of time so that when that
incident does come in you've got all
those plays set up ahead of time so it's
really more by the team great well with
that we're at the hour I hope I answered
most people's in line questions in the
chat box just just as a point of note a
couple people asked about recordings and
availability this is being recorded it
will be publicly available later I know
it was a lot of information to take in
so some of you asked for re-watching yes
you will shortly will email you and give
you the ability to follow along and
re-watch this program if you'd like to
try out some of the queries you saw
yourself and visualizations thanks so
much everyone for joining the slack
channel that is in the chat link in the
sidebar is now live so Graphistry
folks that some Corelight folks will be
hanging around if you have particular
questions and if we didn't get to answer
your question I apologize I'll follow up
with you individually I see a couple
more that we weren't able to answer and
I'll just ping you individually so thank
you everyone for attending today I hope
you learned a lot I certainly did and
Chit and Leo thank you so much for
bringing your expertise there
yeah there's a lot of fun John thanks
for helping to organize this all right
that's a wrap
thanks yall thank you
