JEFFREY MEYERS:
Thank you for coming.
My name's Jeff Myers.
I work at the Mayo
Clinic up in Minnesota.
We have a Cancer
Center Statistics team
where I've been working there
for just about 9 years focusing
on macros, graph
template language, SQL,
and making reports.
Today I'm going to be
talking about a fun project
that I worked on at
Mayo Clinic in our bio--
medical informatics
and statistics teams.
We have a large group
of R users and we have
a large group of SAS users.
And we can be a little bit
competitive with each other.
And once in a while we
have project speeches where
we do a combined presentation
style of anything
that you can do,
I can do better.
And this last time we
did it with graphics
where we could talk
about the new SG graphics
procedures or ggplot2 for them.
And part of that talk was giving
each other a challenge plots,
so find something that we
would think the other group
would have problems with
making, giving it to them
and seeing what they can do, and
seeing how they would actually
make it.
It was during this where I
learned what an actual CIRCOS
plot is, because it's something
I've never personally used
or heard of.
And I had so much
fun trying to make it
that I had two weeks
to do it-- instead
I tried to write the code
in a span of about 6:00 PM
till 4:00 in the
morning, because I
got to go back to my old
trigonometry and mathematics
roots.
And it was just a fun time.
So I managed to convert it
into a macro and I thought
the process that I went through
in actually making the macro
and writing the code could
be found useful for others,
so I decided to go
around and present this.
I gave this talk at
PharmaSUG last year
and was invited to
give it again here.
So without further ado,
what is a CIRCOS graph.
So, again, something
I've never heard of.
It is a circular
graph that shows the--
it's a visual depiction of
connections between groups.
So in this picture here we
have 10 different groups.
We have curves that start from
one group and move to another.
They're used primarily in
genomics or genetic graphs
to show how different
pathways are transforming.
From my research of it-- and
since I'm not really the expert
on these--
is that there's four
main components.
There's an outer circle.
There's an inner circle.
There are Bezier curves.
And there are axes and labels.
And from the graph
it's basically
just a visual
representation of your data.
It's not really quantitative,
to say the least.
So walking through
each of the pieces.
This is what I call
the outer circle.
These are curved
rectangles that go
around the length of the
rectangle's proportionate
to the sample size.
So if we were
looking at this, we
could easily tell that our red
group in the bottom left corner
is much smaller than,
say, our light blue group
in the upper right corner.
We don't know quantitatively
or numerically how much smaller
it actually is.
But we know, proportionate
to our total sample size,
it represents quite
a few less paths.
The inner circle that
I have is representing
the number of paths that are
currently leaving that group.
So if we look at our bottom
three groups-- the blue, red,
and green on the left--
because the bar completely
matches the outer circle,
we know that every single
path is leaving that group
and we don't have any paths
that are entering that group.
For the last three
on the bottom right,
we know that all the paths
are leaving and there's nobody
that's actually
entering those groups,
for all the others it looks like
they're pretty well-balanced
with the exception of maybe
the top group that were--
2/3 are leaving and
a third are entering.
So Bezier curve was
another type of curve
that had never heard of.
And I was actually
trying to make these
without knowing what it was.
And until I got the hint
of what was actually called
I wasn't able to actually
solve how to make one.
They are the curves
in the middle
of the graph that are
connecting one group to another.
The width of the curve
actually changes from it--
for it starts to the
middle to the end.
They get a little skinnier, then
potentially a little bit wider.
The color that I
have on them is--
matches where they
originate from.
So if you look at where they're
entering in the bottom right
corner, that purple bar is
coming from the purple group
without having to
trace the line back.
The curves are pulled towards
the center of the circle
to make that arc.
The longer the pathway is, the
closer to the center of circle
that it gets.
Now, the actual widths
of them are determined
by the proportions of
the paths at their start
and where they finish--
their end point.
So, for example, if they're
leaving the red group
and entering that
purple group, we're
going from the red sample size
to the purple sample size.
So the proportion of that
path-- so if the path
is, say, 40 patients
or something
that 40 out of the
red group is going
to be a different proportion
of the red group than the 40
out of the purple group.
So the width where it starts
and the width where it ends
will wind up being different.
So axes and labels.
I added these into this.
I don't normally
see axes on these
when I Google search
them, but I really
like having something
quantitative in graphs.
So I can put numbers to it.
So these axes have
major/minor tick marks.
The major tick mark represents
5% of the total population.
The minor tick mark represents
1% the total population.
And the labels I have
rotated the face to center.
So now in my examples
I like to put
in the total number of
paths that each percentage
represents.
And I can easily go to a
group and count, left or right
in this example,
by fives and get
what percentage of my
sample sizes in that group.
Labels are-- in
this example, labels
are rotated in order
to face the center.
I can also have them just go
straight horizontal-- that's
preferred.
So that is all a
CIRCOS plot is now
what did I have so
much fun doing in order
to actually make this graph.
Because there's
currently nothing in SAS
that can actually make pretty
much anything that we just saw.
And the answer is trigonometry.
So hopefully we can all
remember our trigonomic roots.
This is called the unit circle.
So it's simply just a circle
with radius 1, center at 0,
0 on the xy axis.
It's easily-- it's easy
to calculate distances
with sine and cosine functions.
A full rotation
around the circle
is 360 degrees or 2 pi radians--
starting with the x-axis as the
reference, so from 0, 0 to 1,
0 is where the reference
line is for radians.
And it's important to note
how to convert to radians.
Because the sine and
cosine functions in SAS
run off of radians
and not percentages.
So not as much fun to
deal with as percentages,
but it's what we have to do.
And I use the polygon plot.
So the polygon plot lets us plot
different xy coordinates, which
then get it connected together.
And once they reach the same
endpoint that it started from,
it fills in that polygon
to give us a shape.
It's a pretty
straightforward one to use.
This is the graph
template language code.
I think it's just called
polygon in SGPLOT.
We just need an
x-coordinate, a y-coordinate,
and an ID variable.
The ID variable helps us
distinguish unique objects.
And that ID variable
comes in very handy
because we're going to have
a lot of different shapes
and if we had to have
a different set of xy
variables for each shape, that'd
be a very massive data set.
This lets us put
all of our shapes
into one set of x and y columns
and only have to really do
one polygon plot call.
So a number of polygons points--
as you can imagine trying
to make a curve out
of laying a bunch of points
can be very difficult.
The more points that you have,
the smoother a curve gets.
With three, as you can
see, we make a triangle,
which isn't very curvy.
Five points, we get a
little bit more there.
Seven points, we're starting
to make a little bit of an arc.
I found in this
macro the default
to set to 10 because,
with SAS, that
makes it smooth without
going overboard and then
having thousands and thousands
of observations in the plot
data set.
But this still wouldn't
work unless SAS
had this subpixel option.
So in the Graph
Template Language,
GTL, the option was added in
9.4, maintenance package three.
It smooths jagged curves
using subpixels, so the pixels
in between pixels.
It's necessary for this to work.
I'm hoping this is visible.
But with the subpixel
on, we can see
on the left it makes our curved
rectangles really smooth.
If you look on the
right hand side,
you can see, especially
in the bottom right
corner of that green
one, the curves
become a little
wonky a little wavy.
And so that subpixel
option is probably
the keystone that really
lets this method work.
So back to trigonometry.
To go over our two
basic trig functions
we have-- our
x-coordinate can be
found by taking distance
times cosine of our angle.
And the y-coordinate
is the same thing,
but with the sine function.
Because, in the unit
circle, we always
have that distance of 1--
going from 0 to the
outside of the circle--
which makes the d calculation
there go away pretty much.
And, in this example,
we're looking at pi
over 3 radians, which gives us
that point in the upper right.
And so the x-coordinate for
that is cosine of pi over 3
and the y-coordinate
is sine of pi over 3.
So, now, actually drawing
a curved rectangle,
I do this using data step,
do loops, and outputs.
And so I have to run a do
loop from our starting point
to our ending point around
the outside of the circle.
In this case, I'm going from
pi over 3 to pi over 9 radians.
If you were to actually
run this in SAS,
you have to use the constant
function to get the pi value,
but I didn't want to overload
the text on this slide.
So we're going from
pi over 3 to pi over 9
by the difference numerically
in those divided by sum count
where the division
by sum count is going
to say how many points
we're going to wind up
having in that rectangle.
For this example, I have five.
You can bump it up to 100 if
you have a really big group
that you want to make smooth.
If you have really small
groups, you probably
don't need a lot of points.
It just comes down to how much
memory and processing time
that you want to use for this.
And so we're just going
to loop through those.
We're going to take
one for our distance
to the outside of
the unit circle,
and take the cosine by whatever
our i value is at the time,
do the same thing with y,
output that into a row,
and we have our outer part.
In order to get
the inner part now,
we have to walk backwards
from what we did.
Because if we go
the same direction--
pi over 3 to pi over 9--
our-- the end point
of our outer circle
is trying to-- going to
connect to the starting point
of our inner part of the
bar, and we're actually
going to make a
hourglass shape instead.
So we need to make sure that
we walk left to right at first
and then we walk it back,
right to left, to make
the polygon plot work properly.
So we just do the same thing.
We come back, this
time we're going
to multiply that by 0.8, this
is how I can make quant--
or quantitate how
wide my bars are.
Because it's just a
percentage of the unit circle.
So now these bars are 20% of
the width of my unit circle.
I can pop in any number
that I want there
to make them wider or skinnier.
And it works just fine
with my trig functions.
Drawing the Bezier curves is
a little bit more complicated.
There's actually,
from what I found,
multiple different
polynomial levels for Bezier
curve from linear,
quadratic, cubic, quartic.
And it just depends
on how many points
do you need to have
in your Bezier curve.
In mine, I have a
starting point, P 0.
I have a midpoint, P
1, and an endpoint, P2.
So I'm using the
quadratic Bezier formula,
which gives me this equation.
The 1 minus t squared of our
starting point plus 2 times
1 minus t times t times
our midpoint plus t
squared times our endpoint
going from 0 to 1--
where 0 is just our starting
point, and 1 is our end point,
and everything in between
is just a proportion
of the distance between those.
But, luckily, in
our situation, P 1
is always the center of
the circle, which is 0, 0.
So we can actually just get
rid of that whole middle piece
and simplify our formula to
be 1 minus t squared times
our starting point plus t
squared times our ending point
from 0 to 1.
And we have to do this
with both our x-coordinates
and our y-coordinates.
So in this example--
trying to color coordinate
our start and end points
with the cosine and
sine values in the code
descriptions-- so
our starting point
is at 2 pi over 3, which
is the far left side in top
that Bezier curve.
I wish I had a laser pointer,
so I could point out--
but hoping we can follow along.
So the top left corner of our
Bezier curve is 2 pi over 3.
And I multiply that by
0.75, because I only
want it to go out 3/4
of the unit circle
to give that gap
between the rectangle.
So we just go from 0 to
1 or however many points
we want to have in
our Bezier curve.
And we do the same do
loop process to get
the top part of the curve.
We use the same process
to make our rectangles,
to make the side of
the Bezier curve,
then we go back the
way we came in reverse
from our starting point
to our end point, which
looks to be about pi over
9 over to 3 pi over 9
to make the bottom
of the Bezier curve
and then make the last edge.
Connect them and then
we have our polynomial--
or our polygon.
Drawing a curved axis.
So a curved axis
is also something
that SAS doesn't currently have.
And what I've done is I've
taken what I've already done
and I've hijacked
it to make the axis.
So what I mean by that is I'm
going to take the outer edge
of the rectangles that
make our outer circles--
so if I go back,
that blue rectangle,
I'm going to use the
edge of that that makes--
that's connected
to the unit circle
and I'm going to draw
a SERIESPLOT over that.
In order to make
the tick marks, I'm
going to, again,
use SERIESPLOTS.
This time, for each tick mark,
I need to have two points.
I need the point that
touches the unit circle
and I need the point that
extends beyond the unit circle.
And I can control
how long they are
by just saying what
percentage of the unit circle
I plot them at.
And I need to have
one of those ti--
or I need two points
for each tick.
I just need to go around the
circle and draw all the ticks.
And I can use my group option to
then plot an individual series
plot for each one of
those tick values.
So this is going to look like a
lot of code to read on a slide.
Essentially, we're just going
from one part of the unit
circle, pi over 9 to pi over 3.
We have to convert
this to percentages,
so I'm dividing
by 2 pi for this.
I'm just drawing everything
out without simplifying
the math, because I
want to go by 1% steps.
So, in this, I'm
converting the percentages.
I'm starting at 2%--
or I'm not going to
draw a tick mark at 0,
so I'm going to start at 1% and
draw the tick marks from there.
And I'm just going to
go by 0.01 or 1% steps.
My tick plus 1 is going to
be in my group variable.
So each time I go through
this, tick will increase,
give me a new ID
variable that I can
use to make a new tick mark.
Our x-coordinates and
y-coordinates are, again,
going to be drawn with
the sine and cosines--
where we're just going
to draw them on the unit
circle for the first point.
We're going to output that.
And this next
section is looking at
is this the fifth tick
because every five
ticks I want to make my
second point further away
to make it a major tick mark
instead of a minor tick mark.
So if mod tick 5 is
equal to 0, meaning
that it is a fifth
tick, then I'm
going to make it 1.025, or 2 and
1/2%, beyond the unit circle.
If it's not, then
I'm going to only go
to make it 1.25% beyond the
unit circle, so half as far out.
So my major marks will always
be twice as big as my minor tick
marks.
And it looks
something like this.
So if I go counterclockwise
around here,
I can count 1, 2,
3, 4, 5 tick marks.
So each one of those
has its own tick ID.
We have two points--
one on the unit circle,
one that's a little
bit further out.
And we just make our way
through each of the groups
and draw those in.
And, from there, we
use a SERIESPLOT and--
has everyone here used graph
template language or primarily
SGPLOT as plot?
So that eval function
is a really handy tool
that lets you do a calculation
on a column or a variable
that you currently have
in order to transform it
into another without having
to go back and remake it
in your data set.
In this case, I'm using the
eval to run an ifn, which
does a logical step
and gives something
if true, something is false--
like an if/else function.
In this case, I have a
flag variable that says,
this is my axis, so these
coordinates go with the axis.
If axis is 1, then I
take the x-coordinate.
If it's not, I make it missing
and I don't plot those.
For the other tick mark-- for
the tick marks I do the same
thing--
I have X tick y is y
tick with a SERIESPLOT
and I grouped them on
that tick identifier,
so I get separate ticks.
Drawing rotated labels.
So with labels I draw them
with the TEXTPLOT fu--
or TEXTPLOT statement.
I rotate them to face
the center of the circle.
For this, all you need is an
x and y-coordinate once again,
which, again, I used
sine and cosine to do.
Rotate-- I have to
use the arccosine,
so I have to get the inverse
of the angle I'm currently
at in order to get the
angle to rotate the text.
And the text variable
is just whatever
we want there, so our variable
group, variable name--
something to that effect.
Then we just throw them
into the TEXTPLOT statement.
So text is text, x is xlab, y
is ylab, rotate equals rotate,
we have that.
And the spot that I have to
pick from my x and y-coordinates
is, of course, the
midpoint of each group.
So we have to calculate whatever
halfway is between the blue,
and halfway is between
the red, and get
the coordinates for those.
So another very
complicated part of this
was how do I order
these Bezier curves.
Ordering where the
curves start and stop
is important for
aesthetics and readability.
You want to prevent the same
group from crossing over itself
too much, because it
just doesn't look clean.
The closer groups should
be connected first.
And, of course, it's one
of the trickiest parts
of making the CIRCOS graph.
So here's an example looking
at clean ordering versus messy
ordering.
On the left, if I
focus on group 1,
the bottom there, I can see
none of the curves cross
over each other.
They connect to the
nearest ones first
without reaching across,
crossing over, anything.
Now, in the messy ordering,
if I look at the blue group,
they're going every which way.
They're crossing
over each other.
It just doesn't
look as professional
as the curve on the left.
So the way that I do that
is I take each group.
So, in this case,
this is group 1,
same as that blue group at
the bottom of this curve,
and I split it into
three sections.
There's an incoming
section, which
are paths coming
from other groups,
so group 2 coming into group 1.
There's paths that stay
within the same group,
so group 1 going to itself.
And then there's
groups that leave, so
going from group
1 out to group 2.
I fill in the incoming
bar from left to right.
I fill in the outgoing
bar from right to left.
So let's say I
focus in on group,
I want my outgoing group
to start on the far right,
so that when it connects to the
group immediately to its right,
it starts on the far
right of the group.
And it connects to the far
left of the group next to it.
So the staying bars only has
one potential path entering it.
I want that to be
the final one that
gets filled right
in the middle, so it
doesn't cross over anything.
So this is my algorithm that
I made in order to sort these.
This is from proc SQL in
the ORDER BY statement,
which I really like using proc
SQL for situations like this.
Because you don't actually have
to make variables to sort by,
you can actually make
code that does that.
So this has three layers to it.
The first layer is just--
is the group that it starts
at-- different than the group
that it leaves.
If it's true, then it
gets the value of 1.
If it's false, it
gets the value of 0.
So when you sort this,
the 0's will come first
and the 1's will come last.
So this is how I
get my groups that
stay the same to be drawn last.
Second level is what's
the starting group.
So do we start at group
1 and we increase up--
2, 3, 4, 5, 6, 7, 8, 9, 10?
And this is possible just
because no matter what
the groups are-- if they're
text, they're character.
The macro takes them and
assigns them in numerical order.
So I don't worry about
alphabetical or anything.
But there's probably
some math algorithm
out there that
could pick where do
I start to get the least amount
of crossover in my graph.
Fortunately, this was
just a fun project for me,
so I didn't go that
into detail and I just
had to pick somewhere to start.
So I picked with group 1.
So then our last section is
using the ifn function again.
And it's a two part deal.
So it checks the
logic statement.
Is the before a group
greater than the after group?
So there's a starting
group more-- greater
than the ending group.
So an example this would
be if you're in group 2
and you're traveling the group
1, then that would be true.
If you're traveling from group
1 to group 2, that is not true.
So when these are true, we start
at-- we go to the before level
minus after level.
So if we're going from 2
to 1, 2 minus 1 gives us 1.
That's our number
for sorting on.
That's not true.
So if we're going from
1 to 10 for example,
we would go to the last
row to take, 1 minus 10,
which gives us
negative 9, we would
add our total number of
groups, and we'd get 1,
and that gives us
our short order.
And the goal for this is
I want our outgoing curves
to go in reverse
around the circle,
so I want to connect to 10, then
9, then 8, then 7, 6, 5, 4--
just so none of my curves
cross over with each other.
So here's an example of
this with not 10 groups.
If you look at our not
sorted curve on the left,
you'll notice it just goes
in whatever SAS gave me--
going 1, 2, 3, 4, 5,
6, 7, 8, 9 around.
Our curves cross
over, no real order.
On the right, now, we're
going to go and order.
So the first thing I
want to do is look at 1--
when 1 connects to 2.
1 is less than 2, so if you take
1 minus 2 we get negative 1.
If we add 3, that
gives us a value of 2.
However, if we look
at 1 to 3, 1 minus 3
is negative 2 plus
3 gives us 1, which
is why 1 connects to 3 first
and then 1 connects to 2
because we'd go 1, 2, 2--
and the same way all
around the curve.
When we look at group
2, 2 minus 1 gives us 1,
so that would go first versus
2 minus 3 gives us negative 1,
we add 3, we get 2.
So that would go second.
And we get all the
way around the curve
and we come back to 1 to 1,
which is our seventh item.
2 to 2 is our eighth item.
And 3 to 3 is our ninth item.
So that's how we get the sort
order based off that algorithm.
So a little bit easier now--
data set structure
and this is not
the structure of the
data set that you'd
have to use to run this macro.
This is the data set--
the structure of the
data that this macro
makes to make the graph.
So data set structure
to make the polygons
is this list of variables.
So the BFR stands for before.
So we have the
before ID and it's
the ID of the starting path.
So I always want to know
what the starting path is
for a Bezier curve because
I want it to be colored
the same as its starting group.
The ID is a unique ID for each
shape in the polygon plot.
So it could be
something arbitrary.
My examples, I have C-1.
So circle 1 is the first
group's outer rectangle.
B comma 1 dash 4 is our Bezier
curve connecting 1 to 4.
So I just have the macro write
out all these different IDs,
so that it can make
them different shapes.
Outline is my flag
variable that says
that these coordinates represent
the outline of the unit circle
for the--
for our outer circle groups.
Outer flags, the coordinates
for that rectangle.
Inner flags, coordinates
for our inner rectangle.
Bezier flags the coordinates as
being part of the Bezier curve
and we have our x
and y-coordinates.
So for the labels we
have our text variable,
which just winds up
being our group values.
We have X_TEXT for--
X_TEXT for our x-coordinates
and y-coordinates
for our TEXT plot.
And we have a rotate
variable, which
lets us pick a rotation
value for the TEXT plot,
this way I don't have to
specify different a TEXT plot
for each label, because I can
precalculate the rotation here.
For our axis tick marks
we have tick mark, which,
again, is just a group ID.
So some examples, I have
here colon 1 dash 1,
so there'll be two rows
where it's 1 dash 1--
one for the point
on the unit circle,
one for the point
of the unit circle.
And it'll just go 1-1--
1 dash 1, 1 dash 2,
1 dash 3, 1 dash 4,
and so forth for each group.
Then increment up to 2 dash
1, 2 dash 2, and so forth.
Then XTICK and YTICK are just
our x or y-coordinates again.
So, now, getting
into the GTL code,
again, Graph Template
Language code.
The big bulk of doing this macro
was the data step processing.
The actual graph
template language code
should be fairly
straightforward.
I think the most complicated
part is right here
in the attribute map.
If you haven't used an
attribute map before,
it let's see you specify
values almost like a format.
So you can list off all
your different values.
And here I am going from one
to however many groups I have--
my example, it's been 10.
So from 1 to 10,
when value is 1,
give it the fill attributes
some color of your choice.
So using the discrete
attribute map,
I'll assign these
different attributes
to each of the 10 groups.
I'll name my discrete
attribute map
ATTRS, which is similar to
how you would name format
with proc format, so
you can call it later.
And it gets called by this
discrete attribute var
statement at the bottom here--
where ATTR var is our new
grouping variable that applies
the discrete attribute map.
Our var is the variable that's
going to contribute the values.
So BFRID is, again,
the idea of the group
that our curve starts at.
And the attribute map
it's going to reference
is the ATTRS here has map.
So in order to draw
the Bezier curves,
I call the polygon plot.
And, again, I'm going to use
this eval function and we're
going to back to that
Bezier flag variable
that I made because all
of my xy coordinates
are in two columns--
whether it's my rectangles,
my inner rectangles,
my Bezier curves-- they're
all buried in there.
And for this one I only want
to plot the Bezier curves
because I need the
transparency option.
If it wasn't for the
transparency option,
I could do it all in one
polygon plot statement,
but, unfortunately,
there's no variable I
can add for transparency value.
The reason why I have
the transparency values
is because if the curves
were not transparent,
they would be completely
opaque and they would just
start covering each
other up and I'd
be a little bit harder
to read the graph--
or quite a bit harder
to read the graph.
To draw the rectangles,
we do the same thing.
This time I'm going to say
where Bezier is not equal to 1
to get all the other
shapes that are contained
in those x and y variables.
Again, I'm going to
group on this ATTR var
in order to get the
colors that I want.
To draw the axes, I'm
going to use a SERIESPLOT.
I'm going to pick put my
xy coordinates that make
the outline of the unit circle.
I'm going to setup
my linear attributes
using my macro parameters.
And I'm going to group
them on the ID variable,
so that I get a
different SERIESPLOT
for each one of my groups
going around the circle.
If I didn't use that
group equals ID,
then the axis, when it ends
on one of the rectangles,
would draw a line to
the next rectangle,
which would go around then draw
a line to the next rectangle.
By using this group
option it just
stops at the end of each group.
To draw the tick marks
I do another SERIESPLOT,
use an x equals x tick, y equals
y tick, line attributes again,
and this time I'm
going to use the tick
mark as-- in our group, so
that I get separate tick marks.
Again, if I didn't
use this group value,
each tick mark would
connect to the next
and make a bunch
of jagged triangles
going around the curve-- or
going around the circle, which
is not what we want.
To draw the labels,
it's the TEXTPLOT.
Again, we have an x and y.
We use our text value.
We set our text
attributes to be whatever
parameters we want them to be.
Rotate is our rotation variable.
And I found position
equals top, vcenter to be
bbox to be the way that
it best positions them
around the circle--
without colliding into anything.
So some examples that I
put together for this--
first of all, if we
can a quick look,
this is percent CIRCOS
macro call, only
has three required parameters.
So your data just has
to be structured to have
a before and an after group.
So the example data that I
had when I first started this
was just two columns.
There was an A, B, C, D,
E, and an A, B, C, D, E,
and just whichever patient
value had for start,
end was just two columns,
so pretty straightforward.
Data is just whatever
the set name is.
There's image options,
so you can control
your height and your width.
You can make a JPEG, PNG,
TIFF, anything that you need.
Anti-aliasing and transparency.
So the anti-aliasing
is actually extremely
important with this graph.
Normally, it doesn't come
up in a lot of curves.
Anti-aliasing is how many
points or objects in my graph
can I have before it
runs out of memory
and starts making
everything look jagged.
In this graph, we
have potentially
thousands and
thousands of points
to make these polygons--
depending on your data.
So being able to set
anti-aliasing higher
could be very critical to
making your graph look correct.
You can output it to
a document, so if you
want to send it to an RTF, or
a PDF, HTML, Excel, whatever
you're working with,
you can do that.
DPI for your resolution.
Controlling the
appearance of the graph.
Group gap is the gap
between rectangles.
So as you're going
around the circle,
you can either have
all your rectangles
connecting to each other
if you set that to 0.
Or, if you want some
more space between them,
you can increase
that buffer size up.
Bar width lets you set the
width of the rectangles
as a proportion of
the unit circle.
Inner gap is the gap
between the inner circle
and the outer circle.
Start is where in the unit
circle do you want to start.
The graph defaults to
the bottom of the curve,
but you can always set
it to be any point going
around the circle.
Direction is do you want it
to go clockwise or do you
want it to go counterclockwise.
You change font sizes, colors,
weights, orders of the groups,
colors of the groups.
And there's also a
way to subgroup down,
which we'll see in one--
in two of our examples here.
So a basic example that you
could actually take this home
and you can run this
if you wanted to--
I have a website that--
it's linked in the presentation
where you can actually
go and download the
macro code if you
wanted to try running this
yourself or look at it.
This just creates a
randomly generated
data set to make random
before and after groups.
Then we just plug it into the
macro and run it to get this.
This one is designed to
make it so our first three
groups don't have any
other paths entering them
and our last three don't
have any groups leaving
them just based off of the
random number generator.
And I'll point out the
bottom right corner,
because I don't think
I've pointed this out yet,
that number 1.92 paths--
percent is the only way I have
to give a quantitative number
to this graph.
So it just means that each
percentage point as you
run around is equal to 1.92.
It could be patients,
could be observations--
whatever's in your data set.
So soccer players
changing teams.
So I found this data in one of
Sanjay's Graphically Speaking.
If you haven't looked
at these blogs,
there's a lot of helpful
resources that are out there.
They're made by Sanjay Matange,
who I heard has sadly retired,
if you're coming here.
This shows soccer
players changing teams
with certain regions
or countries.
And while, again,
it's not quantitative,
it still can give us
some fun information.
For example, if you look at the
USA, nobody is going to the US
to play soccer,
which, I guess, isn't
a huge surprise to anyone.
Surprisingly, a lot of
people are leaving England
to play in other countries.
A lot of people from
Spain-- are going to Spain
to play there instead
of people leaving Spain.
And a majority of
the other countries
are pretty
well-balanced of people
leaving to go to other
countries and other countries
joining them to play soccer.
So I work in clinical
oncology, so I
wanted to pull an
example to try to give
some reference for some way
that I could use a CIRCOS graph.
Because, again, I
just made this for fun
as part of a group
project, but I want
to find some way to use this.
So, in this case, I'm looking
at Maximum grade adverse events
for diarrhea in a advanced
colorectal cancer trial.
And the goal of
this is to see how
are grades distributed at
these two different time
points, so six weeks versus
the entire length of treatment.
And is there any surrogate--
or any kind of surrogate
value that six-- can six weeks
predict what they'll have
across their whole length
of treatment?
So sometimes we look at
analyses like this just
to see if we can cut back on
how long we can watch patients
for, how long we can treat
patients for, et cetera.
In this case, we're using
our subgroup variables.
So now we're having
two CIRCOS graphs-- one
on the left, one on the right,
connected to each other.
And six weeks, if you look at
that, we can see almost half
of our patients there do
not have any diarrhea, so
grade zero.
With adverse events, the higher
the number, the more severe
the adverse event is-- where
five actually means death.
Four is hospitalized.
Three is it severely
impacts their lifestyle.
So we have-- almost half
of them didn't have it.
The majority of them had
it as pretty much minor.
And then there is
a small group where
they had severe adverse events.
Now, if we look at
end of treatment,
we can see that they
become a little bit more
balanced-- where we have
almost an even number at zero,
one, two, and three.
And we still don't have a lot
of very severe adverse events.
But what we can look at is
it looks like almost 60%
of the zero group stayed zeros.
Pretty much 60%--
70% of grade one stayed grade
ones, same thing with twos.
Almost all the threes
stayed a three.
So while it's not quantitative,
there's no p values--
nothing like that-- we can
still see a visual pattern
in our data that most people
stayed in their six week
adverse event group.
Now, this was
actually a meta-study,
which happens to have
47 studies in it.
But I'm subsetting in it
down to just these six.
You can look at how
patients search are
going from six-week adverse
events to the full treatment
adverse events
within each study.
And one of the first things
that we can see by subgrouping
is that study of
three is gigantic
compared to most of
the other studies when
it comes to sample size.
Looking within, it can
be a little bit tricky
when you get down
to the-- this small
to actually pick out
where people are going.
But, again, this is
just a visual graph
just to see what your data
looks like it's doing.
We can see, again,
that most patients seem
to be staying within their
six-week adverse event maximum
grade versus the full
treatment adverse event grade.
Now, within this
macro, you can--
I have made it, so you can color
each one of these studies--
adverse events--
different colors
or you can make them
the same across each.
In this case, it works better
to have them all the same,
so we know that
grade zero is a grade
zero without having
to constantly look
at the outside of the circle.
So some other applications.
So as I've been
doing this, I like
to go out to
communities.sas.com,
which is another great website,
if you haven't been there yet.
It's a forum where you
can post questions.
You could try to help other
people answer their questions.
There's articles on
how to do cool stuff.
I was surfing their
at PharmaSUG and I
came across these two
potential graphs, pie charts
and donut charts.
They're generally very
simple visual graphs,
but SAS doesn't have a
lot of options for these.
Because they just don't do
a lot with circular graphs,
it seems, at this point.
But the same
trigonometry methods
that I've been using to
make the CIRCOS graph
can be applied to make
these graphs as well.
So for the pie chart--
so here's a basic pie chart.
It's available in the
graph template language
with the region layout,
which region layout is just
a way to provide space.
There's no x and
y-axis in there.
And I think the pie chart's
one of the only graphs you can
actually put in that layout.
It can't be combined
with other graph types
like text plot, or
scatter plot, or anything.
There's very limited
options to it.
Now, if this was the graph
that somebody wanted to make--
so they wanted to
have two pie charts
where the small groups that have
1%, 6%, 7%, in this example,
don't look like small slivers.
And he wanted to
blow them up, so he
could look at their proportions
in a second pie chart.
Now, you could potentially
try to do this making it
on a lattice layout
with two regions
and putting two pie
charts next to each other.
But then the problem
becomes how do
you connect them to each other?
Because he wanted
those lines that
made it look like you
were blowing up one pie
chart into the other.
So there really wasn't
a way to do that.
And so I took what
I've been doing
and instead of drawing
curved rectangles,
I'm going to draw a wedge.
Because this is another
pretty straightforward
shape that I didn't
find any plot statement
to be able to do in SAS.
Basically, all we
have to do is do
the outer part of our
rectangle on the unit circle,
and then we just put
a point at the origin,
and the polygon plot
will connect everything,
and we'll get our
wedge, and we just
keep adding wedges
going around the circle,
and we'll make our pie chart.
But the difference between
this one and those other ones
is we have an x and
y-axis for this pie chart.
So now we can actually go
in and add any kind of graph
that we want into this.
Now, in order to
make that second one,
we go back to, again, do
it-- a little bit of math
in order to translate-- to move
a circle into another origin
in the same space, we just go
to each of the x-coordinates
and we just add the
same value to it.
So instead of having our origin
at 0, 0 for the left one,
we want our origin
to be at something
like 3, 0 on the right.
So we just add 3 to everything.
And it just moves that
second circle over.
Then all I have to do is just
find the start and end points
to that other group,
the gray group,
and just connect them to the top
and bottom of the other circle.
And then it looks like it's
exploding out like I want.
So donut chart is
similar to a pie chart,
but instead of
wedges, we have rings.
Again, this one has--
trying to remember
what they call that--
a sheen to it
because that's what
the user is trying to
match what they have,
but you can make it flat
or anything like that.
So it's similar to pie
chart, it has multiple rings.
I haven't seen anything in
SAS that makes a donut plot--
at least in the SG procedures,
which is what I use.
You can use the CIRCOS
outer rectangle methods
and just use them at multiple
distances to draw these.
So in this case, I'm
going do j from 1 to 2
or how many rings that you have.
And the key part
to look at for this
is the green text, if you
can read that out there.
So each loop is,
essentially, going
to draw the next rectangle
0.3 proportion closer
to the origin.
First one, j will be 1,
so it'll get rid of that,
and it'll draw it
on the unit circle,
and it'll just work its way in.
And then we just see this group
equals ID in our polygon plot
to color them differently.
And we have the donut plot.
So, in conclusion,
generating circular plots
is possible with
trigonometry functions.
Current methods are
really inefficient.
They could be improved with
the addition of a polar axis,
if SAS wanted to develop that.
CIRCOS plots are
powerful visual tool,
I don't think they're
meant to be quantitative.
But they can be
pretty and shiny,
put on things like posters.
They can be versatile to
situations outside of genomics.
And everything is available
if you follow this link, which
I think, if you
have the app, you
can actually pull up my slides
and click on that or just
search SAS communities
for percent
CIRCOS and you'll find
an article that I wrote
that has the background of
everything I just talked to,
my PharmaSUG paper,
the program itself,
and some examples-- at
least one example in there.
So thank you for
coming to my talk.
I need to remind you to
complete your session
survey inside the app.
If you have any questions
for me on anything,
you can just walk up
to the microphones
and feel free to ask.
I don't mind going back
to any of the slides
if you have a question and
something specific, so thanks
again.
