Maneesh Agrawala: THANK YOU, THANKS A LOT.
SO I'M MANEESH
AGRAWALA.
AND I THINK THAT ALL OF US CAN AGREE THAT
ONE OF THE BEST WAYS
TO COMMUNICATE QUANTITATIVE INFORMATION IS
TO VISUALIZE IT IN THE FORM OF
A CHART OR A GRAPH.
AND IN THE LAST 300 YEARS OR SO SINCE THESE
FORMS
WERE INVENTED, THEY HAVE BECOME PRETTY UBIQUITOUS.
WE SEE THEM ALL THE
TIME IN THINGS LIKE SCIENTIFIC PAPERS AND
TEXTBOOKS.
IN NEWSPAPER
ARTICLES AND MAGAZINES AND FINANCIAL REPORTS
AND ON AND ON AND ON AND ON.
THEY ARE ALL OVER THE PLACE.
AND ONE OF THE THINGS THAT WE HAVE BEEN
THINKING A LOT ABOUT IS WHAT MAKES THESE KINDS
OF VISUALIZATIONS SO
EFFECTIVE.
SO OKAY.
SO THIS IS WORK FROM WILLIAM PLAYFAIR.
IN 1786,
PLAYFAIR, AN ENGLISH ECONOMIST, PUBLISHED
THE COMMERCIAL AND POLITICAL
ATLAS.
AND THIS COMPARED THE ECONOMICS OF ENGLAND
TO THE REST OF THE
WORLD.
IT WAS THE FIRST PUBLISHED USE OF LINE CHARTS
AND BAR CHARTS.
NOW, THESE IMAGES HERE ARE FROM THE THIRD
EDITION OF THE BOOK
AND THIS WAS PUBLISHED IN 1801.
ONE OF THE THINGS THAT I FIND REALLY
INTERESTING ABOUT THESE IMAGES IS THAT EXCEPT
FOR THE TYPEFACE, THESE
LOOK REMARKABLY SIMILAR TO MODERN LINE CHARTS
AND BAR CHARTS.
THEY ARE
VERY, VERY SIMILAR.
AND SO ONE QUESTION HERE IS WHY DO THESE CHARTS
LOOK SO MODERN?
AND SO LET'S LOOK AT THIS EXAMPLE FROM PLAYFAIR
IN PARTICULAR.
SHOWS THE
BALANCE OF TRADE BETWEEN ENGLAND AND NORTH
AMERICA.
IN CASE YOU CAN'T
SEE IT, I KNOW IT'S SMALL, THE X AXIS SHOWS
TIMES.
FROM THE LEFT IT'S
1700, OVER HERE ON THE RIGHT IT'S 1800.
SO IT'S A CENTURY OF YEARS.
AND
ON THE Y AXIS, THEY ARE SHOWING THE AMOUNT
OF MONEY OR VALUE THAT WAS
TRANSFERRED EITHER FROM ENGLAND TO NORTH AMERICA.
THAT'S THE LINE OF
EXPORTS THAT WE SEE HERE IN RED.
OR VALUE IMPORTED FROM NORTH AMERICA TO
ENGLAND.
THAT'S THE LINE OF IMPORTS HERE IN YELLOW.
AND SO IN THE 1700S IN THE EARLY 1700S, THE
EARLY PART OF THE
CENTURY, WE SEE THAT THE BALANCE OF TRADE
IS REALLY IN FAVOR OF NORTH
AMERICA.
AND THIS PINK REGION SHOWS JUST HOW MUCH MORE
IS BEING IMPORTED
FROM NORTH AMERICA THAN EXPORTED.
IN THE LAST HALF OF THE CENTURY, WE
SEE THAT THE BALANCE OF TRADE IS STRONGLY
IN FAVOR OF ENGLAND.
AND THESE
GREENISH REGIONS KIND OF REPRESENT HOW STRONG
THE BALANCE HAS SHIFTED.
AND WE ALSO SEE THAT IN THE MID 1770S, THERE
IS THIS
INTERESTING DIP IN THE AMOUNT OF TRADE.
THIS IS, OF COURSE, THE
REVOLUTIONARY WAR. AND SO TRADE ALMOST CAME
TO A STAND STILL, YOU SEE
THERE IS A LITTLE BIT OF IT GOING ON, IT DECLINED
PRETTY SHARPLY.
THEN
THE CHART ALSO SHOWS THAT RIGHT AFTER THE
WAR, TRADE PICKED RIGHT BACK UP
AND ENGLAND RETURNED TO HAVING TRADE HEAVILY
IN THEIR FAVOR.
OKAY?
ALL
OF THAT IS REALLY CLEAR TO SEE FROM THIS CHART,
LOTS AND LOTS OF DATA
PRESENTED ALL THE ONCE.
SO, YOU KNOW, WELL DESIGNED CHARTS LIKE THESE
ARE REALLY GREAT
FOR PEOPLE.
LIKE YOU AND ME TO UNDERSTAND AND SEE THE
DATA.
WE SEE THE
DATA IN CONTEXT, WE SEE A LOT OF IT AT ONCE.
IT HELPS US MAKE SENSE OF
THE DATA AND UNDERSTAND IT. AND I THINK THIS
IS THE REASON THAT THESE
KINDS OF CHARTS AND GRAPHS HAVE ENDURED FOR
SO LONG.
BUT FOR MACHINES
LIKE THESE ROBOTS, IT'S DIFFICULT TO UNDERSTAND
THESE IMAGES.
AND SO A
BIG PART OF THE PROBLEM HERE IS THAT THE CHARTS
ARE REPRESENTED AS PIXELS
AND THE MACHINE HAS NO WAY TO ACCESS THE UNDERLYING
DATA.
AND THAT, I
THINK, IS A PROBLEM.
HERE IS ANOTHER CHART.
THIS IS A VISUALIZATION OF NIH'S
RESEARCH BUDGET PER DEBT SPENT ON VARIOUS
DISEASES IN 2005.
AND WE,
PEOPLE, CAN SEE THAT AIDS TAKES UP MUCH OF
THE BUDGET.
IT'S MUCH HIGHER
THAN ANY OTHER DISEASE.
BUT WE CAN'T REALLY TELL EVEN IF WE LOOK CLOSELY
WHETHER NIH SPENDS MORE ON DIABETES OR ON
ALZHEIMER'S.
THE AREAS OR
ANGLES ARE VERY, VERY SIMILAR AND IT'S HARD
TO TELL THE DIFFERENCE.
SO
WHILE THIS VISUALIZATION STILL COMMUNICATES
THE DATA TO PEOPLE, IT MIGHT
NOT BE THE MOST EFFECTIVE DESIGN.
AND THESE KINDS OF POORLY DESIGNED VISUALIZATIONS
CAN BE FOUND
ALL OVER THE PLACE.
HERE ARE A FEW EXAMPLES.
AND IF YOU GO TO VARIOUS
WEBSITES, YOU WILL FIND MORE COLLECTIONS OF
BAD VISUALIZATIONS.
THIS IS
MY FAVORITE.
THIS 3D PIE CHART FROM FOX NEWS.
FROM 2012 PRESIDENTIAL
CAMPAIGN.
AS, I THINK, SOME OF YOU HAVE NOTICED, FOR
SOME REASON, THE
PIE SLICES DON'T ADD UP TO 100%.
BUT GOING BACK TO OUR NIH EXAMPLE HERE, IF
WE HAD ACCESS TO
SOME OF THE UNDERLYING DATA, WE COULD FIX
SOME OF THE PROBLEMS OF THIS
PIE CHART.
FOR EXAMPLE, WE COULD LABEL SOME OF THE SLICES
PERHAPS WITH
THE AMOUNTS THAT THEY REPRESENT.
OR PERHAPS REDESIGN THE CHART
COMPLETELY TRANSFORM IT INTO A BAR CHART,
FOR EXAMPLE, WHERE IT MIGHT BE
EASIER TO READ OFF THE DIFFERENCES IN VALUES.
BUT AGAIN, HERE AGAIN, THE PIXELS REALLY MAKE
IT DIFFICULT FOR
PEOPLE TO ACCURATELY GET ACCESS TO THE DATA
AND TO MANIPULATE THE CHART.
I HAVE NO WAY OF REFORMING THIS CHART IF ALL
I HAVE ARE THE PIXELS.
SO WHAT WE HAVE SEEN IS THAT PIXELS ARE NOT
A GOOD
REPRESENTATION FOR CHARTS AND GRAPHS.
THEY MAKE IT DIFFICULT FOR BOTH
THE PEOPLE AND THE MACHINES TO WORK WITH THE
UNDERLYING DATA AND
INFORMATION.
AND THE GOAL OF SOME OF OUR WORK HAS BEEN
TO DECONSTRUCT THESE
KINDS OF BIT MAPS OF CHARTS INTO A REPRESENTATION
THAT ENABLES REDESIGN
REDUCE AND REVITALIZATION OF THE CHARTS.
SO THAT RACES A QUESTION.
WHAT IS A GOOD REPRESENTATION FOR
CHARTS AND GRAPHS?
NOW, FORTUNATELY, VISUALIZATION RESEARCHERS
HAVE
ADDRESSED THIS PROBLEM.
WHAT I WOULD LIKE TO DO AT THIS POINT IS WALK
YOU THROUGH ONE OF THE MOST IMPORTANT IDEAS
I THINK TO HAVE EMERGED FROM
THE VISUALIZATION COMMUNITY OVER THE LAST
30 TO 40 YEARS.
IN FACT, IT STARTS WITH THINKING ABOUT HOW
TO REPRESENT DATA.
AND THIS REALLY HAPPENED IN THE 1940S.
STEVENS WAS A HARVARD
PSYCHOLOGIST AND DEVELOPED A THEORY FOR THE
SCALES OF MEASUREMENT.
WE
CAN THINK OF THESE AS DIFFERENT TYPES OF DATA.
AND HE DEFINED FOUR OF
THEM.
SO FIRST IS NOMINAL DATA.
THIS CONSISTS OF CATEGORIES OR LABEL.
CATEGORIES OF FRUIT ARE ONE EXAMPLE.
YOU CAN LABEL A FRUIT AS AN APPLE,
AN ORANGE, A PARE OR BANANA.
AND YOU CAN COMPARE THESE VALUES, YOU CAN
TEST FOR EQUALITY AND INEQUALITY.
ALL RIGHT?
THAT'S THE ONLY WAY YOU CAN COMPARE THEM.
NEXT IS
ORDERED OR ORDINAL DATA.
THIS HAS A RANKING OR AN ORDERING ON THE DATA,
GRADE OF MEAT ARE ONE EXAMPLE.
I WAS RECENTLY TOLD THAT WE PHASED OUT
GRADES OF MEAT IN THE U.S. BUT IN CANADA,
THEY STILL HAVE GRADES.
SO
IT -- YOU KNOW, I THINK THE RANKING IS STILL
IMPORTANT.
HOW MANY PEOPLE
KNOW ABOUT GRADES OF MEAT?
DO WE STILL HAVE THIS IN THE U.S.? YEAH?
YES?
NO?
DOESN'T MATTER.
THERE IS AN ORDERING -- THERE IS NO DISTANCE
BETWEEN THEM.
SO YOU CAN'T TELL -- THERE IS NO WAY TO ASSESS
THE
DISTANCE BETWEEN GRADE A AND GRADE AA.
SO YOU CAN COMPARE THINGS IN
TERMS OF THEIR ORDER.
NEXT IS QUANTITATIVE DATA.
THERE ARE TWO FORMS OF QUANTITATIVE
DATA.
ONE IS INTERVAL DATA.
AND HERE WE HAVE AN ARBITRARY ZERO
LOCATION.
WE CAN'T DIRECTLY COMPARE VALUES.
WE CAN ONLY COMPARE
DIFFERENCES BETWEEN VALUES.
AND THE SECOND TYPE OF QUANTITATIVE DATA IS
RATIO DATA.
HERE THERE IS A FIXED ZERO LOCATION.
HAS SEMANTIC MEANING.
SO YOU CAN DIRECTLY COMPARE VALUES.
SO THE PURPOSES OF VISUALIZATION, WE USUALLY
COMBINE THESE LAST
TWO CATEGORIES AND WE THINK OF THE THREE MAIN
DATA TYPES.
THAT'S
NOMINAL, ORDINAL AND QUANTITATIVE.
OKAY?
NEXT, WE NEED TO THINK ABOUT HOW TO REPRESENT
IMAGES.
IN THE
1960S, FRENCH CARTOGRAPHER JACQUES BERTIN
REALLY LAID THE FOUNDATION FOR
THINKING ABOUT CHARTS IN HIS CLASSIC BOOK
THE SEMEIOLOGY OF GRAPHICS.
AND HE THOUGHT OF CHARTS AS COMPOSED OF TWO
THINGS.
A SET OF MARKS, SUCH
AS POINTS, LINES AND AREAS.
AND VISUAL VARIABLES OR ATTRIBUTES OF THE
MARKS.
THESE CONTROL THE APPEARANCE OF THE MARKS
AND ENCODE THE DATA.
SO, FOR EXAMPLE, WE CAN USED POSITION OF THE
MARK TO ENCODE TWO VALUES.
ONE FOR X POSITION.
ONE FOR Y POSITION AND SO ON.
SO WE CAN THINK OF A CHART THEN AS A SET OF
MARKS WITH MAPPINGS
BETWEEN THE DATA AND THE VISUAL VARIABLES
OR VISUAL ATTRIBUTES OF THE
MARKS.
AND A BAR CHART IS A SET OF LINE MARKS.
EACH BAR IS A LINE.
AND
DATA IS MAPPED TO THE SIZE OR LENGTH OF A
LINE.
A SCATTERPLOT CONSISTS
OF POINT MARKS WITH ONE DATA FIELD MAP TO
X POSITION, ANOTHER TO Y
POSITION.
WE CAN, OF COURSE, ADD ADDITIONAL VISUAL VARIABLES.
HERE WE
HAVE ADDED COLOR.
THIS IS A COLORED SCATTERPLOT.
WE ADDED A THIRD DATA
DIMENSION.
THIS SCATTERPLOT IS NOW SHOWING THREE DIMENSIONS
OF THE DATA.
AND WE CAN MAP TO SIZE.
WE HAVE A FOUR DIMENSIONAL SCATTERPLOT WITH
A
DIFFERENT DATA FIELD MAP TO EACH ONE OF THESE
ATTRIBUTES.
NOW, I'M GOING TO COME BACK IN A MOMENT TO
TELL YOU HOW WE
MIGHT DECONSTRUCT A CHART INTO THIS REPUBLICAN
PRESENTATION, BUT I WANT
TO CONTINUE A BIT WITH BERTIN'S IDEAS BECAUSE
I THINK THEY ARE REALLY
IMPORTANT.
SO -- AND THE ONE TAKE AWAY THAT I WANT YOU
TO REMEMBER FROM
THIS TALK IS THAT WE CAN REALLY THINK OF A
CHART IS A COLLECTION OF
MAPPINGS FROM DATA TO THE VISUAL ATTRIBUTES
OF THE MARKS.
OKAY.
SO HERE IS A BIT MORE FROM BERTIN.
HE ALSO CONSIDERED
THE WAY THAT PEOPLE PERCEIVE THE VISUAL VARIABLES
AND THE MARKS.
HERE WE
HAVE A SET OF THREE POINT MARKS.
THE PLUSSES ARE LITTLE REPRESENTATIONS
OF POINTS.
AND WE CAN SEE THAT THE POINTS ARE EASILY
DISTINGUISHABLE
FROM ONE ANOTHER.
AND IN THIS CASE, THEY ARE COLLINEAR.
WE CAN COMPARE
THE DISTANCES BETWEEN THEM.
WE SEE THAT BC IS TWICE AS LONG AS AB.
AND
BERTIN USED THIS KIND OF REASONING TO SUGGEST
THAT POSITION IS REALLY
GOOD FOR ENCODING QUANTITATIVE DATA.
OKAY?
SO WHERE WE HAVE DIFFERENCES IN DISTANCES
BETWEEN THE DATA,
THIS KIND OF ENCODING USING POSITION IS REALLY
GOOD.
HE ALSO EXPLAINED
THAT GRAY VALUE IS PERCEIVED AS ORDERED.
SO GIVEN SEVERAL PATCHES OF
GRAY VALUES, WE CAN EASILY ORDER THEM BASED
ON WHETHER THEY ARE CLOSER TO
BLACK OR WHITE.
BUT IT TURNS OUT AS YOU CAN SEE IN THIS EXAMPLE
THAT VERY FINE
DISTINCTIONS BETWEEN GRAY VALUES ARE MUCH
HARDER FOR PEOPLE TO SEE.
IF
YOU LOOK AT THIS EXAMPLE IN HERE, POINTS THAT
ARE VERY CLOSE BY IN THIS
CONTINUOUS RAMP IS HARD TO DISTINGUISH THEM
IMMEDIATELY.
SO WHAT THIS MEANS TOGETHER IS THAT GRAY VALUE
IS GOOD FOR
ENCODING ORDER DATA.
IT'S NOT SO GOOD FOR ENCODING QUANTITATIVE
DATA.
FINALLY WITH HUE, WE HIGH OF HUE IS UNORDERED.
THERE IS NO NATURAL
ORDERING TO HUE. AND SO THIS IS REALLY BEST
FOR ENCODING NOMINAL DATA.
OKAY?
WE DO LEARN AN ORDERING ON HUES.
BUT THAT'S A LEARNED THING LIKE
THE RAINBOW ORDERING.
SO BERTIN RANKED ALL OF THESE VISUAL VARIABLES
IN
TERMS OF HOW WELL HE THOUGHT THEY COULD REPRESENT
THE THREE TYPES OF
DATA.
AND WHAT HE FOUND IS THAT POSITION IN SIZE
ARE GOOD FOR ALL THREE
DATA TYPES, NOMINAL OR QUANTITATIVE.
THAT HUE WE SAW IS REALLY BEST FOR
NOMINAL AND ORDINAL, NOT SO GOOD FOR QUANTITATIVE
DATA AND SO ON.
THAT'S HOW YOU CAN READ OFF THIS TABLE.
AND I SHOULD POINT OUT
THAT BERTIN MOST HE CONJECTURES THIS TABLE
BASED ON HIS EXPERIENCES IN
CONSTRUCTING CHARTS AND GRAPHS AND LATER EXPERIMENTS
ON GRAPHICAL
PERCEPTIONS, MOTH OF THESE RANKINGS WERE VERIFIED.
IN FACT, HERE IS AN EXAMPLE OF ONE OF THESE
EXPERIMENTS.
EXPERIMENT DONE BY CLEVELAND AND MCGILL WHERE
THEY ASKED PEOPLE WHICH OF
THE TWO PIE SLICES MARKED WITH DOTS IS BIGGER
OR WHICH OF THE TWO BARS
MARKED WITH DOTS IS BIGGER.
AND ALSO HOW MUCH BIGGER THE BIGGER ONE WAS
THAN THE SMALLER ONE.
OKAY?
THEY WERE TRYING TO FIGURE OUT HOW ACCURATELY
PEOPLE COULD
EXTRACT VALUES ENCODED USING EITHER THE ANGLE
OF THE PIE SLICE OR THE
POSITION OF THE TYPES OF THE BARS.
AND AS YOU MIGHT IMAGINE, HERE ARE
THE RESULTS.
SO WHAT THEY FOUND IS THAT ANGLE ENCODINGS
ARE SIGNIFICANTLY
LESS ACCURATE THAN POSITION ENCODINGS OF THESE
BAR CHARTS.
ALL RIGHT?
AND HERE BASED ON A NUMBER OF SUCH EXPERIMENTS,
RESEARCHERS HAVE RANKED
THE EFFECTIVENESS OF DIFFERENT VISUAL VARIABLES.
THESE REALLY MATCH THE
CONJECTURED EFFECTIVENESS THAT BERTIN SET
UP.
THIS I SHOULD POINT OUT
FOR QUANTITATIVE DATA.
THERE ARE SIMILAR RANKS FOR ORDINAL AND NOMINAL
DATA.
SO GOING BACK TO THE PLAYFAIR CHART, WE CAN
IDENTIFY THE
UNDERLYING REPRESENTATION.
WE CAN FIGURE OUT THE DATA, THE MARKS AND
THE
MAPPINGS IF WE HAD ACCESS TO THIS MAPPING-BASED
REPRESENTATION, WE WOULD
HAVE A VERY NICE REPRESENTATION FROM A MANIPULATING
THE CHARTS.
AND ANOTHER IMPORTANT THING WHEN DESIGNING
THESE KINDS OF
CHARTS IS TO MAP THE MOST IMPORTANT DATA TO
THE MOST PERCEPTUALLY
PERCEPTIVE VARIABILITIES.
HERE WE SEE IT'S USING THE MARK TYPE OF LINES
AND YEAR IS MAPPED TO POSITION BECAUSE YEAR
IS VERY IMPORTANT.
YEAR IS
ALONG THE X AXIS.
EXPORTS ARE ALONG THE Y POSITION.
IT'S THE SECOND
MOST IMPORTANT DATA FIELD IN THIS DATASET.
AND SO THOSE TWO TOGETHER
FORM THE LINE OF EXPORTS.
IF WE ARE WILLING TO HAVE MULTIPLE LINES,
THEN
WE CAN ALSO MAP IMPORTS TO THIS LINE POSITION
AND WE GET THE YELLOW LINE
OF IMPORTS.
AND SO THAT'S THE WAY YOU CAN THINK ABOUT
CONSTRUCTING THIS
KIND OF CHART AND MAKING A VERY EFFECTIVE
CHART BECAUSE IT'S BASED ON THE
PERCEPTUAL EFFECTIVENESS OF THE VISUAL VARIABLES.
NOW, WE'VE BEEN DEVELOPING A SET OF TECHNIQUES
TO TRY AND
RECOVER THIS STRUCTURE, THE DATA, THE MARKS
AND THE MAPPINGS FROM A
PIXEL-BASED REPRESENTATION, AN IMAGE OF A
CHART.
AND SO WE WOULD LIKE TO
TAKE AN INPUT LIKE THIS, THIS PIE CHART THAT
IS PIXELS AND RECOVER THIS
STRUCTURE, THE DATA MARKS AND MAPPINGS.
AND THE REASON WE WANT TO DO
THIS IS BECAUSE IT FACILITATES REDESIGN.
SO I CAN START WITH MARKS AND
MAPPINGS AND AREAS FOR THE MARK AND MAPPINGS.
BUDGET GOES TO ANGLE.
AND
CHANGE THE MARK TO LINES AND THE BUDGET GOING
TO THE LENGTH OF THE LINE
AND I TURN IT INTO A BAR CHART.
OKAY?
AND SO THE MAPPING-BASED REPRESENTATION IS
VERY, VERY POWERFUL.
I CAN VERY QUICKLY AND EASILY DO THIS CHANGE
AND GET A COMPLETELY
DIFFERENT -- COMPLETELY DIFFERENT VISUAL CHART.
AND IN THIS CASE, I CAN
PERHAPS SEE A LITTLE MORE CLEARLY THAT NIH
SPENT A BIT MORE ON
ALZHEIMER'S THAN ON DIABETES.
SO I WANT TO BRIEFLY TOUCH ON WHAT WE DO.
OUR APPROACH IS TO
TAKE A BITMAP IMAGE LIKE THIS BAR CHART HERE.
WE RUN IT THROUGH A
CLASSIFIER AND DETERMINE THE CHART TYPE.
SO A BAR CHART, A PIE CHART, A
SCATTERPLOT AND SO ON.
THEN WE HAVE DEVELOPED CHART SPECIFIC TECHNIQUES
TO TAKE THE
INPUT BAR CHART, ANALYZE WHERE THE FOREGROUND
DATA ENCODING RECTANGLES
ARE IN THIS CASE.
AND THEN FIGURE OUT BASED ON AN ANALYSIS OF
THE AXES
THE VALUE ASSOCIATED WITH EACH ONE OF THESE
BARS.
THAT ALLOWS US TO PULL
OUT THE DATA AND IMPLICITLY WE ALSO HAVE THE
MAPPINGS FROM THE MARKS --
FROM THE DATA TO THE MARKS.
THE ASSOCIATION BETWEEN THEM.
WE CAN DO A SIMILAR THING FOR PIE CHARTS.
AND THIS IS -- THESE
TECHNIQUES ARE USING A VARIETY OF MACHINE
LEARNING AND IMAGE PROCESSING
AND COMPUTER VISION TECHNIQUES.
I SHOULD MENTION THAT OUR TECHNIQUES ARE
NOT 100% ACCURATE.
IN OUR CLASSIFICATION, AT LEAST IS PRETTY
CLOSE.
NUMBER TO LOOK AT IS THIS BINARY SVM.
THE RUNS AT 96% ACCURACY.
SO
THAT'S PRETTY GOOD.
THE EXTRACTION OF THE MARKS AND THE DATA,
THAT'S NOT
QUITE AS GOOD.
AND WHAT WE ARE SEEING HERE IS THAT WE GET
BETWEEN 60 AND
80% ACCURACY AT EXTRACTING THE MARKS DEPENDING
ON CHART TYPE.
AND THEN
BETWEEN 40 AND 55, 56% ON EXTRACTING THE DATA.
NOW, I SHOULD POINT OUT THAT THESE ARE CONSERVATIVE
NUMBERS
AND, YOU KNOW, THESE ARE -- WE ARE ONLY REPORTING
HERE CASES IN WHICH WE
EXACTLY CORRECTLY EXTRACTED ALL OF THE MARKS
AND ALL OF THE DATA.
AND
THERE ARE MANY, MANY CHARTS IN WHICH WE MISS
ON ONE OR TWO MARKS AND SO
ON.
IN ANY CASE, THERE IS MORE WORK TO BE DONE
HERE.
BUT THIS IS A
REASONABLE STARTING POINT.
AND WHEN WE DO EXTRACT THIS REPRESENTATION,
WE ARE ABLE TO DO REDESIGN LIKE I SHOWED YOU
EARLIER.
THIS IS ALL DONE
SIMPLY BY MANIPULATING THE MAPPINGS.
OKAY?
HERE IS ANOTHER EXAMPLE.
SO WE'VE REORDERED THE BARS FROM
LARGEST TO SMALLEST.
AND WE'VE ALSO ADJUSTED THE BASELINE.
IN THE
ORIGINAL CHART, THE BASELINE DIDN'T START
AT ZERO.
WHICH CAN BE
CONFUSING AND LEAD TO ERRORS AND INTERPRETATION.
HERE WE SET THE
BASELINE TO START AT ZERO.
AND THEN HERE IS A VERY DIFFERENT STYLE OF
CHART.
THIS IS DOCK
PLOT THAT GIVES A CLEAR AND SPECIFIC VIEW
OF DISTRIBUTION OF THE DATA.
AND THIS IS ALL DONE SIMPLY BY MANIPULATING
THE MAPPINGS.
RECENTLY, WE
HAVE BEEN INVESTIGATING HOW TO EXTRACT THE
DATA MARKS AND MAPPINGS FROM A
CHART CONSTRUCTED PROGRAMMATICALLY USING D3.
D3 IS, I THINK, PRETTY MUCH
THE DEFACTO TOOL FOR CREATING INTERACTIVE
VISUALIZATIONS ON THE WEB
TODAY.
THERE ARE HUNDREDS AND THOUSANDS OF THESE
VISUALIZATIONS WIDELY
AVAILABLE.
SO IN THIS CASE, BY ANALYZING THE CODE, WE
CAN EXACTLY RECOVER
THE DATA AND THE MARKS AND EXPLICITLY RECOVER
THE MAPPINGS BETWEEN THEM.
AND THE NICE THING HERE IS THAT WE CAN START
WITH A BAR CHART LIKE THIS,
DO OUR DECONSTRUCTION TO GET THE MARKS ON
THE MAPPINGS AND THE DATA.
AND
THEN BY SIMPLY DELETING TWO OF THE MAPPINGS,
WE'RE ABLE TO PRODUCE A VERY
DIFFERENT LOOKING CHART, THIS DOT PLOT.
SO, AGAIN, THIS MAPPING-BASED
REPRESENTATION IS VERY, VERY POWERFUL.
SO AGAIN EVEN MORE RECENTLY WE STARTED TO
BUILD REUSABLE STYLE
TEMPLATES.
SO HERE ON THE LEFT ARE FOUR DIFFERENT D3
VISUALIZATIONS THAT
WE HAVE PULLED OFF THE WEB. AND WE'VE CONVERTED
THEM INTO THIS
MAPPING-BASED REPRESENTATION THAT SERVES ADDS
A STYLE TEMPLATE.
AND WE
CAN POWER NEW DATA INTO THESE TEMPLATES AND
GENERATE VISUALIZATIONS FOR
THE NEW DATA SET IN ANY ONE OF THESE STYLES.
AND WE DON'T HAVE TO STOP
THERE.
HERE WE STARTED WHERE A NEW DATASET.
BUT WE CAN ALSO TAKE
EXISTING VISUALIZATIONS, DECONSTRUCT THEM
AND THEM POUR THEM INTO THE
TEMPLATES AND WE GET THE SAME DATA VISUALIZED
IN MANY, MANY DIFFERENT
STYLES.
AND THIS IS ALL DONE AUTOMATICALLY.
OKAY.
SO THE LAST THING I WANT TO LEAVE YOU WITH
IS SOME WORK
THAT WE'VE DONE ON REVITALIZING VISUALIZATIONS.
THIS IS GOING A LITTLE
BIT BEYOND REDESIGN.
WE ARE REALIZE THINKING ABOUT HOW TO BRING
STATIC
CHARTS AND GRAPHS TO LIFE.
AND ENABLE INTERACTIONS THAT MIGHT SUPPORT
DIFFERENT KINDS OF CHART READING.
SO JUST LIKE CURTIS SHOWED EARLIER
THOUGH THE WORK ON THE PAPER OF THE FUTURE,
YOU CAN IMAGINE THESE KINDS
OF INTERACTIVE CHARTS THAT I WILL SHOW YOU
IN A SECOND IN A PAPER OF THE
FUTURE.
SO THE THING THAT WE ARE ADDING OR WHAT WE
CALL GRAPHICAL
OVERLAYS, THESE ARE VISUAL ELEMENTS THAT ARE
DESIGNED TO SUPPORT AND
FACILITATE THE PERCEPTUAL AND COGNITIVE PROCESSES
INVOLVED IN CHART
READING.
AND WHAT I WOULD LIKE TO DO IS JUST SHOW YOU
A DEMO OF WHAT WE
HAVE DONE.
SO LET ME SHIFT OVER TO THIS.
OKAY.
SO HERE UP TOP IS A STATIC CHART.
WE SCRAPE THIS FROM
THE WEB AND IT WAS JUST A SET OF PIXELS.
WE DECONSTRUCTED IT AND NOW WE
CAN ADD IN REFERENCE STRUCTURES.
SO I WILL PRESS THAT.
AND WE ADD IN
THESE DIVISIONS AND THESE DIVISIONS CAN HELP
YOU READ OFF THE VALUES OF
THE BARS.
THEY HELP YOU PROJECT THE END OF THE BAR BACK
ON TO THE AXIS
AND READ OFF THE VALUE.
I CAN CHANGE THE NUMBER OF DIVISIONS LIKE
THIS.
AND I CAN MAKE IT AN UNDERLAY TO GIVE IT LENS
PROMINENCE AND I CAN ALSO
MAKE IT INTERACTIVE.
SO HERE I CAN JUST PLACE MY REFERENCE LINES
AT THE
END OF THE BAR TO HELP ME FOR THAT ONE BAR.
WE CAN ALSO DO HIGHLIGHTING.
SO HERE I HAVE INTERACTIVE HIGHLIGHTS.
WHEN I HOVER OVER A BAR, IT
DE-SATURATES EVERYTHING ELSE SO THAT I CAN
FOCUS ON IT AND READ IT MORE
ACCURATELY AND SO ON.
HERE THESE ARE REDUNDANT ENCODINGS.
SO THE VALUE OF THE BAR IS
WRITTEN NEXT TO IT.
RIGHT?
AND I CAN NOW READ THE VALUE DIRECTLY.
AND
THEN FINALLY SUMMARY STATISTICS.
SO HERE WE HAVE ADDED A LINE THAT
REPRESENTS THE MEAN OF ALL OF THE VALUES.
WE CAN DO THE MAX, OF COURSE.
AND THE MEDIAN.
SO THESE KINDS OF ADDITIONAL OVERLAYS CAN
REALLY SUPPORT
DIFFERENT KINDS OF READINGS OF THE DATA.
WE STARTED FROM PIXELS, SO WE
CAN TAKE EXISTING VISUALIZATIONS AND ADD THESE
ON TO THEM.
SO THERE ARE THREE THINGS THAT I WANT TO LEAVE
YOU WITH.
AND
THAT I HOPE YOU WILL REMEMBER FROM THIS TALK.
FIRST, WE SHOULD THINK OF
CHARTS AS A COLLECTION OF MAPPINGS BETWEEN
DATA AND VISUAL VARIABLES OF
THE MARKS.
AND THE MOST EFFECTIVE VISUALIZATIONS MAP
IMPORTANT DATA TO
PERCEPTUALLY EFFECTIVE VISUAL ATTRIBUTES OR
VARIABLES.
SECOND, WE CAN USE IMAGE PROCESSING TECHNIQUES
AND ALSO CODE
PROCESSING TECHNIQUES TO RECOVER THIS REPRESENTATION
FROM IMAGES AND
PROGRAMMATIC CHARTS.
AND FINALLY, THIS RECONSTRUCTED REPRESENTATION
REALLY ALLOWS FOR MANY DIFFERENT TYPES OF
REDESIGN, REUSE AND
REVITALIZATION.
MORE IMPORTANTLY THAN ALL THREE OF THESE TAKE
AWAYS EVEN IS
THAT A GOOD REPRESENTATION REALLY SUPPORTS
NEW WAYS OF THINKING ABOUT
DATA.
AND THIS IS SOMETHING THAT ARTISTS AND ART
THINKS ABOUT A LOT.
THEY THINK ABOUT DIFFERENT KINDS OF REPRESENTATIONS
OF THEIR WORK.
AND
REPRESENTATION IS ALSO AT THE HEART OF SCIENCE.
AND SO, I THINK THIS IS
A DEEP POINT OF CONNECTION BETWEEN SCIENCE
AND ART, REALLY THINKING ABOUT
THE REPRESENTATION THAT YOU ARE USING AND
HOW THAT REPRESENTATION CHANGES
YOUR THINKING ABOUT IT WORK.
ALL RIGHT, I WOULD LIKE TO THANK THE PEOPLE
THAT MADE THIS WORK
POSSIBLE.
THE TWO PEOPLE PICTURES HERE WERE THE STUDENTS
THAT DID A LOT
OF THE HEAVY LIFTING THAT I SHOWED YOU. AND
FOR MORE INFORMATION ABOUT
ANY OF THESE PROJECTS, PLEASE CHECK OUT OUR
WEBSITE.
THANKS A LOT.
[APPLAUSE]
