I'M KAMIE ROBERTS, DIRECTOR OF THE NATIONAL
COORDINATION FOR NITRD AND I HAVE HAD A
FABULOUS TIME OVER THE LAST
DAY AND A HALF WITH ALL THESE
PRESENTATIONS.
I WANT TO TAKE A MINUTE REAL
QUICK TO THANK SUSAN, IF SHE
IS STILL AROUND, AT NIH, FOR
HER --
[APPLAUSE]
>> WILL GOFUL TO THAT MUCH
-- -- -- ALSO WANT TO THANK
THE ORGANIZING COMMITTEE.
AND ALL THE MODERATORS
WHO HAVE A BIG JOB TO DO AFTER
THIS BECAUSE THEY HAVE TO
WRITE OUT A ONE-PAGE REPORT ON
THEIR SPECIFIC AREA SO BE
INTERESTING TO GET BACK.
I WANT TO THANK THE NIH TECH
SUPPORT, THEY HAVE BEEN
FABULOUS GETTING THE
PRESENTATIONS UP AND MAKING
SURE EVERYTHING IS WORKING AND
JACKIE YESTERDAY AND ADRIAN
TODAY WHO SHOWED UP TO DO THE
TECH SUPPORT AND EVERYTHING WE
NEEDED.
THANK YOU, EVERYBODY.
SO WE CAN ALL AGREE YESTERDAY
THERE WAS AN AMAZING
PRESENTATION AND DIVERSITY OF
VIEWS COMING OUT AND
INTERACTION BETWEEN EVERYBODY
WAS REALLY FABULOUS.
THERE IS EVIDENCE OF PROGRESS
OF CONVERGENCE BUT STILL A LOT
OF WORK TO DO AND QUESTIONS AS
TO HOW MUCH WE NEED TO HAVE
WITH SOME OF THE THINGS I
HEARD TODAY.
DOES THERE REALLY HAVE TO BE
CONVERGENCE OR IS IT SORT OF
AN OVERLAY OR HOW DOES IT
WORK?
THERE IS LOTS TO DO SO THIS
MORNING'S BREAKOUT WILL DELVE
INTO THIS A LITTLE DEEPER SO
WE WILL BE THROUGH AND ASK THE
MODERATE RATERS RS TO GIVE US
INPUT.
AND 
AS PART OF MY JOB,
INSTITUTE OF OFFICE OF
TECHNOLOGY AT THE WHITE HOUSE,
THINGS YOU ARE COMING UP WITH
HERE THAT WE WILL DO THE
WORKSHOP REPORT ON WILL BE
READ AT THAT LEVEL.
THE AGENCIES WITH HEC AND BIG
DATA AND WE DO HAVE AN AI
THEY ARE INTER-AGENCY WORKING GROUP
NOW SO THEY WILL HAVE AN
OPPORTUNITY TO READ THIS AS
WELL SO GIVE US GOOD THINGS.
SO I THINK WEWILL GET STARTED
WITH OPERATIONS, THAT IS GOOD.
>> WE OBVIOUSLY HAD A COUPLE
OF BUREAUCRATS ON OUR TEAM
BECAUSE WE CAPTURED OUR THOUGHTS IN
POWERPOINT.
SO THIS WAS A GREAT SESSION.
I REALLY ENJOYED THE PANELIST
PRESENTATIONS YESTERDAY AND
THE DISCUSSIONS WE HAD TODAY.
IT WAS REALLY INTERESTING SO
THANK YOU TO ALL OF THESE WHO
PARTICIPATED IN THESE
DISCUSSIONS.
ONE THING THAT REALLY CAME OUT
TO ME WAS LOOKING THROUGH THE
LENSE OF THE HIGH PERFORMANCE
COMPUTING CENTERS WAS AN
INTERESTING WAY TO LOOK AT THE
SUBJECT OF CONVERGENCE BECAUSE
IN SOME SENSE, THAT IS WHERE
THE RUBBER MEETS THE ROAD AND
THEY ARE SEEING CONVERGENCE
AND HAVING TO DEAL WITH IT
ALREADY.
DID A GREAT SERVICE TO
OUR DISCUSSIONS BY LISTING A
NUMBER OF CASES WE HAD BEEN
TALKING AROUND SO THESE ARE
ARE ROUGHLY AS FOLLOWS.
FIRST OF ALL, THE USECASE THAT
COMES FROM LARGE SCALE
EXPERIMENTS.
THE HUGE AMOUNT OF DATA
STREAMING FROM LARGE SCALE
EXPERIMENTS WHERE
HI-PERFORMANCE COMPUTING IS NEEDED TO
DEAL WITH THE
DELUGE OF DATA COMING OFF OF
IT.
CASES WHERE MACHINE LEARNING
IS BEING DONE AND HAVE FINALLY
REACHED A SCALE WHERE HIGH
COMPUTING IS NEEDED.
THIS IS SIMULATION WHERE YOU
WANT TO INVENT SOME SORT OF
MACHINE LEARNING OR MACHINE
LEARNING THAT IS DRIVING AN
ENSEMBLE OF SIMULATIONS OR
MACHINE LEARNING IN SIMULATION
OR SIMULATION CAMPAIGN.
BIG DATA SETS REQUIRING HIGH
PERFORMANCE COMPUTING.
SO THIS IS OFTENTIMES NOT DUE
TO THE SCALE OF THE COMPUTING
NEEDED BUT SCALE OF THE DATA
AND SO THERE ARE OTHER
FEATURES OF THE ARCHITECTURE
YOU MIGHT BE EXPLORING FOR
THIS BIG DATA ANALYSIS AND
THEN THE COLOCATION OF DATA
AND COMPUTE, SO THE IDEA YOU MAY BE
SERVED AND WANT HIGH
COMPUTATION ALONGSIDE THAT AND
COMPUTING CENTERS ARE THE BEST
PLACE TO DO THAT.
SCALABILITY OF TOOLS AND
CAPABILITY OF MACHINE LEARNING
AI AND ANALYSIS SO CONVERGENCE
IN THE UNDERLYING SOFTWARE
STACK.
THE NEED FOR TRAINING OF NEW
USERS AND PEOPLE TO SUPPORT
THE NEW USERS AT THE HIGH
PERFORMANCE COMPUTING CENTERS.
INTERFACES AND ACCESS TO HIGH
PERFORMANCE COMPUTING SO
THERE WAS A LOT OF DISCUSSION
ABOUT HOW COMMAND LINES COULD
BE INTIMIDATING, THE BLINKING
CURSOR, FOR EXAMPLE, SO HAVING
MORE AVAILABLE FOR THE USES, A
CONCEPT OF USING HYPER
PERFORMANCE COMPUTING AND THEN
DATA CURATION CAME UP A LOT SO
NOT JUST ABOUT THE CYCLES BUT
TAKING CARE OF THE DATA THAT
IS DRIVING A LOT OF THE
SCIENCE.
AND THEN SOME THEMES COULD BE
CHALLENGING AND SORT OF THE
OVERARCHING THEME THAT CAME
OUT IS THE HYPER COMPUTING
CENTERS THAT COME OUT TOMORROW
WILL LOOK VERY DIFFERENT TO
WHAT THEY LOOK LIKE TODAY AND
CONVERGENCE IS LOOKED AT OVER
A NUMBER OF DIFFERENT ANGLES,
DISTRIBUTING WORKFLOW, TAKING
ON THE WORKFORCE FACILITY AND
STREAMING DATA FACILITY AND
HOW YOU MIGHT HAVE WORK FLOWS
WITH ANALYSIS GOING ON THE
EDGE THAT SPILLS OVER TO A
HIGH-COMPUTING LEARNING CENTER
AND THEN THE CONVERGENCE OF
MACHINE LEARNING AND
SIMULATION AND BIG DATA
ANALYSIS THAT IS HAPPENING
BASICALLY DRIVEN BY SIMULATION
DATA AND ANOTHER POINT IS THAT
THERE IS A NEED FOR MORE
COMPUTING AT THE EDGE BUT ALSO
SMARTER COMPUTING AT THE EDGE.
SO I GUESS I WOULD INVITE ANY
OF THE PARTICIPANTS TO LET ME
KNOW IF I HAVE MISSED ANYTHING
AND MAKE SURE IT GETS ADDED.
NEXT UP IS RANDY BRYANT
>> I WOULD LIKE TO THANK
MIKAELA AND HER GROUP FOR THE
WORK ON THE SOFTWARE AND
EXTRACTED IT ALL IN A GOOGLE
DOC SO I DON'T HAVE A
POWERPOINT.
I THINK ONE OF THE MORE
INTERESTING IDEAS THAT SEEMS
TO HAVE COME OUT IS THAT IT IS
NOT LIKE A 
BIG MACHINE.
THE REALITY IS WE WILL LIVE IN
AN ECOSYSTEM ARE ALL NEEDS ARE
THERE AND ALSO TO BRING IN, I
THINK WHAT KEITH BROUGHT IN
ABOUT EDGE COMPUTING AND THAT
WHOLE MODEL OF BRINGING DATA
IN FROM VARIOUS SOURCES,
HAVING THEM GO THROUGH VARIOUS
LEVELS OF PROCESSING UNTIL
THEY REACH SOME MORE
CENTRALIZED FACILITY FOR
STORAGE ANALYSIS IS AN
IMPORTANT MODEL THAT WE REALLY
SHOULD THINK OF EDGE COMPUTING
AND ALL WORKING TOGETHER ON
DIFFERENT PART OF THE PROGRAM
SO WITH THAT PERSPECTIVE, IT
GIVES YOU A PRETTY IMPORTANT
SET OF CHALLENGES THAT WE
NEED.
AND SO I THOUGHT ONE OF THE
REALLY INTERESTING IDEAS THAT
CAME OUT IS WITH HPC, WE HAVE
MPI AND MPI, WHAT IT GIVES YOU
IS ABILITY IN A SINGLE PLACE
TO SORT OF DESCRIBE THE ENTIRE
COMPUTATION.
SOMEWHAT FROM A TOP-DOWN POINT
OF VIEW MEANING THIS IS THE
OVERALL COMPUTATION, THIS IS
WHAT I AM TRYING TO DO, THIS
IS HOW IT IS PARTITIONED.
WHEREAS WITH MORE OF THE ETCH
COMPUTING STYLE OF WORK
CURRENTLY, IT IS A BOTTOM-UP
CONSTRUCTION WHERE YOU BUILD
THE SOFTWARE THAT WILL RUN ON
EDGE DEVICES, BUILD THE
SOFTWARE THAT WILL RUN ON THE
CORE, COBBLE THEM TOGETHER
WITH VARIOUS COMMUNICATION
PROTOCOLS AND IT BECOMES
BASICALLY A DISTRIBUTED SYSTEM
AND YOU HAVE TO WORRY ABOUT
FALL TOLERANCE, BANDWIDTH
ISSUES AND YOU END UP WITH
THESE AD HOC BUT POTENTIALLY
HIGHLY ENGINEERED SURFACES TO
GET IT TO WORK.
SO THERE IS MPI FROM THIS
MODEL, TOO, THAT GIVES YOU A
MORE TOP-DOWN PERSPECTIVE THAT
LETS HE TALK, WRITE CODE
PARTIALLY ON THE EDGE,
PARTIALLY IN THE CLOUD AND
PARTIALLY ON A LARGE MACHINE
AND GET THOSE TOGETHER TO
OPERATE AND DEAL WITH ISSUES
ABOUT PERFORMANCE, PERHAPS
ADJUST WHICH GOES ON IN WHICH
MACE AND MAKE THAT MORE
STRAIGHTFORWARD.
I THOUGHT THERE WAS SOME
REALLY INTERESTING -- PART OF
THE GOAL WAS TO COME UP WITH
THE SOLICITATION THAT THE NSF
COULD PRODUCE.
THIS WOULD DEFINITELY BE THE
SOURCE OF AN INTERESTING
SOLICITATION SO I THOUGHT THAT
WAS SOME VERY POWERFUL IDEAS.
THE OTHER INTERESTING THING I
THINK THAT CAME OUT WAS THE
WHOLE ISSUE OF PRODUCTIVITY.
RIGHT NOW THERE ARE THESE
VARIOUS FRAMEWORKS THAT HAVE
COME UP IN THE DIFFERENT
AREAS, ESPECIALLY IN THE DATA
ANALYSIS FRAMEWORK AND
SOMEWHAT MORE IN THE
SIMULATION PERSPECTIVE.
I GUESS THE GENERAL THINKING
IS THIS IS THE WAY THE WORLD
IS GOING TO WORK, THAT
SOFTWARE IS GOING TO BE MORE
KIND OF -- YOU WILL HIDE THE
COMPLEXITY OF MACHINES
UNDERNEATH THE FRAMEWORK AND
THAT IS VERY POWERFUL.
THE PROBLEM NOW IS IF I WANT
TO DO THESE CONVERGENCE KIND
OF THINGS WHERE I AM TRYING TO
MAKE BEST USE SOME AMOUNT OF
SIMILAR LAYING, SOME AMOUNT OF
MACHINE LEARNING AND THESE
DIFFERENT PART, NOW THE
FRAMEWORKS HAVE TO TALK TO
EACH OTHER AND THEY ARE NOT
REALLY DESIGNED FOR THAT RIGHT
NOW.
SO I THINK SORT OF HOW DO WE
DESCRIBE OR CREATE WORKFLOWS
THAT LET US CONNECT THESE
TOGETHER, MAKE SURE THAT THEY
HAVE APPROPRIATE APIS AND
INTERFACES THAT MAKES THIS
POSSIBLE, I THINK, IS ALSO
SORT OF A DIRECTION TO THINK
ABOUT FOR THE WHOLE ISSUE OF
SOFTWARE PRODUCTIVITY.
AND THEN FINALLY, I THINK AS
WE SAW HERE, I THINK THAT WE
NEED A BETTER UNDERSTANDING OF
USE CASES OR AT LEAST BETTER
COMPILATION OF WHAT ARE SOME
OF THE USE CASES THAT ARE
BEING LOOKED AT HERE BECAUSE I
FELT LIKE ESPECIALLY SOME OF
THE DISCUSSIONS YESTERDAY WERE
A LITTLE BIT, YOU KNOW, THE
DIFFERENT SIDE VIEWS OF AN
ELEPHANT, THAT EVERYONE HAD A
DIFFERENT THINKING ABOUT WHAT
KIND OF REALLY APPLICATIONS
THEY WERE TALKING ABOUT AND
GAVE A DIFFERENT PERSPECTIVE
AS A RESULT OF THAT.
THOSE ARE JUST SOME HIGH-LEVEL
THINGS BUT OF COWER HAD --
COURSE HAD A LOT MORE
DISCUSSION THAN THAT.
>> SO THE HARDWARE GROUP HAD
SOME REALLY GRACE DISCUSSIONS
AND I WILL TRY TO HIT ON THE
COMMENT SUGGESTIONS THAT
EMERGED BUT THOSE OF YOU WHO
WERE PRESENT, IF I FORGET
SOMETHING THAT YOU ARE
PARTICULARLY PASSIONATE ABOUT,
PLEASE REMIND ME.
SO I THINK THE -- THERE IS A
LOT IN COMMON WITH WHAT YOU
HAVE ALREADY HEARD, THERE
CERTAINLY IS A LOT OF
COMMONALITY THERE.
WE HAVE A LOT OF HYPOTHESES OF
WHAT IT IS ABOUT AND THE
ACTION IS TO GATHER MORE DATA
SO THE HARDWARE, OPERATING
SYSTEMS GATHERING INFORMATION
ON WORK FLOWS TO BETTER
UNDERSTAND HOW THIS IS
CURRENTLY BEING USED WOULD BE
REALLY IMPORTANT.
ALSO I THINK AN OPPORTUNITY TO
START THINKING ABOUT SOME
COMMON END-TO-END BENCHMARKS,
KERNEL BENCHMARKS SO THEY ARE
LOOKING AT THE ENTIRE
PERFORMANCE OF WORKLOAD, NOT
NECESSARILY ANY SINGLE FIGURE
OF MERIT THAT YOU CAN USE TO
SAY THIS SYSTEM IS BETTER THAN
THIS SYSTEM, RATHER THAT THEY
BE USED TO HELP EXPOSE
BOTTLENECKS AND OTHER
LIMITATIONS IN THE HARDWARE
SYSTEMS.
THERE ARE A NUMBER OF AREAS
WHERE WE FEEL THAT THE CURRENT
HARDWARE IS LIMITED IN BOTH
THE ABILITY TO SUPPORT APC AND
BIG DATA OR AI WORK LOADS SO
THESE INCLUDED THE
INTERCONNECT AND THIS IS
INTERCONNECT QUITE BROADLY.
SO THE NODE TO MEMORY, NODE TO
NOTE, GENERALLY THE
INTERFABRIC OF INTERQUESTIONS,
HPC AND BIG DATA WORK LOADS
ARE MEMORY BOUND SO LOOKING AT
INNOVATIONS IN INTEGRATING
MEMORY IN PROCESSING WOULD BE
POTENTIALLY QUITE
TRANSFORMATIVE.
IT IS ALSO RECOGNIZED THAT WE
DON'T MAKE EFFECTIVE USE OF
THE SYSTEMS THAT WE HAVE.
THERE ARE A NUMBER OF REASONS
WHY THAT IS CHALLENGING.
CERTAINLY SOME OF THE SOFTWARE
ISSUES AT THE SAME TIME,
GATHERING ACTUAL INFORMING
ABOUT HOW THINGS ARE
PERFORMING IS DIFFICULT.
SO ONE OF THE THINGS A LOT OF
US WOULD LIKE WOULD BE
PERFORMANCE COUNTERS BUT
PARTICULARLY FOR THE NETWORK
WHERE THAT SHARED DATA CAN BE
A SECURITY HOLE AS A SIDE
CHANNEL AND SO ONE SUGGESTIONS
WOULD BE TO ADDRESS THIS
PROBLEM AND FIGURE OUT WHAT IS
THE ACTUAL INFORMATION YOU
WOULD LIKE TO HAVE YOUR
PERFORMANCE ENGINEER AT THAT
LEVEL AND HOW TO DELIVER IT IN
A WAY THAT DOES NOT COMPROMISE
YOUR SECURITY.
WE DON'T HAVE A SOLUTION FOR
THAT RIGHT NOW.
AND THEN FILE SYSTEMS.
THIS ALSO, AS WITH MANY OF
THESE THINGS, CROSSES PARTLY
INTO THE SOFTWARE BUT THE
REALITY IS THAT THE FILE
SYSTEMS, HBC SYSTEMS TEND TO
RELY ON ARE INAPPROPRIATE FOR
HPC AS WELL AS BIG DATA AND
THERE HAS BEEN A LOT OF
PROGRESS IN DATA STORES
SYSTEMS WHERE GENERALLY IN THE
BIG DATA AREA AND TAKING
ADVANTAGE OF THAT, FINDING
WAYS TO MOVE FORWARD FROM AN
HBC SPACE BUT ALSO EXPLOITING
SOME OF THE THINGS THAT ARE
DONE IN BOTH OF THOSE WOULD BE
VERY BENEFICIAL IN BOTH HBC
AND BIG DATA.
THE OTHER THING THAT CAME UP
WAS THE INDIAN LAWYERING.
THAT WORKING PERFORMANCE,
THERE WAS DISAGREEANCE BUT THE
BREAKOUT 
WAS IF YOU ARE
WILLING TO SPEND THE MONEY AND
THE TIME, YOU CAN MAKE IT
WORK.
HOW DO YOU MAKE THE SYSTEMS
MORE PERFORMANCE-ROBUST
WITHOUT A LOT OF MANUAL
INTERVENTION AND THAT I THINK
IS AN OPEN QUESTION.
WE ALSO TALKED ABOUT POWER
EFFICIENCY, ACKNOWLEDGING THIS
IS A CHALLENGE IN THE
COMMERCIAL WORLD AND THEY WILL
BE ADDRESSING IT BUT WE
BELIEVE THEY WILL PROBABLY
ADDRESS IT IN A MORE
EVOLUTIONARY WAY SO THERE ARE
OPPORTUNITIES TO TAKE WHAT WE
MIGHT CALL A MORE
DISCONTINUOUS APPROACH AND
EXPLORE WHAT THAT MEANS.
AND THEN FINALLY, AGAIN, WE
EXPECT A RICH SPACE OF
HARDWARE.
WE DON'T THINK THERE WILL BE A
SINGLE SYSTEM THAT WILL DO
EVERYTHING BUT THE SYSTEMS
WILL NEED TO WORK TOGETHER
BECAUSE THE WORK LOADS TEND TO
BE QUITE BROAD AND WE NOTED
THAT SYSTEMS THAT AREN'T
REALLY DESIGNED FOR CERTAIN
DATA STRUCTURES AND METHODS SO
LOOKING AT THEM IN TERMS OF
THE PATHWAY RATHER THAN SAYING
THIS IS A BIG AI SYSTEM, IS A
MORE EFFECTIVE WAY TO THINK
ABOUT THE CONFIGURATION OF THE
HARDWARE TILL IT IS OPTIMIZED.
OKAY, SO FOR MY PARTNERS IN
THE BREAKOUT, ANYTHING ELSE
YOU WOULD LIKE ME TO ADD?
OKAY.
>> NOW WE'LL TAKE A QUICK
BREAK BEFORE TONY AND JUST ASK
IF THERE IS ANYTHING THE
AUDIENCE WANTED TO HEAR OR
DIDN'T HEAR OR SOMETHING YOU
HEARD AND THOUGHT OH, YEAH,
BUT THERE IS ALSO THIS THAT
MAY NOT HAVE COME OUT IN THE
BREAKOUT GROUP.
ANYBODY?
>> SO TALKING ABOUT
NETWORKING, WE FACE THAT ALL
THE TIME.
WE HAVE OPTICAL PILOTS AT --
OSHA.
WE HAVE MULTITEAR LEVELS
OPERATING AT THE SAME TIME,
RIGHT?
SO I THINK -- AND WE CREATED
SOMETHING CALLED THE AMAZON
LINK FIBER LINK SO ACROSS THE
LINK, THE WORLD ACTUALLY, YOU
CAN SEE THAT.
SO I SEE 
A BAD CONNECT WITH
MAJOR PROVIDERS AND OTHER
VARIATIONS SO I THINK THERE IS
A WAY THAT THESE KINDS AS YOU
MENTIONED CAN WORK.
[ OFF MIC ]
>> YES, SO THE WAY WE TRIED TO
ADDRESS THAT, THERE IS
SOMETHING CALLED BIG MAC WHERE
YOU BREAK OUT IP, PCB AND
STORE THE BITS OF INFORMATION
ITSELF AND WE CAN DISCUSS MORE
OF THE DETAILS -- SO NO LONGER
THE QUANTITY, YOU CALL IT
FUNDAMENTALS.
SO WE SHOULD TALK ABOUT --
[ OFF MIC ]
[OVERLAPPING SPEAKERS]
>> YEAH, ANOTHER CONNECTION
MADE.
ANYBODY ELSE HAVE ANYTHING
THEY WANT TO --
[ OFF MIC ]
>> -- MENTIONED ABOUT TOOLS TO
HELP WITH HBC, INTERFACES THAT
WOULD MAKE IT EASIER TO USE
THESE SYSTEMS.
I THINK THAT WAS WHAT YOU WERE
GETTING AT, RIGHT?
THERE ARE SOME TOOLS THAT ARE
AROUND AND AVAILABLE AND WE
CAN TALK ABOUT IT OFFLINE.
>> ANYBODY ELSE?
OKAY, I AM GOING TO WELCOME
TONY UP TO GIVE US HIS WISDOM.
>> RIGHT, YEAH, KNOW, IT HAS
BEEN AN INTERESTING MEETING
WITH LOTS OF PEOPLE KNOWING
LOT OF STUFF AND FROM THE
FIRST TALKS, YOU KNOW, IN THE
FIRST SESSION ON THE SCIENCE
AND JEFF SNYDER'S FASCINATING
TOOLS WITH UBER AND
SELF-DRIVING VEHICLES THAT
LOOK PRETTY INTIMIDATING TO ME
AND I DON'T THINK I WILL BE
GETTING INTO A SELF-DRIVING
VEHICLE WITHOUT A DRIVER SPARE
FOR SOMETIME.
AND THEN WE HAD PEOPLE LOOKING
AT THE SOFTWARE CHALLENGES AND
I THOUGHT THAT WAS EXTREME HEE
INTERESTING, TOO, JUST TO PICK
OUT TWO.
THERE WAS TALK ABOUT THE AI
INTELLIGENT CLOUD IN THE
MIDDLE BUT IT WAS CLEAR THAT
HPC DID NOT FIGURE ON HIS
DATABASE AT ALL AND AS WE
UNDERSTAND SCIENTIFIC
COMPUTING, THAT WAS NOWHERE TO
BE SEEN AND THAT WAS
ILLUSTRATED I THINK BY FRED'S
TALK THAT GAVE A VERY NICE
EXAMPLE OF WHAT WE ARE DOING
IN SCIENTIFIC COMPUTING ON
SYSTEMS YOU COULDN'T DO ON THE
COMMERCIAL CLOUD WITH THE
SOFTWARE STACKS THEY SUPPORT
SO I THINK THERE IS A
DISTENTION TO BE MADE AND
EXTREMELY INTERESTING.
AND THEN THE HARDWARE
CHALLENGES AND OPPORTUNITIES,
LOTS OF PEOPLE WITH EXPERIENCE
ON OPERATING THESE SYSTEMS WHO
KNOW ABOUT THE BOTTLENECKS,
THAT WAS AGAIN AN INTERESTING
SESSION.
AND THEN FINALLY THE
CHALLENGES IN MODES OF
SEPARATION AND STUFF LIKE
THAT.
SO WE HAVE HEARD ABOUT THOSE.
SO I JUST WANT TO MAKE A FEW
POINTS THEN IN TRYING TO SUM
OF MY IMPRESSIONS.
FIRSTLY, I THINK WE ARE SEEING
SOME CONVERGENCE OF HARDWARE
FOR HBC, MACHINE LEARNING AND
BIG DATA.
TO TAKE TWO EXAMPLES, ONE IS
McSTEVENS WHO WASN'T HERE,
HE IS COLLABORATING WITH NIH
ON CANCER PROJECTS AND HE HAS
THESE THINGS CALLED CANDLE
BENCHMARKS WHICH COVERS THREE
DIFFERENT AREAS OF THE CANCER
CHALLENGE AND THEY RUN THOSE
ON THE BIG MACHINES AT ARGON
AND NOW OAK RIDGE AND ARE
LOOKING FOR WHAT THESE THINGS
CAN GIVE THEM ON EXTRA
SCHEDULE SO I THINK THEY SHOW
US THEY CAN PERFORM ON A
VARIETY OF HBC AND CHALLENGING
LIMIT AND 
TO ME IS IMPRESSIVE
YOU CAN USE THE SAME
ARCHITECTURE AND ACTUALLY HAVE
IT BE FUNCTIONAL AND MY SECOND
POINT, I DON'T THINK HBR AND
SIGN -- SCIENTIFIC COMPUTING
ARE INTERESTING TO I.T.
COMPANIES.
YES, IT IS A SMALL PART OF
THEIR BUDGET AND YES, AMAZON
AND MICROSOFT AND GOOGLE DO
SOME RESEARCH BUT IN TERMS OF
THE SKIN IN THE GAME, IT IS A
VERY TINY PORTION SO THAT WAS
SORT OF WHAT I THINK WAS THE
DISCONNECT BETWEEN THE SESSION
AND THIS AGENDA WE HAVE
BECAUSE WE CARE ABOUT
SCIENTIFIC LINKING, THE
SYSTEMS YOU CAN'T EASILY
REPLICATE ANYWHERE ELSE.
THIRDLY, I LIKE VERY MUCH
KEITH BECKMAN'S VISION AND I
HAVE HEARD IT THREE TIMES AND
MAYBE AM GETTING BRAINWASHED
ABOUT CLOUD HBC AND MAYBE I
DON'T UNDERSTAND IT BUT
KIRSTEN WAS TALKING ABOUT
SENDING PART OF OUR
INSTRUMENTS TO THE CLOUD OR
THE HBC CENTER AND I THINK YOU
NEED TO HAVE AN UNDERSTANDING&
WHAT YOU CAN DO ON THE EDGE,
WHAT YOU CAN DO IN THE MIDDLE
WHICH PETE ON HIS SLIDE CALLED
THE FOG AND THEN IN THE HBC
CENTER.
SO I THINK WE HAVE A VISION
THAT ONE OF THE CHALLENGES FOR
THE AGENCY IS TO THINK ABOUT
THE FRAMEWORK WHERE YOU CAN
SUPPORT SOFTWARE
INFRASTRUCTURE AND SERVICES
THAT ENCOMPASS GOING FROM THE
EDGE TO CLOUD AND TO THE HBC
CENTER AND I THINK THAT'S A
REAL CHALLENGE AND A POSSIBLE
THEME FOR FUNDING AGENCIES.
NETWORKING, I MEAN, JUST TO
PICK UP BILL'S POINT, THIS IS
NOT NETWORKING WITHIN THE
SYSTEM, THIS IS NETWORKING
BETWEEN SITES FOR BIG DATA
TRANSFERS.
WE ROUTINELY TRANSFER
PETABYTES OF DATA FROM MY
CENTER, DONE RELATIVELY
QUICKLY AND USING WHAT WE
DIDN'T CALL THE SCIENCE DMZ TO
USE THE X-NET SYSTEM, PARTLY
BECAUSE THEY HAVE CREATED
THEIR OWN AND CALLED IT THE
FARWALL BYPASS.
[LAUGHTER]
>> THIS DIDN'T SEEM TO ME VERY
WISE SO WE ARE NOW CALLING IT
RESEARCH DATA TRANSFER ZONE
AND I WOULD DISTINGUISH
BETWEEN THE U.K. HAS VERY HIGH
BANDWIDTH BACKBONES BUT IT IS
REALLY FOR GENERAL USE FOR
UNIVERSITIES, THE END RUNS AND
THINGS LIKE THIS.
IT HAS NOT GOT THE FOCUS ON
RESEARCH COMPUTING END-TO-END
AND I THINK IT IS POSSIBLE BUT
YOU NEED EASY TOOLS TO EXPLORE
THE BOTTLENECKS.
I THINK IT IS GREAT THAT NSF,
FOR EXAMPLE, HAS FUNDED
EXPLORING DMZ'S THROUGH THE
UNIVERSITIES BUT THERE IS
STILL A LOT OF IGNORANCE OUT
THERE AND I SEE STUDENTS GOING
BACK TO THE UNIVERSITIES WITH
ARMFULS OF TERABYTE DISKS IN
SORT OF THE SITUATION OF
SNEAKING IT.
SO THERE IS A LOT TO BE DONE
AND HAVING USELY DEPLOYABLE
TOOLS TO FIND THE BOTTLENECK.
MAYBE YOUR SERVICE IS ALL
WRONGLY CON FIGURED, MAYBE IT
IS MICROSOFT'S FAULT AND THAT
IS POSSIBLE BUT YOU NEED TO
MAKE IT EASILY ABLE TO BE
FOUND OUT AND I THINK YOU CAN
ENCOURAGE WORK IN THE OTHER
AREA.
THE OTHER THING I CARE ABOUT
THAT WAS MENTIONED, NIH AND
OTHER CENTRAL ORGANIZATIONS
COVERING THE SEARCHES TO
PUBLISH FAIR DATA.
THAT IS FINDABLE, ACCESSIBLE,
INTEROPERABLE AND REUSABLE AND
EXACTLY WHAT THAT MEANS AND
HOW YOU IMPLEMENT IT IS RATHER
IMPORTANT AND THE FARE GROUP
TALK ABOUT MACHINE ACTIONABLE
METADATA A AND HOW YOU
IMPLEMENT THAT AND PUT IN SOME
SEMANTICS .
WHEN I WAS AT MICROSOFT, THE
ONLY THING MICROSOFT OR GOOGLE
AGREED ON WAS A THING CALLED
SCHEMER.ORG, A WAY THAT YOU
CAN PUT A LITTLE SEMANTIC
INFORMATION IN A WEBSITE SO IF
YOU ARE LOOKING FOR CASA
BLANCA, THE SEARCH ENGINE WILL
KNOW THIS IS THE TOWN WHERE I
AM SEARCHING FOR THE MOVIE.
SO YOU CAN PUT A LITTLE
SEMANTIC INFORMATION.
AND SCHEMER.ORG SEEMS A WAY TO
ENCOURAGE COMMUNITIES TO ADD
THEIR CATEGORIES TO
SCHEMER.ORG AND DO IT THAT
WAY.
THAT IS SOMETHING THAT CAN BE
EXPLORED ALTHOUGH IT MAY NOT
BE THE BEST WAY BUT IT IS
SOMETHING THAT IS ALLOWABLE IN
I.T.
AREAS.
DATA IS AN IMPORTANT AREA AND
REQUIRES A WAY OF LOOKING AT.
THE LAST SIX POINTS, USABILITY
AND FAIR BUT THERE IS ANOTHER
R WHICH IS REPRODUCIBILITY AND
THAT IS ALSO IMPORTANT AND
WHEN YOU DO COMPUTATIONAL
SIGNS, IT IS A LITTLE MORE
COMPLICATED BECAUSE A YOU CAN
ATTACK A SCIENTIFIC PROBLEM
USING THIS ALGORITHM OR THAT
AL GORITHM, THEY ARE SUBTLE
BUT YOU NEED ACCESS TO THE
SOFTWARE SO HAVING THE
SOFTWARE LINKED TO YOUR DATA
IS IMPORTANT.
HOW MANY PEOPLE HAVE HEARD OF
OCITY?
FOR THOSE WHO HAVEN'T, IT WAS
SET UP IN 1947 IN RESPONSE TO
THE THEN-PRESIDENT ROOSEVELT
WHO RECOGNIZED ALL THE SCIENCE
THAT HAD BEEN DONE IN THE
SECOND WORLD WAR, NOT JUST OF
THE MANHATTAN PROJECT BUT
OTHER STUFF IN RADAR AND
EVERYTHING ELSE AND HE WANTED
JOBS AND BUSINESSES CREATED
FOR RETURNING SOLDIERS TO THE
POPULATION SO HE ASKED BUSH
WHO SET UP SOMETHING AND
GENERAL GROVES WHO DID THE
SECURITY FOR THE MANHATTAN
PROJECT, NOT SURE HE DID IT
VERY WELL BECAUSE STALIN AND
BERIA A HAD TWO COPIES OF THE
PLANS BUT SETTING UP THE
TECHNICAL INFORMATION WHICH
WAS TO DISTRIBUTE THE RESULT
IN AN OPEN WAY OF ALL FUNDED
RESEARCH PROJECTS THAT WERE
NOT CLASSIFIED AND I THINK
AGENCIES LIKE THAT CAN HAVE A
ROLE IN FAIR DATA AND ALSO THE
SOFTWARE AND REPRODUCIBILITY.
SO I DON'T KNOW QUITE HOW THAT
TRANSLATES INTO AN ACTION BUT
THAT IS SOMETHING FOR THE
AGENCY TO THINK ABOUT.
THE LAST THING I WOULD LIKE TO
COMMENT ON IS ABOUT DATA
SCIENCE EDUCATION.
I MEAN, I THINK IT IS
IMPORTANT.
WE HAD A DISCUSSION ABOUT
WHETHER YOU SHOULD TEACH
OPERATING SYSTEMS IN A
COMPUTER SCIENCE COURSE
NOWADAYS.
SEEMS TO BE INCREDIBLE YOU
WOULDN'T BUT I CAN SEE THAT
WRITING A KERNEL OPERATING
SYSTEM IS ACTUALLY A RATHER
SPECIALIST SKILL SO MAYBE WE
NEED TO RETHINK ALL OF THOSE.
BUT IN DATA SCIENCE, I WOULD
DISTINGUISH AT LEAST THREE
DIFFERENT ROLES.
THERE IS THE DATA ENGINEER,
THAT IS THE PERSON WHO GET THE
DATA FROM THE SATELLITE AND
DOES ALL THE SORT OF
CALIBRATION AND PUTTING THE
PATCHES TOGETHER TO BUILD UP A
DATA SET WHICH YOU CAN THEN
HAND OVER TO THE DATA
ANALYSTS.
SO THE DATA ENGINEER IS THE
PERSON WHO HAS THE SKILLS TO
GO AND GET THE DATA FROM THE
INSTRUMENTS, WHETHER IT IS A
SATELLITE OR A NEW TRANSSOURCE
AND PUT IT INTO A FORM THAT
SCIENTISTS CAN BEGIN TO USE.
THE DATA AN MIST PEOPLE WHO
CAN ACTUALLY GET RESULTS AND
NEW SCIENCE OUT OF THAT AND I
WOULD DISTINGUISH TWO TYPES.
ONE IS LIKE APPLIED MACHINE
LEARNING IN AI AND THAT IS IS
ACTUALLY WHAT I AM TRYING TO
SET UP AT MY GROUP IN THE UK
AT THE LAB.
WE WOULD LIKE TO FIND OUT HOW
THE EXISTING MACHINE LEARNING
ALGORITHMS WORK ON THE VARIOUS
ASPECTS OF DATA AND YOU WILL
FIND THERE ARE SOME GAPS WHERE
SOME WORK WELL, SOME DON'T
WORK SO WELL AND YOU WILL
FIND, I BELIEVE, AN AGENDA
INTO DOING RESEARCH INTO AI
WHICH IS A DIFFERENT ROLE AND
RATHER DIFFERENT AN EXISTING
AND THE QUESTION IS CAN YOU
MAKE THE TOOLS USABLE FOR
ORDINARY MORTALS WHERE THEY
WON'T MAKE FOOLISH MISTAKES
ABOUT APPLYING THESE METHODS
OR DEEP LEARNING TO THEIR DATA
AND I THINK THOSE ARE
INTERESTING CHALLENGES AND
EXACTLY HOW YOU EXPLORE THOSE
IN AN AGENCY.
I AM NOT QUITE SURE BUT DATA
SKILLS ARE WHERE WE NEED MORE
PEOPLE IN THE U.S. AND
ELSEWHERE TO FILL THE NEED FOR
MANAGING THESE HUGE AMOUNTS OF
DATA COMING IN AND THE GOOD
THING FROM MY PERSPECTIVE IS
YOU CAN TEACH PEOPLE IN THE
UNIVERSITY CONTEXT AND THEY
CAN HAVE AN ACADEMIC CAREER
AND THEY ARE ALSO EMPLOYABLE
IN THE WIDER POPULATION.
SO A WIDE RANGING CONFERENCE
AND GIVING ME LOTS TO THINK
ABOUT SO THANK YOU TO THE
ORGANIZERS.
THANK YOU.
KENNY.
>> OKAY, WE HAVE TWO MINUTES.
ANY OTHER COMMENTS, QUESTIONS?
ANY QUESTIONS, COMMENTS?
PETER?
>> YES, I 
FIND MYSELF THINKING
IF SOMEONE GAVE YOU TEN
SECONDS, WHAT WOULD YOU SAY AS
A THE POTENT PULL, WHAT IS THE
PROBLEM ON THE INTERFACE
BETWEEN TRADITIONAL HEC AND
BIG DATA, IF YOU WANT TO CALL
IT THAT?
FOR EXAMPLE, OUR WORKING GROUP
I POSED IN THE HARDWARE AND
YOU CAN HAVE SPECIAL HARDWARE
LIKE YOU HAVE A BUS TO
TRANSPORT PEOPLE AND AN
AUTOMOBILE FOR ONE PERSON, NO
ONE QUESTIONS THAT SO MAYBE WE
DON'T HAVE A PROBLEM AT THE
INTERFACE.
BUT ANY WAY, IF ANYBODY HAS
SOME SORT OF SUMMARIZING
COMMENT ON THAT?
>> IN THE HARDWARE SESSION
THERE WAS TALK OF NEW CHIPS
COMING OUT AND I THINK YOU
WILL SEE THAT DEPLOYED AS MUCH
AT THE EDGES AS IN THE CENTER
AND I THINK THE INTERESTING
CHALLENGE IS FINDING OUT WHAT
YOU CAN DO AND SHOULD DO AT
THE EDGE, TRANSFERRING ALL THE
DATA.
SO I THINK THE EDGE IS A VERY
INTERESTING AREA TO EXPLORE.
>> OKAY, ANYBODY ELSE?
>> I HAVE A TAKE ON THAT, TOO.
IT IS ALL POSSIBLE BUT REALLY
HARD RIGHT NOW, ESPECIALLY THE
INTEGRATION OF THE EDGE AND
CONNECTED TO THE CLOUD.
SO I THINK IT SORT OF COMES
DOWN TO A PRODUCTIVITY
PROBLEM, WHICH IS CAN WE MAKE
THIS SO IT IS MORE EASILY DONE
AND THEREFORE WE CAN ENABLE
MORE PEOPLE TO DO IT FOR MORE
APPLICATIONS?
>> AND ALSO, YOU KNOW, PEOPLE
HAVE DEMONSTRATED THESE BIG
MACHINES CAN DO SOME PRETTY
WICKED MACHINE LEARNING
APPLICATIONS.
IT IS THE OLD YOU CAN CRUSH A
LOT OF FLIES WITH A HAMMER,
YOU DON'T NEED FLY SWATTERS
ANYMORE.
BUT YOU KNOW THE QUESTION
BECOMES SORT OF A -- AGAIN, AN
ECONOMIC MODEL, CAN WE DELIVER
THESE CAPABILITIES AT A LOWER
COST THAN THE BIG IRON
MACHINES COST?
>> OKAY, OH, WE HAVE ONE MORE?
>> THANK YOU.
SO IN CASE ANYONE IS
INTERESTED, WE JUST ABOUT AN
HOUR AGO ANNOUNCED OUR NEXT
SYSTEM AT NURSE WHICH WILL BE
CALLED PERIMETER AND IT IS A
SYSTEM THAT IS EXPLICITLY
DESIGNED TO HANDLE BIG DATA
WORK LOADS.
WE HAVE SOME TECHNICAL
INNOVATIONS THAT WE HAVE
EMPLOYED FOR THESE WORK LOADS
THAT I WAS NOT ALLOWED TO TALK
ABOUT BEFORE.
NOW THAT WE CAN TALK ABOUT IT
PUBLICLY, I CAN TALK TO YOU
ABOUT IT BUT I AM COGNIZANT
THAT IT IS MIDDAY AND I DON'T
WANT TO TAKE UP THE TIME IN
THIS MEETING.
>> CON
>> CONGRATULATIONS.
>> GOOD TIMING.
OKAY, WELL THANKS VERY MUCH,
WE SHOULD HAVE THE WORKSHOPS
OUT IN ABOUT A MONTH AFTER THE
MODERATERS HAVE THEIR PAPERS
TOGETHER AND WE SYNTHESIZE AND
DECIDE ON HOW THE AGENCY WILL
SUPPORT THAT.
THANK YOU VERY MUCH, I
APPRECIATE IT AND HAVE A GOOD
REST OF YOUR DAY.
