>> THANK YOU EVERYONE. GOOD MORNING. MY NAME
IS BISHON SINGH.
THANK YOU ALL FOR JOINING THE THIRD DATA SCIENCE
COLAB DEMO DAY.
WE NEVER THOUGHT WE WOULD LAUNCH THIS ROUND
OF CO LAB.
WE HAVE TO HOST A DEMO DAY VIRTUALLY, HOWEVER,
I'M PLEASED TO SHARE WE HAVE A RECORD AUDIENCE
AND THE NEXT SPEAKER WILL TELL YOU ABOUT.
I GUESS I OWE COVID-19 A SMALL THANKS.
WITH THAT SAID, I HOPE YOU ARE STAYING HEALTHY
FROM THE SAFETY OF YOUR HOMES.
I'VE HAD THE PLEASURE OF BEING THE CO-LAB
CO DIRECTOR FOR THIS PAST ROUND.
HAVING BEEN A GRADUATE IN THE FIRST COHORT
A FEW YEARS AGO, COLAB IS NEAR AND DEAR TO
ME SO I'M EXCITED FOR YOU TO SEE THE NEXT
EDITION TO HHS PORTFOLIO OF DATA SCIENCE TOOLS
WHICH ARE PARTICIPANTS BUILT DURING THEIR
EIGHT WEEKS OF THE COLAB BOOTCAMP.
A COUPLE QUICK HOUSEKEEPING NOTES BEFORE WE
GET STARTED.
PRESENTERSUNMUTE YOURSELVES WHEN IT'S YOUR
TURN TO PRESENT AND MY COLLEAGUE, RACHEL MELLOW,
WILL BE KEEPING TRACK OF YOUR TIME SO YOU
WILL HEAR FROM HER BEFORE YOU BEGIN TO MAKE
SURE YOU ARE GOOD TO GO AND ALSO, IF YOU RUN
OVER TIME.
AUDIENCE MEMBERS, THIS EVENT IS BEING RECORDED
AND LIVE STREAMED.
IF YOU HAVE ANY ISSUES WITH THE WEB EX, SWITCH
OVER TO THE LIVESTREAM AT THE HHC LIVESTREAM
CHANNEL.
THE LINK SHOULD BE IN YOUR E-MAIL IF YOU REGISTERED
FOR THE EVENT.
I SENT IT OUT YESTERDAY.
AFTER THE MEETING, A RECORDING WILL BE AVAILABLE
ON HHS YOUTUBE CHANNEL SO FEEL FREE TO SHARE
AND WATCH AGAIN.
IF YOU HAVE ANY QUESTIONS, ASK THEM IN THE
WEBEX Q&A AND THEY CAN RESPOND TO YOU DIRECTLY
AND OTHERS CAN ALSO SEE YOUR Q&A, WHICH MAY
BE INFORMATIVE.
FINALLY, LIVE TWEET WITH US AT HHH AND USE
THE HASHTAG COLAB DEMO DAY.
I'LL TURN IT OVER TO SANJAY, THE EXECUTIVE
DIRECTOR OF INNOVATION AND OFFICE OF THE CTO
WHO GOT TO REIMAGINE HHS DATA INITIATIVES
WHERE A DATA SCIENCE CO LAB WORKS.
>> SANJAY, GO RIGHT AHEAD.
>> GREAT, THANK YOU, CAN YOU HEAR ME OK?
>> GOOD MORNING AND THANK YOU FOR JOINING
US OUR DEMO DAY AND AND I'M THE EXECUTIVE
DIRECTOR AND I WANT TO WELCOME COLLEAGUES
FROM HHS, OTHER FEDERAL AGENCIES, AND OTHER
INTERESTED PARTNERS FROM ACROSS THE COUNTRY.
FROM WHAT I UNDERSTAND, THIS IS OUR LARGEST
ATTENDANCE YET.
I THINK YOU WERE TELLING ME IT'S JUST SHORT
OF 2,000 PEOPLE REGISTER.
SO THIS IS REALLY TERRIFIC TO HAVE SO MUCH
INTEREST IN THE WORK WE'RE DOING AND SOME
OF THE OF THE RESULTS FROM THE DEMO DAY.
I WANT TO ALIGN COLAB WITH HHS' LARGER REIMAGINE
DATA SHARING STRATEGIES SO I'M GOING TO GO
THROUGH A FEW SLIDES THERE.
IT'S BEEN A COLLABORATIVE PARTNERSHIP BETWEEN
OUR ASSISTANT SECRETARY OF ADMINISTRATION,
OFFICE OF BUSINESS MANAGEMENT TRANSFORMATION,
A KEY PLAYER IN THAT AND ALONG WITH AND I
REALLY WANT TO RECOGNIZE OR EXCEPTIONAL WORK
AND ALL THE STUFF THEY'VE DONE BEHIND THE
SCENES TO MAKE THIS STUFF HAPPEN.
>> THANK YOU.
SO REIMAGINE HHS IS A ROBUST TRANSFORMATION
EFFORT TO IMPROVE HOW THE DEPARTMENT OPERATES
AND IMPROVES ITS EFFICIENCY TO SERVE THE AMERICAN
PUBLIC.
ONE OF 10 INITIATIVES IS CALLED DATA INSIGHTS
INITIATIVE AND IT FOCUSES ON IMPROVING HOW
WE DO DATA SHARING ACROSS THE DEPARTMENT SO
THIS IS JUST A GENERAL FRAMING TO IMPROVE
HOW WE AT HHS AND BEYOND SHARE, INTEGRATE
AND FEDERATED DATA TO BETTER INFORM POLICY-MAKING
AND OUR EVIDENCE-BASED DECISION-MAKING AND
THE REASON WHY WE REALLY EMPHASIZE FED RATE
AND WE HAVE INCREDIBLE PROGRAMS AT HHS FROM
FDA TO CMS TO CDC AND THEY'VE ALL GOT THEIR
DATA SYSTEMS AND THEY DRIVE THE WORK THEY'RE
DOING BUT WE'RE TRYING TO CREATE A BRIDGE
AND LINKAGES BETWEEN THAT DATA SO THAT WHEN
YOU LOOK AT OPIOIDS AND OTHER HEALTH AND HUMAN
SERVICE ISSUES, WE CAN BRING THAT DATA TOGETHER
IN A LIGHT-WEIGHT WAY TO REALLY HELP ANSWER
QUESTIONS THAT YOU CAN'T REALLY DO INDEPENDENTLY.
THIS REPRESENTS SOME OF THE WORK WE'VE DONE
TO GET TO THIS POINT.
WHEN WE STARTED THIS, WE DID IT WITH DISCOVERY.
IN 2018, HE INTERVIEWED HHS STAKEHOLDERS AND
DEVELOPED A REPORT ON DATA SHARING AND OPPORTUNITIES
FOR IMPROVEMENT.
IF YOU GO TO OUR WEBSITE ON HHSCTO AND WE
INFORMED HOW NOTMENTED OUR DATA SHARING STRATEGY
AND WHERE YOU GET IT RIGHT NOW IT IS OUR PROOF
OF CONCEPT PHASE WHERE WE'RE BUILDING IN -- WE'RE
BUILDING OUT A ENTERPRISE WIDE DATA SHARING
PLATFORM AND THE PRO PRE AT GOVERNANCE AND
OTHER SYSTEMS IN PLACE THAT ENABLE US TO DO
THAT DATA SHARING.
IF GO TO THE NEXT SLIDE, I WILL HIGHLIGHT
THE KEY POEM CON ANTS OF OUR WORK AND WHERE.
IT HAS FOUR KEY PARTS.
IF YOU LOOK AT THE FAR-RIGHT, WE'RE DEVELOPING
A CLOUD-BASED COLLABORATIVE ENVIRONMENT FOR
DATA SHARING AT HHS SO HEALTHDATA.GOV IS OUR
REFERENCE CATALOG FOR ALL THE PUBLICLY DATA
WE HAVE.
WE'RE CREATING SOMETHING INTERNALLY THAT ENABLES
US TO LOOK AT THE NON PUBLICLY AVAILABLE DATA
TO ANSWER IMPORTANT QUESTIONS.
IF YOU GO UP TO THE TOP, THIS IS DRIVEN ON
HEALTH AND HUMAN SERVICE ISSUES TO HELP THE
PLATFORM BUILD OUT AND DEMONSTRATE THE IMPACT
OF THE SYSTEM IMPROVING OUR DECISION-MAKING
OR OUR POLICIES AND SO WITH THE PLATFORM,
IT'S REALLY IMPORTANT THAT OUR USERS DRIVE
IT BOTH IN FUNCTIONALITY BUT ALSO TO MAKE
SURE IT'S DOING WHAT IT NEEDS TO DO TO HELP
IMPROVE OUR DECISION MAKING AND POLICY MAKING.
IF YOU GO DOWN, YOU LOOK AT OUR DATA GOVERNANCE,
WHICH IS HOW WE CREATE OUR DATA STANDARDS
AND HAVE THE RIGHT LEVEL OF OVERSIGHT AND
FIT INTO THE EXISTING DATA SYSTEMS AT THE
DEPARTMENT AND TO THE REST YOU SEE THAT WE
HAVE OUR CO LAB TO TRAIN AND ENABLE MORE TAP,
INTERESTED IN DATA SCIENCE TO GET HANDS ON
EXPERIENCE AND EVENTUALLY DRIVE PROJECTS THROUGH
OUR PLATFORM AND THROUGH THEIR OWN DATA-DRIVEN
WORK.
SO THAT'S HOW WE FRAMED THIS OUT.
YOU WILL SEE MORE OF THAT IN THE FUTURE STATE
REPORT.
GO TO THE NEXT SLIDE.
THIS BASICALLY IS HIGHLIGHTS.
YOU WILL HEAR FROM THE PROGRAM MEMBERS WHICH
IS MUCH MORE INFORMATIVE.
BASICALLY, OUR GOAL HERE WAS TO CREATE AN
OPPORTUNITY TO UPSCALE OUR WORKFORCE AND DO
IT HANDS ON AND IT WAS MUCH MORE DON JUST
LEARN.
SO WE HAD THE BOOTCAMP SPONSORED BY HHS AND
30 STUDENTS PER COHORT AND YOU ARE HEARING
FROM OUR LATEST COHORT.
A KEY THING THAT WE WANTED TO EMPHASIZE HERE
WAS THE ICEBERG IS THAT THIS IS A NICE REPRESENTATION
OF WHAT WE CURRENTLY ANALYZE AND THERE'S DATA
BELOW THE SURFACE THAT IS IN SOME CASES YOU
MIGHT CALL DARK DATA OR DATA THAT SUN STRUCTURED
AND STUFF WE MAY NOT KNOW ABOUT THAT WE NEED
TO DO TO DO MUCH BETTER EVIDENCE-BASED DECISION-MAKING
AT THE DEPARTMENT AND SO, CO LAB IS ONE OF
THOSE FACETS THAT HELPS US TO GET THERE.
THAT BEING SAID, NOW TYING THIS BACK TO OUR
LARGER REIMAGINE EFFORT, I WILL TURN IT BACK
OVER AND THANK YOU.
>> IT LOOKS LIKE YOU'RE MUTED.
>> THANK YOU, RACHEL.
OUR NEXT SPEAKER IS WILL BRADY.
THE CHIEF-OF-STAFF TO THE DEPUTY SECRETARY
AND THE DEPARTMENT OF HHS.
AS CHIEF-OF-STAFF, HE ADVISES ON PROGRAM POLICY
MATTERS AND ASSIST THE DEPUTY SECRETARY IN
THE MANAGEMENT AND OPERATIONS OF THE DEPARTMENT.
IN ADDITION, HE IS RESPONSIBLE FOR STRATEGIC
POLICY INITIATIVES AND FOCUSED ON INNOVATION,
DEREGULATION AND HEALTHCARE FINANCING.
TAKE IT AWAY.
>> THANK YOU.
SANJAY AND THE TEAM, THANK YOU FOR HAVING
ME HERE TODAY.
I'M HAPPY TO SHARE SOME THOUGHTS.
I THINK YOU GUYS HEARD ABOUT MY BACKGROUND
AND A LITTLE BIT MORE SPECIFICALLY, A LOT
OF THE REGULATORY AND POLICY WORK THAT I'VE
DO AND FOCUS ON IS FOCUSED ON INNOVATION AND
SPECIFICALLY THINGS LIKE DEREGULATION, INTER
OPERABILITY, TELEMEDICINE AND A COUPLE OTHERS.
ONE OF THE THEMES OF ALL OF THOSE EFFORTS
IN EVERYTHING WE DO IS THAT THEY ARE DATA
DRIVEN.
SO, I THINK IT'S SO IMPORTANT WHAT THE DATA
SCIENCE COLAB DOES OVER ALL BUT IT'S BECOMING
EVEN MORE IMPORTANT UNDER THE CURRENT, YOU
KNOW, COVID-19 PANDEMIC WITH WHICH WE'RE ALL
WORKING THROUGH AND TRYING TO SURVIVE AND
THRIVE IT.
SO, WHEN WE THINK ABOUT WHY DATA IS IMPORTANT,
IT'S NO SECRET THAT EVERYDAY, WHETHER WE SEE
IT ON THE NEWS OR WE DEAL WITH IT IN OUR EFFORT
WORK, IT'S ALWAYS IMPORTANT TO USE DATA BUT
IT'S ALSO A STRUGGLE TO FIND IT AND FIGURE
HOW TO MANIPULATE AND GAIN THE INSIGHTS WE
NEED.
AND SO, I CAN'T STRESS ENOUGH HOW IMPORTANT
IT IS FOR EVERYBODY TO LEARN HOW TO READ,
UNDERSTAND, ANALYZE AND WORK WITH DATA.
AS YOU'VE HEARD SOME OF THE CONVERSATIONS
SO FAR THERE'S THE QUESTION OF YOU KNOW, WHY
IS ENTERPRISE DATA SHARING A STRATEGIC ASSET
AND SO, SOME OF THE BASIC REASONS OF WHY WE'RE
DOING IT IS RECENT LEGISLATION THAT REQUIRES
US TO BE MORE STRATEGIC ON HOW WE USE DATA
AND ONE OF THOSE BEING THE FOUNDATION TO THE
EVIDENCE-BASED POLICY MAKING ACT OF 2017.
SO AS SOMEONE MIGHT BE FAMILIAR WITH THE ACT
IT ACTUALLY REQUIRES US TO HAVE A SYSTEMATIC
RETHINKING OF GOVERNMENT DATA MANAGEMENT TO
BETTER FACILITATE ACCESS FOR EVIDENCE-BUILDING
ACTIVITIES AND SO, YOU KNOW, THERE'S ONE OF
THE REASONS WHY DATA IS SO IMPORTANT IS IT'S
NOW PART OF OUR MISSION IN STATUTE AS WE'VE
BEEN DIRECTED.
UNDERSTANDING THIS NEW GOAL TO HAVE A SYSTEMATIC
RETHINKING OF GOVERNMENT DATA IT FOCUSES ON
USING THINGS LIKE ARTIFICIAL INTELLIGENCE
AND DIRECT AGENCIES TO USE FEDERAL DATA AND
MAKE THE MODELS AND THE RESOURCES AVAILABLE
NOT JUST FOR US INTERNALLY BUT THE RND EXPERTS.
THE FIRST THING, IT'S IMPORTANT IF YOU ARE
NOT AUTHORITY IT'S NOT JUST AUTHORITY BUT
WE'RE OBLIGATED TO USE DATA IN THIS WAY.
FOLLOWING THAT AND PRIOR TO COVID-19, HHS
IS THE FOREFRONT OF RECOGNIZING THE VALUE
OF DATA AND LOOKING AT EVERY POSSIBLE WAY
TO USE DATA TO MAKE BETTER DECISION AND COME
UP WITH BETTER IDEAS AND DRIVE BETTER SOLUTIONS
AND OPTIMIZE AND WE'VE DONE EVERY SINGLE ONE
WHETHER OUR POLICY DEVELOPMENT ACTIVITIES
AND WE'VE TRIED TO DO IT INTERNALLY AND SHARE
DATA EXTERNALLY TO ALLOW THE PRIVATE SECTOR
AND OUTSIDE GOVERNMENT THRIVE AS THEY DRIVE
THAT OCCURS WHICH WE KNOW.
IT'S AN IMPORTANT THING TO JUST RECOGNIZE
AND SHOW IT'S DEVELOPED IN THE ABILITY TO
UNDERSTAND DATA AND NOT JUST HAVE IT BUT ALSO
SHARE IT AND EDUCATE PEOPLE ON IT AND TEACH
IT AND SHOW THE INSIGHTS.
IT HAS ALREADY SHOWN SO MANY SUCCESSFUL -- PROVIDED
SO MANY SUCCESSES ACROSS HHS AND ELSEWHERE
WHETHER YOU LOOK AT THE CANCER AT NIH OR SOME
OF THE THINGS THAT HAVE JUST MORE RECENTLY
HAPPENED RELATE TODAY COVID-19.
IT'S DATA INTERNALLY AND EXTERNALLY.
WHILE WE'VE MADE SIGNIFICANT PROGRESS COLLECTING
AND LEVERAGING DATA FOR POLICY AND OPERATIONAL
DECISIONS, WE'RE RECOGNIZING WE NODE TO CONTINUALLY
WORK TO STRENGTHEN DATA SHARING ACROSS THE
ENTERPRISE.
NOT JUST DATA SHARING BUT DATA KNOWLEDGE AND
DATA HYGIENE AND ALL THOSE THINGS THAT JUST
ARE FUNDAMENTAL TO USING DATA TOWARDS THAT
AND THE PAST FEW YEARS WE HAD INTERNAL DATA
AND WORKING ACROSS IT ITS AGENCIES TO DOCUMENT
A FUTURE STATE.
THEY'RE IN THE PROCESS OF DEVELOPING AN OPEN
SOURCE, CLOUD-BAYS SHARING TO IMPROVE ACCESS
TO DATA INSIGHTS AND IN THE ABILITY TO HAVE
DATA DRIVING DECISION-MAKING AND IMPROVEMENT
AND I THINK ONE THING THAT I SHARE JUST IN
THE PAST TWO OR THREE MONTHS SINCE COVID-19
IS REALLY TAKEN OFF, ONE OF THE THINGS WHERE
TO ME DATA HAS BECOME IS THE PROVIDE A RELIEF
FUND WHICH IS SOMETHING I'M PART OF A LEADERSHIP
FROM OUT OF THE DEPUTY SECRETARY OFFICE WHERE
WE ASKED BY CONGRESS TO DELIVER AND DISTRIBUTE
$175 BILLION TO PROVIDERS IS A LIFELINE TO
KEEP THEM MOVING AS THEY'VE HAD TO SHUT DOWN
THEIR AND HOSPITALS AND ELECTED PROCEDURES
AND IT'S SHOWN THAT EVERYTHING WE DO HOW WE
PROVIDE RELIEF TO IS DRIVEN BY THE DATA.
NOT JUST FROM A DECISION-MAKING PROCESS BUT
CAN WE ACTUALLY GET THIS TO THE RIGHT PEOPLE
TO WHERE HOW DO WE DIVINE PROVIDERS AND DO
WE KNOW WHAT THIS PERSON IS AN ACTUAL HEALTHCARE
PROVIDERS THAT EXISTS AND 
WHAT CAN WE DO AND IT'S RELIANT ON THE DATA
WE HAVE.
BEFORE YOU CAN EVEN GET INTO THE IDEAS OF
POLICY YOU'VE GOT TO THINK WHAT IS ACTUALLY
CAPABLE AND THAT REQUIRES DATA.
IT'S THE FIRST THING AND 
YOU HAVE TO UNDERSTAND WHAT IS OUT THERE,
DO YOU KNOW WHAT CMS HAS IN THEIR PROVIDER
MET WORK AND DO YOU LOOK AT
WHAT SAM SA HAS AND BEHAVIORAL HEALTH AND
THERE'S WHAT IT MIGHT ALLOW TO YOU OPERATIONALIZE
AND THAT IS ONE OF THE IT'S WHAT WE WORK AND
AND HELPS DRIVE I'DATION BASED WHAT YOU CAN
ACTUALLY OPERATIONALIZE AND TO USE THE DATA
AND READ IT AND UNDERSTAND IT ACROSS EVERY
LEVEL OF AN ORGANIZATION FROM TOP TO BOTTOM
TO NEW PERSONNEL AND UNDERSTAND WHAT THE DATA
IS PRESENTING AND MANIPULATE IT AND TEST WITH
IT SO YOU CAN HAVE MULTIPLE INSIGHTS IN TEST
DIFFERENT HYPOTHESISES SO EVEN IF THE WORK
THAT WE'VE DONE, I CAN I NEVER THOUGHT I WOULD
TO USE EXCEL PROGRAMMING OR OTHER CAPABILITIES
THAT I HAD TO USE IN THE PAST
AND 
THAT ONLY HAPPENS BECAUSE WE'RE PORT AT TO
HAVE PEOPLE WHO UNDERSTAND THE USER SOURCES
AND THE OPERATIONAL SCHOOLS TO MANIPULATE
THE DATA AND CHECK IT ACROSS DIFFERENT PROGRAMMING
AND SYSTEMS TO WHERE YOU HAVE A LEVEL OF CONFIDENCE
SO I THINK THAT CAN'T BE UNDER SCORED STRONGLY
ENOUGH AND TECHNICAL FUNDAMENTAL TECHNICAL
ABILITY SO IMPORTANT AT EVERY LEVEL IF YOU
WANT TO MOVE QUICK AND SO YOU ARE NOT GOING
BACK AND FOURTH AND YOU CAN PULL UP THE SHEET
AND MANIPULATE IN REAL TIME AND I CAN SHARE
THAT I'VE HAD TO DO THAT PERSONALLY AND EVERY
LEVEL OF GOVERNMENT JUST IN THE PAST 60 DAYS.
WHICH, YOU KNOW, IS IT SHOWS PEOPLE WHO ARE
DATA SCIENTISTS OR DATA ANNALISTS OR PEOPLE
WITH ALL THE DATA AND EVERYBODY HAS TO HAVE
THE BASIC SKILL SET TO MOVE QUICKLY AND OPERATIONALIZE
IT.
I JUST CAN'T UNDERSCORE HOW IMPORTANT IT IS
TO UNDERSTAND THE DATA SETS OUT THERE AND
HAVE THE ABILITY TO PUT IT IN EXCEL AND PLAY
WITH IT TO SEE WHAT YOU CAN FIND AND WHAT
INSIGHTS CAN YOU DO.
IT'S JUST SOMETHING THAT IS CONSTANTLY GOING
TO BE A GROWING NEED AND BEING THE IMPORTANT
TO HAVE THOSE SKILLS AND UNDERSTANDINGS.
IT CAN'T BE STRESSED ENOUGH.
I THINK THE LAST THING WHEN WE WERE TALKING
ABOUT DATA AND I THINK IT'S IMPORTANT AND
ONE OF THE SHARED BASED OFF THE EXPERIENCES,
WHEN YOU DO IT, EVERYBODY NEEDS TO KEEP THE
MIND SET IT'S AN PROCESS THAT
IS GOING TO REQUIRE PERSISTENCE AND RESILIENCY.
YOU WANT THINGS TO DEMONSTRATE THE NEED AND
SHOW THE EVIDENCE.
WHAT THAT REQUIRES IS A DEGREE OF HUMILITY
OF HAVING A HYPOTHESIS WITHOUT PRIDE OF OWNERSHIP.
SO PERSONALLY, IN MYSELF AND OTHERS WE HAVE
HYPOTHESIS AND PROPOSALS AND SAID LOOK, THE
IDEA WAS OFF.
SO YOU CAN'T, YOU NOT ONLY HAVE TO HAVE THE
TECHNICAL ABILITY AND UNDERSTANDING TO HAVE
THE DATA BUT TOUGH SAY THE HYPOTHESIS IS WRONG
AND THIS IS WHAT THE DATA IS TELLING US AND
INFORM THE DECISION MAKERS TO UNDERSTAND WHAT
IT ACTUALLY SAYS AND WHY WE NEED TO PIVOT.
AND SO, I THINK THAT'S A REALLY IMPORTANT
PIECE TO REINFORCE IT ISN'T TO PROVE WHAT
WE THINK IS RIGHT BUT TO REALLY UNDERSTAND
WHAT THE DATA IS TELLING US TO MAKE BETTER
DECISIONS.
IT'S BEEN, LIKE I SAID, HAVING THAT ABILITY
TO UNDERSTAND THE DATA SETS AND PULL IT TOGETHER
FOR DECISION-MAKING IT CRITICAL.
I CAN'T THANK ALL OF YOU ENOUGH FOR YOU GUYS,
YOU KNOW, IN YOUR INTEREST IN DEMO DAY, THE
WORK THAT YOU'VE DONE, ALL THE LEARNING WITH
THE CO LAB, AND SUCCESS IS POSSIBLE WHEN DIVERSE
GETS TOGETHER LIKE YOU SEE TODAY BY THE PRESENTERS.
I'M DELIGHTED TO BE HERE TODAY TO SHARE MY
EXPERIENCES AND WHY THIS IS SO IMPORTANT AND
IN THE SHORT TERM HOW PEOPLE UNDERSTAND DIFFERENT
DATA SETS AND HOW IT CAN BE HELPFUL.
WHERE YOU WORK PUTS US ON A PATH THAT WILL
HELP US MAKE BETTER DECISIONS, UNDER UNDERSTAND
THE DATA BETTER AND ACHIEVE OUR MISSION BETTER
AND ALSO BECOME A BETTER ORGANIZATION AND
TEAM.
ONCE AGAIN, I WANT TO THANK YOU FOR THE WORK
THAT YOU'VE DONE AND.
>> THANK YOU, WILL.
I APPRECIATE YOUR IN SIGHT TALKING ABOUT YOUR
EXPERIENCES RECENTLY.
DISTRIBUTING THE FUNDS AND HOW DATA HAS HELPED
YOU MAKE THOSE DECISIONS.
OUR NEXT SPEAKER IS STEVE BABICH, HE USES
DESIGN, DATA AND TECHNOLOGY TO BUILD BETTER
PRODUCTS AND POLICY.
CURRENTLY HE IS THE HEAD OF ARTIFICIAL INTELLIGENCE
PORTFOLIO FOR THE U.S. TECH FOLLOW TRANSFORMATION
SERVICE, TTSAI, TTSAI IS HELPING THE U.S.
FEDERAL GOVERNMENT INVEST IN AND USE AI TO
ACHIEVE THEIR RESPECTIVE MISSIONS.
IN 2015, HE WAS A WHITE HOUSE PRESIDENTIALEN
OVATION FELLOW AT F.B.I. AND HE ORCHESTRATED
A USER-CENTERED APPROACH TO BUILDING PRODUCTS
AND MITIGATING THREATS TO NATIONAL SECURITY.
STEVE WAS AWARDED FOR EXCEPTIONAL SERVICE
IN THE PUBLIC INTEREST BY F.B.I. DIRECTOR
CHRISTOPHER RAY.
STEVE, FEEL FRO TO JUMP IN.
>> THANK YOU, YOU CAN HEAR ME, RIGHT?
>> YES.
>> HI, EVERYONE.
THANK YOU, FIRST TO SANJAY AND THE ENTIRE
HHS DATA SCIENCE COLAB FOR THE CHANCE TO SPEAK
WITH ALL OF YOU THIS MORNING.
AT LEAST A TWO DIMENSIONAL VERSION OF ME.
HOPEFULLY I CAN MEET YOU ALL IN REAL LIFE
SOONER THAN LATER.
I'VE STEVE, I HEAD UP THE PORTFOLIO FOR THE
U.S. TRANSFORMATION SERVICE TTC, AND ON TTS,
WE ARE AN ORGANIZATION WITHIN THE GOVERNMENT
THAT HELPS GOVERNMENT USE TECHNOLOGY IN A
BETTER WAY TO IMPROVE THE LIVES OF PUBLIC
AND PUBLIC SERVANTS.
SO, IN TTS WE HAVE GROUPS LIKE THE PRESIDENTIAL
INNOVATION, CENTERS OF EXCELLENCE, 10X WHICH
IS A VENTURE SEED FUND FOR GOVERNMENT IDEAS
MOVING FORWARD.
AND SO, YOU KNOW, WITHIN TTS AND FOLLOWING
THE A.I. EXECUTIVE ORDER LAST YEAR AND THE
WHITE HOUSE SUMMIT ON A.I. IN GOVERNMENT IN
SEPTEMBER, WHITE HOUSE WAS SPONSORSHIP FOR
OSCP AND NEIL CHERRY AND THE HEAD OF TTS,
WE SET UP A FOCUS ON ARTIFICIAL INTELLIGENCE
AND IT'S LIKE EVERY TALK STARTS THIS WAY SO
IT'S CLICHE BUT IT HAS POTENTIAL TO BRING
OUT TRANSFORMATIVE CHANGE.
EQUALLY CLICHE, IT'S ALSO PRESENTING A RANGE
OF CHALLENGE AND OPPORTUNITIES WITHIN GOVERNMENT
AND SO WITHIN GOVERNMENT, WE'VE GOT TO FIGURE
OUT HOW DO WE BEST INVEST IN APPLY A.I. TO
HELP AGENCY AND THE PUBLIC COUNTRY AT LARGE
AND WHATEVER WE DO AS FAR AS TTS IS CONCERNED,
WHATEVER WE'RE TRYING TO TACKLE HAS TO BE
ROOTED IN THE CHALLENGES THAT THEY HAVE AND
HOW DO WE SUPPORT THAT MISSION?
BY AND LARGE, UNLESS YOU ARE DOE AND NASA,
OUR SENSE IS AGENCIES ARE RELATIVELY EARLY
ON IN THEIR TRUE USE OF A.I.
SO THEY'RE LAYING THE FOUNDATIONS OF IT AND
THERE IS POCKETS OF ACTIVITY HAPPENING.
IN HHS HAS DONE A GREAT JOB LAYING THE FOUNDATION,
CERTAINLY WITH THE DATA SCIENCE INITIATIVES,
THE ABILITY TO ACTUALLY SHARE MORE AND MORE
DATA SENSE ABOUT TAKING FULL ADVANTAGE OF
A.I. SO CONGRATULATIONS TO HHS AND SOME OF
THEIR GROUND-BREAKING WORK.
IN TERMS OF TTS AND OUR FOCUS, THERE'S A FEW
AREAS THAT WE'RE FOCUSED ON AS WE THINK ABOUT
OUR MISSION OVER ALL IS TO HELP ACCELERATE
THE USE OF A.I. IN GOVERNMENT TO ACHIEVE THE
MISSION.
SO I'LL JUST GO THROUGH A FEW OF THOSE.
SANJAY, CAN YOU GO TO THE NEXT SLIDE.
THE FIRST ONE, WE HAVE A FOCUS ON IMPLEMENTATION'S
DELIVERY OF A.I. WORK SO THAT IS THROUGH THE
CENTER OF EXCELLENCE AND SO WE WORK WITH AGENCIES
INCLUDING THE JOINT ARTIFICIAL INTELLIGENCE
CENTER AND THE DOL AND SOME OF THE WORK WE'RE
DOING THERE.
IT'S REALLY HOW DO WE ACTUALLY HELP AGENCIES
THROUGH THE USE OF A.I.?
THE CENTER OF EXCELLENCE LIKE DATA ANALYTICS
AND CLOUD AND THOSE THAT SUPPORT THIS AS WELL.
THAT'S ONE AREA.
SO IN ADDITION TO IMPLEMENTATION AND DELIVERY,
WE ALSO WANT TO SUPPORT THE ACCELERATION OF
THE PRODUCT DEVELOPMENT.
WHAT I MEAN BY THAT, WHAT ARE THE THINGS WE
CAN DO THAT AGENCIES WRIT LARGE CAN TAKE ADVANTAGE
OF?
AND ONE OF THOSE THINGS IS THE GUIDE TO A.S.I.
WE BELIEVE THAT RIGHT NOW, THERE'S A SET OF
CONTEXT THAT HELPS AGENCIES AND AGENCY LEADERS
PRIMARILY THINK THROUGH WHAT DOES IT TAKE
TO INVEST IN A.I.?
HOW DO I START TO APPLY IT AND WHERE DO I
GET STARTED?
THAT'S ONE THING WE'RE BUILDING.
AGENCIES ACROSS THE GOVERNMENT WANT TO KNOW
WHAT ELSE IS HAPPENING OUT THERE AND HOW DO
WE LEARN FROM THAT.
AND SO, WE WANT TO BUILD THIS REPOSITORY LIBRARY
OF VARIETY OF THESE CASE AND WE'RE STARTING
TO BUILD THAT OUT OVERTIME.
SO IF THERE ARE USED CASES WE WELCOME THOSE
AND PLEASE DO SHARE THOSE WITH US.
AND THEN, WE'RE INVESTIGATING HOW WE BECOME
A BROADER RESOURCE OF A.I. LEARNING AND SO,
WE'RE THINKING THROUGH IDEAS RIGHT NOW AND
WE'RE HAVING CONVERSATIONS WITH HHS AND OTHERS
HOW WE CAN BUILD THAT OUT.
THAT'S THE PRODUCT DEVELOPMENT PIECE.
ONE PLATFORM IS THE ARTIFICIAL INTELLIGENCE
COMMUNITY PRACTICE.
THIS IS A PLACE WHERE WE CAN ESSENTIALLY HAVE
TALKS, PANELS, WORKSHOPS WHERE WE SHARE LESSONS
LEARNED.
THE PITFALLS, THE CHALLENGES, EVERYTHING IN
BETWEEN AND SO WE'VE GOT ABOUT SOMEWHERE BETWEEN
800 AND 1,000 STRONG NOW IN THIS COMMUNITY
AND SO WE WELCOME TO YOU SIGN UP FOR THIS.
WE HAVE MONTHLY EVENTS RIGHT NOW AND THE NEXT
PART.
WITHIN THE COMMUNITY, WORE STARTING TO AND
EVEN TODAY WE'RE GOING TO HAVE AN INITIAL
DISCUSSION ON WORKING GROUPS THAT ARE GOING
TO ESSENTIALLY WE'RE HEARING THAT THE FEDERAL
GOVERNMENT AGENCY, HOW DO WE GET TOGETHER
INTER AGENCY AND AGE ON SPECIFIC TOPICS RELEVANT
TO A.I.
THIS IS SOMETHING THAT WE'RE STARTING TO BUILD
AND THAT WILL LIKELY FORM SUB GROUPS AS WELL.
AND LASTLY, THERE'S A NOTION OF EXTERNAL ENGAGEMENT
AND THAT'S THE NEXT PIECE.
SO WE WANT TO ENGAGE EXTERNALLY AND GET BEYOND
THE GOVERNMENT TO MEET WITH ACADEMIA INDUSTRY
AND OTHER CON SORE SHA TO CONTINUE WE'RE FOSTERING
CONTINUOUS LEARNING AND HELP FEDERAL AGENCIES
AND THE GOVERNMENT PREPARE FOR INVESTING AND
USE A.I.
THAT'S A QUICK DOWNLOAD OF WHAT WE'RE FOCUSING
ON.
AGAIN, I COME BACK WITH THE COMMUNITY PRACTICE
AND HHS WE HAVE THEM COME IN AND A COUPLE
OF FORMATS OF ONE OF WHICH WAS THE DATA SCIENCE
CO LAB SHARE THEIR DATA INSIGHTS INITIATIVE
TO SHARE THE GREAT WORK THEY'VE BEEN DOING
AND WE ALSO HAD HHS AND THEIR HR DEPARTMENT
WITH BLAIRE DUNCAN SHARED SOME OF THE GREAT
WORK THEY'RE DOING TO MODERNIZE THE HR PRACTICES
AND SO IT'S JUST, THIS IS SORT OF THE MECHANISM
BY WHICH THE FEDERAL GOVERNMENT CAN GET TOGETHER
AND FOCUS ON WITH ISSUES RELATED TO A.I. AND
IT'S FOUNDATIONAL IN TERMS OF DATA SHARING
AND DATA PREPARATION AND DATA READINESS AND
THE TECH CHALLENGES.
AND THEN, JUST A COUPLE LAST THINGS I'D LIKE
TO SHARE, IF I HAD TO PICK ONE THING TO FOCUS
ON, YOU KNOW, OBVIOUSLY THERE'S A RANGE OF
ISSUES THAT ARE CHALLENGES FOR THE FEDERAL
GOVERNMENT AND DATA, TECHNOLOGY RELATED ISSUES
AS EVEN WILL ALLUDED TO, BUT PEOPLE, I WOULD
ARGUE, ARGUABLY THE MOST IMPORTANT THING AND
SO SOME OF THE THINGS WE'RE STARTING TO THINK
ABOUT IS WHEN YOU THINK ABOUT THE PEOPLE DEVELOPMENT
AND THE UPSCALE AND THE RESCALING, AGAIN,
HHS HAS DONE A GREAT JOB WITH THIS CO LAB
AND THE COHORTS THEY'RE WORKING THROUGH AND
THEN THE GREAT WORK THEY'RE SHARING OUT IN
EVENTS LIKE TODAY.
IS WE'RE THINKING ABOUT HOW DOES DATA SCIENCE
FIT INTO AN AGENCY.
WHAT IS THAT CAPABILITY LOOKING LIKE AND WHAT
IS THE STRUCTURE?
SO, PART OF THAT RELATES TO WHAT IS THE CAREER
PATH OF A DATA SCIENTIST?
WE HAVE TO SORT OF SET THESE FOUNDATIONAL
MODEL AND MECHANISMS UP SO WHEN A DATA SCIENTIST
COMES IN THEY CAN SEE WHERE THEIR PROFESSIONAL
DEVELOPMENT WILL LIVE AND HOW THEY PROGRESS
FROM THAT AND SO THIS IS SOMETHING WE'RE THINKING
ABOUT AND WANT TO SHARE A PERSPECTIVE ON.
INCLUDING THAT PROFESSIONAL DEVELOPMENT CAREER
PATH.
AND HOW DO WE BUILD INTO THE DNA OF AGENCIES
THIS NOTION, RIGHT, WE HEAR IT ALL THE TIME
SO IT'S ANOTHER CLICHE BUT HOW DO WE FAIL
FAST BUT MORE IMPORTANT BUT HOW DO WE LEARN
FAST?
IT'S HARD STUFF, RIGHT.
WE'RE GOING TO FAIL AT SOME THINGS AND WE
HAVE TO MAKE SURE THAT WE A PAY ATTENTION
TO THE LONG GAME HERE AND ABANDON A.I. BECAUSE
YOU FAILED SOMEWHERE.
WE HAVE TO LOOK FOR WAYS TO IMPROVE ON THAT
AND AGENCY LEADERSHIP HAS TO SUPPORT THAT
AND RECOGNIZE THE IMPORTANCE OF INVESTING
IN A.I.
WHEN YOU THINK ABOUT THE TAL ESSENTIAL ITSELF
ONE CRITICAL AREA IS THE HUMAN RESOURCE AND
TALENT TEAMS AND THEY HAVE TO HAVE ENOUGH
A.I. KNOWLEDGE TO BE ABLE TO EVALUATE THE
TALENT THAT IS FLYING TO THE AGENCIES THAT
THEY CAN BRING THEM INTO THE RIGHT BUSINESS
AREAS AND REALLY GET THE WORK DONE AND SO
HOW DO WE GIVE THEM THE TRAINING TO UNDERSTAND
WHAT GOOD A.I. AND DATA SCIENCE LOOKS LIKE.
THE ISSUE OF ACQUISITION AND THE GOVERNMENT
IS GOING TO BUY A LOT OF A.I. TECHNOLOGY UNLESS
YOU ARE AN R AND D TYPE OF AGENCY.
IF THE GOVERNMENT IS BUYING A FAIR AMOUNT
OF AI THE PEOPLE DOING THE ACQUISITION HAVE
TO UNDERSTAND AND WORK WITH THEIR TECHNICAL
STAKEHOLDERS AND BUSINESS STAKEHOLDERS TO
UNDERSTAND WHAT ARE THE RIGHT QUESTIONS TO
BE ASKING ABOUT THE VENDOR AND CONTRACTING
COMMUNITY AND HAVE ENOUGH KNOWLEDGE TO UNDERSTAND
THE ANSWERS COMING BACK AND MAKE SURE THAT
WE WORK SIDE BY SIDE AND HAVE MULTIPLE STAKEHOLDERS
AT THE TABLE TO ENSURE WE HAVE GOOD TECHNOLOGY,
SOLUTIONS AND TOOLS.
AND SO, THIS SPEAKS TO THE MOST IMPORTANT
ISSUE OF PEOPLE.
HOW DO WE BUILD THAT DATA LITERACY AT TECHNICAL
AND BUSINESS LEVELS AND THEN WE GET TO START
SOMEWHERE AND START TO DO SOME PILOTING AND
EXPERIMENTATION AND LEARN AND I WOULD SAY,
WE'RE BUILDING THOSE PERSPECTIVES AND LOOKING
TO SHARE THAT OUT TO THE FEDERAL GOVERNMENT
WRIT LARGE AND WE DON'T HAVE ALL THE ANSWERS
THAT IS WHY WE WANT TO ENGAGE IN THE COMMUNITY
AND WORKING HARD TO PROVIDE THOSE PERSPECTIVES
AND BRING ALL OF YOU TOGETHER SO WE WELCOME
YOU TO BE A PART OF THE COMMUNITY PRACTICE
AND WE'RE ALL IN THIS JOURNEY TOGETHER AND
I LOOK FORWARD TO SEEING THE COMMUNITY GOING
FORWARD.
THANK YOU FOR THE TIME AND THANK YOU AGAIN
AND ALL OF HHS.
IT'S A PLEASURE TO BE WITH YOU.
>> THANK YOU, STEVE.
IT'S GREAT TO BE A PART OF A LARGER COMMUNITY,
ESPECIALLY AS WE TRY TO LEARN FROM ONE ANOTHER
TO IMPROVE THE WAY THAT THE GOVERNMENT DOES
A.I. AND DOES DATA SCIENCE AND BUILD BETTER
DATA STRATEGIES.
APPRECIATE YOUR TALK.
SO, NOW WE'RE READY TO GET STARTED WITH PRESENTERS.
WHILE ALL 30 PARTICIPANTS HAVE WORKED VERY
HARD TO CREATE AMAZING TOOLS THEY'VE BRING
BACK TO THEIR HOME OFFICES, TO DEVELOP NEW
INSIGHTS FROM THE DATA SETS, IN ORDER TO MAKE
BETTER BUSINESS DECISIONS, MAKE THEIR WORKFLOWS
MORE EFFICIENT, OR REDUCE COSTS, WE WOULD
LIKE TO SHARE NINE PROJECTS TODAY TO GIVE
YOU A FLAVOR OF THE POSSIBILITIES THAT THE
APPLICATIONS OF DATA SCIENCE BRING TO THE
DEPARTMENT.
FIRST UP, WE HAVE CHRISTIE WHO WILL TELL YOU
ABOUT HER PROJECTS ON THE ADMINISTRATION FOR
NATIVE AMERICANS.
>> GOOD MORNING, EVERYONE.
MY NAME IS CHRISTIE AND I'M A CURRENT PROGRAM
ANNALIST WITH THE ADMINISTRATION FOR NATIVE
AMERICANS WHICH IS UNDER THE ADMINISTRATION
FOR CHILDREN AND FAMILIES OR ACS.
I'M VERY THANKFUL TO EVERYONE AT THE CO LAB
AND FOR Y'ALL TURNING IN TODAY TO SHARE A
BIT ABOUT MY PRESENTATIONS, WHICH IS FOCUSED
ON HCS TRIBAL TRANSCRIPTS AS WELL AS THE NATIVE
AMERICANS.
WHAT YOU ARE SEEING IN THE ILLUSTRATION ON
THE RIGHT HERE, THAT WAS SHARED BY A AND A
DURING THE MAY FIFTH AWARENESS DAY THAT JUST
PASSED.
THIS WILL PROVIDE BACKGROUND INFORMATION ON
ANA.
WE ARE AN AGENCY THAT PROVIDES COMPETITIVE
GRANT FUNDING IN THE AREAS OF SOCIAL, ECONOMIC,
LANGUAGE, AND ENVIRONMENTAL TOPICS.
AND THESE ARE AVAILABLE FOR NATIVE AMERICANS,
PACIFIC ISLANDER ORGANIZATIONS AND STATE AND
THE 547 STARTING TO RECOGNIZE TRIBES IN THE
UNITED STATES.
AND AT A.N.A., WE SUPPORT SEVERAL DIFFERENT
HHS AND ECK.
TO SUPPORT TRIBAL CONSULTATION.
THERE'S DIFFERENT CONSULTATION POLICIES AND
THINGS THAT OCCUR ACROSS THE AGENCY FROM ADVISORY
COMMITTEES TO INDIVIDUAL AGENCIES CONSULTATION
AND AS YOU ARE SEE HERE, ANNUAL CONSULTATION.
CONSULTATION LEADS TO THE INFORMATION EXCHANGE
AND INITIAL UNDERSTANDING AND INFORMED DECISION-MAKING
STATED BY THE TRIBAL CONSULTATION POLICY ON
THE SCREEN.
SO WITH CONSULTATION, TRIBAL LEADERS SHARE
THEIR VOICES AND THIS CREATES TRANSCRIPTS
AS YOU SEE ON THE LEFT HERE AS WELL AS WRITTEN
TESTIMONY THAT IS SUBMITTED.
THEY ARE PUBLICLY AVAILABLE FOR ACS AND RECENT
FROM 2010 TO 2019 AND I WANTED TO WORK AS
THEY HAVE NEVER BEEN ANALYZED AND DATA SCIENCE
AND TEXT MINING TECHNIQUES.
THE NEXT ONE WHY IS IT IMPORTANT RIGHT NOW
IS THIS CRISIS OCCURRING THE MISSING AND MURDERED
NATIVE AMERICANS OR SOME OF YOU MAY KNOW IT
AS MISSING AND MURDERED INDIGENOUS WOMEN.
AS YOU CAN SEE, THE CDC HAS REPORTED HOMICIDE
AS THE NUMBER FOR CAUSE OF DEATH FOR NATIVE
WOMEN AND THE NUMBER THREE CAUSE OF DEATH
FOR NATIVE MEN AGE 1-19-YEARS-OLD.
THIS CRISIS HAS BEEN LONGSTANDING AND HAS
OCCURRED ACROSS AREAS AND EFFECTS MANY DIFFERENT
COMMUNITIES.
THIS IS A COMPLEX ISSUE RELATED TO THE SOCIAL
DETERMINANTS OF HEALTH AND CUTS ACROSS OUR
PROGRAMS AND HHS PROGRAMS.
ON NOVEMBER 26TH, 2019, ACTUALLY AN EXECUTIVE
ORDER CREATED OPERATION LADY JUSTICE.
WHICH WAS TO HELP BRING TOGETHER DEPARTMENT
OF INTERIOR, DEPARTMENT OF JUSTICE AND HHS
TO WORK SPECIFICALLY ON MMNA.
WITH THIS OPERATION LADY JUSTICE CONSULTATION
THAT'S WILL OCCUR AND PART OF THIS WAS AS
A CO LAB WAS TAKE THE INFORMATION THAT WE
HAVE FROM CONSULTATIONS TO CREATE A SYSTEMATIC
WAY TO ANALYZE THAT DATA AND AS WELL HELP
SUPPORT THE WORK THAT IS BEING DONE ON NMA
AND THINK OF WAYS WE CAN ANALYZE THESE CONSULTATIONS
THAT WILL OCCUR IN THE FUTURE.
IT'S ACKNOWLEDGING THE VOICES OF OUR TRIBAL
NATION THAT'S HAVE SHARED THEIR THOUGHTS,
IDEAS AND FEEDBACK WITH US THROUGHOUT THE
LAST NINE YEARS SO I WANTED TO UNDERSTAND
WHO HAD PARTICIPATED OVER THE LAST NINE YEARS
AND THROUGH THIS I WAS ABLE TO CREATE AND
CODE INTERACTIVE MAPS THAT YOU ARE SEEING
ON THE LEFT HERE.
THIS IS RECOGNIZED TRIBES AND AGAIN THERE'S
SEVERAL AS YOU CAN CONSULTATION THAT OCCUR
AND ACROSS HHS AND THEY'RE LOOKING AT THE
ANNUAL CONSULTATION THAT HAPPENS WITH HCS.
THERE'S MANY TRIBES AND ORGANIZATION THAT'S
HAVE PARTICIPATED AND I REALLY WANTED TO UNDERSTAND
THOSE IN COMPARISON TO THE NUMBER OF HIGHLY
RECOGNIZED TRIBES IN THE STATE.
ALASKA THEY HAVE PARTICIPATED AND THIS ONLY
MAKES UP 3% OF THE ALASKAN VILLAGES BECAUSE
THERE WAS A LARGE STATE WITH MANY VILLAGES.
AND THE CHALLENGE TO PARTICIPATING CONSULTATION
AND CAN BE TRAVELING TO D.C. TO ATTEND AND
LIMIT RESOURCES TO TURN IN WRITTEN TESTIMONY.
GREAT.
SO NOW THAT WE HAVE AN IDEA OF WHO PARTICIPATED
I WANT TODAY LOOK AT SOME OF THE DATA THAT
EXISTS ON MISSING PERSONS.
THIS DATA OF COURSE IS FROM SEVERAL DIFFERENT
SOURCES BECAUSE SOME OF THE MISSING MIW AND
MMNA DATA SAY CHALLENGE TO FIND.
THERE IS OUR DATA SOURCE, THE URBAN INDIAN
HEALTH INSTITUTE THAT PUT A REPORT TO FOCUS
ON CASES OF WOMEN AND GIRLS FROM 1943 TO 2018,
WHICH ARE SEEN IN THE RED MAP ON THE TOP.
ON THE BOTTOM YOU SEE THE NATIONAL MISSING
AND UNIDENTIFIED MISSING PERSONS WHICH IS
ALL GENDERS AND AGES FROM CASES IN THE NATIVE
POPULATION THAT I PULLED FROM MARCH.
WHEN I LAY IT ON TOP, YOU CAN SEE THAT THE
COLORS THAT ARE COMING THROUGH ON THE BOTTOM,
THE RED AND YELLOW, THESE ARE STATES THAT
HAVE MISSING CASES BUT HAVE NOT NECESSARILY
PARTICIPATED IN CONSULTATION.
NOT ALL STATES HAVE FEDERALLY RECOGNIZED TRIBES
BUT SOME STATES THAT DO SUCH AS NEBRASKA,
WHICH YOU ARE SEEING ON BOTH THE TOP AND THE
BOTTOM, HAVE CASES OF MMIW AS REPORTED IN
UIH AND IN NAMEN.
THIS MIGHT BE A PLACE THAT HAS SIX FEDERALLY
RECOGNIZED TRIBES THAT WE MIGHT WANT TO FOCUS
OUR OUTREACH, AWARENESS OR DISCUSSIONS WITH
AS THEY HAVE NOT PARTICIPATED IN THE NINE
YEARS.
NOW THAT WE HAVE AN IDEA OF WHO HAS PARTICIPATED
AND I WANTED TO LOOK AT CONTENT OF THESE TRANSCRIPTS.
YOU ARE SEEING A WORD CLOUD OF ALL NINE YEARS
OF ALL THE WORDS AND THE DIFFERENT COMMENTS
AND RESPONSES THAT WERE GIVEN THROUGHOUT THESE
NINE YEARS AND OF COURSE, IT'S GOING TO COME
INTO VIEW IS TRIBES.
THE CENTER AND THE REASON FOR CONSULTATION
AND A INTERESTING WORD THAT'S COME THROUGH
ON THIS IS IQUA WHICH IS THE INDIAN CHILD
WAREFUL FACT WHICH IS A COMMON TOPIC THAT
WE SAW ACROSS EVERY YEAR OF CONSULTATION AND
IT'S IMPORTANT TO NATIVE COMMUNITIES.
SO LOOKING MORE INTO THESE WORDS I WAS ABLE
TO CREATE THE TOP 30 MOST FREQUENT TERMS.
AS YOU SEE HERE, WHAT IS INTERESTING IN THESE
30 WORDS IS HIT THE NEXT SLIDE THERE.
THERE'S A LOT OF ACTION WORDS.
NOT ONLY TRIBES ARE ACTING FOR ACTION FROM
THE FEDERAL AGENCIES BUT THEY ARE TAKING ACTION
FROM MMIW TO OTHER PROGRAMS AND AREAS THEY
ARE PROVIDING SOLUTIONS AND INNOVATIVE IDEAS
ON THEIR OWN AS WELL AND TRYING TO SHARE THESE
BEST PRACTICES THAT MAYBE CAN HELP OTHER COMMUNITIES
IN THESE TRANSCRIPTS AND THROUGH CONSULTATION.
AFTER TAKING THESE WORD FREQUENCIES, WE CAN
SEE THAT WE CREATED A NETWORK TO UNDERSTAND
HOW THESE DIFFERENT YEARS INTERACT TOGETHER
AND IN 2010, MAINLY FOCUSED ON CREATING FUTURE
CONSULTATIONS.
IT WAS A WAY TO UNDERSTAND HOW TO DEVELOP
THIS TOGETHER.
AND TO SEE HOW THESE DIFFERENT CONSULTATION
YEARS INTERACTED.
THERE'S ALSO A WAY TO DO A TERM SEARCH QUICKLY
THROUGH SOME OF THE SKILLS I GAINED IN THIS
CLASS.
AS YOU ARE SEEING HERE, AS AN EXAMPLE, THE
TRIBE SAID IN 2015, TRAFFICKING WAS AN ISSUE
SO WE CAN FIND DIFFERENT IDEAS AND TOPICS
OCCURRING THROUGH OUT THE COMMUNITIES AND
WE CAN CREATE HOPEFULLY A PUBLIC SAFETY MATRIX,
WHICH YOU ARE SEEING ON THE BOTTOM THAT ORGANIZED
TRANSCRIPTS TO TRIBES CAN SEE PUBLICLY WHAT
COMMENTS HAVE BEEN SAID IN THE PAST AND HOW
THEY CAN HELP IN THE FUTURE.
HOW WE CAN RELATE THIS INFORMATION AND ANALYZE
IT FURTHER.
>> I HOPE WE CAN CONTINUE TO DO THIS ANALYSIS
AND RESOURCE AND REALLY CLASS AND CO LAB AND
WE'RE THANKFUL HAS GIVEN A NEW UNDERSTANDING
TO NEVER ANALYZE THE TRANSCRIPTS AND CONSULTATION
AND CREATE A TIME SAVING FOR ORGANIZING THESE
TRANSCRIPTS AS WELL AND I HOPE THIS CONTINUES
TO SUPPORT THE WORK AND MOST IMPORTANTLY HONOR
THE RELATIONSHIP WE HAVE WITH TRIBES.
I WANT TO THANK THE CO LAB FOR THIS OPPORTUNITY
AS WELL THE STAFF SO YOU ARE SEEING SOME HAVE
BEEN DEEPLY EFFECTED BY THIS WHO HAVE LOST
LOVED ONES AND AGAIN TO THE TRIBAL NATION
THAT'S HAVE PARTICIPATED IN THIS.
I THINK TARA SWEENEY, THE ASSISTANT SECRETARY
OF INDIANA FAIRS SAID IT BEST, THE CONSULTATIONS
ARE IMPORTANT BECAUSE THEY'RE A VOICE FOR
THOSE WHO CANNOT SPEAK AND I REALLY THANK
YOU FOR TODAY AS WE CONTINUE TO WORK ON THIS
AND TO THE TRIBAL NATIONS FOR THEIR RESILIENCY
AND STRENGTH THROUGHOUT THIS CRISIS.
THANK YOU, EVERYONE.
>> THANK YOU, CHRISTIE.
OUR NEXT LIGHTENING PRESENTER WILL BE ZHONG
QI ON BUILDING OPERATIONS AND MAINTENANCE.
>> HI, EVERYONE.
I'M A DATABASE MANAGEMENT FROM NIH RESEARCH
FACILITIES.
I WANT TO GIVE MY APPRECIATION TO HHS AND
FOR PROVIDING US WITH TREMENDOUS OPPORTUNITIES
AND TECHNOLOGY.
YEAH, I'M SO GRATEFUL FOR EIGHT WEEKS AND
FOR OUR LANGUAGE PROGRAM AND BUT ALSO FOR
ENTRY LEVEL DATA SCIENTIST TO INCORPORATE
AND COLLABORATE VARIOUS DATASETS WE'RE DEALING
WITH EVERYDAY IN ORDER TO PROVIDE MORE INSIGHT
AND A REASONABLE INTELLIGENCE MANAGEMENT.
SO TODAY I'D LIKE TO GIVE A BRIEF LAYOUT OF
MY PROJECTS AND HOW TO USE EXISTING BUILDING
DATA TO PREDICT THESE OVERALL OPERATION COST.
WE HAVE A QUICK BACKGROUND.
YOU CAN SEE THE NUMBERS AND HOW MANY ACTIVE.
THIS IS REAL TIME DATA 24/7.
DURING THE LAST 20 YEARS, WE HAVE ACCUMULATED
NUMEROUS DATA RECORDS.
YOU CAN SEE MAINTENANCE, PREVENTING MAINTENANCE
AND SERVICE ORDERS WITH EACH BUILDING'S OPERATIONS.
HOW YOU USE THOSE DATA IS ALWAYS A BIG CHALLENGE
FOR US.
WE ALL KNOW THAT WE CAN EASILY GENERATE A
REPORT OR DASHBOARD TO SHOW HOW GOOD, HOW
BAD OUR TEAM IS WORKING ON THEIR JOBS.
OR EVEN WE CAN SHOW YOU HOW MANY HAPPENED
IN EACH BUILDING.
THE REAL DATA ANALYSIS IS DONE ON THE REPORTING
LEVEL.
THE DATA FROM ALL SOURCES THEY CAN LINK WITH
AND COME BACK WITH DEEP DATA ANAYSIS.
THEY ARE TO UNVEIL AN POSSIBLE CORRELATION
AMONG THOSE DATA ELEMENTS.
THEY WANTED TO SPECIFY THE REASONABLE CLASSIFICATION
OR FURTHERMORE THEY WANT TO PREDICT THE DATA
TREND THAT WE HAVE NEVER SEEN BEFORE.
THIS IS OUR PURPOSE.
SO, DOING THE COLAB CLASS, I DECIDED TO GIVE
IT A TRY.
WE KNOW THAT EACH FISCAL YEAR, OIF, OUR OFFICE
HAS TO REPORT IN THOSE OPERATIONS AND MAINTENANCE
COST AND THE CONDITIONAL ASSESSMENT BETWEEN
CONGRESS TO TRADITIONALLY, THEY JUST BUILD
A TEAM AND SET UP INTO CONNECTION ALL THE
DATA MANUALLY AND ALL KIND OF EXCEL WORK BOOKS
INCLUDING THE LABEL COST AND CONTRACT COST
AND MANY DIFFERENT COSTS, BLAH, BLAH, BLAH.
BUT I'M PROPOSING HERE TO USE MACHINE TAKING
EXISTING YEARLY DATA, PLUS, PREVIOUSLY PHYSICAL
YEAR COST TO PREDICTIVE FY19 NUMBERS.
SO WE USE THREE TECHNICAL STEPS IN THE CLASSROOM,
ONE IS THE FIRST ONE IS THE DATA PREPARATION,
AND WE DO THE DATA VISUALIZATION AND FINALLY
WE HAVE A DATA MODEL.
NEXT SLIDE, PLEASE.
FIRST STEP, WE SORT OUR DATA.
SO WE ADOPT FY12 TO FY19 DATA.
WE SPECIFY BUILDING USE TYPES AS WELL THE
BUILDING AGE, SIZE, AND THE SIZE OF THE DATA.
SO WE CONNECT ALL THE DATA WE HAVE AND THE
BASIC DATA.
AFTER THAT WE SPEND A LOT OF TIME TO CLEAN
UP AND CREATE A FINAL DATA SITE.
TOTAL 384 OBSERVATIONS FOR DATA MODELING AND
THE TARGET AVAILABLE IS OM AND TOGETHER WE
USE 14 PREDICTABLE VALUES.
NEXT SLIDE, PLEASE.
SO YOU USE ONE SENTENCE CODE TO SHOW THE CORRELATIONS
MATRIX AMONG ALL VARIABLES.
SO YOU CAN SEE HERE WE HAVE EVEN THE BLUE
COLORS WHICH IS THE DARK THE BETTER AND 0.8
SEEN AS A HIGH CO EFFICIENT.
YOU CAN SEE THE GROWTH AND THE CENSUS DATA
AMONG THE HIGH AND THE SERVICE ORDER LABORERS
ARE LONGER THAN THE TOPICAL DATA.
SO YOU ALSO CAN SEE SOME KIND OF LAB DUTY
SHOWS SOME COEFFICIENT SO IT'S VERY INTERESTING
DATA.
WE SPECIFY THOSE AVAILABLE AND THE FIELD TO
OUR TRUE DATA.
WHY IS IT RANDOM FOR US OF EFFICIENCY AND
THE OTHER ONE IS -- WE LEARN THESE SO WE USE
OUR DATA TO TRAIN EACH DATA MODEL AND USE
ANOTHER 30% OF THE TASK DATA MODEL TO ENSURE
THE RESULTS ARE ACCEPTABLE.
ACTUALLY THE RESULTS ARE VERY IMPRESSIVE.
SO FINALLY WE APPLY BOTH DATA MODELS FY19
PREDICTABLE VARIABLES FOR EACH.
WE SAW COMPARE REAL FY19 COST BUT ALSO NUMBERS
A LOT OF AVAILABLE YET.
SO, UNFORTUNATELY BUT WE CAN SEE BOTH HISTOGRAMS
SIDE BY SIDE AND THEY ARE VERY SIMILAR AND
THEY WILL BE VALUABLE TO TWO PREDICTIONS IS
ONLY 5%.
NEXT SLIDE.
OK.
WHAT WE HAVE FOR OUR PROJECT BUT IN THIS PROJECT
IT'S ONLY FOR OUR DATA SCIENTISTS.
WE HAVE MANY, MANY SOURCES DOWN IN THE ROAD
AND WE CAN DO THE TIME SEQUENCE AND WE CAN
GO DEEPER TO UNDERLIES THE MECHANICAL SYSTEM
WE SEE IN EACH BUILDING TO SEE HOW GOOD OR
HOW BAD THEY ARE.
THE PERFORMERS.
FOR THE FUTURE, WE WILL LOVE ACTUALLY WE WILL
KIND OF INITIALIZE TO FORMER DATA SCIENTISTS
PANELS USING WE GATHER ALL THE PEOPLE WITH
DIFFERENT MAJOR AND FOR THE DATA.
THAT'S ALL WE'LL GO.
THAT IS WE BUILDUP A TEAM TO GIVE MORE ABOUT
THE DATA SCIENTIST HOW TO USE THIS IN OUR
DAY-BY-DAY BUSINESS OPERATIONS.
THE FINAL GOAL IS TO TAKE A MACHINE INTO DECISION-MAKING
PROCESS AS A PIONEER PLAYERS.
THANK YOU THAT'S ALL MY DEMO TODAY.
>> THANK YOU.
OUR NEXT PRESENTER WILL BE CLAIRE G AND SHE
WILL SHARE HER INSIGHTS HOW TO PREDICT A BROKEN
HEART.
>> HI, EVERYONE.
MY NAME IS CLAIRE JI AND I WORK AT FDA.
TODAY I'M PRESENTING PREDICTING A BROKEN HEART
USING MACHINE LEARNING TO EFFECT CARDIOVASCULAR
RISK OF PHARMACEUTICALS.
NEXT.
I WORK AT FDA CENTER FOR DRUG EVALATION AND
RESEARCH.
MY TEAM IS CALLED QT INTER DISCIPLINARY REVIEW
TEAM.
OUR JOB IS TO REVIEW PHARMACEUTICAL COMPANIES
AND ASSESS HOW LIKELY A DRUG CAN CAUSE CARDIO
ARRHYTHMIA.
IF A DRUG HAS A HIGH-RISK OF ARRHYTHMIA, IT
MIGHT NOT BE SAFE TO BE USED IN PATIENTS AND
WE MIGHT NOT APPROVE THE DRUG OR WARNINGS
IN THE LABLES.
THE ARRHYTHMIA IS THE A POTENTIALLY FATAL
ARRHYTHMIA AND IT CAN BE CAUSED BY MANY DRUGS.
AND IT IS ASSOCIATED WITH ECG SIGNALS AND
WE ATTACH THIS TO A PERSONS BODY AND RECORD
SIGNALS AND TO ASSESS HOW THE PATIENT'S HEART
IS FUNCTIONING.
NEXT.
SO TRADITIONALLY WE USE THE TWO TEAMS ON THE
ECG SIGNAL TO ASSESS TO THE RISK AND ON THE
BOTTOM LEFT, YOU SEE NORMAL ECG SIGNAL AND
IT'S COMPOSED OF ARRYTHMICS.
NEXT.
NEXT.
THE ORANGE SIGN SHOWING THE QT SYMBOL.
IF A DRUG IS A QT SYMBOL AND IT IS ASSOCIATED
WITH A HIGH-RISK HOWEVER, THIS IS NOT NECESSARY.
ALONG WITH DRUGS THAT DO PROLONG GATE QT SYMBOL
BUT THEY HAVE LOW RISK SO IF WE ONLY USE THE
QT ON THE SIGNAL, WE WILL THROW AWAY A LOT
OF GOOD DRUGS.
>> THIS IS A PROBLEM WE WANT TO SOLVE.
>> SO THE GOAL FOR THIS PROJECT IS TO FIND
A BETTER WAY OF PREDICTING TORSADE DE POINT
AND WE FIND A WAY.
>> NEXT.
THAT'S HOW A COMPREHENSIVE IN VITRO PRO ARRYTHMIA
WAS DEVELOPED AND IT'S NEW REGULATORY PARADIGM
USED ELECTRICAL SIGNALS TO PRO DICTIONARY
RISK AND THIS NORMAL AND SIGNAL AND ON THE
BOTTOM LEFT IS THE ELECTRICAL SIGNAL FROM
THE CELL LEVEL.
AND THE SIGNAL FROM THE CELL LEVEL IS NEXT.
IT SHOWS LIKE THIS.
NEXT.
SO TRADITIONALLY WE WERE TRYING TO USE THE
TWO TEAMS TO PREDICT NOW WE'RE TRYING TO USE
CHARACTERISTIC OF THE ELECTRIC SIGNAL ON THE
CELL LEVEL AND PREDICT, NEXT.
AND THESE COMBINATIONS OF THIS CHARRER ADVERTISE
TICKS OF THE SIGNAL OF THE SALE LEVEL IS METRICS.
NEXT.
NEXT.
THIS SLIDE SHOWS THE CIPA WORK FLOW.
FIRST WE EXPERIMENT AND WITH THE DRUGS AND
PULL THE ELECTRIC SIGNALS FROM THE CELL AND
I WOULD HAVE A TOTAL OF 28 DRUGS THAT WERE
KNOWN TO HAVE RISK LEVELS AND WE SEPARATED
THIS INTO TWO THINGS.
12 DROPS IN THE TRAINING SET AND 16 DROPS
IN THE VALIDATION SET.
THIS IS USED TO DEVELOP A MODEL AND PROTOTYPING.
NEXT.
NEXT.
THEN WE PUT THE DATA FROM THE EXPERIMENT TO
BUILD A MATH MEDICAL MODEL FOR CHARACTERIZEING
AND AFTER BUILDING THE MATH MEDICAL MODEL
WE RUN THE SIMULATION TO GENERATE THE MATRIX.
NEXT.
AFTER OBTAIN THIS METRIC FROM THE COMPUTER
SIMULATION, WE INPUT THAT INTO A MACHINE LEARNING
MODEL AND OUTPUT OF THE MACHINE LEARNING MODEL
IS THE PREDICTION OF THE RISK OF THE DRUGS.
IN THIS PRESENTATION, FOCUSING ON THIS LAB.
SO THE GOAL HERE IT'S TO SEE THE MACHINE LEARNING
MODEL WHO PREDICT THE RISK, TO TACKLE THIS
PROBLEM FROM DATA SCIENCE PERSPECTIVE WITH
FIRST CLEAN THE DATA SET AND NEXT, WE TRAIN
THE CLASSIFICATION MODEL AND WE USE THE DATA
TO TRAIN THE MODEL TO LEARN WHETHER THEY HAVE
A LOW OR NOT LOW RISK LEVELS BASED ON THE
TRAINING DATA SET. THE MACHINE LEARNING I
USED HERE AND THEY'RE FOUR DIFFERENT OUT OF
THEM AND A GRES AND THE RAINFOREST.
AND AFTER FIELDING THIS CLASSIFICATION MODEL,
APPLY THE MODEL ON THE VALIDATION WHICH COMPOSED
OF 16 DRUGS IN A COMPARE THE PREDICTIVE RISK
AND NO RISK TO ASSESS THE PERFORMANCE OF THE
MODEL.
NEXT.
THIS IS WHAT THE TRAINING DATASET LOOKS LIKE.
IT HAS A RISK AND IT HAS 12 PREDICTORS.
IN THE TRAINING SETS I HAVE INCLUDED 12 DRUGS,
FOUR OF THEM HAVE LOW RISK LEVEL AND EIGHT
HAVE NOT LOW RISK LEVEL.
AND FOR EACH DRUG, I HAVE FULL DOSAGE AND
EACH DOSAGE HAS 2,000 SAMPLES AND YOU CAN'T
THINK OF 2,000 SAMPLES HAS 2,000 DIFFERENT
STEPS.
AND IN THE VALIDATION SET THAT'S I HAVE BEFORE
YOU, FIVE HAVE LOW RISK AND 11 OF THEM HAVE
NOT LOW RISK.
THIS SHOWS THE RESULTS OF THE CLASSIFICATION
MODEL ON THE 16 VALIDATION DRUGS.
SO THIS FIGURE IS COMPOSED OF 16 PANELS.
EACH PANEL HAS ONE FROM THE VALIDATION SET.
SO, ON EACH PANEL, X AXIS IS DOSAGE RANGING
FROM 1-4.
ON THE Y AXIS IT'S THE COUNT OF NUMBER OF
SAMPLES FROM 0 TO 2,000.
NEXT, NEXT.
SO EACH PANEL REPRESENTS ONE DRUG ON THE TOP
OF THE PANEL, IT SHOWS THE NAME ON THE DRUG
AND THE COLOR OF THAT INDICATING WHETHER IT'S
THE DRUG IS AND IF THE DRUG IS A LOW RISK
DRUG, THEN IT'S GROWN AND IF IT'S A HIGH-RISK
DRUG THE COLOR IS RED.
AND EACH BAR INDICATES THE PREDICTION RESULTS.
THE GREEN PART OF THE BAR IS HOW MANY WERE
PREDICTED AS A LOW RISK DRUG AND THE RED ARE
INDICATE HOW MANY IS PREDICTED IS NOT A LOW
RISK DRUG.
THIS LINE SEPARATES HIGH-RISK DRUG AND A LOW
RISK DRUG AND IF THE MODEL WORKS ABOVE THIS
LINE, THERE'S A HIGH-RISK DRUG AND BELOW IS
A LOW RISK DRUG AND IF THE MODEL WORKS WELL,
YOU WILL SEE MORE RED ON THE TOP AND GREEN
ON THE BOTTOM.
WHICH IS EXACTLY WHAT YOU SEE HERE.
NEXT.
THIS SHOWS YOU THE PROPOSALS OF THE MODEL
AND THIS IS CLOSE TO ONE WHICH MEANS THIS
MODEL IS DOING PRETTY WELL, NEXT.
>> THIS IS A SUMMARY STATISTICS OF ALL THE
MODELS I USED.
YOU CAN SEE THAT THEY ALL DOING PRETTY WELL.
IT'S NOT -- SO THE BEST ONE IS THE KNN NEIGHBOR
WITH ONLY ONE METRICS WHICH IS A CHARGE CARRIED
BY A FEW SELECTED CURRENTS AND THIS MODEL
PERFORMED THE BEST AT WHICH INDICATING IS
A GOOD PREDICTOR FOR IT.
IN CONCLUSION, IN SEVERAL CLASSIFICATIONS
MODELS WITH DIFFERENT METRICS USING FULL ALGORITHMS
AND COMPARE THE MODEL PERFORMANCE ON THE DEVELOPING
PREDICTING RISK LEVELS OF VALIDATION TRUCKS.
IT PREVENTS THE I AM PACT ON THE SAFEST PARADIGM
WE'RE CONTINUOUSLY DEVELOPING GUIDELINES USING
INDIVIDUAL DATA TO ASSESS THE VIRUS AND THIS
EXTENDED THE COPE OF THE CLASSIFICATIONS MATTERS
THAT CAN BE USED TO PREDICT DRUG LEVELS INDICATED
IMPORTANCE IN CERTAIN METRICS AND SHED LIGHT
ON THE MECHANISM.
THESE ARE ALL CRUCIAL IN DEVELOPING GUIDELINES
NEXT.
FINALLY I WANT TO THANK PEOPLE IN MY TEAM
WHO APPLY THE DATA AND INSTRUCTORS AND ORGANIZERS
OF COLAB.
>> THANK YOU, CLAIRE.
NEXT UP WE HAVE RYAN LAIRD WHO TELLS YOU HOW
YOU USE MACHINE LEARNING TO FIND CLINICAL
PATTERNS OF AUTO INFLAMMATORY DISEASE.
>> HI, EVERYBODY, THANK YOU FOR VIRTUALLY
BEING HERE TODAY.
MY NAME IS BRIAN LAIRD.
I'M A POST BACHELOR FELLOW WITH THE NATIONAL
-- I WORK UNDER DR. DAN CAST NER.
TODAY I'LL BE GOING OVER MY ATTEMPT TO USE
MY UNSUPERVISED MACHINE LEARNING TO FIND CLINICAL
PATTERNS OF AUTO INFLAMMATORY DISEASE.
SO SOME QUICK BACKGROUND ON OUR RESEARCH.
WE STUDY A GROUP OF RARE DISEASES CALLED AUTO
INFLAMMATORY DISEASES, YOU CAN THINK OF THESE
ESSENTIALLY AS A DIS REGULATION OF THE INNATE
IMMUNE SYSTEM SO IT LEADS TO CLINICAL FEATURES
SUCH AS FEVER, ARTHRITIS, STERILE OR NON INFECTIOUS
SKIN REGIONS, ET CETERA.
SO IT'S A NATURAL HISTORY STUDY GOING ON FOR
25 YEARS NOW AND TO DATE, WE HAVE OVER 2,000
PATIENTS THAT ARE ENROLLED IN THE STUDY OR
TESTING BY ONE OF OUR CLINICIANS AT THE NIH
CLINICAL CENTER.
SO IT'S A VERY DIVERSE GROUP, A DIVERSE COHORT
WITH A LARGE PERCENTAGE OF THE PATIENTS WE
SEE REMAINING UNDIAGNOSED OR UNDIFFERENTIATED.
WE JUST KNOW THEY HAVE SOME SORT OF AUTO INFLAMMATORY
DISEASE AND WE'RE LOOKING FOR A CAUSE.
THEY'RE VERY RARE CASES.
SOMETIMES ONLY DOZENS OF CASES IF THAT KNOWN
WORLDWIDE.
SO MY GOAL WAS TO FIND A WAY TO BETTER STRAT
FIE OR CLUSTER THESE PATIENTS BY HOW THEIR
DISEASE PRESENTS.
TO SEE IF WE CAN FIND ANY NEW INSIGHTS AND
MAYBE AID DOWNSTREAM ANALYSIS OR IN THE FUTURE,
DO DATA-DRIVEN MEDICAL DECISION-MAKING BASED
ON HOW THEY PRESENT.
BUT THE CHALLENGE IS A LARGE AMOUNT OF OUR
DATA IS NOT MOST OF OUR CLINICAL DATA AND
IT IS UNSTRUCTURED AND IT'S IN PRETEXT ELECTRONIC
HEALTH RECORDS THAT HAVE BEEN WRITTEN BY TECHNICIANS.
WE DON'T KNOW EXACTLY WHAT WE'RE LOOKING FOR.
THERE'S NO OUTCOME VARIABLE TO TRAIN OR TARGET
FOR.
WE'RE JUST LOOKING FOR SIMILAR GROUPS OF PATIENTS
WITH SIMILAR DISEASE.
AND LASTLY WHEN IT COMES TO STRATIFYING PATIENTS,
WE HAVE A CURE INDICATED COHORT SO WE'RE NOT
LOOK -- I WANT TO GET CLINICAL DATA OUT OF
OUR EHR NO A WORKABLE FORMAT AND USE THAT
DATA TO CLUSTER PATIENTS WITH UNSUPERVISED
MACHINE LEARNING TECHNIQUES.
SO SOME QUICK BACKGROUND ABOUT THE CLINICAL
DATA OF THE USING THERE'S A BIO MEDICAL KNOWLEDGE
RESOURCE CALLED THE HUMAN PHENOTYPE WHICH
IS A STRUCTURE VOCABULARY FOR DESCRIBING AN
ABNORMAL PHENOTYPE OR ABNORMAL PRESENTATION
OF DISEASE OR CLINICAL FINDINGS AND IT'S THE
STANDINGS IN ANALYSIS AND YOU CAN SEE IN THE
GRAPH, JUST FOR EXAMPLE, CAN YOU SAY SHORT
STATURE AND BODY HEIGHT AND IS A GROSS ABNORMALITY.
IT'S A WAY TO REPRESENT HOW WE THINK ABOUT
DISEASE AND THAT'S WHAT I WANT TO GET FROM
OUR RECORDS, NEXT, PLEASE.
HOW CAN WE GET THESE FROM OUR RECORDS?
I COULD PLEAD WITH OUR CLINICIAN TO MANUALLY
REVIEW OUR PATIENT CHARTS AND EXTRACT THESE
TERMS.
THAT WOULD BE TEDIOUS.
WE HAVE OVER 20,000 CLINICALLY RELEVANT NOTES
AND OUR CLINICIANS ARE BUSY PROVIDING CARE
TO PATIENTS AND THEY DON'T HAVE TIME TO GO
THROUGH ALL OF THESE NOTES AND CONTRACTING
OUT WOULD ALSO BE EXPENSIVE.
TO HAVE SOMEONE ELSE GO THROUGH ALL OF THESE
FOR US.
SO I TURNED MY FOCUS TO AUTOMATED TOOLS, MANY
OF WHICH HAVE BEEN PUBLISHED IN THE PAST DECADE
AND MANY IN THE PAST YEAR.
THE ONE I WILL FOCUS ON TODAY IS A TOOL CALLED
QUINN FEN.
THE OVER ALL PROCESS IS FIRST, I DOWNLOAD
ALL OF OUR NOTES FROM A DATA WAREHOUSE WE
HAVE AT THE NIH CALLED BEATRICE AND IT'S A
WAY TO GET ALL NOTES FOR AWFUL OUR PATIENTS
ON OUR PROTOCOL.
THIS DATA IS MESSY AND SO YOU HAVE TO FILTER
IT DOWN, BASIC FILTERING TO ONLY KEEP CLINICALLY
RELEVANT NOTES.
GETTING RID OF DOCUMENTATION OF CONSENT.
TO REMOVE CLUTTER AND THEN WE CAN TAKE THE
NOTES AND APPLY AND NEXT PLEASE.
SO, IT WAS PUBLISHED LATE IN 2019 OUT OF STANFORD
AND IT'S A MORE TRADITIONAL NATURAL LANGUAGE
PROCESSING TOOL WHERE YOU HAVE A PARAGRAPH
OF TEXT, YOU BREAK IT INTO SENTENCES AND THEN
YOU MATCH PHRASES IN THE SENTENCES TO A DICTIONARY
LOOKING FOR TERMS.
I CHOSE THIS ONE, DESPITE IT BEING MORE BASIC
THAN OTHER TERMS BECAUSE IT DOES DO A GOOD
JOB AT TAKING SENTENCE CONTEXT INTO CONSIDERATION.
SO YOU SEE IN THE MIDDLE, WHERE IT SAYS RENAL
DISEASE IN BLUE, THAT'S A TRUE NEGATIVE.
IT DIDN'T IN THE OUTPUT, CAPTURE THAT OF THE
PATIENT HAVING RENAL DISEASE BECAUSE IT HAS
NO FURTHER OCCURRENCE OF RENAL DISEASE.
I WAS MORE CONCERNED ABOUT HAVING A LOT OF
FALSE POSITIVES IN THE DOWNSTREAM DATA AND
THAN MISSING OTHER TERMS BECAUSE IT WOULD
BE HARDER TO INTERPRET.
THIS WOULDN'T BE NATURAL LANGUAGE PROCESSES
WITHOUT THE OBLIGATORY WORD CLOUD SO THIS
IS WHAT THE OUTPUT LOOKS LIKE FOR OUR COHORT
AND ALL AND ALL IT'S A GOOD REPRESENTATION
AS A WHOLE YOU CAN SEE QUALITY CONTROL IN
THE BOTTOM LEFT AND I HAVE TO DEAL WITH THAT
LATER.
I WAS HAPPY WITH THE INITIAL OUTPUT.
NOW THAT I HAVE THIS DATA FOR OUR WHOLE COHORT,
HOW CAN I ACTUALLY USE IT TO CLUSTER PATIENTS?
WELL, RATHER THAN STARTING WITH EVERYBODY
ALL CLOSE TO 2,000 PATIENTS, I WANTED TO TAKE
A STEP BACK AND DO A PROOF OF CONCEPT AND
SAY TO SEE IF I CAN MAKE SENSE OF IT.
I STARTED WITH A COHORT OF PATIENTS WE SEE
IN OUR CLINIC WITH THE SAME DISEASE.
DEFICIENCY OF DATA 2.
WE HAVE A 54-PATIENT COHORT AND I STARTED
WITH THIS FOR BOTH PRAGMATIC AND CLINICAL
REASONS.
PRAGMATICALLY, WE DISCOVERED THIS DISEASE
BACK IN 2013-2014 SO WE HAVE A VERY GOOD UNDERSTANDING
OF IT AND IT'S A VERY NUANCED AND COMPLEX
DISEASE SO OUR CLINICIANS MENTALLY CLUSTER
PATIENTS INTO THESE THREE CATEGORIES INTO
AN CATEGORIES.
SO I KNEW I HAD SOMETHING TO LOOK BACK ON
CAN I CATCH YOUR OUTPUT SOMETHING TO THIS
USING THESE HPO TERMS AND I WAS MOTIVATED
BY THE FACT THAT UNFORTUNATELY A DEFINING
FEATURE THIS IS DISEASE IS EARLY ONSET STROKE.
AROUND HALF OF THE PATIENTS IN OUR COHORT
HAVE HAD AT LEAST ONE EVENT AND THE MEDIUM
ONSET BEING FIVE-YEARS-OLD AND UNFORTUNATELY,
IT DOESN'T COME UP ON THE PATIENT'S DIFFERENTIAL
UNTIL THERE'S A TRAUMATIC STROKE IF WE CAN
FIND A BETTER WAY TO CATCH DISEASE AND FINDING
THE PATTERNS THAT COULD BE INCREDIBLY USEFUL
AND BETTER TREATING THE PATIENTS.
I TAKE THE HPO TERMS OUTPUT AND TRANSFORM
IT INTO THIS FORMAT AND EACH ROW REPRESENTS
A PATIENT AND THEN THERE'S A COLUMN FOR EVERY
SINGLE HPO TERM THAT WAS FOUND WITHIN THIS
CORP IS AND IT'S A BINARY ONE OR ZERO AND
IT SHOWED UP SOMETHING IN THE PATIENT'S RECORD.
I TOOK THIS DATA AND APPLIED TWO DIFFERENT
CLUSTERING METHODS OR PIPELINES TO THAT.
I DON'T HAVE TIME TO GO INTO TOO MUCH DETAIL
BUT FROM A HIGH LEVEL, THE SIMILARITY IS A
MORE CONVENTIONAL APPROACH POPULAR AS IN THE
90S WHERE YOU USE THE STRUCTURE THAT I SHOWED
YOU EARLIER AND CALCULATE TERM FRO AND FROM
THERE YOU CAN GET AND EXTRAPOLATE AND FOR
THOSE FAMILIAR WITH THIS MODEL OF WORD EMBEDDING,
THIS IS AN ADAPTATION OF THAT WHICH WORKS
ON THE GRAPH STRUCTURE OF THE HPO AND IT ALLOWS
YOU TO ENRICH THE HPO WITH OTHER MID OWE MEDICAL
ANNOTATION SOURCES.
I DREW LINKS.
TO GIVE YOU MORE REAL WORLD UNDERSTANDING
OF WHAT THAT TERM MEANS OR WHAT IT REPRESENTS.
SO I WAS SURPRISED WITH MY COMMERCIAL RESULT.
YOU ARE LOOKING AT A NETWORK WHERE EACH NODES
IS A INDIVIDUAL ONE OF THE 54 AND I DREW A
LINK BETWEEN PATIENTS.
IF THEIR SIMILARITY SCORES FROM ONCOLOGY SIMILARITY
METRICS WAS GREATER THAN THE COHORT MEDIA
TO CAPTURE RELATIVE DIFFERENCE.
AS I MENTIONED EARLIER, EVERYBODY HAS A HIGH
DEGREE OF SIMILARITY ALREADY.
I USED THE LIGHTING TO DO COMMUNITY DETECTION
AND USE THAT AS IN ORDER TO EVALUATE THE CLUSTERS,
I INITIALLY STARTED WITH A MORE BASIC APPROACH
LOOKING AT WHAT TERMS ARE MORE FREQUENT AND
ONE PATIENT CLUSTER THAN ANOTHER AND
I WAS SURPRISED TO SEE THAT THE QUALITATIVE
CATEGORIES NEED REVIEWING AND THE FREQUENT
TERMS IN EACH CLUSTER AND THOSE LARGELY REPRESENT
THE WAY OUR CLINICIANS ALREADY THINK ABOUT
THESE PATIENTS MENTALLY AND REMEMBER THAT
DIAGRAM I SHOWED YOU EARLIER.
IT'S A PROMISING PROOF OF CONCEPT AND I'M
HAPPY WITH IT SO FAR.
THIS IS THE RESULTS FROM USING HPO WHICH IS
QUITE SIMILAR ALTHOUGH A LITTLE LESS GRANULAR
SO I NEED TO TWEAK IT AROUND IN THE FUTURE.
I AM HAPPY WITH TEASE CATEGORIES.
I HAVE THESE PATIENT CLUSTERS NOW AND WHAT
CAN DO YOU WITH IT?
IT WAS A HUGE TIME SAVER AND WE DIDN'T HAVE
TO GO THROUGH AND CLUSTER THESE PATIENTS.
GOING BACK TO WHAT I WAS TALKING ABOUT WITH
THE STROKES AND DATA TOO, I'M MOST EXCITED
WITH THIS OUTPUT BECAUSE EACH CLUSTER ACTUALLY
HAS A FEW PATIENTS THAT HAVE HAD A STROKE
IN IT.
I WAS DISAPPOINTED.
I THOUGHT TO MYSELF, STROKE IS SUCH A DEFINING
FEATURE.
HOW ARE THESE TECHNIQUES MISSING THAT.
AS I THINK ABOUT IT, PERHAPS STROKE IS A VERY,
VERY DEFINING FEATURE CLINICALLY WHEN WE THINK
ABOUT A PATIENT'S DIFFERENTIAL.
WE DON'T WANT TO EVEN HAVE TO SEE THAT RED
FLAG.
WE WANT TO CAPTURE THESE PATIENTS THAT HAVE
THIS DISEASE BEFORE THEY DO HAVE
A STROKE SO WE CAN PROVIDE MEDIATING TREATMENT
BEFORE THAT PERIOD.
THESE GROUPS ARE PROMISING THAT THERE IS A
BETTER WAY TO CATEGORIZE THESE PATIENTS WITHOUT
RELYING SO HEAVILY ON 
THE MOST OBVIOUS FEATURES, THE MOST CLINICALLY
OBVIOUS FEATURES.
I THINK THAT CAN AID A VIRTUES CYCLE BETWEEN
OUR CLINICAL TEAM AND OUR LAB AND PATIENTS
AND FAMILIES WE SEE IF IF WE BETTER UNDERSTAND
THESE DISEASES WE CAN GET EARLIER DIAGNOSIS,
AND IT GETS PEOPLE ON TREATMENT SOONER AND
HELP WITH TARGETED CARE TO THE PATIENTS WE
SEE.
AND, OF COURSE THERE ARE CAVEATS.
THIS WAS A RETROSPECTIVE STUDY AND THERE'S
A LOT OF WORK TO BE DONE AND I'M VERY HAPPY
SO FAR.
LASTLY, I'D LIKE TO
THANK OUR CLINICAL TEAM SEEING THESE PATIENTS
AND OUR LAB AND EVERYBODY ELSE AT THE NIH
WHO TAKES CARE OF OUR PATIENTS AND THEY DO
A GREAT JOB.
OF COURSE I'D LIKE TO THANK ALL OF
OUR PATIENTS AND THEIR FAMILIES AND NONE OF
OUR WORK WOULD BE POSSIBLE WITHOUT THEIR COMMITMENT
AND SUPPORT AND I'D LIKE TO THANK THE DATA
SCIENCE CO LABS AND ALL OF MY COLLEAGUES THAT
HAVE MET THROUGH IT.
A YEAR AGO I
DIDN'T HAVE ANY PROGRAMMING OR DATA SCIENCE
EXPERIENCE AND NOW I'M CONFIDENT DOING INTERESTING
ANALYSIS LIKE THIS.
THANK YOU ALL FOR BEING HERE TODAY.
>> THANK YOU, RYAN.
APPRECIATE YOUR PRESENTATION AS WELL AS PRESENTATIONS
OF OUR OTHER FOUR PRESENTERS.
