IS ROHAN I'M FROM COSMOS DB. I'M 
HERE WITH MY COLLEAGUES SRI AND 
ROHAN, I'M HERE WITH TALKING ABOUT 
AZURE-COSMOSDB-SPARK CONNECTORS. 
SO MANY ANNOUNCEMENTS AND THINGS 
GO ON. REST ASSURED WE SAVED THE 
BEST FOR LAST. DO NOT WORRY, WE 
ARE IN SAFE HANDS. WE ARE TALKING 
ABOUT A LOT OF EXCITING STUFF. A 
LOT OF LEARNING IN THIS SESSION 
AS WELL. SPEAKING OF LEARNING WE 
HAVE DESIGNED THIS SESSION AROUND 
A FEW OBJECTIVES THAT I WOULD LIKE 
TO BRIEFLY DISCUSS. NUMBER ONE, 
WE WANT YOU TO BE COMFORTABLE WALKING 
OUT OF HERE UNDERSTANDING WHY AND 
WHEN TO USE COSMOS DB IN CONJUNCTION 
WITH SPARK. THE USE CASES WHEN IT 
IS CONDUCIVE TO USE SPARK IN CONJUNCTION 
WITH COSMOS DB. SECOND, WE ARE GOING 
TO WALK YOU THROUGH EVERY SINGLE 
LINE OF CODE TO IMPLEMENT A REAL-TIME 
ANALYTICS SOLUTION CAN THIS. I'M 
SURE YOU WILL CODE THIS AFTER THE 
SESSION. WE WILL TALK ABOUT A REAL 
PRODUCTION SCHEDULERS. COCA-COLA 
IMPLEMENTED IN JOURNEY TO OVERCOME 
ANALYTIC CALL CODE, USING COSMOS 
DB AND SPARK. IMPLEMENTED A SOLUTION 
OVER THE LAST FEW MONTHS. SUPER 
HAPPY WITH IT. GETTING INTO THE 
DETAILS OF IT LATER. LAST BUT NOT 
LEAST, I HEARD YOU HEARD ABOUT THE 
NEW FEATURES BUILT IN COSMOS DB. 
EVEN IF YOU HAVEN'T ATTENDED SCOTT 
GUTHRIE KEYNOTE, TOTALLY FINE. WE 
WILL WILL TALK ABOUT THE FEATURES 
INTRODUCED, ANALYTIC, NATIVE SPARK 
SUPPORT INSIDE COSMOS DB WHICH CAN 
ENABLE A LOT OF USE CASES ESPECIALLY 
WITH REAL-TIME ANALYTICS SCENARIOS. 
WE'LL GET INTO DEMOS. WE'LL TALK 
ABOUT THE BENEFITS OF THAT AND WHAT 
VALUE PROPOSITION THAT BRINGS FOR 
CUSTOMERS AND BUSINESSES. HAVING 
SAID ALL OF THAT, LET'S MOVE ON 
TO A SIMPLE QUESTION. WHY SHOULD 
WE USE AZURE-COSMOSDB-SPARK CONNECTORS 
TOGETHER? IF YOU THINK ABOUT IT, 
SPARK IS THE GO-TO AN ANALYTICAL 
SOLUTION DEEMED FOR PERFECT FIT 
FOR REAL-TIME ANALYTICS IN THE WORLD. 
SPARK CAME ABOUT IN 2014 WHEN THE 
FIRST RELEASE I BELIEVE. BACK THEN, 
I WAS WORKING AS A DEVELOPER AT 
ONE OF THE OLDEST BANKS IN THE WORLD 
DEVELOPING A FRAUD DETECTION SYSTEMS. 
SPARK WAS A NEW THING BACK THEN. 
THE FIRST THING I TRIED IT, IT IS 
AMAZING. IT IS BLAZING FAST. IT 
HELPS YOU TO COMPUTE YOUR BIG DATA 
ESSENTIALLY IN A VERY SEAMLESS MANNER. 
AND YOU CAN DO A NUMBER OF THINGS 
WITH IT. RIGHT? IT MAKES IT SO SIMPLE. 
YOU ARE STREAMING, MACHINE LEARNING 
AND WHAT THOUGHT. ON THE OTHER HAND, 
COSMOS DB IN THIS PICTURE, NOTHING 
BUT A NOSQL DATABASE ON THE CLOUD 
THAT IS OFFERED ON AZURE. THE BENEFITS, 
A LONG LIST. BUT TO SUM IT UP IN 
ONE SENTENCE, OFFER GLOBAL DISTRIBUTION, 
WE ALLOW SCALING THROUGH IN AND 
OUT WHATEVER TIME YOU NEED TO SCALE 
IN AND OUT. IT IS SUPER FLEXIBLE. 
ANY CHANGE YOU MAKE IS INSTANTLY 
REFLECTED. LAST BUT NOT LEAST, MULTI 
MODEL CAPABILITIES. WHAT THAT MEANS 
IS, IF YOU COME FROM A STRUCTURED 
BACKGROUND OR UNSTRUCTURED BACKGROUND, 
DOESN'T MATTER. OPTIONS FOR BOTH 
OF THE SCENARIOS MUCH WE HAVE SQL 
API AND GREMLIN API FOR GREENFIELD, 
MIGRATING FROM EXISTING WORKLOADS 
AND MIGRATION API'S FOR CUSTOMER 
WHOSE ARE USING NOSQL DATABASES 
PRESENTLY ON-PREM OR IS SOLUTIONS 
LIKE CASSANDRA, MONGO DB AND WHAT 
HAVE YOU. APs FOR EVERY SINGLE SCENARIO 
THAT YOU COULD THINK OF. THE COSMOS 
DB OFFERS A LOT OF THINGS AND THIS 
SORT OF PRESENTATION CATERS TO THE 
SCENARIO OF H IT AP. PEOPLE HAVE 
OFTEN THOUGHT OF THIS DISTINCTION 
BETWEEN DATABASES AND ANALYTICAL 
DATABASES. OLD APP VERSUS OLTP. 
OVER THE TIME, YOU MIGHT HAVE SEEN 
THIS LAP IS BLURRING. OLTP AND HTAP 
IS CONVERTING NOW. COSMOS DB IS 
PRINCIPLE ON HTAP. THIS IS WHERE 
YOU HAVE A LAYER, RATIONAL DATA 
SITS ON. THE SAME TIME SINCE STORAGE 
IS NOW AVAILABLE AND WITH COSMOS 
DB STORAGE IS NOT YOUR PRIMARY CONCERN. 
MOST OF THE BUILD YOU PAY FOR, THRUPUT, 
WHICH IS SENSITIVE IN LOW SCENARIOS. 
THE COSMOS DB ENABLES ALL OF THE 
SCENARIOS THAT SATISFY THE LOW LATENCY 
AND STUFF LIKE THAT. I'M NOT GOING 
TO GET INTO DETAIL BECAUSE THIS, 
THIS SESSION IS MORE ABOUT REAL-TIME 
ANALYTICS THAT HAS ENABLED THROUGH 
COSMOS DB AND SPARK. WE ARE GOING 
TO STICK TO THIS SCENARIO. IF YOU 
LOOK AT ALL OF THE REALMS OF INDUSTRY, 
THERE IS A LOT OF USE CASES THAT 
CAN BE USED TO COSMOS DB AND SPARK 
AND CONJUNCTION. THERE ARE PEOPLE 
FROM FINANCE, RETAIL GAMING AND 
HEALTHCARE, IOT, WEATHER, LOGISTICS, 
AVIATION AND ALL SORTS OF INDUSTRIES 
THAT ARE USING COSMOS DB TO PUT 
THEIR DATA IN AND USE IT IN REAL-TIME. 
WE ARE GOING TO TAKE A FEW EXAMPLES 
OF THESE, I'LL DISCUSS WHATEVER 
WE SEEN CUSTOMERS USE. SOME PRIMARY 
ONES OF COURSE. I DON'T WANT TO 
HAVE TOO MUCH TIME ON THAT. AS MUCH 
TIME ON DEMOS. THE STUFF YOU CAN 
TAKE BACK AND IMPLEMENT YOURSELF. 
NOW A 5, 000 FOOT VIEW, HOW DOES 
REAL-TIME ANALYTICS WORK WITH APACHE 
SPARK AND COSMOS DB. LIKE I SAID, 
ALL THESE INDUSTRIES ARE DUMPING 
DATA INTO COSMOS DB AND THE BASIC 
QUESTION THAT YOU WOULD WANT TO 
ASK PEOPLE ANY SORT OF FORTUNE 500CEO 
OR DATA SCIENTIST OR ENGINEER ALL 
THE WAY TO BUSINESS ANALYST OR PRODUCT 
MANAGER, SIMPLE QUESTION THAT EVERYONE 
IS ASKING, IS WHAT CAN WE DO WITH 
THIS RICH BIG DATASET THAT WE HAVE 
IN COSMOS DB. WHAT IS THE VALUE 
PROPOSITION THAT WE CAN DERIVE OUT 
OF RICH DATA WE HAVE IN THE DATABASE 
THAT IS HIGHLY ACCESSIBLE AND AVAILABLE 
AT A LOW NC ASPECT ? AND THE ANSWER 
IS, THE POSSIBILITIES ARE ENDLESS. 
ONCE YOU HAVE THE RIGHT DATA AND 
YOU HAVE THE RIGHT COMPUTE FOR YOUR 
WORK, THERE ARE SO MANY THINGS THAT 
YOU CAN IMPLEMENT. A GOOD EXAMPLE, 
I'M GOING TO DISCUSS THE USE CASES 
THAT WE HAVE SEEN IN ALL OF THE 
INDUSTRIES IN A FEW. THE WAY THAT 
IT WORKS IS THE ARCHITECTURE FLOW 
IS BASICALLY ALL OF THE DATA IS 
INGESTED AND STORED IN COSMOS DB. 
COSMOS DB IS VERY LATENCY EFFICIENT 
DATABASE. IT KEEPS INTO YOUR LATENCY 
NEEDS AND CUSTOMIZE THAT FOR YOUR 
NEEDS. ONCE THE DATA IS STORED INSIDE 
COSMOS DB, THERE IS A VERY USEFUL, 
HANDY NIFTY FEATURE OF COSMOS DB 
CHANGE FEED TO SIMPLY DEFINE WHAT 
CHANGE FEED IS. IT IS NOTHING BUT 
A BACK END COMMIT LOG ON TOP OF 
COSMOS DB. SHOWS YOU WHAT DATA IS 
CHANGING IN REAL-TIME. IF YOU WANTED 
TO SEE WHAT IS HAPPENING IN YOUR 
COSMOS DB ACCOUNT IN REAL-TIME, 
THIS COMES OUT OF THE BOX. CAN YOU 
SEE WHATEVER IS HAPPENING IN REAL-TIME 
IS CHANGE. WITH THIS RICH STREAM 
OF CHANGING DATA USING CHANGE FEED 
YOU CAN SOURCE ALL EVER THE EVENTS 
WITH SPARKS, SPARK STREAMING AND 
ONCE THE DATA LANDS IN SPARK, THERE 
IS NUMBER OF THINGS YOU CAN DO WITH 
IT. MACHINE LEARNING, PRESCRIPTIVE 
ANALYTIC, DESCRIPTIVE ANALYTICS 
AND WHAT HAVE YOU. THE WHOLE NINE 
YARDS. FROM THERE, YOU CAN DO A 
COUPLE OF THINGS. IF YOU WANT TO 
STORE VIEW, MATERIALIZE VOWS AND 
SMALL VIEWS YOU WANT TO HAVE FOR 
COMPUTER BATCHES, DO THAT AND DUMP 
THAT BACK INTO SPARK. IF YOU STRICTLY 
WANT TO SATISFY THE ANALYTIC REQUIREMENTS, 
CONSERVE THOSE AND ENABLE THE BUSINESSES 
TO MAKE KEY DECISIONS BASED ON THE 
INFORMATION THAT IS COMING INTO 
THE DATABASE AND REAL-TIME. NOW 
BEFORE I MOVE ON, TWO THINGS THAT 
WE ARE GOING TO BE TALKING ABOUT 
IN THIS SESSION, THE BASIC CLASSIFICATION 
OF REAL-TIME ANALYTICAL PROCESSING 
ARCHITECTURES. TWO THINGS THAT THE 
INDUSTRY HAS SEEN KAPPA ARCHITECTURE 
AND LAMDA ARCHITECTURE. WE'LL TALK 
WHAT THESE TWO ARE. APP KAPPA IS 
NOTHING BUT REAL-TIME STREAM, ONE 
OR MORE INGESTING, PUTTING DATA 
INSIDE THE COMPUTE FRAMEWORK WHICH 
COULD BE SPARK OR ANYTHING ELSE. 
NOW IN KAPPA ARCHITECTURE IT SEEMS 
PRETTY SIMPLE BECAUSE YOU HAVE STREAMS 
AND YOU ARE PROCESSING SOMETHING 
IN THE COMPUTE LAYER AND OUTPUT 
SOMETHING THAT MIGHT ENABLE A BUSINESS 
TO MAKE YOUR DECISIONS OR MIGHT 
JUST BE INFORMATION OR ANALYSIS 
OF YOUR DATA AS IT IS COMING IN. 
BUT OVERTIME, WHAT YOU WOULD SEE 
IS THAT MOST KAPPA ARCHITECTURES 
TURN TO SOMETHING CALLED LAMBDA 
ARCHITECTURE, THAT IS NOTHING BUT 
YOUR STREAMS IN CONJUNCTION WITH 
YOUR BATCH HISTORICAL DATA. BOTH 
COME TOGETHER TO SUPPLEMENT OR SOMETIMES 
EVEN COMPLEMENT EACH OTHER TO HELP 
THE BUSINESSES OR CONSUMERS MAKE 
MORE INFORMED DECISIONS. I'LL TALK 
MORE ABOUT THAT AS WE DELVE INTO 
THE LOW LEVEL DETAILS. NOW THE BASIC 
USE CASES THAT WE SEE IN THE INDUSTRY 
AMONG THE COSMOS DB CUSTOMERS ARE 
RETAIL, VERY CLASSY EXAMPLE, YOU 
HAVE ALL OF THE CONSUMER BEHAVIOR 
DATA. THE CUSTOMER TRANSACTIONS. 
THE CUSTOMER PROFILE. CUSTOMER PROFILE 
ISN'T SOMETHING THAT CHANGE A LOT. 
STATIC DATA. SOMEONE'S AGE CHANGE 
ONCE A YEAR. SOMEONE'S NAME WILL 
NEVER CHANGE. ON THE OTHER HAND, 
THIS BATCH VIEW, CUSTOMER TRANSACTIONS 
AND SEEING WHAT THEY ARE BUYING. 
USING THAT, YOU CAN PREDICT AND 
RECOMMEND PRODUCTS TO THEM BASED 
ON WHATEVER RETAIL SUBSECTOR YOU 
ARE IN. NOW IF YOU HAVE REAL-TIME 
DATA COMING IN, WITH TRANSACTIONS 
IF SOMEONE IS BUYING SOMETHING, 
AND THAT TRANSACTION HAS PARAMETERS 
LIKE, OH, THIS PERSON IS IN SEATTLE, 
DOWNTOWN, WHY NOT SEND THEM A PRODUCT 
RECOMMENDATION FOR NEARBY STORES 
AND THEY MIGHT BE INTERESTED IN 
BUYING SOMETHING SINCE THEY ARE 
SHOPPING. USE SUCH CASES AS CLASSIC 
EXAMPLE FOR RETAIL, ANALYTICAL PROCESSING 
SCENARIOS. SPARK AND COSMOS DB WORK 
TOGETHER TO ENABLE SUCH USE CASES. 
FINANCE ANOTHER CLASSY EXAMPLE. 
AND YOU HAVE BANKING TRANSACTION 
DATA THAT DEFINES A CONSUMER OR 
CUSTOMER BEHAVIOR FROM THE LAST 
10, 15, 20 YEARS HOWEVER LONG YOU 
HAVE BEEN STORING CUSTOMER DATA 
FOR. AND WITH THAT DATA, WITH THAT 
RICH DATASET, YOU CAN UNDERSTAND 
HOW SOMEONE PERFORMANCE AND TRANSACTIONS 
CAN NORMALIZE THE BEHAVIOR OF THAT 
CUSTOMER. IF TRYING TO MAKE A FRAUD 
DETECTION SYSTEM, IT BECOMES SUPER 
EASY BECAUSE YOU HAVE REAL-TIME 
TRANSACTIONS COMING IN AND THEN 
YOU HAVE THIS PAST 10, 15 YEARS 
WORTH OF DATA THAT SHOWS WHAT THE 
NORMAL BEHAVIOR FOR TRANSACTION 
FOR CUSTOMER. THE TRANSACTION COMES 
IN WITH THE AMOUNT OF $50, 000 AND 
YOU SEE FOR A CUSTOMER YOU ONLY 
USED TRANSACTION, ONLY MADE TRANSACTIONS 
UNDER $100 IN THE LAST TEN YEARS. 
THAT SEEMS SUSPICIOUS. AND YOU CAN 
COMBINE THESE TWO TO SUPPLEMENT 
INFORMATION AND INITIATE FRAUD DETECTION 
MODEL THAT TRIGGERS TRANSACTION 
FRAUD. THAT WAS A USE CASE OR NUANCES 
TO FLAG A TRANSACTION. THEN AGAIN, 
THERE ARE SO MANY THINGS THAT YOU 
CAN DO WITH IT. GAMING IS A USE 
CASE THAT IS VERY SIMILAR TO RETAIL. 
BECAUSE YOU HAVE A LOT OF INFORMATION 
ABOUT THE CUSTOMER WHICH IN THIS 
CASE IS A GAMER. THE KEY USE CASE 
THAT WE HAVE SEEN WITH COSMOS DB 
AND SPARK IN THIS INDUSTRY AS MONETIZATION 
OF GAMES DURING THE PLAYER'S EXPERIENCE. 
YOU WANT TO SHOW THEM AS RELEVANT 
OF RECOMMENDATION AS YOU CAN USING 
THEIR PAST EXPERIENCE AND THE CURRENT 
OPERATIONS THAT THEY ARE DOING USING 
THEIR GAMES. THESE ARE VERY CENTERED 
CASES. THIS SLIDE I FIND SOMETHING 
VERY INTERESTING ABOUT. EACH OF 
THESE USE CASES, EACH OF THESE INDUSTRIES 
HAVE A VERY NICHE MARKET OR A VERY 
NICHE USE CASE. IN HEALTHCARE, WE 
ARE LITERALLY SAVING LIVES BY GIVING 
CONSUMERS OR HOSPITALS THE ABILITY 
INFRASTRUCTURE TO TRIGGER ALERTS 
BASED ON PAST HEALTH PATTERNS THAT 
WE HAVE OBSERVED AMONG OTHER PATIENTS. 
RIGHT? YOU GOT ALL OF THE PATIENT 
DATA IN THE WORLD. AS SOON AS SOMEONE 
ELSE COMES IN, AND HOOKS UP THEIR 
ALS, PULSES, YOU CAN SEE FROM PAST 
DATA WHEN THE LAST TIME SOMEONE 
WAS A CARDIAC ARREST PATIENT THAT, 
YOU KNOW, YOU SEE PATTERNS THAT 
LED UP TO THAT CARDIAC ARREST. AND 
YOU CAN DEDUCE THAT BEHAVIOR FOR 
WHAT TRIGGERS THAT SORT OF BEHAVIOR 
USING MACHINE LEARNING MODELS IMPLEMENTED 
IN SPARK. YOU CAN RECREATE THAT 
ENTIRE EXPERIENCE TO TRIGGER ALERTS 
TO NOTIFY STAFF LIKE DOCTORS AND 
NURSES, SOMETHING IS GOING ON HERE. 
YOU MIGHT WANT TO CHECK ON THAT 
PERSON. IT IS LITERALLY SAVING LIVES. 
MANUFACTURING SIDE, YOU HAVE ANOTHER 
VERY INTERESTING USE CASE. WITH 
THE, WITH THE ADVENT OF SENSORS 
AND IOT, YOU HAVE GOT EXPENSIVE 
MACHINES LIKE DRILLING MACHINES, 
IF YOU ARE IN OIL AND GAS, YOU ARE 
DRILLING WITH THESE EXPENSIVE MACHINES 
THAT COSTS THOUSANDS OF DOLLARS 
EVERY MINUTE TO OPERATE. IF YOU 
START DRILLING FOR OIL IN THE WRONG 
DIRECTION AND YOU ARE NOT AWARE 
OF IT, EVERY SECOND THAT YOU WASTE, 
DRILLING IN THE WRONG DIRECTION, 
YOU ARE WASTING THOUSANDS OF DOLLARS. 
NOW THINK ABOUT SENSORS BEING ATTACHED 
TO THE DRILLING MACHINES. IF YOU 
COULD USE, SOMEHOW LEVERAGE THE 
SENSORS TO PROVIDE YOU REAL-TIME 
FEEDBACK AND COUPLE THAT WITH PAST 
DATA TO SORT OF IDENTIFY IF YOU 
ARE GOING IN THE WRONG DIRECTION, 
THAT REALLY HELPS YOU OUT IN SAVING 
COSTS. AND YOU CAN STOP DRILLING 
RIGHT AWAY AND BACK OFF FROM THAT 
AREA AND NOT WASTING A COST EFFORT. 
LOGISTICS, ANOTHER USE CASE. MORE 
OF UMBRELLA, COVERS A LOT OF INDUSTRIES 
AS WELL. THE PRIMARY USE CASE, STANDARD 
COMPUTER SCIENCE PROBLEM, THINK 
ABOUT ALGORITHM. OPTIMIZING ROUTES. 
FROM A TO B AND OPTIMIZE THE ROUTE. 
AND A LOT OF THINGS PLAY INTO THE 
VARIABLES IN THE AIR OVER HERE. 
AND YOU HAVE GOT, SAY YOU HAVE A 
PLANE OR A SHIP OR A TRUCK ON THE 
STREET. AND IT COULD BE THE WEATHER, 
THE WIND CONDITION, THE TRAFFIC 
ON THE STREET THAT IS REALLY AFFECTING 
YOUR DELIVERY FROM A TO B. ALL OF 
THESE THAT ARE SHIPPED. RIGHT? AND 
A LARGE PART OF IT IS DETERMINED 
BY REAL-TIME DATA WHICH IS COMING 
IN THROUGH WEATHER REPORTS OR WHATEVER 
YOUR SOURCE IS. YOU CAN AGAIN USE 
SPARK AND COSMOS DB IN CONJUNCTION 
TO OPTIMIZE THE ROUTE AND MAKE INFORMED 
DECISIONS ABOUT THE LOGISTICS ROUTE 
OPTIMIZATIONS. THIS IS A VERY GOOD 
USE CASE THAT YOU SEE A LOT OF LOGISTICS 
COMPANIES USE IN THE FUTURE AS WELL. 
NOW I KIND OF MENTIONED THAT THERE 
IS TWO MAIN DATA PROCESSING ARCHITECTURES 
WHEN IT COMES TO REAL-TIME DATA. 
KAPPA IS AGAIN SOMETHING THAT IS 
A VERY SMALL BUCKET OF WHAT IS USED 
IN THE INDUSTRY. BECAUSE MOST OF 
THE TIMES IT ENDS UP CONVERTING 
TO LAMBDA ARCHITECTURE IN THE LONG 
RUN. KAPPA ITSELF IS NOT GOOD ENOUGH 
TO PROVIDE A HOLISTIC VIEW OF THE 
CUSTOMER'S JOURNEY AND THE IMPACT 
THAT EACH TRANSACTION IS MAKING 
WHICH IS WHY YOU NEED TO COUPLE 
IT WITH LONG TERM DATA AND DEFINE 
THE CONSUMER BEHAVIOR WITH BATCH 
VIEWS OF THE PAST HISTORICAL TRANSACTIONS 
AND THE AVATIONS OF THEIR BEHAVIOR 
AS WELL. TYPICALLY, WE ARE GOING 
TO FOCUS ON LAMBDA ARCHITECTURE 
AT THIS POINT. IT COVERS ASPECTS 
OF KAPPA, THE BIGGER USE CASE SUCH 
AS LAMBDA. LET'S LOOK AT HOW LAMBDA 
ARCHITECTURE HAS BEEN IMPLEMENTED 
IN THE PAST. AND WHAT IS WRONG WITH 
IT AND WHAT IS RIGHT WITH IT. TYPICALLY, 
NEW DATA COMES IN AND YOU HAVE, 
YOU HAVE A MULTI CAST THAT NEW DATA 
INTO TWO LAYERS. THREE MAIN COMPONENTS 
IN ANY LAMBDA ARCHITECTURE. AND 
YOU HAVE THE BATCH LAYER THAT HAS 
THE MASTER DATASET THAT HELPS YOU 
SORT OF CREATE A PRECOMPUTED VIEW 
THAT CAN BE QUERIED WHENEVER YOU 
NEED TO HAVE A OPTIMIZED VIEW OF 
THE ENTIRE SET. YOUR SERVING LAYER 
IS NOTHING BUT THE LAYER THAT HOLDS 
THAT PRECOMPUTER VIEW THAT CAN BE 
QUERIED WHENEVER YOU NEED IT. NOW 
TYPICALLY WHAT HAPPENS IS, LAMBDA 
ARCHITECTURE IS IMPORTANT BECAUSE 
LIKE I SAID, YOU NEED THE HOLISTIC 
VIEW OF EVERYTHING. THERE IS A LOT 
OF LIMITATIONS WITH LAMBDA ARCHITECTURE, 
SOME OF THEM, IMPLEMENTING ARCHITECTURE 
IN 2010, WHEN SPARK WAS NEW, THE 
BIGGEST CHALLENGE, THE SERVING LAYER, 
YOU NOTICE THAT OR THE BATCH LAYER, 
LET ME START WITH THAT. THAT IS 
MORE OF A CLOSER TO ACTUAL INITIATION 
OF THE PROCESS. THE BATCH LAYER 
NEEDS TECHNOLOGY THAT IS VERY ROBUST 
AND STABLE. MOSTLY PEOPLE ENDED 
UP USING HADOOP OR HGSS FOR BATCH 
LAYER, A LOT OF THE MASTER SET IS. 
RAW DATA STORE IN THE RAW FORM. 
RIGHT? YOUR SERVING LAYER KEEPER 
HAD, OR IN YOUR SPEED LAYER PEOPLE 
HAD APACHE STORM OR APACHE SPARK 
WHICH ARE FAST COMPUTE ENGINES THAT 
CAN TRIGGER REAL-TIME DATA AND SOURCE 
IT WHENEVER YOU NEED IT. YOUR SERVING 
LAYER IN THE CONVENTIONAL ARCHITECTURE 
BACK IN THE EARLY 2010, PEOPLE HAD 
STUFF LIKE APACHE H SPACE OR ANY 
SORT OF FASTER LOOK UP TABLE DATABASE 
THAT CAN BE USED TO HAVE READILY 
AVAILABLE DATA TO BE QUERIED AND 
COUPLED WITH THE REAL-TIME. THE 
CHALLENGE WITH HAVING SO MANY COMPLEX 
TECHNOLOGIES IN THIS ONE FRAME. 
IS THAT THERE IS TOO MANY MOVING 
PARTS . IT BECOMES A LOT OF OVERHEAD. 
THAT FOR ME AND OTHER PEOPLE THAT 
I WORKED WITH, AND THE TRENDS IN 
THE INDUSTRY SPEAK, THIS IS A HUGE, 
HUGE BLOCKER. IT WAS HUGE DEVELOPER 
CONCERN . HOW DID WE ARCHITECTURE 
THIS? SIMPLE. WE DON'T HAVE THE 
MULTI CAST ANYTHING TO TWO DIFFERENT 
LAYERS. USE THE CHANGE FEED THAT 
I SPOKE OF EARLIER. SO YOU ONLY 
HAVE TO SOURCE, OR INGEST THE DATA 
IN COSMOS DB. SINCE THEY HAVE A 
FEATURE CALLED THE CHANGE FEED WHICH 
IS NOTHING BUT THE BACKEND COMMIT 
LOG, OR DATA COMING IN, YOU CAN 
USE THE CHANGE FEED AS A SOURCE 
FOR YOUR SPEED LAYER. YOU ARE ONLY 
CASTING TO ONE LAYER WHICH IS THE 
BATCH LAYER MANY AND CHANGE FEED 
TAKES CARE OF THE REST. YOU HAVE 
TO HOOKUP WITH CHANGE FEED WITH 
SPARK TO SOURCE OR STREAM THIS REAL-TIME 
DATA COMING IN. AND YOUR SPEED LAYER 
IS READY. JUST LIKE THAT. THE BATCH 
LAYER AND SERVING LAYER ARE QUITE 
SIMILAR TO WHAT THEY WERE LIKE IN 
THE CONVENTIONAL LAMBDA ARCHITECTURE 
AS WELL. THIS IS PRETTY MUCH THE 
SAME THING. MASTER DATASET RESIDING 
IN THE BATCH LAYER. SPARK THAT PRECOMPUTES 
OF YOU. PUSHES TO THE SERVING LAYER. 
THE SERVING LAYER IS FASTER LOOK 
UP TABLE OR COLLECTION FOR YOUR 
HISTORICAL WINDOW OF DATA. SEE IF 
YOU ARE A RETAIL REAL-TIME ANALYTICS 
USE CASE, YOU DON'T WANT TO HAVE 
TEN YEARS WORTH OF DATA ON THE PRE-COMPUTER 
BATCH VIEW, ONE YEAR'S WORTH OF 
WINDOW AND TO SERVING LAYER TO HAVE 
FASTER LOOK UPS. ONCE THAT HAPPEN, 
REAL-TIME VIEW AS WE DISCUSSED WAS 
ALREADY READY, COMBINE THE VIEWS 
OR REAL-TIME COMPUTER VIEW FROM 
THE SPEED LAYER WITH THE SERVING 
LAYER DATA AND THAT GIVES YOU A 
HOLISTIC VIEW OF WHAT HAPPENED IN 
THE PAST. AND WHAT IS HAPPENING 
RIGHT NOW. YOU CAN MAKE INFORMED 
DECISIONS BASED ON THAT. HAVE YOU 
SO MANY THINGS THAT YOU COULD DO. 
NOW LET'S, LET'S SUMMARIZE WHAT 
IS HAPPENING HERE. THIS IS LAMBDA 
ARCHITECTURE, RE-ARCHITECTED AFTER 
MOVING SOME PARTS . NEW DATA COMING 
IN. SINGLE LAYER COSMOS DB. USING 
THE CHANGE FEED, REAL-TIME DATA 
IS GOING INTO SPARK. WHERE IT IS 
COMPUTING REAL-TIME VIEW. THEN EXISTING 
DATA PRE-COMPUTED USING SPARK AND 
MADE AVAILABLE IN THE SERVING LAYER. 
COMBINING THE TWO VIEWS YOU CAN 
DO SO MANY THINGS BECAUSE YOU CAN 
COMBINE THE PAST PATTERNS OF CONSUMER, 
CUSTOMERS, WHAT HAVE YOU. YOU CAN 
COUPLE THAT WITH YOUR REAL-TIME 
DATA WHICH TRIGGERS A LOT OF THE 
EVENTS, RIGHT? NOW ONE THING THAT 
I DO WANT TO CLARIFY ABOUT COSMOS 
TV BACK TO THE BASICS ONCE AGAIN, 
I SAID THAT IT IS NOSQL DATABASE 
ON THE CLOUD. WHEN I SAY COSMOS 
DB AND NOSQL DATABASE, IT IS NOT 
A NO STRUCTURED QUERY LANGUAGE DATABASE. 
THAT IS A MISCONCEPTION. WHEN I 
SAY COSMOS DB NOSQL, IT MEANS NOT 
ONLY SQL DATABASE. THAT IS A MISCONCEPTION 
THAT PEOPLE HAVE. WE SHOULD CLARIFY 
MORE OFTEN. WHAT THAT BASICALLY 
MEAN, YOU CAN HAVE UNSTRUCTURED 
AND STRUCTURED ON COSMOS DB. WE 
DO NOT CONSTRUCT TO ANY MODEL, SO 
MANY MODELS API THAT YOU CAN LEVERAGE. 
HAVING SAID THAT, COSMOS DB OFFERS 
BENEFITS. GLOBAL DISTRIBUTION WHICH 
WE ARE GOING TO TALK ABOUT AND HOW 
SPARK LEVERAGES DISTRIBUTION IN 
SUFFICIENT MANNER. WE'LL TALK ABOUT 
THAT IN A FEW. ALSO TALK ABOUT THE 
LOW LATENCY ASPECTS AND HOW THAT 
HELPS WITH REAL-TIME ANALYTICS NEWS 
CASES WITH COSMOS KB AND SPARK. 
NOW I DON'T WANT TO BE ALL ABOUT 
JUST ARCHITECTURE DISCUSSIONS AND 
WE WANT TO DELVE INTO THE DEMOS 
AS WELL. SO I'M GOING TO HAND OVER 
TO MY COLLEAGUE SRI. SHE IS GOING 
TO TALK ABOUT A FRAUD DETECTION 
USE CASE THAT WE ARE GOING TO START 
FROM SCRATCH AND IMPLEMENT. YOU 
WILL SEE HOW IT IS DONE. IT IS SUPER 
EASY. SO SRI, OVER TO YOU. >> HELLO. 
CAN EVERYONE HEAR ME BACK THERE? 
AWESOME. THANK YOU ROHAN. THAT WAS 
GREAT. SO AS ROHAN WAS SAYING WE 
HAVE ENDLESS POSSIBILITIES WHERE 
YOU CAN LEVERAGE COSMOS DB WITH 
SPARK. THERE ARE TRILLIONS OF THINGS 
YOU CAN DO. ENDLESS POSSIBILITIES. 
YOU CAN DO, SOLVE A BUNCH OF DATA 
SCIENCE PROBLEMS. YOU CAN USE SPARK 
TO DO FANCY ML STUFF. YOU CAN DO 
COMPLEX AGGREGATE CRAZE AND SO FORTH. 
IN THE NEXT FEW MINUTES WHAT I AM 
GOING TO FOCUS ON, HOW YOU CAN DO 
THIS. AS YOU HEARD, WE ANNOUNCED 
SOME EXCITING NEW STUFF. NOW WE 
INHERENTLY HAVE SPARK INTO COSMOS 
DB. AND YOU WILL HEAR MORE ABOUT 
IT FROM COLLEAGUE ROHAN. BEFORE 
THIS, MOVING BACK A FEW YEARS WE 
HAD, WE SEE AN INDUSTRY TREND WHERE 
WE WANT TO, WHERE THE INDUSTRY NO 
LONGER WANTS TO SEPARATE BETWEEN 
OLTP AND OVERLAP CAPABILITIES. THEY 
WANT A DATABASE THAT IS HIGHLY, 
WHICH IS SUPER FAST, WHICH IS SUPER 
GOOD FOR H IT AP CAPABILITIES. BUT 
ALSO HAVE THE POWER, SHOULD HAVE 
THE CAPABILITIES TO RUN AGGREGATIONS 
OR ABLE TO DRAW SOME MEANINGFUL 
BUSINESS INSIGHTS. AND THAT IS WHEN 
WE DECIDED TO GO AHEAD AND BUILD 
OUR OWN LIBRARY, OUR OWN KECKOR 
WHICH WE CALL THE SPARKS COSMOS 
DB CONNECTOR. WRITTEN IN JAVA. OPEN 
SOURCE CONNECTOR. IF YOU GOOGLE 
IT, YOU WILL BE ABLE TO GO AND SEE 
OUR SOURCE CODE. WE HAVE WRITTEN 
THOSE LIBRARY ABOUT TWO YEARS AGO. 
AND BEFORE I JUMP IN AND SHOW YOU 
THE DEMO, WHAT I AIM TO DO IS HOW 
YOU CAN USE THE COSMOS DB SPARK 
CONNECTOR TO DO THE TWO BASIC FLAVORS 
OF PROCESSING. THAT ARE BATCH READS 
AN BATCH WRITES AND STREAM READS 
AND STREAM WRITES. HOW DOES A COSMOS 
DB SPARK CONNECTOR WORK? FIRST OF 
ALL, THE KECKOR GIVES YOU THE ABILITY 
TO POSITION COSMOS DB EITHER AS 
THE SOURCE OR THE SINK OF YOUR SPARK 
JOBS. WHENEVER RUNS SPARK, DATABASE, 
INSIDE, WHEREVER IT IS, YOU FIRST 
GO AND INSTALL YOUR COSMOS DB CONNECTOR 
LIBRARIES. THIS YOU CAN DO THROUGH 
THE MAIN COORDINATES OR DOWNLOADING 
THE UBER JAR. I'LL SHOW YOU HOW 
TO DO THAT AS WELL. ONCE YOU HAVE 
THE CONNECTOR INSTALLED NEXT WHAT 
YOU DO IS SPECIFY THE READ OR WRITE 
CONFLICT. PRETTY MUCH YOU ARE TELLING 
WHICH COSMOS DB CONTAINER AM I CONNECTING 
TO. AND ONCE YOU HAVE THE CON FIG 
IN PLACE, WHAT WE DO IS ESTABLISH 
A CONNECTION FROM THE SPARK MASTER 
NODE TO YOUR COSMOS DB GATEWAY NODE. 
SO WHEN THIS CONNECTION IS ESTABLISHED, 
WHAT ESSENTIALLY HAPPENS IS, THE 
COSMOS DB PASSES BACK THE PARTITION 
KEY OR THE PARTITION MAP OF THE 
DATA. SO NOW ONCE YOU HAVE THIS 
PARTITION MAP YOU CAN ACTUALLY, 
WE CAN, AT THIS POINT, WE CAN ACTUALLY 
SPARK, TO SEE WHAT IS THE DATA I 
NEED TO ANSWER THIS GRADE. AND SINCE 
WE HAVE THE PARTITION MAP, NOW WE 
PUSH DOWN THIS INFORMATION TO OUR 
WORKER NODES SO THAT THE WORKER 
NODES CAN TALK TO THE COSMOS DB 
DATA NODES. SO LIKE YOU KNOW, SO 
AS YOU KNOW, SPARK IS A HIGHLY PARALLEL 
DISTRIBUTED SYSTEM. AND COSMOS DB 
IS A HORIZONTALLY DISTRIBUTED DATABASE. 
WHEN YOU HAVE THESE WORKER NODES, 
TALKING TO THESE DIFFERENT DATA 
NODES, OBVIOUSLY YOU ARE PROCESSING 
IS GOING TO BE MUCH FASTER . OKAY. 
SO FOR TODAY'S DEMO, ROHAN SPOKE 
ABOUT A LOT OF INDUSTRY SCENARIOS 
AND VERTICALS WHERE THIS IS BEING 
USED. SO FOR THIS PARTICULAR SCENARIO, 
LET'S TAKE FINANCES AND EXAMPLE. 
IN FINANCE, WE HAVE A NUMBER OF 
E-COMMERCE SITE, E-COMMERCE ON THE 
RISE. I THINK THAT I HAVE STOPPED 
EVEN GOING TO GROCERY STORE. NOW 
YOU HAVE FRESH AND TONS OF OTHER 
STUFF. MAIN PROBLEM WITH ON-LINE 
TRANSACTIONS IS ON-LINE FRAUD. THIS 
COULD BE CREDIT CARD IMPOSTERS, 
STEALING OF CREDENTIALS, A BUNCH 
OF STUFF. TODAY'S SCENARIO, LISTS, 
IF YOU HAVE A CERTAIN RANK, A BANK, 
THEY ARE TRYING TO DO ON-LINE PAYMENT 
PROCESSING SYSTEMS. SO THEY, THEIR 
CUSTOMERS AND END USERS ARE MERCHANTS. 
THIS COULD BE ANY ON-LINE SITE LIKE 
HATCHET. COM, EXPRESS. COM. WHATEVER 
YOU WANT. SO YOU HAVE, THEY HAVE 
CUSTOMER, OR MERCHANTS ALL AROUND 
THE WORLD. AND WHAT THEY AIM TO 
DO IS THEY WANT TO PROVIDE REAL-TIME 
FRAUD DETECTION SYSTEM TO BLOCK 
FRAUDULENT TRANSACTIONS AS AND WHEN 
THEY COME IN. FOR EXAMPLE, HAVE 
YOU A CUSTOMER WHO IS TRYING TO 
MAKE AN ON-LINE PURCHASE. AND THIS 
SYSTEM AIMS TO, AIMS TO DETECT, 
IF THIS TRANSACTION IS FRAUDULENT 
OR NOT. BY USING SOME ADVANCE ANALYTICS 
AND SOME MACHINE LEARNING MODELS. 
SO HOW, SO HOW DO YOU GO ABOUT CREATING 
SUCH SYSTEM? SO WE HAVE A FEW REQUIREMENTS 
JUST TO POINT OUT, I DID SAY WOODROW 
BANK HAS CUSTOMERS ALL AROUND THE 
WORLD. YOU WANT, SO SOME OF THE 
REQUIREMENTS ARE THE FRAUD DETECTION 
SYSTEM NEEDS TO BE HIGHLY RESPONSIVE. 
THAT MEANS YOU SHOULD BE ABLE TO 
DISTRIBUTE YOUR DATA AS CLOSELY 
AS POSSIBLE TO WHEREVER YOU USERS 
ARE. SO THAT, I'M SURE, THAT GIVES 
YOU AN IDEA, LIKE YOU KNOW, COSMOS 
DB SUPPORTS TURN KEY GLOBAL DISTRIBUTION. 
THAT MEANS ONCE YOU HAVE YOUR TRANSACTIONS 
OR LAND YOUR DATA, IN COSMOS DB, 
CAN BE SEAMLESSLY REPLICATED TO 
USERS THROUGHOUT THE WORLD. THAT 
WOULD MAKE COSMOS DB A GREAT FIT 
AS AN INGEST STORE. SO JUST, BEFORE 
I DO THE EXAMPLE, I'M GOING TO WALK 
YOU THROUGH THE QUICK ARCHITECTURE 
OF HOW THIS WOULD LOOK FOR FRAUD 
DETECTION SOLUTION. YOU HAVE YOUR 
PAYMENT TRANSACTIONS OR YOUR STREAMING 
EVENTS THAT COME IN. WE ARE GOING 
TO LAND ALL OF THESE EVENTS INTO 
COSMOS DB OR THIS IS GOING TO BE 
STREAMING AND INGEST STORE. AS THE 
TRANSACTIONS KEEP COMING IN, YOU 
HEARD SOME, YOU HEARD SOME INFORMATION 
ABOUT THE CHANGE FEED. SO WE HAVE 
A FEATURE CALLED CHANGE FEED WHICH 
ALLOWS YOU TO STREAM EVENTS OR STREAM 
UPDATE AS AND WHEN THEY HAPPEN. 
SO USING THE CHANGE FEED AND STRUCTURE 
SPARK STREAMING, NOW YOU CAN LAND 
ALL OF THESE TRANSACTIONS INTO SPARK. 
SO NOW THAT YOU HAVE THESE REAL-TIME 
EVENTS LANDING IN SPARK, YOU CAN 
DO SEVERAL STUFF. FIRST, FIRST REQUIREMENT 
IS BE ABLE TO BLOG THESE TRANSACTIONS 
INTO REAL-TIME. THE WAY THAT WOODROW 
BANK DID THIS, THEY HAD SOME HISTORICAL 
TRANSACTIONS OR SOME CLEAN DATA 
WHICH THEY HAD IN THE CSB FILE WHICH 
THEY USED TO TRAIN A MODEL SO THEY 
TRAINED A MODEL AND THEY DEPLOYED 
THE MODEL IN SPARK, THEY USE SPARK 
ML OR AZURE ML. AND YOU CAN USE 
DIFFERENT ML SERVICES IN SPARK BETWEEN 
THE MODEL AND NOW DEPLOYED THE MODEL. 
AS THESE NEW TRANSACTIONS COME IN, 
WE SCORE THESE TRANSACTIONS OR DO 
REAL-TIME SCORING AGAINST THIS PRE-TRAINED 
MODEL TO DETECT IF THIS IS FRAUD 
LEBT OR NOT. BASED ON FRAUDULENT 
OR NOT FRAUDULENT, OR HOW LIE THE 
SCORE IS, HOOK INTO WEB SERVICES 
OR WEB API OR BLOG THIS TRANSACTION 
OR REPORT TO THE MERCHANT AS SUSPICIOUS 
TRANSACTION OR TAKE FURTHER ACTIONS. 
THIS IS THE REAL-TIME ASPECT OF 
IT. BUT AS YOU KNOW, AS THE TRANSACTIONS 
KEEP COMING, OR OVER YEAR, YOU HAVE 
MORE NEW SETS OF DATA. YOU CANNOT 
USE THE SAME HOLD MODEL YOU USE. 
YOU HAVE THE REQUIREMENT TO RETRAIN 
THIS MODEL. OR EVEN IN CERTAIN TIMES, 
FOR EXAMPLE, I WAS IN AUSTRALIA 
A FEW MONTHS AGO, AND THE MERCHANT 
BLOCKED MY TRANSACTION. I WAS LIKE, 
HEY, WE SEE SOME NEW TRANSACTION 
COMING ACROSS THE WORLD. IT MIGHT 
NOT BE YOU. FLAGGED AS SUSPICIOUS. 
IT WAS NOT NECESSARILY A SUSPICIOUS 
TRANSACTION. THIS ALSO MEANS YOU 
WILL HAVE FEEDBACK COMING IN FROM 
THE MERCHANTS AND THE BANKS ITSELF 
AND THE CREDIBILITY OF THE MODEL 
PREDICTION. ONCE YOU HAVE THIS FEEDBACK 
IN THERE. WE HAVE, THE WOODROW BANK 
WILL SCHEDULE SOME OFF LINE, OFF 
LINE BATCH PROCESSING TO DO SEVERAL 
THINGS. SO THE FIRST THING IS TO 
RETRAIN THE MODEL DEPENDING ON THE, 
DEPENDING ON THE FEEDBACK. AND DEPENDING 
ON THE NEW TRANSACTIONS THAT CAME 
IN. AND THE SECOND ONE IS TO THE 
VIEWS. NOW WE HAVE DIFFERENT TYPES 
OF TRANSACTIONS THAT COME IN. AND 
THE WOOD GROVE BANK WANTS TO PROVIDE 
NICE AGGREGATED VIEWS TO THE MERCHANT 
LIKE, HEY, BY THE WAY, THESE ARE, 
YOU HAVE THE HOST NUMBER OF TRANSACTIONS 
COMING ON, COME INTERESTING COUNTRY 
Z. OR YOU HAVE THIS PARTICULAR CUSTOMER 
WHO HAD THIS MANY TRANSACTIONS OVER 
A PERIOD OF ONE YEAR. THIS IS YOUR 
TOP CUSTOMER. OR TO DO FURTHER ANALYSIS, 
WE CAN BUILD NICE NEUTRALIZED VIEWS 
WHERE THE MERCHANT CAN GO AND DASHBOARDS 
OR TOP OFF AND BUILD REPORTS TOP 
OFF IN A NICE FAST REAL-TIME MANNER. 
SO NOW, LET'S TAKE A DEMO OF HOW 
THIS SYSTEM IS IMPLEMENTED . TO 
START OFF, THE FIRST QUESTION IS, 
WHERE DO I GET MY SPARK KECKOR? 
TWO WAYS TO DO THIS. AND YOU CAN 
POINT TO THE COORDINATES. WE DID 
HEAR A FEW ISSUES WHERE A SPARK 
STREAMING RIGHTS DID NOT WORK. SO 
UNTIL WE GET THAT ISSUE FIXED, WE 
HIGHLY RECOMMEND THAT YOU DOWNLOAD 
THE UBER JAR. ALSO INCLUDED THE 
LINKS TO ALL OF THIS IN MY PRESENTATION. 
YOU WILL BE ABLE TO FIND IT. SO 
ONCE YOU DOWNLOAD THIS FOR THE PURPOSES 
OF MY DEMO, I'M USING DATABRICKS. 
IN DATABRICKS, YOU CREATE YOUR CLUSTERS 
SPECIFYING HOW MUCH MEMORY YOU NEED, 
SO ON AND SO FORTH. ONCE YOU CREATED 
THAT, YOU HAVE AN OPTION TO INSTALL 
LIBRARIES. HERE I HAVE BOTH OF THEM 
INSTALLED USING THE JARS AND THE 
COORDINATES. I CAN SAY, INSTALL 
NEW. AND THEN I SAY, UPLOAD A JAR 
AND ONCE, WHATEVER JAR I HAVE DOWNLOADED 
I WILL JUST DRAG AND DROP IT. EASY 
AS THAT. ONCE YOU INSTALL THE CONNECTOR, 
THIS IS A FUN PART. THAT WAS GOING 
THROUGH THE BASICS. I'M GOING TO 
SHOW YOU TWO THINGS. HOW TO DO STREAM 
WRITES AND STREAM READS. BATCH READS 
AND BATCH WRITES. SO HERE I HAVE 
A NOTEBOOK OPEN. AND ONE SECOND. 
SO THERE YOU GO. SO WHAT YOU SEE 
HERE, OR WHAT I WANT YOU TO PAY 
ATTENTION TO IS THE READ CONFLICT. 
SO IS THIS A FEW LINES OF CODE, 
ESSENTIALLY WHAT I AM SPECIFYING 
HERE IS THE FIRST POINT. WHICH COSMOS 
DB CONNECTOR DO I CONNECT TO? THESE 
ARE YOUR ENDPOINTS AN MASTER KEYS. 
I USED AZURE KEY WALL TO STORE MY 
SECRETS. IF NOT, ALSO PASTE YOUR 
URI AND ENDPOINT AND HIGHLY DON'T 
RECOMMEND IT. AND THEN YOU HAVE 
YOUR SPECIFY DATABASE IN COLLECTION. 
AND THEN THE INTERESTING THING TO 
PAY ATTENTION TO HERE IS RECHANGE 
FEED IS TRUE. THAT MEANS YOU ARE 
ENABLING STREAMING AND YOU WANT 
TO READ OFF THE CHANGE FEED. THE 
OTHER THINGS YOU NEED TO PAY ATTENTION 
TO IS THE CHANGE FEED CHECK POINT 
LOCATION. AND THE CHANGE FEED CREATE 
NAME. THESE ARE THE LOCATIONS ON 
YOUR DATABRICKS FILE SYSTEM ON WHERE 
WE ARE STORING THE CHECK POINT AND 
LOGIC, FOR EXAMPLE, IF YOUR JOB 
FIELD, WHERE DO YOU CONTINUE TO 
READ? OR WHAT IS YOUR CHANGE FEED 
CHECK POINTS? ONCE YOU HAVE THE 
READ CON FIG, YOU ARE GOING TO OPEN, 
OR YOU ARE GOING TO START A READ 
STREAM WHICH IS THE NEXT COMMAND. 
ALL I AM SAYING START A SPARK READ 
STREAM AND SPECIFY, ALSO ANOTHER 
INTERESTING THING I WOULD LIKE TO 
POINT OUT IN THE FORMAT. YOU SEE 
THAT WE ARE USING SPARK STREAMING. 
SO THIS FORMAT DIFFERS FOR BATCH 
READS, SORRY, STREAM READS AND STREAM 
WRITES AND BATCH STREAMS AND BATCH 
READS. YOU NEED TO SPECIFY BATCH 
RIGHT FORCE THAT TOO WORK. YOU SPECIFY, 
YOU SPECIFY PRETTY MUCH YOU ARE 
SAYING YOU USE THE CON FIG THAT 
YOU PROVIDED ABOUT. AND FOR THE 
PURPOSES OF THIS DEMO, I'M JUST 
WRITING THE OUTPUT OF THIS TO MEMORY 
SYNC. YOU CAN WRITE IT TO ANYWHERE. 
DELTA TABLES. WRITE IT BACK TO A 
DIFFERENT COSMOS DB ACCOUNT IF YOU 
DID SOME CLEANING OR SLIDING WINDOW 
AGGREGATES OR WHATEVER YOU WANT. 
SO A STEPPING BACK ONE SECOND, I'M 
GOING TO SHOW YOU HOW, HOW MY SAMPLE 
DATA IN COSMOS DB LOOKS AND DECIDE 
TO REFRESH. SO IN THE MEANTIME, 
WHAT I HAVE HERE IS AN, I HAVE AN 
OFF LINE OR LOCALLY, I'M RUNNING 
A TRANSACTION GENERATOR WHICH IS 
SIMULATING SOME TRANSACTION DATA 
FOR ME. SO I HAVE THIS RUNNING RIGHT 
NOW. I'LL GIVE IT A FEW MINUTES 
TO START UP. PREVIOUSLY WHEN I RAN 
THIS, I'M GOING TO JUST SHOW YOU 
WHAT SOME OF MY DOCUMENTS LOOK LIKE 
. OKAY. I'M SORRY. THAT IS TAKING 
TIME. BASICALLY, IF YOU SEE THE 
RIGHT CORNER, THAT IS A SAMPLE DOCUMENT. 
I HAVE DIFFERENT INFORMATION LIKE 
COUNTRY CODE, THE TRANSACTION ID. 
THE TRANSACTION AMOUNT. THE PAYMENT 
BILLING POSTAL CODE. JUST TO GIVE 
YOU AN EXAMPLE OF A DOCUMENT. WHAT 
I DID HERE, I HAVE, I STARTED MY 
TRANSACTION GENERATOR WHICH IS EMITTING 
SOME STREAMING DOCUMENTS WHICH WILL 
BE INSERTED TO MY COSMOS DB ACCOUNT 
HERE. AND NOW WHEN I RUN WHAT I 
AM GOING TO DO, RUN MY READ CON 
FIG OR RUN THIS CHANGE FEED TO READ 
OFF OUR CHANGE FEED AND YOU WILL 
BE ABLE TO, YOU CAN, WE CAN SEE 
THAT AS WE ARE READING OFF THE CHANGE 
FEED, WE CAN SEE AT WHAT PROCESS 
WE ARE READING IN THE DOCUMENTS 
AND WE ARE PROCESSING THESE DOCUMENTS. 
AND YOU CAN SEE THAT THE SPARK JOBS 
ARE RUNNING. LIKE YOU SEE, YOU CAN 
SEE THE TRANSACTIONS COMING. THIS 
IS, THIS I'M WRITING TO OUTPUT SYNC. 
SO LET'S SAY ONCE YOU NOW HAVE YOUR 
TRANSACTIONS. LODE YOUR DATA AND 
MODEL FROM ML MODEL, INTEREST THE 
IT DB DATABRICKS FILE SYSTEM. NOW 
RECORD THESE TRANSACTIONS AS THEY 
COME IN, SAYING WHETHER THEY ARE 
FRAUDULENT OR NOT. I'M NOT GOING 
TO GO INTO MORE DETAIL. FOR THE 
EXAMPLES OF THIS , WE RETRAINED 
SPARK ML AND LIBRARIES. AFTER WE 
RECORD THE TRANSACTIONS FOR WHATEVER 
PURPOSES, IF YOU WANT TO WRITE IT 
BACK, YOU CAN WRITE IT BACK TO A 
DIFFERENT COSMOS DB COLLECTION. 
THIS IS EXAMPLE HOW THE WRITE CONFIG 
WRITE YOUR STREAMING JOBS BACK TO 
COSMOS. THE THING TO PAY ATTENTION 
TO, IS, IN THE FORMAT, FOR THE RIGHT 
FORMAT, YOU WILL HAVE TO PROVIDE 
COSMOS DB SYNC PROVIDER WHEREAS 
IN READ, YOU SAY COSMOS DB PROVIDER, 
THESE ARE TINY THINGS POINTING OUT, 
BUT BROKE IN MY HEAD SOMETIMES GETTING 
IT TO WORK WHEN I MISS SOME SMALL 
MISTAKES IN THE COMMANDS. BUT THESE 
ARE SIMPLE LINES OF CODE. AND THIS 
IS, THIS IS ABOUT IT. FOR THE STREAM 
STUFF. SO LET'S, MOVING ON TO BATCH. 
SO WHAT I HAVE HERE, SO NOW YOU 
HAVE, NOW YOU COVERED THE REAL-TIME 
SCORING PART. SO NOW YOU HAVE ALL 
OF THESE TRANSACTIONS IN COSMOS 
DB. AND YOU WANT TO READ THESE COSMOS 
DB, READ THESE TRANSACTIONS FROM 
COSMOS DB TO DO, TO RETRAIN YOUR 
MODEL OR TO, OR TO DO SOME AGGREGATIONS 
AND WRITE THEM BACK INTO MATERIALIZED 
VIEW. THIS IS HOW A READ CON FIG 
WOULD DO FOR BATCH JOB. SPECIFY 
THE MASTER KEY AND ALL OF THAT STUFF. 
ONE INTERESTING THING I WANT TO 
POINT OUT IS THE CUSTOM. WHY DO 
YOU WANT TO SPECIFY THIS? SO LIKE WE WERE TALKING ABOUT, 
PROBABLY WOULD LIKE TO SCHEDULE
YOUR BAD JOBS TO SAY, RUN 
ONCE EVERY 24 HOURS OR ONCE A DAY
WHICH WOULD BE IN ALL OF THE NEW
TRANSACTIONS.
HERE I FILTERED ON COLLECTION
TYPE TRANSACTION BUT ALSO FILTER 
BY DATE TIME STAMPS TO READ IN
ONLY THE MOST RECENT
TRANSACTIONS 
YOU NEED. SO THIS IS REALLY HELPFUL, 
FOR EXAMPLE, YOU HAVE TWO TERABYTES 
OF DATA IN COSMOS DB, NOT NECESSARILY 
WANT TO PULL OUT ALL TWO TERABYTES 
OF DATA IN MEMORY IN SPARK. THIS 
IS WHERE THE CONNECTOR HELPS IN 
PUSHING DOWN THE ONES AND PUSH DOWN 
THE FILTERS ALL THE WAY TO THE COSMOS 
ENGINE AND PULL BACK THE DATA YOU 
ACTUALLY ONLY NEED. ONCE YOU ESTABLISHED 
YOUR READ CON FIG, WHAT I AM DOING 
HERE, I AM CLEANING SOME COLUMNS 
OR ROWING SOME COLUMNS JUST TO SHOW 
HOW THE TRANSACTION DATA LOOK LIKE. 
IF I ARE TO RUN THIS, SOME OF THE 
DATA IN HERE, YOU HAVE THE TRANSACTION 
ID. YOU HAVE THE POSTAL CODE. IS 
THE USER REGISTERED? SOME OF THE 
OTHER STUFF. SO NOW THAT YOU HAVE 
YOUR BATCH TRANSACTIONS OR YOU ARE 
READING YOUR BATCH TRANSACTIONS, 
INTO SPARK AND DATABRICK, SO THE 
NEXT THING IS, WHAT ARE, WHAT ARE 
SOME OF THE AGGREGATIONS THAT I 
COULD DO. IN SUCH DATA, SOME USEFUL 
THINGS YOU COULD DO, ING AGGREGATE 
ALL OF THE TRANSACTIONS COMING BY 
COUNTRY CODE. AND GIVE ME THE AVERAGE 
TRANSACTION AMOUNT. SO THIS IS JUST 
A SAMPLE. SO ONCE I HAVE THIS IN 
HERE, WHAT I AM GOING TO DO I'M 
GOING TO WRITE IT BACK TO COSMOS 
DB. SO FOR WRITING BACK, AGAIN, 
THE WRITE CON FIG LOOKS SOMETHING 
SIMILAR. SPECIFY THE ENDPOINT AND 
THE MASTER KEY AND THE COLLECTION 
YOU WANT TO WRITE BACK. I HAVE SOMETHING 
CALLED THE MATERIALIZED VIEW AND 
ALSO SPECIFY THE PARTITION KEY. 
THE DEFINITION OF YOUR PARTITION 
KEY. SO ANOTHER INTERESTING THING, 
TO SPECIFY, OR TO PAY ATTENTION 
HERE IS THE WRITE FORMAT. FOR STREAMING, 
YOU WILL HAVE TO SPECIFY IT IS COSMOS. 
STREAMING. WHEREAS HERE, YOU STAY 
DETECTIVE, THE FORMAT IS MICROSOFT 
AZURE COSMOS DB SPARK AND ALSO SPECIFY 
MODES. MODES LIKE OVERRIDE, ATTEND, 
SO FORTH. AND THAT IS BASIC IDEA. 
LET'S TRY, SO THESE ARE THE RESULTS 
FROM WHAT I HAD BEFORE. I'LL QUICKLY 
TRY TO RUN IT FOR YOU TO GIVE YOU 
AN IDEA OF HOW THIS WORKS. BUT THAT 
IS PRETTY MUCH THE EXPLANATION. 
SO YOU HAVE THE RECON FIG. WHAT 
I AM DOING IS DOING SOME DATA AND 
BATCH RUN, CREATING THE TEMP TABLE 
TO WRITE, TO WORK WITH THIS DATA 
TO DO SOME AGGREGATES IN MATERIALIZED 
VIEW. SO LIKE YOU CAN SEE, SO THIS, 
THIS PRETTY MUCH WORKS. SO, JUST 
BECAUSE WE ARE SHORT ON TIME, WE 
WILL WRAP UP THERE. THIS IS THE 
IDEA. I HOPE YOU UNDERSTOOD WHAT 
ARE THE DIFFERENT CASES YOU WOULD 
USE STREAMING VERSUS BATCH AND THE 
DIFFERENT CON FIGS . ALL OF THIS 
IS AVAILABLE. THE FEW LINES OF CODE 
YOU NEED TO ESTABLISH CONNECTION 
FOR STREAM READS AND STREAM WRITES. 
WHATEVER YOU DO IN DATABRICKS IS 
PRETTY MUCH THE SAME YOU WORK WITH. 
WITH THAT, I GIVE TO ROHAN WHO WILL 
TALK ABOUT COCA-COLA WHO HAS BEEN 
USING THE DB SPARK CONNECTOR FOR 
SEVERAL YEARS FOR A FEW MONTHS NOW 
FOR ONE OF THEIR PROJECTS. HE ALSO 
TALK ABOUT THE NEW SPARK API. THANK 
YOU. >> THANK YOU. SRI . >> WELCOME 
EVERYONE. ROHAN, HERE FROM THE GROUP 
AS WELL. HOPE YOU ARE HAVING AMAZING 
BUILD THIS YEAR. PARTICULARLY ON 
THIS TOPIC, TOPIC OF REAL-TIME ANALYTICS 
AN AN LIT YKS ON TOP OF COSMOS DB. 
WHICH IS SOMETHING THAT IS NEAR 
AND DEAR TO ME. A COUPLE OF YEARS 
BACK, I WAS ONE OF THE FOLKS WHO 
STARTED THE COSMOS DB SPARK CONNECTOR 
WHICH SRI WAS TALKING ABOUT. TODAY, 
I'M HELPING DESIGN AND BUILD THE 
OPERATIONAL INTEGRATED OPERATIONAL 
ANALYTICS SUPPORT WITHIN COSMOS 
DB. WE ARE REALLY EXCITED TO ANNOUNCE 
THE SUPPORT AT BUILD. AND EXCITED 
TO SEE HOW WE CAN WORK WITH YOU 
TO UNDERSTAND THE SPARK SCENARIO 
AND REALLY INTRODUCE THIS NEW PARADIGM 
OF GLOBALLY DISTRIBUTED OPERATIONAL 
ANALYTICS WHICH IS VERY UNIQUE TO 
SOMETHING THAT WE ARE OFFERING TODAY 
BECAUSE OF THE CAPABILITY OF COSMOS 
TO GLOBALLY DISTRIBUTE THE DATA. 
SO MY IMMEDIATE TEAM IN COSMOS DB 
ENGINEERING IS RESPONSIBLE FOR LEADING 
THE TECHNICAL CUSTOMER ENGAGEMENTS 
FOR THE SERVICE. WITH THAT, WE ENTERED 
INTO INTERFACE WITH CUSTOMER, ENTERPRISE 
CUSTOMERS AT AZURE WITH DIFFERENT 
STAGES OF DATA MODERNIZATION WHICH 
WE CALL IT. THIS IS A BIT OF A BUZZ 
WORD. REALLY IF YOU THINK ABOUT 
IT, IT IS ALL OF THE ENTERPRISES 
COMING BACK TO REDESIGN SOME OF 
THE CORE PRINCIPLES IN HOW THEY 
STOLE DATA AND THIS IS ALL OPERATIONAL 
DATA WHICH IS TODAY GROWING INTO 
TERABYTES AND PETABYTES SCALE. THIS 
IS HOW THEY STORE IT IN GLOBALLY 
DISTRIBUTED FASHION. ALSO BE ABLE 
TO DERIVE INSIGHTS INTO THAT DATA. 
THAT IS THE REAL KEY TODAY, WHICH 
WITH ALL OF THE COSMOS DB SESSION 
YOU ARE ATTENDING YOU WILL UNDERSTAND 
HOW TO ACHIEVE SCENARIOS IN GLOBAL 
DISTRIBUTING SETTING. RIGHT NOW, 
LET'S FOCUS MORE ON HOW YOU CAN 
DERIVE INSIGHTS OUT OF TERABYTES 
AND TRANSACTIONAL DATA. I THINK 
THAT A PERFECT EXAMPLE WOULD BE 
COKE. LET'S TAKE A LOOK AT A VIDEO 
FROM THE CIO AND EXECS AT COKE WHERE 
THEY TALK ABOUT HOW THEY, ABOUT 
THE WHOLE JOURNEY ON AZURE AS WELL 
AS CENTERED AROUND COSMOS DB . IT 
IS UPLIFTING AND OPTIMISTIC. OVER 
200 COUNTRIES. AS SYSTEM, WE EMPLOY 
770 PEOPLE. WE HAVE BRAND PARTNERS 
THAT WE WORK WITH. WE HAVE 1. 9 
BILLION SERVES OF OUR PRODUCTS A 
DAY . ACROSS OUR SYSTEM, WE HAVE 
SALES, REVENUE, MULTIPLE CURRENCIES 
AND MULTIPLE COUNTRIES FROM MULTIPLE 
SOURCES. OUR CHALLENGE HAS BEEN 
TO GET INSIGHTS UP TO SPEED. WE 
ARE FULLY IN WITH THE MOVE TO CLOUD. 
THE COOL PART OF THAT CLOUD IS COSMOS. 
>> SORRY. >> DB. BEING A GLOBALLY 
DISTRIBUTED COMPANY WE HAVE PETABYTES 
OF DATA THAT CONTINUES TO GROW. 
WE WANTED TO PARTNER WITH SOMEBODY 
WHO HAS A BATTLE TESTED INFRASTRUCTURE 
AS WE ARE RUNNING MISSION CRITICAL 
APPLICATIONS. AS FOR QUITE FRANKLY 
LED US TO MICROSOFT. WE ARE COLLECTING 
REVENUE AS WELL AS VOLUME DATA FROM 
ALL OF OUR ECOSYSTEM AND PUTTING 
IT IN ONE PLACE THAT ALLOWS US TO 
DRAW INSIGHTS. ABLE TO SCALE AND 
HAVE INSIGHTS THAT ARE ACTUALLY 
DELIVERED WITHIN MINUTES IS VERY, 
VERY IMPORTANT FOR US. WE ARE VERY, 
VERY EXCITED ABOUT COSMOS DB AND 
THE UPCOMING NATIVES SPARK SUPPORT 
AS WELL ASPIRATIONAL ANALYTICS THAT 
ARE BUILT ON TOP OF THAT. >> THE 
PROGRAMS BROUGHT OUR SYSTEM TOGETHER. 
AS WE BRING DATA FROM OUR FRANCHISE 
PARTNERS AND THE COMPANY TOGETHER 
TO BETTER SEGMENT AND BETTER SERVE 
THE NEEDS OF OUR CONSUMER. IT IS 
DELIVERED AGAINST THAT. IT IS DELIVERED 
US THE COSMOS DB TECHNOLOGY THAT 
ALLOWS US TO SKY MASSIVELY AND DELIVER 
AGAINST ALL OF THE TIME REQUIREMENTS 
WE HAD ABOUT GETTING INSIGHT TO 
MARKET AT THE PACE THAT THE MARKET 
MOVES. THIS IS THE TIME THAT WE 
CAN MAKE A FUNDAMENTAL DIFFERENCE 
TO OUR BUSINESS. AND I DON'T THINK 
THAT IT GETS BETTER THAN THAT . 
>> SO THAT IS A REALLY, REALLY SATISFYING 
VIDEO. PERSONALLY BECAUSE I CAN 
VERY DISTINCTLY REMEMBER THE DAY, 
OVER A YEAR BACK WHEN WE WALKED 
INTO THE HEADQUARTERS AT COCA-COLA 
TO PROPOSE THE SOLUTION THAT WE 
WANTED TO BUILD ON COSMOS DB AND 
THE ENTIRE AZURE. FOR THIS PARTICULAR 
PROJECT WHICH IS CALLED NSR. SO 
WE, WE LOOKED AT THE VIDEO. RIGHT? 
LET'S LOOK AT SOME OF THE KEY HIGHLIGHTS 
HERE. WHAT IS THE SCALE CHALLENGES 
HERE? WHICH IS COCA-COLA AS A COMPANY 
AND ALL OF THE DATA WHICH, WHICH 
GENERATES AS WELL AS CONSUMES. IT 
IS SPREAD ACROSS 200 COUNTRIES. 
AND THEY HAVE, THEY SERVE OVER 2 
BILLION PRODUCTS IN A SINGLE DAY. 
YOU CAN IMAGINE EVERY SINGLE PRODUCT 
SALE IS GOING TO GENERATE AN ORDER 
OF LARGER VOLUME OF RECORDS ITSELF. 
IT IS BASICALLY, THIS IS A VOLUME 
OF DATA THAT THEY HAVE TO PROCESS 
THIS A SINGLE DAY. IN THIS NSR PROJECT 
WITHIN COCA-COLA IS BASICALLY ALL, 
SO COCA-COLA IS FRANCHISE BUSINESS 
AT THE END OF THE DAY. WHICH MEANS 
THEY INTERACT WITH PRODUCTS ACROSS 
THE WORLD THAT IMPLORE DATA EVERY 
DAY ABOUT THE SALES VOLUME AND REVENUE 
ON A SPECIFIC DAY. AND THESE BOTTLERS 
ARE SPREAD ACROSS THE WORLD. THEY 
CAME TO US, ASKING, CAN YOU PROPOSE 
TO US A GLOBALLY DISTRIBUTED COMMON 
DATA STORE WHERE I CAN STORE DATA 
NOT JUST FROM THE BOTTLERS BUT ALSO 
DATA COMING IN FROM INTERNAL BUSINESS 
UNITS AS WELL AS OTHER THIRD PARTY 
PUBLIC DATAS SOURCES INTO A COMMON 
DATA STORE WHICH HAS ALL OF OPERATIONAL 
DATA. THIS IS NOT JUST A DATA LINK 
THAT YOU CAN HOST THE DATA. THIS 
HAS TO BE DATA WHICH HAS TRANSACTIONAL 
REQUIREMENTS AND THEN BE ABLE TO 
ALSO DERIVE INSIGHTS OUT OF THAT 
IN TIME. THAT WAS NOT ENOUGH. BECAUSE 
THIS COMMON DATA STORE, THEY WANTED 
TO FUTURE PROOF IT. WHICH IS TODAY 
THEY HAVE BATCH DATA COMING IN FROM 
ALL OF THESE BOTTLERS BUT GOING 
FORWARD, THEY NEED TO BE ABLE TO 
INGEST IOS DATA. POINT OF SALES 
MACHINES AS WELL AS VENDING MACHINES. 
THIS IS A COMMON DATA STORE THAT 
HAS TO SCALE AT THIS VOLUME AS WELL 
AS GLOBAL DISTRIBUTION NEEDS AND 
ALSO BE ABLE TO NEED TO SCALE TO 
BATCH AS WELL AS STREAMING SCENARIOS. 
THIS IS A VERY NICE SLIDE. YOU MIGHT 
HAVE TO SQUINT A BIT. THIS IS ONE 
OF THE SITES, THAT ONE OF THE EXECS 
PRESENTED. CAUGHT MY ATTENTION. 
IT SHOWS NOT JUST ABOUT COKE, BUT 
EVERY COMPANY, ENTERPRISE CUSTOMER 
WHO WANTS TO COLLECT DATA, OPERATIONAL 
DATA NOT JUST AT THE SOURCE OF THE 
DATA BUT ALSO ALL THE WAY FROM B2B 
OR C2C BUSINESS RIGHT FROM THE PLACE 
YOU MANUFACTURER DATA THROUGH THE 
LOGISTICS, THROUGH THE POINT OF 
SALES THAT YOU CAN NOW SEGMENT THIS 
DATA AND DERIVE INSIGHTS OUT OF 
THE BUSINESS AND UNDERSTAND THE 
CUSTOMER BETTER. THIS IS A VERY 
NEAT SLIDE THAT CAPTURES THE ESSENCE 
OF WHY WE BUILT THIS OPERATIONAL 
NATIVE OPERATIONAL SUPPORT AS WELL. 
SO NO ONE WAS GUESSING RIGHT. COSMOS 
DB WAS COMMONLY DELIBERATED DATA 
WHICH, WHICH HAS TO SERVE BOTH TRANSACTIONAL 
REQUIREMENTS AS WELL AS ANALYTICS 
REQUIREMENTS. I'M NOT GOING TO SPEND 
TOO MUCH TIME ON THE OTHER ASPECTS 
AS WHY COSMOS DB WAS CHOSEN. IT 
WAS CLEAR. 200 COUNTRIES. IT HAS 
GLOBAL DISTRIBUTION. YOUR BOTTLERS 
ARE GOING TO START LOADING FILES 
AT ANYPOINT IN THERE. YOU DO NOT 
WANT, I ASKED THE SYSTEM WHERE YOU 
HAD TO PROVISION FOR PEAK WORKLOADS. 
MEAN AS HIGHER COST OF OWNERSHIP 
VERSUS NOW WITH COSMOS DB. AND YOU 
HAVE AVAILABILITY AND YOU HAVE LATENCY 
GUARANTEES AS WELL. IF YOU WANT 
TO BUILD A WEB APP THAT INTERACTS 
WITH DATA AND LOOK UPS. IN THIS 
PARTICULAR THING, IN THIS, IN THESE 
NEXT FEW MINUTES TO LOOK AT, IS 
THE FAST TIME TO INSIGHTS ASPECT. 
THIS IS AGAIN, A SIMPLIFIED ARCHITECTURE 
AROUND WHAT COCA-COLA DOES TO, THIS 
IS MAIN SCENARIO WHEN A BOTTLER 
DROPS, FILES EVERY DAY AROUND, HOW 
MUCH VOLUME SALES AND VOLUME REVENUE 
THEY HAD. SO THE DATABASEICALLY 
LANDS IN BLOB, VERY SIMPLE. THIS 
SEEMS LIKE A SIMPLE ETL ME CANNISM 
THAT THEY HAVE TO DO ETL FROM BLOB 
COSMOS DB. WHAT IS BIG ABOUT IT? 
THE REASON WHY SPARK IS IN THIS 
PICTURE, IT IS NOT JUST ETL, IT 
THEY HAVE TO DO A LOT OF ANALYTICS 
AS WELL AS COMPLEX AGGREGATIONS 
AND JOINTS WHEN THEY HAVE TO MOVE 
DATA FROM BLOB TO COSMOS DB. THIS 
IS BECAUSE CUSTOMERS, BOTTLERS DROP 
TRANSACTION DATA BUT PERFORM VERY 
COMPLEX GIANTS AGAINST REFERENCE 
DATA AS WELL AS MASTER DATA STORED 
IN OTHER CONNECTIONS IN COSMOS DB. 
AND YOU NEED TO BE ABLE TO DO SCHEME 
OF VALIDATION, DATA WRANGLING. SPARK 
ISING AGGREGATION FRAMEWORK NEATLY 
POSITIONED TO BE ABLE TO ACHIEVE 
THIS. ONCE YOU LAND DATA IN COSMOS 
DB, YOU THEN, THROUGH THE NATIVE 
AS, AZURE ANALYSIS FOR CONNECTOR 
DB, WE DO A REFRESH OF CUBE, THEY 
BUILD THIS AZURE ANALYSIS SERVICES 
THAT POWERED BY POWER BI DASHBOARDS 
AS WELL AS AD HOC EXCEL ANALYSIS. 
ONE THING TO NOTE HERE, IN HERE, 
THE ORCHESTRATOR OF THE WHOLE PLATFORM. 
AND THIS IS ONE SCENARIO, RIGHT, 
WHERE YOU ARE INGESTING DATA. THAT 
IS ESSENTIALLY ONE THAT IS BOTTLEER 
FILE APPROACH. BOTTLERS LATER NEED 
TO COME IN RUN ONE ANALYTICS AN 
ROMPS ON TOP OF THE DATA THAT THEY 
HAVE UPLOADED. THEY DON'T WANT TO 
LOOK AT JUST THEIR DATA BUT LOOK 
AT DATA THAT OTHER BOTTLERS IN THEIR 
REGION AND NOT THEIR REGION UPLOADED. 
THIS IS WHERE ALL OF THE COMPLEX 
AGGREGATIONS AND JOINT CAPABILITIES 
REALLY HELP. AND THEN COMES IN, 
THIS IS, MORE OF OPERATIONAL SCENARIO. 
RIGHT? ANALYTIC SCENARIO. ALSO HAVE 
REGULAR TRANSACTIONAL SCENARIO, 
RELATION Y'ALL DATABASE, TRANSACTIONAL 
DATABASE, YOU WANT SQL QUERIES ON 
TOP OF THE DATA. AND YOU DON'T NEED 
TO ADD THE LATENCY OF SPARK IN THERE. 
SO THAT IS ALSO ANOTHER CAPABILITY 
WHERE YOU CAN HIT COSMOS DB. FINALLY 
FROM THE CUBE, WHICH POWER THE DASHBOARD, 
BUSINESS ANALYSTS THAT VIEW THE 
DATA. SO THIS AGAIN YOU MIGHT JUST 
WANT TO SQUINT A LITTLE BIT. IF 
YOU SEE TWO PARTS HERE. THERE ARE 
CIRCLES THAT REPRESENT AZURE, THE 
LOCATION OF AZURE REGIONS TODAY. 
IF YOU SEE THE COCA-COLA SYMBOL, 
THOSE REPRESENT THE REGIONS WHERE 
THEIR COCA-COLA BUSINESS UNITS. 
WHAT IS REALLY NEAT HERE, WHEREVER 
THERE IS A BUSINESS UNIT, WHICH 
COCA-COLA HAS, THERE IS EITHER AZURE 
REGION RIGHT IN THE SOME PROVINCE 
OR WITHIN A THOUSAND MILE RADIUS 
AROUND THE BUSINESS UNIT. SO, THE 
COOL THING IS COSMOS DB, THE ZERO 
SERVICE. COSMOS DB IS PRESENT WHEREVER 
ANY AZURE REGION IS. THIS HAS HELPED 
BUILD A GLOBALLY DISTRIBUTED DEPLOYMENT. 
WHATEVER WE SAW SO FAR, SINGLE REGION 
DEPLOYMENT. THIS IS WHAT HAPPENS 
WHEN YOU ARE DEPLOYING OPERATIONAL 
PLATFORM WHICH IS GLOBALLY DISTRIBUTED. 
THERE ARE TWO PARTS HERE, RIGHT? 
IF YOU SEE IN ANY ONE OF THOSE SQUARES, 
THERE IS A DATA TIER AND THEN A 
COMPUTE TIER. SO COSMOS GLOBALLY 
DISTRIBUTED, ALL YOU HAVE TO DO 
HERE IS CREATE ONE SINGLE COSMOS 
DB ACCOUNT. AND ADD ADDITIONAL REGIONS 
WHEN YOU NEED TO SCALE OUT DATABASE 
TO ANOTHER REGION. BUT THE SAME 
CANNOT BE NATURALLY SAID FOR THE 
COMPUTE. WHICH IS, LET'S FOCUS BEYOND 
ANALYTICS PIECE HERE. WHICH IS, 
IF YOU ARE RUNNING SPARK, YOU KNOW, 
YOU CANNOT GUARANTEE ARTICLE OF 
ZERO. OBJECTIVE OF ZERO. COSMOS 
DB, IF A SINGLE REGION GOES DOWN, 
AUTOMATIC FAILOVER TO ANOTHER REGION. 
HOW DO YOU DO THAT FOR A COMPUTE 
HERE? EAST U. S. WENT DOWN, TAKE 
THE ANALYTIC STACK AND AUTOMATICALLY 
REPLICATE TO ANOTHER REGION? DO 
I PLACE IT INSIDE THE SAME V NET 
AND ALL OF THESE? THIS IS THE CORE 
REASON WHY COCA-COLA AND OTHER CUSTOMERS 
PUSHED US HARD TO BUILD OUT THE 
NATIVE SUPPORT FOR SPARK. THIS IS 
WHAT THE SOLUTION WOULD NOW MOVE 
TO IN FUTURE SIDE AS WELL AS WHAT 
WE ARE PROPOSING FOR EVERYONE OUT 
HERE TO BUILD A GLOBALLY DISTRIBUTED 
ANALYTICS SET UP. RIGHT? WHICH IS, 
YOU KNOW HAVE SPARK. ONCE YOU SET 
UP SPARK API ON TOP OF ANY OTHER 
DATA API IN COSMOS DB. AND YOU HAVE 
SPARKS JOB WHICH -- JOBS WHICH CAN 
NOW RUN IN ANY, ALL OF YOUR AZURE 
REGIONS WHICH ARE ASSOCIATED WITH 
THE COSMOS DB DATABASE ACCOUNT. 
AND NOW WITH COSMOS DB, MULTI MASTER 
CAPABILITY, EACH OF THE REGIONS 
IS NOW A READ AND WRITE REPLICA. 
WHICH MEANS YOUR SPARK JOBS CAN 
PERFORM BOTH QUERIES, READS, WRITES, 
ALL INTEREST THE NEAREST REGION. 
SO THIS WAY NOW, IF COCA-COLA TOMORROW, 
UNBOARDS ANOTHER BUSINESS UNIT ON 
THIS PLATFORM, ALL THEY HAVE TO 
DO IS ADD ANOTHER COSMOS DB REGION. 
DATA REPLICATE AND COMPUTE, STACK 
IS REPLICATED NATIVELY THERE. THIS 
IS WHY WE BUILT IN NATIVE SUPPORT 
FOR SPARK. AND JUPITER NOTEBOOKS 
INTO COSMOS DB WHICH WORKS SEAMLESSLY 
WITH ALL OF THE DATA API. AND WORK 
WITH CASSANDRA, API, SQL API, ALL 
OF THEM. SO LET'S LOOK AT PARTICULARLY 
WHAT, WHAT THE STACK WOULD, DATA 
ENGINEER AND ML ENGINEER WOULD WORK 
WITH, RIGHT? HOPEFULLY AT THE END 
OF THE DOCK HERE, IT IS CLEAR TO 
YOU THAT COSMOS DB IS ONE OF THE 
TOP CONTENDER FORCE THE DATABASE 
PLATFORM IF YOU WANT TO BUILD A 
TRANSACTIONAL/OPERATIONAL ANALYTICS 
SYSTEM TOGETHER. SO THAT PIECE LET'S 
LEAVE IT OUT. AS OF YESTERDAY, BEFORE 
WE ANNOUNCED THE SPARK INTEGRATED 
SPARK SUPPORT, THE ENTIRE TOP PART 
WHICH IS, THE CHOICE OF THE COMPUTE 
PLATFORM AS WELL AS SPARK AND JUPITER 
NOTEBOOKS WHICH IS TYPICALLY THE 
MOST PREFERRED ECOSYSTEM FOR RUNNING 
VISUALIZATIONS AS WELL AS A COLLABORATED 
ENVIRONMENT WHERE YOU CAN PULL IN 
RICH SET OF PLUG INS AN EXTENSIONS 
TO VISUALIZATIONS. THAT IS GIVEN. 
THAT IS THE REASON WE ARE GOING 
TO ADOPT OPEN SOURCE TECHNOLOGIES 
WITHOUT REBUILDING SPARK. WITHOUT 
REBUILDING JUPITER NET BOOKS. WE 
WANT YOU TO BRING EXISTING JUPITER 
NOTEBOOKS AND EXISTING SPARK NOTEBOOKS 
AN JOBS AND RUN NATIVELY WITHIN 
COSMOS DB. AS ALWAYS, WE CAN NEVER 
TALK ABOUT COSMOS DB ISOLATION. 
AND YOU WILL ALWAYS, USING SPARK 
TO DO ETL. STREAMING ETL. AND YOU 
WANT TO CONNECT, HUBS, EVEN IF YOU 
WANT TO CONNECT TO OTHER DATA SOURCES 
AS WELL AS DB, FROM SOMEWHERE YOU 
NEED TO CONNECT TO THOSE DATA SOURCES, 
RIGHT? THAT IS WHEREAS YOU ONBOARD 
WITH THE COSMOS DB SPARK, API, YOU 
WILL SEE THAT ALL OF THESE CONNECTORS 
COME IN BUILT WITH THE RUN TIME. 
WHERE WE WANT TO ELIMINATE THE REQUIREMENT 
FROM YOU TO GO ADD THESE CONNECTORS, 
GO AROUND AND FIX THE PARTS AN SOLVE 
THAT PROBLEM WE WANT. AS WELL AS 
DATA SCIENCE AND ML PACKAGES. AS 
YOUR MACHINE LEARNING SERVICE LIBRARIES, 
BY SPARK, BASICALLY ALL OF THE DATA 
SCIENCE ECOSYSTEM COMES IN AS PACKAGED 
IN THE RUN TIME. IF THERE ARE ADDITIONAL 
REQUIREMENTS, YOU CAN ADD ADDITIONAL 
JOBS TO THE RUN TIME. OTHERWISE, 
ALL OF THIS COMES AS NATIVE STACK. 
WHAT WE NOW SAY IS THE RED LINE 
IS REMOVED AND THE ENTIRE STACK 
OF JUPITER NOTEBOOKS AS WELL AS 
SPARK COMES BUILT IN TO COSMOS DB. 
WE ACTUALLY, I WOULD LOVE TO GO 
INTO DETAIL HOW WE ARE BUILDING 
THIS OUT. THERE IS SIGNIFICANT, 
A LOT OF DESIGN CHOICES PERFORMED 
IN NOT RUNNING SPARK ON JUPITER 
NOTEBOOKS ON VM. HOW YOU WOULD RUN 
A SPARK CLUSTER RIGHT NOW. THERE 
ARE, THERE IS A LOT OF RESOURCE 
GOVERNANCE AND RESOURCE UTILIZATION 
THAT WE ARE ABLE TO IMPROVE ON BY 
RUNNING THESE JUPITER NOTEBOOKS 
AND SPARK MASTER AND NODES AS CONTAINERS. 
THAT IS THE REAL COOL THING WE ARE 
DOING RIGHT NOW. RUNNING ALL OF 
THESE AS CONTAINERS WITHIN THE COMPUTE 
STRUCTURE OF COSMOS DB. WHICH YOU 
WILL SEE NOW, WHICH IS HOW WE BOTH 
FROM A TCL, TOTAL COST FROM RESOURCE 
UTILIZATION POINT OF VIEW IS SIGNIFICANTLY 
BETTER THAN COMPARED TO THE EXISTING 
CHOICES YOU HAVE AND YOU HAVE TO 
RUN IT. EVEN IF YOU SAY, EVEN IF 
APART FROM ALL OF THE GLOBAL DISTRIBUTION, 
AND THE ZERO CAPABILITIES WHICH 
INTEGRATED SPARK EXPERIENCE GIVES 
YOU. SO WE, YOU MIGHT BE RUNNING 
A LITTLE SHORT OF TIME. IF YOU LOVE 
TO STAY ON, I WOULD LIKE TO SHOW 
YOU A COOL DEMO THAT WE PUT TOGETHER 
WITH COGNITIVE SERVICES TEAM. WITH 
THIS COMPUTE INFRASTRUCTURE COMING 
WITHIN COSMOS DB, THIS ALLOWS US 
TO NOT JUST RUN SPARK BUT ALLOWS 
US TO BUILD NATIVE INTEGRATIONS 
WITH AZURE SERVICES AND COGNITIVE 
SERVICES THAT YOU CAN RUN COGNITIVE 
PIPELINES NATIVELY WITHIN COSMOS 
DB. I'LL SHOW YOU A VERY COOL DEMO. 
I'M NOT SURE IF ANY OF YOU WERE 
THERE, WE SHOWCASED THIS DEMO AT 
THE COGNITIVE SERVICES SESSION YESTERDAY 
AS WELL. BUT I WOULD LOVE TO WALK 
YOU THROUGH THIS. THIS IS A REAL 
LIFE SCENARIO. JUST A QUICK THING 
BEFORE WE GET INTO JUPITER LAB NOTEBOOK. 
BEFORE ONBOARDING CUSTOMERS ON TO 
THE SPARK API, ALL YOU HAVE, THIS 
IS HOW YOU WOULD LOOK AT IT. YOU 
CAN CHOOSE THY OF THE EXISTING DATA 
APIs ALONG WITH THE SPARK API. AND 
ONCE YOU DO THAT, YOU CAN NOW REPLICATE 
ANY, ONCE YOU ADD ANY REGION, YOU 
CAN NOW SEE THAT IT COMBINES BOTH 
THE DATA API AS WELL AS THE ANALYTICS 
API IN BOTH OF THE REGIONS. BOTH 
COLOCATED IN EACH OF THE REGIONS. 
WHAT WE ARE LOOKING AT TODAY, HENDRIX 
MOTOR SPORTS. I DIDN'T REALIZE -- 
YEAH. SO THE, OKAY. SO HENDRIX MOTOR 
SPORTS IS ACTUALLY ONE OF AZURE 
CUSTOMERS. THEY ARE RAISING TEAM 
IN NASCAR. I'M NOT A HUGE NASCAR 
FAN. BUT I'M SURE YOU ARE. AND YOU 
MIGHT RELATE TO THIS. SO TEXAS 500 
IS ONE OF THE APPARENTLY REALLY 
COOL NASCAR RACES. THIS IS WHAT 
WE DID WE WORKED WITH HENDRIX MOTOR 
SPORTS TO ACTUALLY GET THE RACE 
TELEMETRY FROM THE TEXAS 500 RACE. 
THIS IS OVER 30 RACECAR DATA AND 
EACH OF THEM AT OVER 5 HERTZ. LET'S 
LOOK AT WHAT THAT DATA LOOKS LIKE. 
THIS IS THE ACTUAL DATA. YOU CAN 
SEE IT IS VERY RICH. THE TELEMETRY 
COMING FROM EACH OF THE CARS WHICH 
SHOWS YOU THE LATITUDE, LONGITUDE, 
WHAT IS THE RPM, WHAT IS THE BRAKE, 
WHERE IS IT HEADING AND A LOT OF 
OTHER COOL THINGS. THIS IS COSMOS 
DB CAN SCALE TO INGEST ALL OF THE 
DATA. WHERE IS THE ANALYTICS PIECE 
COME IN? TELEMETRY, WE ALSO HAVE 
A COUPLE OF OTHER SOURCES OF DATA 
IN COSMOS DB. ONE IS THE DATA ABOUT 
THE PLAYER PROFILE. AS WELL AS AUDIO 
DATA. SO THIS IS WHERE THE DATA, 
AUDIO DATA ITSELF IS STORED IN BLOG. 
BUT COSMOS DB IS INDEXED TO THE 
BLOG. YOU HAVE DOCS TO IT. AND YOU 
CAN NOW LOOK UP TO KNOW WHICH EXACT 
LOCATION I CAN GO PICK UP MY AUDIO 
FILE FOR SPECIFIC DRIVER AT PARTICULAR 
POINT IN TIME. SO NOW THAT WE HAVE 
ALL OF THIS DATA, THIS IS ACTUALLY 
AS SIMPLE AS THIS GETS, RIGHT? THIS 
IS ACTUALLY THE UI WHICH YOU WILL 
SEE ONBOARD TO THE SPARK API WHICH 
IS THE JUPITER NOTEBOOK COMES BUILT 
IN TO THE DATA EXPLORER OF COSMOS 
DB. AND YOU CAN NOW EXPLORE YOUR 
DATA BUT YOU ALSO CAN EXPRESS NOTEBOOKS 
AND EXPLORE ALL OF THE NOTEBOOKS 
IN API. AND YOU CAN COLLABORATE 
WITH MORE PEOPLE AND THIS IS HOW 
WE WANT MORE PEOPLE TO COLLABORATE 
ON TOP OF THE DATA. AS SRI JUST 
SHOWED YOU, WITH A SINGLE LINE OF 
CODE YOU CAN READ DATA AND APPLY 
FOR READING DATA FROM COSMOS DB 
IN SPARKS. THE CORE THING IS VISUALIZATION. 
A LOT OF TIMES I HAVE HAD THIS ISSUE, 
WHEN WE WORK WITH CUSTOMERS LOAD 
THE DATA IN COSMOS DB. NOT ENOUGH 
TO RUN THE MAX QUERIES IN THE DATA. 
SOMETIMES YOU NEED RICH VISUALIZATIONS 
TO SEE WHAT THAT DATA MEANS. HERE, 
BECAUSE OF ALL OF THE, NOT BUILDING 
ANYTHING NEW HERE. USING A RICH 
EXTENSION IN JUPITER LABS FOR VISUALIZATIONS. 
HERE, I AM BASICALLY TAKING THE 
FIRST COUPLE OF MINUTES OF THE RACE. 
AND I'M, BASICALLY LOOKING AT THE 
DATA FOR THE TOP FIVE RACES. SO 
YOU CAN NOW LOOK AT WHAT IS, THIS 
IS BASICALLY THE FIRST TEN MINUTES 
OF THE PLAYERS AND THE RELATIVE 
POSITIONS. AND THEN COMES IN, DISTRACT 
STRATEGY. I WANT TO BASICALLY LOOK 
AT, I WANT TO EMBED THIS DATA ON 
A REAL MAP. AND SEE WHERE OUR RACES 
ACCELERATING AND WHERE ARE THEY 
BREAKING? WHAT IS THE COOL WAY TO 
DO THAT? THIS IS ALL OF THE CODE 
I HAD TO WRITE. WITH THAT YOU CAN 
NOW LOOK AT THIS. WHICH IS THE GREEN 
REPRESENTS WHERE PEOPLE APPLY THROTTLES 
AND THE RED REPRESENTS WHERE PEOPLE 
APPLY BRAKES. NOTHING EXTREMELY 
NEW HERE. YOU CAN SEE THAT IT HAPPENS 
WHEN PEOPLE BRAKE MORE, COMPARED 
TO THE STRAIGHT STRETCHES WHERE 
THEY ACCELERATE. THEN COMES IN MORE 
NATIVE SPARK SUPPORT. I WANT TO 
LOAD THE RACE TELEMETRY ABOUT DRIVERS 
AN ACTUALLY ANALYZE HOW DRIVERS 
ARE PERFORMING. THIS IS WHERE, SPARKS, 
AGGREGATION FRAMEWORK, GROUPED BY 
AND ALL OF THE AGGREGATION CAPABILITIES 
WITH JOINTS COMES IN. NOW WE HAVE 
THE OPTION TO VISUALIZE THAT NEATLY 
INTERACTIVELY. HERE I AM LOOKING 
AT WHAT THE TIME THAT EVERY DRIVER 
SPENDS IN ACCELERATION WAS BRAKE. 
IT IS VERY CLEAR THAT THE MORE EXPERIENCED 
DRIVERS SPEND LESS TIME BREAKING 
AND PUSH FOR MORE TIME WITH THE 
THROTTLE. THIS IS EXAMPLE OF HOW 
YOU CAN RIFFLY VISUALIZE YOUR DATA 
HERE. TO ONE OF THE COOL THINGS 
AS I WAS SAYING, WHAT WE ARE WORKING 
ON, AND YOU WILL SEE IN THE COMING 
ANNOUNCEMENTS AS WELL AS WORKING 
WITH COGNITIVE SERVICES AML, FOR 
NEAR INTEGRATIONS WITH COSMOS DB. 
AND YOU CAN STORE DATA IN COSMOS 
DB AND ENRICH YOUR DATA WITH COGNITIVE 
SERVICE PIPELINES SO THAT YOU DON'T 
HAVE TO WRITE MUCH CODE TO MOVE 
YOUR DATA OUT OF COSMOS DB TO UNDERSTAND 
AND ENRICH THAT DATA. SO ML SPARK 
IS BASICALLY, IS A VERY WELL ATTEND 
SPARK PACKAGE THAT COMBINES A LOT 
OF COGNITIVE SERVICES NATIVELY WITHIN 
SPARK. HERE WE ACTUALLY SHOWCASE 
AN ANOMALY. ANOMALY DETECTION WAS 
A COGNITIVE SERVICE THAT WAS ANNOUNCED 
RECENTLY. HERE WHAT WE SHOW IS, 
CAN I APPLY ANOMALY DETECTION ON 
THE RPM DATA THAT I HAVE FROM RACECAR 
TELEMETRY? HERE WE APPLY ANOMALY 
DETECTION ON TOP OF THE RPM DATA 
AND LET'S LOOK AT, IT JUST TAKES 
A COUPLE OF SECONDS FOR ANOMALY 
DETECTION. LET'S LOOK AT WHAT THE 
ANOMALIES LOOK LIKE. IT IS PRETTY, 
IT IS A NICE GRAPH HERE. LET'S LOOK 
AT SOME PORTIONS OF THIS. HERE, 
THIS IS RPM. RIGHT? IF YOU SEE THE 
YELLOW AND THE RED, AND THE GREEN 
LINE, THAT ACTUALLY REPRESENTS IN 
NASCAR, YOU HAVE A LOT OF CRASHES 
MUCH MORE THAN AVERAGE 5-6 PER RACE. 
THE YELLOW REPRESENTS A POINT WHEN 
THE WAVING FLAG WAS RAISED. USE, 
THE DRIVERS CANNOT RACE AMONGST 
THEMSELVES AN MAINTAIN RELATIVE 
POSITION WHICH IS WHY YOU SEE A 
SIGNIFICANT DROP IN THE RPM. THAT 
IS ACTUALLY WHAT ANOMALY DETECTION 
JUST, IT, STOOD OUT AS WELL. THERE 
IS SOME NICE PARTS WHICH IS, IF 
YOU LOOK AT THE LAST PART, RIGHT? 
SO WHAT IS THIS? HERE I SEE AGAIN 
THAT THE YELLOW LINE REPRESENTS 
A WAVING FLAG WAS RAISED. AFTER 
THAT, RPM WENT TO ZERO. WHAT DOES 
THIS MEAN? WHAT WE DID, WE FOUND 
THE VIDEO OF THIS EXACT RACE AND 
IF YOU ARE JUST -- HE CHOOSES THE 
INSIDE LINE AGAIN. THE 12 CAR, HE 
HAS SOME EXPERIENCE ON THE OUTSIDE 
LINE. >> THAT WAS CLOSE. >> YES, 
THAT IS WHY IT HAPPENED. THAT PARTICULAR 
CAR WHICH WE WERE LOOKING AT, WE 
WERE BASICALLY DOING ANOMALY DETECTION. 
WAS THE ONE THAT ACTUALLY CRASHED 
INTO THE SIDEWALK THERE . SO IT 
IS NOT SURPRISING THE WAY THAT RPM 
WENT TO ZERO. NOW THIS IS A COOL 
THING THAT YOU CAN RICHLY ANALYZE 
YOUR DATA, NATIVELY WITHIN COSMOS 
DB. THEN UNDERSTAND THE SEMANTICS 
OFFERED BY EITHER INCLUDING YOUR 
DOMAIN KNOWLEDGE, GOING AND DOING 
ANY EXTERNAL RESEARCH OUTSIDE. ONE 
MORE REALLY COOL PART, BECAUSE WE 
DIDN'T WANT TO DO ONE COGNITIVE 
SERVICE. WE SAID, OKAY, LET'S SHOW 
THIS. I SAID, WE ALSO GOT THE AUDIO 
DATA FROM THE DRIVERS DURING THE 
RACE. RIGHT? WE STOLE THAT IN COSMOS 
DB. WHAT WE DO, WE PASS IT THROUGH 
A SPACE TO TEXT, A COGNITIVE SERVICE 
HERE. AND WHAT WE DO THEN IS, WE 
BASICALLY GET TAGS OUT OF THAT. 
RIGHT? WHAT YOU CAN THEN DO, ENRICH 
YOUR TELEMETRY DATA WITH POTENTIALLY 
KEY WORDS WHICH YOU THINK ARE FLAGS. 
LIKE FOR INSTANCE, IN THIS PARTICULAR 
SCENARIO, WHAT YOU COULD DO IS, 
IF THE DRIVER IS ESSENTIALLY EXPERIENCING 
SOME SORT OF DISCOMFORT, WHEN HE 
IS SAYING SOMETHING LIKE THAT, YOU 
CAN NOW BUILD PIPELINES, ALERTING 
PIPELINES ON TOP OF THAT TO SAY, 
WHEN DO YOU NEED A PIT STOP? PREDICTIVE 
MAINTENANCE TYPE OF ARCHITECTURE. 
SO THIS IS ONE OF THE COOLNESS OF 
THE CAPABILITY OF COGNITIVE SERVICE. 
SO YOU CAN SEE THE VIDEO, THE AUDIO 
WHICH I'M GOING TO PLAY IS MUFFLED 
BUT IT IS ACTUALLY ABLE TO, THE 
TEXT IS WHAT THE SPEECH TO TEXT 
DETECTED. YOU CAN SEE IT IS HERE. 
>> HE WAS RIGHT ON THE BOTTOM. I'M 
READY TO GO HERE. >> YOU CAN SEE 
THAT IT ACTUALLY ABLE TO VERY CLEARLY 
CAPTURE WHAT THE TEXT WAS. THIS 
IS THE COOL CAPABILITIES OF WHICH 
MSR AND A LOT OF AZURE AI FOLKS 
ARE DOING MUCH WE WANT TO BE ABLE 
TO OPERATIONALIZE THIS. WHICH IS 
WHY WE ARE BRINGING THAT NATIVELY 
INTO THE PLACE WHERE YOU ARE STORING 
YOUR DATA. SO THE LAST PART WHICH 
I WANTED TO SAY, THIS IS GOOD, BUT 
YOU WANT TO OPERATIONALIZE THIS. 
WHAT YOU WOULD WANT TO DO, DEFINE 
A PIPELINE, WHERE YOU DEFINE A PIPELINE 
TAKING THIS DATA, APPLYING ANOMALY 
DETECTION, RUNNING IT THROUGH SPEECH 
TO TEXAS WELL AND STORING BACK IN 
COSMOS DB. THIS IS HOW YOU WOULD 
DO THAT. DEFINE A PIPELINE OUT OF 
THIS MULTIPLE STAGES IN SPARK. THIS 
ISLE THE COOL THING ABOUT SPARK. 
DEFINE SUCH PIPELINES. AND EASILY 
TURN YOUR BATCH LAYER TO STREAM 
LAYER CODE. THIS IS I'M SETTING 
UPSTREAMING JOB. IT STARTED, BECAUSE 
IT SAYS PROCESSING NEW DATA. END 
TO END, THIS IS BASICALLY WHAT THE 
WORKFLOW LOOKS LIKE. WHERE I STORED 
MY DATA, RACECAR TELEMETRY IN COSMOS 
DB. APPLY A BUNCH OF AGGREGATIONS 
ON TOP OF REDUCING SPARK, PASS THROUGH 
COGNITIVE SERVICE PIPELINES AND 
THEN ENRICH MY DATA INTO COSMOS 
DB INTO ANOTHER COLLECTION OR UPLOAD 
THE EXISTING DATA. ALL OF THIS NOW, 
HAPPENED WITHIN COSMOS DB. IT IS 
THE SINGLE DATA EXPLORER WHERE I 
DID ALL OF THIS. I NEVER TOOK MY 
DATA OUTSIDE COSMOS DB TO DO THIS. 
THAT IS REALLY ONE COOL THING THAT 
I WANTED TO SHOWCASE ABOUT WHAT 
THIS NOW OPENS UP AS OPPORTUNITIES 
FOR PEOPLE TO DO WITH THE REAL-TIME 
DATA. THIS IS A CUSTOM, PRE-BUILT 
AI MODEL. YOU NOW HAVE THE CAPABILITY 
TO BUILD CUSTOM MODELS AS WELL WHERE 
YOU CAN, SO IF YOU SEE, I THINK, 
IT IS A BAD OMEN IF ANY DEMO, SOMETHING 
GOING, YEAH, THIS IS ESSENTIALLY 
HOW YOU CAN GO OUT AND BUILD. AND 
YOU CAN USE SPARK ML TO GO BUILD 
A CUSTOM MODEL AS WELL. HERE, THIS 
IS ANOTHER SCENARIO WHERE WE TAKE 
RETAIL DATA, YOU CAN NOW FETCH THAT 
DATA INTO COSMOS DB. PUT IN TRAIN 
AND TEST. AND HERE WE APPLY A VERY 
SIMPLE LOGISTIC MODEL TO DO A CLASSIFY 
CARK ON WHETHER OR NOT BASED ON 
THE CONTEXT AROUND THE PURCHASE, 
WHETHER A PERSON WOULD ACTUALLY 
PURCHASE A SPECIFIC PRODUCT OR NOT. 
SO THIS IS JUST ANOTHER CASE OF 
THAT YOU CAN NOW USE PRE-BUILT AI 
MODELS AS WELL AS CUSTOM AI MODELS 
BUILT WITHIN THE SPARK INFRASTRUCTURE 
OF COSMOS DB. SO I THINK THAT THIS 
IS KIND OF WHAT WE WANTED TO LEAVE 
YOU WITH. AND WE REALLY WANT YOU 
TO GO AND SIGN UP FOR THIS SPARK 
API REVIEW KIND OF, WE WANT TO WORK 
WITH YOU TO UNDERSTAND WHAT YOUR 
EXISTING SPARK USAGE IS. AND WE 
WORK WITH YOU TO ONBOARD YOU AND 
KIND OF HELP YOU ON THE PARADIGM 
SHIFT TO THE GLOBALLY DISTRIBUTED 
ANALYTICS AN AI. SO I THINK WITH 
THAT, I WILL LEAVE YOU AND SORRY 
