So you might have more people coming in but
the most important ones are here, so
So today's guest our distinguished speaker is Professor
Satish Ukkusuri from Purdue University
unknown Satish from his days
At RPI and he's not a stranger to New York City
It's done a lot of great work for New York City and New York State so he knows
Our network very well our problems where he's been how many years?
Nine nine years
And he has a wide range of projects
I work with him on a couple of very interesting projects from afar deliveries
To the evacuation emergency management, so we didn't know each other very well
But I think today is going to show you things that I also don't know
In that meeting so I'm looking forward to in stock
and that I'll just do it for him to talk to you with more about what he is doing lately and
Pleasure to be here as commissioned I would in New York State for
Four years before going to Purdue so some of the topics that I talk today will relate to the deoxy
But I want to kind of paint a big picture about Big Data of smart mobility some of the work that we have been doing
Ten years and a lot of people have been doing space, but I want to talk about
emergence of these networks
with smart mobile technologies and big data
So the work which we are doing in my lab my students and postdocs over the last 10 years
So so my talk is
Divided into three parts
Motivation
And the drivers of one of the some of the changes that we are seeing in this space
then looking at conversions talking about one idea in more detail about this convergence of networks and big data and
The third part is looking at some ads show you examples of some projects that we are done in the past
Do they take to be here right so using various kinds of Guetta sources from social media from?
mobile phone data from
GPS traces of taxis and so on and then kind of give you some directions of future research and trends in this case
if this time
We'll also talk about autonomous vehicles right so I have a lot of things that I want to cover here
So the time will be when when we reach for what could came in it so sir
We just stop okay, and I'll take some questions to do this
Ok so we have been hearing a lot of things of shared mobility right so uber live
Different kinds of right sharing apps by sharing, then we have apps for travel time prediction and estimation
self-driving technologies
apps for disaster management
Internet of Things and so on all of these things are driving
Cities and in terms of providing services to people
But more importantly connecting cities with services the citizens that are there and the services that they need with
agencies in the scenes, right
so
if you look at this space
various industries and also participating in this
IBM Cisco and Accenture examples right when they called smart cities
But the key idea here if you see the commonality of these definitions
Is this integration of?
information from various kinds of sensors right from sensors in space and time
Right which provide us information about where people are what are we doing? How much time do you spend on these various activities and
The second piece is these communication technologies
connecting
Citizens to the service providers, which is city agencies limping, New York state
New York II right and various types of city services so the connection is what's been missing
over the last many years
Right and then the third is to provide value from this and appear in terms of reducing condition improving air quality
And many many other things as we will see into the station right so
These are some commonalities that we see from whatever definition that we actually take from industry or agency in this
wide space of
spottings
Now
Wise up, okay, so there's a delay
So let's look at
Some of the drivers
So before we going into the presentation and what specifically such topics
Let's look at some of the drivers of what changes we have seen last ten years
right so props allows us pay and
I have been in this field for I've been in this field for about 17 18 years
And we have seen these trends in the last 15 years right in
The last 10 years when I started this research and data science
I'm most of a networks person right most of my work is in dynamic network static networks primarily for transportation
But when we started looking at this data on networks right last 10 years or so
There's some fundamental drivers that actually
relate
regarding the initiation of these cytokines
The first driver is this penetration of smartphone right so when I used to do this presentation almost eight years back
I used to show a smartphone to everyone. I don't do that anymore
Right because everyone knows what a smartphone is what a half wave and so on so by by
2019
One third of the population of this earth is going to have a smartphone, and what are they doing with that?
They're sharing the data. They're sharing the location. They're sharing information
They're communicating in people and all of this data. Every second is being recorded is being shared with
Service providers right and people are using this for various kinds of services right for getting groceries for
shopping for
Traveling right health for getting pharmaceuticals and so on and so on right, but every service there's an app out there
Right how many of you?
Sleep with your cell phone next week I do
right
So we can probably live with water or food for a day
But not with our cell phones, right?
So he is so much dependent and the recent trend this was not that five years back ten years back
We are so much dependent on this technologies for various things that we mean in all of us
The second driver is the ubiquity of sensors in sales right so if you take a typical car
We used to say over there about
200 plus sensors now it's more than like 400 plus sizes
Why sensing has become way cheap sensors have become very prevalent?
Right so we have a lot of sensing companies which provides us
high volumes of this kind of area and communities are becoming connected villages
the third driver
is the
rapid changes in behaviour of people
Right so if you see your parents
Right so Facebook is no longer a younger generation thing right. It's for grandparents
So all the young people are moving away from Facebook right so
Older people are actually beginning to use this technologies
Right at a very rapid pace so the adoption of these type of technologies is very high right?
So 12 years back if you see very few people use Google Maps
For turn-by-turn navigation, but almost about 80 personal people now use this type of navigation apps
11 person use their phone at least occasionally to reserve a taxi or car so we said this is 2014 here now
It's probably much higher right so this type of behavioral changes and the adoption of these technologies
Is increasing very fast not only in the US?
But in the rest of the world probably much faster in Asia and Africa where they're kind of leapfrogging
Some of these older technologies right so these three main drivers
What is the result of this this provides us?
tremendous amount of capability isn't
and
We can do a lot of things with this type of data spatial temporal data of where people are
Where acids are in space and time and these are moving on networks, right?
So what do we do with this data? What type of methods what type of problems?
Can we actually solve with this kind of idea right and there is a supply side?
And there is a demand side for this panel here
So what I want to do today is to share with you some of the ideas that we have
explored in using this type of
heterogeneous data on heterogeneous
networks to understand various questions in
Urban areas ok so this is the connecting
Story behind this presentation, so you can see our research questions
But behind all of that is this bigger question that we are trying to kind of address ok
so
What is big data Oliver right simple definition anything which does not fit any X all right?
So as soon as you Excel crashes you have a problem either with Microsoft
Let me see you to be put your computer or you have a problem of size your data is no longer
sufficient for Excel to actually have right so but a more rigorous definition of this big data is
Characterized by three things one the scale of the data itself both in terms of the size of the data
And the velocity of this finally here the second is unstructured nature of this year the data is not necessarily
Perfect it has not of noisiness in this data, and so on and this data is real-time
Streaming over various time intervals and spatial scales right so there's a very high velocity
Of this kind of figure so when you are dealing with this kind of data. It poses new challenges
In terms of how to handle this kind of data unlike survey data
Which most of the transportation students may have used in planning course right so where you have
as an example and so on and you're trying to make
Inferences with that kind of material right so here we have huge volumes of this data, but the data is not perfect
So
So the common
Misnomer is that Oh big data means, so it should be with a nice you should be able to solve problems, right?
but for
other problems especially in cities and
For urban transportation systems need not necessarily reason right it depends on the kind of problem
You're solving it depends upon the kind of data set that you have in big data
And so that really it depends a lot of factors related to
The quality of the data and related to the kind of problem you haven't had
So sometimes smaller is actually better than big data is for some problems right the scale of data
and network effects and interdependencies poses challenges
to develop these kind of new methodologies
So, but this real opportunity here is the problem and if the data set is appropriate
For the kind of research question you're trying to address
ok
So a little bit of history here in terms of transportation modeling so if you look at transportation modeling then two main
Assumptions we made both of the demand side on the supply side on the demand side
Utility maximization has been the governing framework, which we are used on the behavioral side on the supply side we have used
equilibrium or
People choosing the least cost path right in the network right to make the travel decisions
But in when we started this in 1950s we had very limited amount of error so a lot of our
Validation we could not really test these assumptions
Right that we actually made in modeling these problems
Over time we did validate this with limited amount of data that we have from lubricant occurs
from surveys and so on right, but still
We had many instances where these assumptions did not hold water right
So we were not able to kind of see what was the alternative?
mechanism by which these supply-side and hillside assumptions are
What I claim is that we now have an opportunity?
to actually cushion some of those fundamental assumptions
And let the data speak for itself and write these assumptions in a way that it fits our modeling framework
Right so that's one opportunity
Which we have been trying to kind of do to kind of let the data speak and inform the assumptions that we actually made
Okay, so that results in a new kind of
Modeling approaches, and I mean I can talk about this
Philosophy here for some more time, but in the interest of time if somebody's interested we can talk about that later on
so almost
Nine years back when we started this work the first question. We asked was well
We have a lot of this there from taxis and social media, which we are able to collect from Twitter
Bar is this theory really useful when we say usefulness one measure of usefulness is is it represented?
right and
For transportation people we know that issue very well
We want this to be represented because we want to kind of scale it from a sample to the entire population
Right so that was the one of the first questions that we actually asked in 2009 when we actually started this one
And what we did was to so we actually did
Look at various sources of data. We looked at various
Articles related to the usage of these kind of social media tools, and this is one snapshot of this kind of firm
information about each group distribution mobile phone users check-ins in
Foursquare right in New York City and what we see is
In urban areas so the conclusion from all of this analysis is in urban areas the data is
Reasonably to present, KT across all age groups
cannot be said for the
Rural areas where I come from best Lockett right but in urban areas, New York City, Chicago, San Francisco
And so on this is a pretty good representation
between the demographics and the
Usage that is there of these different types of angles. There's some under-representation about 65, but not too much and again
This is old air and the volume of data is significantly high
Thereby canceling out some of this noisiness, which is there in this kind of data?
Okay, so
So when we telephone models
Using this you know driven approaches on especially on networks all of our models on our networks
Right so we have various ways to kind of
Develop these models, so I kind of when I work with my students
I give them few guidelines about how the philosophy behind modeling these problems
One the first thing is that let the data speak for itself in terms of the assumptions
Let's not make overly restrictive assumptions on the modeling framework
right
and similarly on the choice of the model
Let's be pretty simple in terms of these choices of this model right so there are various data sets that we can use and wherever
possible you want to use degrade these kind of differentiated cells and
First you want to be able to clean this data that you actually in have then
There have been appropriate algorithmic modeling framework for that and then
Estimate is do the inference and then the estimation and testing of this
All right, so there is a philosophy behind. How we actually develop these models
for this okay, so that's my first part of the presentation which is like an
Introduction philosophy of how why this models are important, and how we actually go about doing it is
the second part of my presentation
I want to just dive deeper into one specific topic where this kind of integration takes place
okay, and
I've been a big believer of this kind of
Convergent research and what we mean by convergence resurgence is just explain that in the next slide
so I want to kind of show you how this converges between networks big data and
resulting in new
Enriching models can take place with an example of recovery in complex networks
Which are subjected to various kinds of shocks right shocks of various time durations shocks at various
The spatial locations so on so this is some part of my Chris plan that we currently are working on right now
Right so what we mean by this conversion to search is a deep
integration of
knowledge based tools and ideas
from related disciplines to kind of create new
synthesis that create new theoretical frameworks and modeling idols
right
This will help us to create new foundations for knowledge
Explore questions that are not possible with just one single domain right?
And most of the Grand Challenges that you actually see today beat in
cancer research beat in material science beat in Health engineering
Beat in infrastructure engineering right can only be done in my area can only be done by
Taking this kind of a convergent approach
because these problems are too complex for a single discipline right and
Recently NSF has been also very focused on this kind of converging research
And I just found this a few weeks back that and I said is going to focus on
This kind of convergent research over the next five to ten years right so this is I feel very
encouraged by looking at this kind of directions
Okay, so let me share, so let's take a deep dive into some of these things so we have data about
About these structural properties of various road networks various pipe networks where your sewer networks in various states
cities in the US and in Europe
And we also have functional properties of these networks
What's a functional property a functional property is how much of traffic flow is there on every lane?
What is the speed characteristics how much of water is flowing across these pipes?
What's the diameter of this pipes was the material of this pipe and so on right so for different?
kinds of infrastructure networks we have functional properties of the
So the cushions that we are interested are are their
Underlying signatures across physical natural social and human networks, which are common across all of these different networks
Are there some unifying scientific principles that are common across our?
Transportation network so one level down right so from a more general setting to specific to transportation networks
Do you see any colocation of these networks other is that a correlation correlation?
between these different kinds of networks either spatially or temporally and
The observation of networks of
The several good - rubber networks sure they do have similar properties right so here. We show this
Road and sewer networks right same here what you see is when you actually look at the degree distribution
Of these kind of networks
They share a common property and whatever
Scale that you actually do in whatever city network that you actually date right so they do share one
Property that this year is this degree distribution
Right and we have done this with different pipe networks. We have done this with Road sewer networks. We've shown that
Colocation does exist spatially and temporally in some cases
right then the question is that once we understand this once we establish this hypothesis the
cushion then is can you theory will build to understand the
Failure and recovery of this interdependent networks right so if you have a shock in this kind of networks. How does the shocks?
Propagate or cascade within these networks and across this multi-layer networks
And then can be really come up with how these networks would recover over time
Okay, so to do that first we need to understand this of course my dual mapping of this networks very quickly
I just want to share with you what we mean by this rural mapping so dual mapping is a
transformation from the original network to a different network and then look at the degree distribution
of this network
So this is the order of the network for example got that work right with nodes representing intersections link representing roads, no roads
Right and then they might have some functional properties of congestion power time and so on
What we do is we go to on man where?
The link
becomes in node right so link for example
With the same code name all of them will become one norm
And then the Lord becomes a link so the node being intersection or some kind of relationship now becomes a link
So we transform this so that why why do we do this so that we actually can look at some common?
Attributes to see the hierarchy if there's a hierarchy in this kind of our campus
so the door networks, no longer have Clara everybody and of their planet be able to uncover the hierarchy and
The important links in this network would be have a large node degree in the program. So if you take for example, Manhattan here
Right, that's the primal graph and the dual graph would look something like this so all of our analysis
So this is much more easier to understand and handle because now you're talking about the importance of some
central nodes and links in the network rather than if for example
They might be important bacteria
But that's broken into like 50 different things and that will not tell us anything about the properties that they've network as okay
So
the first
Important result is this one right so you have various road networks, and what do you see of this?
Degree distribution and this degree distribution is not all done on this dual racks
So you see that they all share the same kind of degree distribution right so it's truncated in some cases
it's truncated here, but they all follow a
Power law distribution right so that means it establishes that there's a hierarchy in this kind of road networks their main
Arterials which are connected to the secondary Ontario's to the local arterioles and the local yours so there's a strict hierarchy so some
some links actually carry a lot of traffic right and so this is an
80/20 rule that we see in many different networks, and we've done this analysis with many different
networks
The second thing is is this true if we reduce the scale
right
Is this true infinite years of scale so we have done a scale analysis that means if you take Pittsburgh?
You only take the internet word or one-fourth of the network one sixteenth of their peugeot?
1/64 of the network does it establish a similar car we should see that yes
It does it actually does have this scale free property
Across these different resolutions of this network, so we see that that's a very important property in this networks
Then after that what we do is how can you predict this kind of functional failures on?
Flow based kind of networks, so if you have flow based networks like road networks being one good example
How can you look at the?
Incorporate both structural failure second for example what happened in Atlanta, right?
So there was a there was a link there was a road on the bridge which part
Damaged caused huge condition problems in Atlanta. That's a structural failure for a few days right, but you have functional failures
Due to accidents due to over congestion and so on every day, but typically we do not really
Account for those things in our modern process very well, so the question is how does this function failures caused?
Cascading in these networks, and can you predict which links are it's going to spread that it is going to recover and so on right?
so these are some of the questions, which were interested in terms of understanding the
network structure and functional properties
so using the idea of that dual mapping okay, and a
data-driven approach because mine is to
understand this problem of this condition evolution and the cascading
Behavior on this and so we have done this on two big networks
One is Beijing there from the complete Beijing network, and the other is the Shanghai that one, okay?
And the very connected the data is since Google Maps is not popular in China these collected using Baidu Maps
right so Baidu Maps
We actually use the left Gator crawlers to collect the travel types of every link in Beijing and Shanghai
Networks at the resolution of about 30 to 40 minutes. Okay for about six months
We actually collected this data
and
So we have the state you have the state of the this is the speed information of every link in the Beijing and road network
and
What we do is
We take that
Network every 30 30 minutes right and we transform it into the dual graph and the dual graph here the function is
The state of the link whether it is congested medium consistent or local listed right we transform that into the dual space
And if it is very high congested that particular node is going to be splayed
This one will come in the next leg is going to be splayed right and that tells us that how it is going to actually
cascade into this kind of a process
So the implications of this is that disappearance of high dual degree nodes?
Tells us the state of that particular lane and do an degree nodes distribution roughness breakdown of this original
Hierarchy, so I'm going to play a video here using all of the data that we actually collected
So this is the Beijing road network we have the different ring roads here
so you can see the ring roads right all the ring roads here, so I'm going to play the
What I want you to observe is
How the color of this graph changes and how the slope of this
distribution actually changes
So let me first explain this base of the blue line
Which is the base to Agra the best dual graph distribution is when there is no
Congestion in the network, so that's the best we can actually do that's the best degree distribution you really we are
absent in their
Ex-wives
this xn y axis you know so this is the node degree and
This is the probability of having that node name, so this is a PDF. This is a probability distribution function
so this is when there is no congestion in the network and
This is when there is some conditioner, so I want you to observe when I play this video
How this red line is actually going to change and tell me if you observe anything?
interesting so
This is the timestamp 9:00 10:00 in the morning 9:45 1 yes
So any anyone and the wallet chair here, what are you up, so?
Unless the PM is pretty congested in the center
Right so as it's getting congested
What's happening this red line is moving away from the blue line and as the conditional reducing?
Kind of coming back closer to the blue line right?
So that's a clue into what's happening right so that means the degree distribution the slope of red
Tells us something about the state of this overall Network as you have more congestion in the network as more
functional failures happen within this network
This degree distribution is going to be away from our base
And that's tells you so the slope is going to be indicator of how much
Functional failures are happening in this net so that's the important insight
We can actually get from this kind of a problem, and we have done this in as I mentioned in Shanghai network as well
Right so now we can put a theory around this you can put some
very nice
Theory around this using both networks and big nigga right - yeah
So he was happening in - mostly to tell
the whole problem
and that is there's also reason for why it actually happens at the tail because
This is primarily being influenced in the major goal phase in the network right so the units for example
Are going to get more and more congested so you're going to see that so you say, it's a happening
Around from here to here, right?
But mainly you can see this difference in this in this part because the ring roads are the ones which get severely congested
during the peak hours
So based on this insights based upon all of this data analysis. We actually do
Weekend up a very nice
Model to predict which roads in this condition in some parts
Which doors is it going to propagate to and how much time will it take for it to recover back to the normal process, right?
So this is the normalized split
Probability distribution, so if we're going to split this dual notes
this is the PDF of this degree distribution, and this is the
Negative exponential distribution of how much time it would take for it to open it note split add tuna color back to this original
funk right, so we can get this empirically from all the data that we actually have and what we do is we develop a
For the microscopic and macroscopic model to determine
Which notes are going to likely split, and if it note splits
Which of the library nodes are going to actually split right? What's the likelihood that that I bring notes are displayed?
and we can characterize this kind of a
splitting probability using this Markov chain
process
And
so therefore
Some part use in this process one self spitting whether a given notice to display or not
Right and it's a function of the traffic load
That is actually going to be there and then self contagion
Which is the condition of more likely to propagate among the same road and Nyberg contagion?
Which is the condition is spreading to the neighboring roads?
We can predict that and finally the recovery which is a fan or splits. How long is it going to take to recover?
That back to the original state that we actually have right, so this is the very nice
Simple equation one good right to determine this substrate rate the self contagion way
And then the knife or contagion rating and some of these parameters
Are estimated from the data analytics that have done right and insights of that PDF that we actually saw?
So far right so once we build this model you can see we can use it for prediction purposes
So you can actually predict the likelihood of?
The spread in this kind of a network, and we have done this for
Different time intervals and different loadings if you have done this for weekday and weekend in in December 2015
This is for the Charlotte or network. We can do it with different weights given to those
Forum parameters that you actually saw in the previous slide right and you can see that
There's a very good agreement between the data
Right so this is you're bringing in the data. You're bringing in networks
You're bringing in the transportation knowledge everything into this to understand
So it's a physics-based model to understand the recovery of road networks, but also to understand where this spread is going to happen, right?
So one thing which is interesting is we have not really put a lot of
Assumptions that or people are going to make these habit risers right it's not
Transportation heavy in terms of these assumptions, right?
But some things we need to consider are related under the lanes really does the speed limits with it because the design standards of course
so those things can be
Parameterize and put into the model, so we're thinking about that but the key takeaway which I'm going to take away
I want you to kind of
Think about is because the assumptions are not too much heavy on the transportation side
Water researchers, they're actually developing similar modeling frameworks in Germany
They're using the same model to understand the recovery of pipe networks and sewer tunnels right so because the modeling philosophy
There's a flow there is a function
There's a structural bomb - despite networks and sewer networks
So you can extend these concepts to understand the failures in both I and so on
Okay
So that's the end of part two of my presentation and part three is
about some
Advertisements about work that we actually have done in this
data science space on smart mobility
And I want to do is I just spend a minute or two on each project that we have done
And if anyone is interested to know more about a specific research question just raise your hand okay
and then I'll be happy to expand upon that in more detail so we
have worked with different kinds of guesses we have worked with New York City Taxi get us ed
extensively
Right we have merged with get a couple of datasets in the u.s.. From taxi GPS dinner sense
about six cities in China again taxi lot of social media dinner sets in urban areas
But also to disaster conditions we have done
We have used mobile phone data
Right we have again used transits matte container and so on and so on so I'll just show a sample of some of these er
There since so some of these datasets have geolocation and geolocation places where you are throw
In terms of telling us information about activities which some people participate in it does not have
socio-economic characteristics
We do privacy concerns and some of this data that we are collected from social media right at Foursquare has
Missing activities, so it does not for example is social media chicken Guerra
But it does not have all that given because people don't check it
This is the taxi data right you can do simple visualizations of this tax evader
Most of you may not play with this kind of idea before right, so these are the origins of these taxi trips
In this nations of these taxi trips so any one person an analysis with this deal that 70% of trips come from Manhattan
Right hotspots in terms of this airports and so on and so right so you can do this kind of simple visualizations
from this kind of process
One of the questions we have
Explored early on is in terms of estimating the travel time using this kind of contacts in here
But not just estimating a deterministic travel time, but estimating the entire distribution
of this travel time, right
so as
most of you may know the taxi data set in New York City does not come trajectory information
It only has to pick up and drop off information it has the travel distance pair
Type of pavement and so on right, but it does not really have the trajectory so one of the first questions
we did this way back in 2011 right is to understand how to
predict the distribution for all links using
taxis as probes in this
network, so it's again a good a combination of network tools and
Data science tools to kind of estimate these
Distributions so without really going into the details
There are some nice algorithmic
Approaches to boost the performance of these
approaches using e/m approaches and fast path inference techniques, which we've used in this work and
if you look at this research, they're pretty good, so this is a part of it, Manhattan Network right you can estimate the
Speeds both the expected value and the standard
Deviation of the speeds right and you can see that and we compact this with limited
Loop detector later on some limited links that we actually have and agreement is about 8590 for
Right there. You can see that the average speed during peak hour is about eight miles per hour in most of these links in
Right so you can do this kind of
analysis
Then we also done
one very interesting
Understanding of this taxi markets to understand how taxi markets
Can be modeled right with this app based with the introduction of uber lyft DB and so on and the traditional taxis?
They're all in the same market, right?
What is the effect of this new entrance on this traditional taxi markets?
What is the effect of surge pricing in terms of the choices that people make right?
How do you want this kind of a market and the effects of?
utilization of this month so this is the taxi market equilibrium with this ad based taxis
Incorporated between this kind of a service, and we tested this
using limited data from again, New York City and a few cities in Wuhan China and Shanghai in this one
So you can model this as a game right where there's competition between
drivers and PBS's traditional taxi system a DTS is
Address taxi system and they are all competing for the passengers the search price
impacts the utility of the driver and the passenger right and
We can model this as a leader follower kind of a game right and to
The that three different kinds of games so you can have this
Game one with the driver as the leader
Within a TDS driver as the leader and the passenger being a selfish bastard
You're trying to me - his or her cost in this kind of a network
So we can formulate this equilibrium properties of this game
You can kind of show some properties about the solution existence in this kind of curve
What I get a follower games right the card part of this is how to solve this kind of games, right?
How do you solve this kind of games?
Efficiently is something which is a little bit challenging
So you can compute the equilibrium for this kind of games?
Right equilibrium is the utilization of these different types of passengers in this kind of a game?
and
You can look at the effect of the search price
So the search price you can see the x-axis and the total passenger cost on the y-axis
You can see the effect of computation right so
The red is the ad pays taxi the green is the traditional taxi and then this is the number of passengers as the cost?
Here the search multiplier on the atps is
decreasing right more people are more likely to
Actually move towards their ATDs. You can see that kind of computation
I will place out in this final system
One result, which which is important is if you find that the search price multiplier of 1 find who gives the most?
competitive
Market, if this is the tipping point really when people kind of move from the top base to the traditional taxis
Recently there's a paper by uber with some economists right where they have used a lot of Hooper data
to understand this question
And this number is exactly what they done one point two is exactly what they got from all of that
Analysis of the surge pricing data that we have and this is published I think North American, Economic Review
With some economist that they have actually done
So last week uber
Introduced a new system call
Right there
You actually need to walk a little bit wait a little bit, and then guru price in terms of
using uber half-right who but just launched that last me the
Same idea is there to this paper
and this was done like fearless back in terms of the optimal assignment and
incentive design in taxi do right
This is the price cheering exactly
But this is taxi blue price actually
New York City had this I think in 2010 with the traditional taxis that various stops where people can wait
And they get your blue price right and they abandoned that I think in six months or a year
So what are the locations where you can people can pull together?
right bit for a taxi and go together to some common destination lady talk go to the same destination, but they may have to
walk
A couple of bucks the motivation for this research question is this evidence that we actually have from New York City Nancy
83% of passengers can find at least one group ride
if their walk
quarter of a mile and wait for two and half minutes
That means people can eliminate we can eliminate 83% of individual taxi rides
If people are just willing to wait right
two and a half minutes and walk a little bit of a distance couple of blocks right this evidence actually helps us to
model this kind of pay incentive
Mechanism for doing this kind of crime so they're interesting network problems here as to how do you group these rides?
What locations do you do that?
How do you determine the price of this group ride?
Right so all of that. Is there in this kind of research work that we actually have done
two years back
the problem is also interesting from a mathematical point of view because
These problems are very hard
Bipartite graphs and so kind of solving them at very large network size has become very hard
So we actually did develop some heuristics to actually solve this and you can see our
exact algorithm the greedy eyes of the manicurist and curiosity actually gives pretty good performance
But these kind of problems
Okay, all right, so
I'll just quickly talk about some ongoing work in the last few months
So we have been collecting data from uber
I mean I gave a talk at uber a few years back, and it lived a few years back as well
We had conversations about data sharing, but really we did not really get any data
specifically useful for research and so for trends, right
So what we decided finally was that we were going to collect the data by ourselves from uber app and so what we did was
a group of students and myself we actually did
Develop a hack into the uber app and we put in some
Virtual stations, let me see if there's there
Yeah
So we put in some virtual stations data collection stations as you can see all female span right we put about
More than 470 stations we collected this data from April 7th to May 2nd
from various Cooper IDs so we have a lot of
100 GB of data per day from uber
Right so we have a lot of data from uber which we have been kind of analyzing so this is all data
which tells us again information about pickup drop-off right the fare with a little search multiplier and
Location and so on right so we have actually done a lot of analysis with this data after the summer
Right so if you're interested keep an eye for this right. I'll talk about that in a future presentation sometime
Ok we also done a lot of work with social media data
with Twitter data right so we have used this data for
understanding activity patterns in cities doing urban activity pattern classification
Right so, this is something you can actually read publish in Farsi
We have used this for disaster analytics to understand sentiment in disasters to see what concerns people have at what time how?
This consoles are evolving with time
Right so this, Oklahoma tornado is one example. We have also done this when Hurricane sandy
Inferring land use so we have used the Twitter and Foursquare
They're doing for land use is the actual land use pattern in New York City
And this is the input lag is from will be supervised and the unsupervised learning
Algorithms that we actually have held up to predict land use
And of course we have done a lot of work in social networks and disasters starting from Hurricane sandy more recently
I've been collecting data from how you can Mathew and
Harvey right so this hurricane sandy we have about 52 million tweets almost all the data
from 30 million users from October 14 to number 12
2012 right so we have been kind of building the social network the funnel question
We are interested here is how does influence work?
how does
The people that you are connected to in a social network
Influence your decisions that you actually make right on a day to day basis whether you evacuate a lot
What time you are activated? What mode you choose? So how does your social network who you are connected to the?
quality of those connections
And how does that influence the decisions that you actually make and the risk you're actually right and that's what really we are trying to
understand
in some of this research
I think I'm almost out of time. I think I see it's 322
Okay, so this is all the autonomous vehicle stuff, which we are doing so skip that
We are developing a roadmap for this autonomous vehicle for Indiana State
Okay, so this is my final plugin for you guys, so this is a book which is just coming up
This is transportation analytics in big data right so this is just being released
I think this month or next month is going to be out, so if you're interested in knowing some of the recent
Emerging
Trends in this area you can look at this
But I'm also a I'm a CDS editor for this Elsevier book series of smart mobility
So we are putting together a few
Books in this this year next year, so I see few experts in this room who could be potential
Editors for these books I'm going to talk to them after this presentation, right
and with that
My key takeaway, I see some students here right my key
Takeaway, if you're interested in this area is you need to be comfortable with the unknown
Right you need to be comfortable with
Really getting your hands dirty and messy with this kind of a data right so most traditional transportation
Students do not have training with handling this hafidh data because either they are not familiar with
Machine lying there myung right they're not familiar with coding skills. They are not very good on
fundamental math in networks and
optimization so if you are really interested in this you need to
Really learn those kind of right so one thing which I always
Say in particular meetings is our curriculum needs to be upgraded
To actually handle the challenges we cannot just keep up about four-step planning processes. We cannot just talk with
traditional safety analysis and so on
Our transportation curriculum should adapt and should update to some of these recent files
Right, and I think that's something which we will continue to do that but students you're much smarter than most of us
So I really encourage you to kind of be creative and go out of the box to kind of
Learn this kind of skill set so thank you very much
So obviously, I mean, this is some you only see from the president
Elections, there's a big impact, but how do we know?
Where is the control? See you know people?
Were not using social networks. How do you understand? I mean? What is the real effect of these social networks um?
How do we design an experiment didn't tell us?
Real impact versus because you need a controller in general
You don't have that out of the study because you're not doing a controlled experiment, so what do you do?
Right that's nothing of any value sure right because we really
don't have a
Good control loop to kind of see exactly
how
How much accuracy is there is this kind of information how these people I think one way to do the control loop
I have enough I haven't done that in a different context, but not in this social network context
used to do experimental games
that means in a
laboratory setting you create this social network, so you you ask people to interact with other people and
See the evolution of the social networks
see what kind of information they actually communicate in a more controlled setting and
Designed incentives to understand these dynamics
Right and then see how that how the same
similar groups in with similar characteristics would perform in a virtual set
Right for example you might get their Twitter IDs and so on suffering somewhere in your service
We actually ask people to give them if they want to share their pen right here
So you can kind of observe their poor behavior in a virtual setting in a much larger?
Social network and see if better the inferences that you actually get from a controlled setting would match with the virtual setting
You haven't done that
And I don't know anyone who actually has done that yeah, I mean the same question comes on when it will be good research
But I mean everything into big data right because you get the data
But it's not really controlled experiments sure but I think for other kinds of Big D. Electrons after tax again
I don't think we in a control
Knows right yeah these people are observations
They don't necessarily address the factors that
Reduce that behavior and death is very much dependent on how can you?
Predict in these four years from now on based on what you're observing today. If you'll know the factors that affect that behavior
Are significant so that there is a limit which is this talent?
Exercises or or analytics are very targeted to they are targeted to to the knees
And what can I do to change something or maybe understand? Where is happening right, but the near term however?
limits the pebble solutions that woman can get intervene with if you want to make changes that actually decide
What you know
That's still good, it's not good
So the way I would answer that question is that
It's true that yeah, the big dealer does not have
Some of the indicators that we actually look for in making long-term forecasts and transportation planning for example
we don't have indicators such as
income demographics the the family structure all that kind of information but
the point is that we don't perhaps need all of that kind of information because
if you go back to 1980s and look at the forecast from nineteen ninety ten your forecast for your forecast and
Pretty much all of them are wrong by
seventy hundred percent, but that's because they do not include the
political institutional aspects
But no one knows the political institutional aspect also there also makes it more mystery to present
my
Responses all of this long-term forecast that people with
Are completely not accurate I would like to see being applied right how how we like to see
how the financial
Investment is easy. Well. What what is where is money going right and what is?
Transportation, I don't know anything that is actually being done, but in
Banking and finance. There's some people were actually looking at that
They're using this to see where money is flowing
what type of
investments are
Hot on a given day and so on right, but but to the University of us question
I think I mean when you have this volume of data
You can correlate it with
indicators of mackerel indicators like for example when you're doing planning it correlate with the GDP growth
You can correlate with land use changes
You can correlate with the the transect density that is there in a given zone so this map but not
Individual level indicators because we don't have that in most of this video
But my point is that any long-term forecast that we actually made is not precise. It's not really accurate
Yeah, I understand that but basically we're talking about the existing behavior, and not predicting it I think
Over time you accumulate also, but I think you can learn that there's an aspect of learning in this kind of problems
But any prediction right is so that is that is the important thing which is any prediction is
If you fall flat on the face when you actually go and see the very hard to read it because we don't really
Cannot predict some of these unknowns the unknowns being political changes technological changes that I have
Become predict we can learn some things from historical data, but we can't put in the unknowns that are there so there's uncertainty and stochasticity
So those of you are interested in that there is a professor here
lesson pilot
The book on called the Black Swan okay, so the Europeans fat black slit. I guess ones are all white
Yes, that's what events that we cannot put it in there. Haven't seen it. We cannot predict the
Best one so that pretty very long-term predictions really difficult, that's like not at the point. I was trying
To impress is the big data
It reaches our knowledge of of the existing existing losses, but beyond that it wasn't
a decision for future
That's really limited on four step process
I'm not talking about three very interesting
There's some legal use the application is that this is limited Mounties also also new possibilities
And I mean I think this all this data, but it's the limit is essentially what you put in the water
And what you cannot put it in the model is some of your class one of you girls like that
So the model is only as good as what you actually put into it and the nice thing about this big mirror is you have?
the evolution you have the longitudinal data you have the
dynamics associated with how these things are changing and
Those are very useful for short-term forecasts, but also geared up for camp
But certainly not very good if you're trying to forecast ten forty years going on
there is also a new center coming up in NY useful AI now you know so basically they're looking at the data and
The viruses in the day and what you observe comes with the biases that that people have done
So I I mean doesn't mean that everything
Everything that you learned
So I mean you learned what you observe is, so there's no problems with this big data
But it's much better than not having anything and then it's very very
Different for example cosmologist or sangha
Observing the nature because it has its own dynamics here as human beings policy and so many things so I mean
It's very complex. It's very complex system. Yeah, I mean just to have this discussion
I think one of the advantages of having that big data even for long term planning is
You you can be more adaptive right so that you you can be more responsive?
You can look at all the examples on your slides you showed
EE to
Model the values patterns, and so if we if we currently follow the way that we're operating right now
We take very long time to adjust our a land use code
So it's so like very good currently the trends for parking in the households in New York City or shifting vary drastically
But then we the city government doesn't always respond as readily on that, but if we're making use of big data
We could make those kind of adjustments, and so I think there are advantages to making useful
And to the point about these forecasts for the future as well
You were saying how there's these big?
big
Confidence intervals in the predictions and there there have been studies that looked into in hindsight into all these
studies over the years, and I think they found that for
predictions for like traffic demand tends to be
less unbiased look like more unbiased and
Around like twenty twenty or forty percent or something like that whereas like for real projects because there's a lot more lobbying
going on but the
politics involved, it's an institutional
kind of
system in place because
Remember in their signature course yes, so I think that's something that we we do need to work on better
To Joe's point is that I mean
Today most of the decisions are also tactical so this kind of data is kind of allowed to make those
tactical decisions especially that a country like the United States because a
New things will not build an offer so basically tack operational decisions are very important, so it allows you to make those operational decisions very
Effective I mean if you go back, China India or Turkey things are changing so fast like you know
We know model. You are no available were probably as much as as long as you put in the
Mass was I?
Mean I think there's a lot of discussion
But I think some things that we I have been thinking about and I think we have been trying to do is
to
Introduce at the undergraduate level itself like a one credit course
Related to smart mobility, which is taught by a team of faculty
Who are working on these kind of problems and showing the students that?
transportation and civil engineering is not just about
developing concrete design course doing
Highway design or pavement design
But it's beyond that and you know there's all these emerging things
I think it's showing them the excitement that is there right now to be a
bubble mobility
Research right and so we are doing that in the introductory level another native
At the graduate level we have actually developed a couple of new courses and teaching one of them this much mobility
Where we actually do introduce we tied in it's not an independent course
But it builds upon the network analysis course, which we are only half and tie sit in with
How these concepts are related to solving some of the problems?
And teaching them some of this combination during the night
So these are some of the things that will actually fight for the terminal, but yeah, not for things we have been thinking about
That any other questions so with that, thank you again
