My research is on the mathematical foundations
of data science, particularly on the design
and analysis of algorithms that are provably
effective for exploiting information from
available data and the results constraints.
In the big data regime, surprisingly, many
of the problems we are interested in actually
are ill-posed in the classical sense because
many of the data we collected are actually
very messy, contains lots of missing entries.
The number of parameters we want to recover
sometimes can be much higher than the number
of observations we have.
In order to solve such ill-posed problems,
we have to be able to exploit prior information
from the data.
Fortunately, many high-dimensional data actually
has interesting low-dimensional structures
we can exploit.
My research, primarily, is on the design of
efficient data representations that can employ
such structures in the algorithm.
And then we try to design provable performance
guarantees on these algorithms.
Our research actually has led to new algorithms
for several interesting imaging modalities
that are very important in science and engineering
applications.
One example is an imaging technology called
super resolution fluorescence microscopy.
With the collaborators we have, the domain
experts, we actually developed new algorithms
that can be able to achieve much higher resolution,
both in space and in time, using the same
available data, but also using fewer computation
resources.
The other example is phase retrieval, which
is another interesting imaging modality that
has a lot of applications.
In, for example, crystallography and astronomy,
we have designed new algorithms for the phase
retrieval problem, which are much more scalable,
much more robust than existing solutions.
So many of the big data algorithms people
use in practice are heuristic algorithms.
It has been very successful, but lacks theoretical
understanding.
So while our goal is trying to understand
why such simple heuristics work so well in
practice, can we theoretically analyze why
they are so successful?
And hopefully, our analysis can potentially
shed light on how to design new algorithms
that can improve their practical success.
So I imagine data science to be a truly interdisciplinary
field.
But to do this, we actually have to work very
closely with domain experts in different engineering
domains, science domains, where they actually
have the data, and then work with theoretical
science people from many different fields
in statistics, mathematics, electrical engineering,
and computer science.
