We propose a novel taint analysis tool using
neural program embeddings, called Neutaint.
We obtain values for taint sources and taint
sinks from multiple dynamic execution instances
and train a neural network with these value
pairs.
The learnt neural network model works as a
neural program embedding that can approximate
information flow of dynamic execution.
To extract taint information from the neural
program embedding, we compute the gradient
of NN output with respect to NN input.
The gradient can indicate which part of taint
sources affects taint sinks.
In this example, the first byte of X has the
largest gradient value and hence affects Z.
We compare the hot bytes accuracy and overall
runtime overhead of Neutaint against 3 state-of-the-art
dynamic taint analysis tools.
On 6 programs, Neutaint achieves on average
10% improvement on hot byte accuracy while
reducing runtime overhead by a factor of 40
over the second-best tool Libdft.
