Potemkin Villages in Deep Learning

2 minute read

Published:

Potemkin Villages in Deep Learning
I have received a few messages asking for clarification what I meant exactly with Potemkin village models (for all who missed it: Fundamental Research). The term Potemkin village found its origin in Russia where Grigory Potemkin, a former minister, built fake portable settlements solely to impress his beloved Catherine II. Thus, the term Potemkin village means any construction which is built with the purpose to deceive others into thinking that a situation is better than it really is. So are almost all Deep Learning models - no matter if we are talking about language translation systems, picture labeling applications, or voice assistants. With the discovery of Adversarial Attacks, it has been proven that current neural networks learn shallow decision boundaries instead of actual underlying truths. Research has also shown how easy it is to manipulate currently deployed, state-of-the-art image recognition and voice control systems. As a result, any Deep Learning model can be fooled by adding controlled noise to its input such that humans cannot recognize a difference. Let’s imagine a voice control assistant at your home. The just mentioned leaks can be exploited to embed commands in e.g. manipulated music files. You won’t even hear a difference, but your assistant will naturally process those hidden messages. Though we are aware of the presence of Adversarial Attacks, we are not able to protect models from being exploited. It might not be a huge security issue for many of our current applications, but what if we are talking about road sign detection in autonomous driving environments? What about healthcare applications? Or what about Google’s newly introduced voice assistant which will make calls and manage appointments for you? Would you trust such as system which is vulnerable in a way we cannot prevent yet?

More resources:

  • Carlini, N. & Wagner, D. (2018). Audio Adversarial Examples: Targeted Attacks on Speech-to-Text. arXiv preprint arXiv:1801.01944.
  • Papernot, N., McDaniel, P., Goodfellow, I., Jha, S., Berkay Celik, Z., & Swami, A. (2016). Practical Black-Box Attacks against Deep Learning Systems using Adversarial Examples. arXiv preprint arXiv:1602.02697.
  • Papernot, N., McDaniel, P., Sinha, A., & Wellman, M. (2016). Towards the Science of Security and Privacy in Machine Learning. arXiv preprint arXiv:1611.03814.
  • Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., & Fergus, R. (2013). Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199.