This article is an orphan, as no other articles link to it. Please introduce links to this page from related articles. (April 2026) |
Model inversion attack is a type of adversarial machine learning attack where an attacker tries to reconstruct or infer sensitive information about a model's training data by analyzing the outputs of a trained machine learning model.[1][2] Instead of directly querying the underlying dataset, attackers query the model (usually via APIs or prediction interfaces), and leverage patterns in the model responses to infer properties of the original inputs.[1] These attacks leverage the fact that machine learning models encode statistical information about their training data in their parameters and outputs, which can unintentionally leak private or proprietary information.[3]
Depending on the access level to the target model, model inversion attacks can be performed in both black-box and white-box settings.[2] In a generic attack, an adversary makes several queries to a model and leverages the responses (e.g. confidence scores, predictions) to train a surrogate or inversion model that learns to approximate the inverse mapping from outputs to inputs.[1][4] This process may enable the reconstruction of sensitive attributes, e.g., facial features, medical data, or user behavior patterns, from models trained on such data. The technique has been demonstrated against various models like deep neural networks, classification systems etc. The technique has significant privacy risks in areas like healthcare, finance, biometric identification etc. Mitigation strategies include restricting model access, reducing output granularity, using differential privacy and monitoring anomalous query patterns.[5]
See also
[edit]References
[edit]- ^ a b c Fredrikson, Matt; Jha, Somesh; Ristenpart, Thomas (2015). "Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures". Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security (CCS). pp. 1322โ1333. doi:10.1145/2810103.2813677.
- ^ a b Zhang, Jiaqi; Chen, Kai (2021). "Model Inversion Attacks: A Survey". IEEE Transactions on Knowledge and Data Engineering. doi:10.1109/TKDE.2021.3065936.
- ^ Shokri, Reza; Stronati, Marco; Song, Congzheng; Shmatikov, Vitaly (2017). "Membership Inference Attacks Against Machine Learning Models". IEEE Symposium on Security and Privacy: 3โ18. doi:10.1109/SP.2017.41.
- ^ Yang, Zhengxue; Zhang, Jian; Chang, Eugene; Liang, Yingyu (2019). "Adversarial Model Inversion for Deep Neural Networks". Advances in Neural Information Processing Systems (NeurIPS).
- ^ Dwork, Cynthia; Roth, Aaron (2014). The Algorithmic Foundations of Differential Privacy. Foundations and Trends in Theoretical Computer Science. doi:10.1561/0400000042.
