ݮƵ

Research News

A Smarter Way to Protect Machine Learning from Tampered Data

August 5, 2025

A new technique can strengthen machine learning models against poisoning attacks and outperforms existing defenses

Learn how to protect ML from Tampered Data – Tune in now!

 

As machine learning systems become integral to industries from healthcare to cybersecurity, their vulnerability to training-time attacks has become a growing concern. One of the most insidious forms of these attacks is called poisoning and occurs when an attacker subtly modifies the training data to degrade model performance. One common trick is to flip the labels of data points — telling the system that spam is not spam, for example — so it learns the wrong patterns.

 

A team of researchers from Khalifa University and the University of Milan has developed a new defense strategy to fight this. Instead of training one big model, they split the work among several smaller models, called an ensemble. But rather than splitting the data randomly, each data point is assessed individually for its susceptibility to attack and then routed appropriately.

 

Prof. Ernesto Damiani and Dr. Chan Yeob Yeun, from Khalifa University’s Center for Cyber-Physical Systems (C2PS), with Nicola Bena, Claudio Ardagna and Marco Anisetti from Milan University, published their system in .

 


Dr. Nilesh

“Machine learning models can be tricked by poisoned data. Our method checks which data points might be poisoned and reroutes them to protect the system. It’s a simple idea that makes machine learning much more secure.”

Prof. Ernesto Damiani, Khalifa University.

 

The system uses three signals to spot suspicious data: how close the data is to the decision boundary, whether it looks different from its neighbors, and how far it is from typical examples of its class. If a data point looks risky, the system can either spread it thinly across models or send it all to one model to contain the damage.

 

Tests showed this method made machine learning models more resistant to attacks, especially when more of the data was poisoned. On certain datasets, it outperformed older methods that rely purely on random distribution. However, it worked best when the suspicious data was spread out evenly — in some cases where bad data clustered together, the method had limits. However, the technique runs quickly, and doesn’t require removing any data outright, which makes it practical for real-world use.

 

Plus, their approach is efficient and scalable. Even as dataset sizes increased, processing times grew linearly, and the method remained faster than many existing filtering techniques.

 

As adversarial machine learning threats evolve, the study demonstrates that proactive, risk-aware training processes can offer a powerful defense — shifting the paradigm from random redundancy to intelligent resilience.

 

Jade Sterling
Science Writer