Machine learning (ML) systems are the cornerstone to analyze organizational data driving both innovation and informed decision-making. There is virtually no industry that is not touched by ML. It has been widely adopted and greatly embellished manufacturing, retail, healthcare and life sciences, travel and hospitality, financial services and medicine industries. Like any other technology, ML is a double-edged sword where the algorithms that drive advancements are also liable for exploitation from adversaries. Machine Learning models are powerful yet prone to severe vulnerabilities due to data dependency and lack of standardized security measures. Hence the primary aim of OWASP ML Security Top 10 project is to address top 10 security issues and how to handle them effectively in order to develop high quality ML-enabled products.
This project will provide an overview of the top 10 security issues of machine learning systems which include threats from adversarial attack. Each vulnerability listing will have a brief description along with potential impact and a note on ease of exploitation, and prevalence in real-world applications.
In what follows a brief description of each of ML Security Top 10 vulnerabilities will be given along with impact, examples, and important mitigation strategies. Then an outline of SecPod strategy to handle them will be presented.
ML01:2023 Input Manipulation Attack
Input manipulation is used in a broad sense to cover even adversarial attacks. The attacker deliberately alters input data to mislead the model which can result in unintended consequences.
Consider for example, a deep learning (DL) model trained to classify images into different categories, such as dogs and cats. Suppose an attacker manipulates the original image with small, carefully crafted perturbations that causes the model to misclassify a cat as a dog. Then the attacker can use the manipulated image to bypass security measures to login to the system causing the damage.
Take a DL model is trained to detect intrusions in a network. If an attacker carefully manipulates packets to evade detection by IDS. This allows attacker to modify network traffic pattern that cannot be detected by IDS.
Here are some preventive measures to help in mitigation:
- Either training with adversarial examples or adversarial training helps model to become more robust to manipulative attacks and reduce chances of being misled.
- Input validation is another important defense mechanism that can be used to detect and prevent input manipulation attacks. This involves checking the input data for anomalies, such as unexpected values or patterns, and rejecting inputs that are likely to be malicious.
ML02:2023 Data Poisoning Attack
This occurs when an attacker manipulates the training data to cause the model to behave in unexpected ways.
An attacker poisons the training data for a deep learning model that classifies emails as spam or not spam. This can be done by injecting the maliciously labeled spam emails into the training data set by compromising the data storage system. Alternatively, he could manipulate the data labeling process by falsely the labeling of the emails.
An attacker poisons the training data that is used to classify network traffic into different categories, such as email, web browsing, and video streaming. The method used is to incorrectly label traffic data, causing the model to make incorrect traffic classifications leading to misallocation of network resources and/or degradation of network performance.
The following approaches are recommended for mitigation: –
- Data validation and verification of the training data is thoroughly validated and verified before the training process. This can be done by implementing data validation checks and employing multiple data labelers to validate the accuracy of the data labeling
- Secure storage of training data via encryption. Regular monitoring to detect tampering and anomalies
- Data separation between the training data from the production data.
- Have a strict access control of training data to limit who can access it.
- Model validation using a separate validation set to detect any data poisoning.
- Train multiple models using different subsets of the training data and use an ensemble of these models to make predictions to reduce the impact of data poisoning attacks
- Use anomaly detection techniques to detect any abnormal behavior in the training data, such as sudden changes in the data distribution or data labeling. These techniques can be used to detect data poisoning attacks early on.
ML03:2023 Model Inversion Attack
It’s a type of reverse-engineering AI attack during which an attacker tries to extract personal information about a data subject by exploiting the outputs of a target ML model.
An example of an attack scenario is when this attack is used to steal personal information from a face recognition model of an organization. The attacker trains a separate ML model, called inversion model, on the target model output to predict the input data. This could be done by exploiting a vulnerability in the model’s implementation or by accessing the model through an API. The attacker would then be able to recover the personal information of the individuals from inversion model’s predictions, which is not intended for the target model to reveal.
Another example is bypassing a bot detection model in online advertising platforms to promote ads. The advertiser executes this attack by training their own bot detection model and then using it to reverse the predictions of the bot detection model used by the online system. The end result of the attack is that the advertiser is successful in automating their advertising campaigns by making their bots appear as human users.
Here are preventive measures that can be taken:
- Limiting access to the model or its predictions to prevent attackers from gaining information needed for the attack.
- Input validation (check format, range and consistency) to prevent malicious data that could be used to invert the model.
- Model transparency can help to detect and prevent model inversion attacks.
- Regular monitoring of the model’s predictions for anomalies can help to detect and prevent model inversion attacks.
- Regular model retraining to prevent the information leak by incorporating new data and correcting any inaccuracies in the model’s predictions.
ML04:2023 Membership Inversion Attack
This attack occurs when an attacker manipulates the model’s training data in order to cause it to behave in a way that exposes sensitive information.
Consider as an example inferencing financial data from a machine learning model. A malicious attacker wants to gain access to sensitive financial information of individuals. They do this by training a machine learning model on a dataset of financial records and using it to query whether or not a particular individual’s record was included in the training data. The attacker can then use this information to infer the financial history and sensitive information of individuals.
Techniques to mitigate denial of service include the following: –
- Model training on randomized or shuffled data to make it more difficult for an attacker to determine whether a particular example was included in the training dataset.
- Model obfuscation of predictions by adding random noise to make it harder for an attacker to determine the model’s training data.
- Regularization techniques such as L1 or L2 to help prevent overfitting of the model to the training data. This can help reduce the model’s ability to accurately determine whether a particular example was included in the training dataset.
- Reducing the training data to help reduce the information an attacker can gain from a membership inference attack.
- Regularly testing and monitoring the model’s behavior for anomalies can help detect and prevent membership inference attacks by detecting when an attacker is attempting to gain access to sensitive information
ML05:2023 Model Theft
This attack occurs when an attacker gains access to the model’s parameters and takes full control. Suppose a malicious attacker is working for a competitor of a company that has developed a valuable machine learning model.
The attacker reverse engineers and finds the company’s machine learning model, either by disassembling the binary code or by accessing the model’s training data and algorithm. Then they can recreate the model and start using it for their own purposes. This can result in significant financial loss for the original company, as well as damage to their reputation.
Here are some ways to mitigate supply chain vulnerability: –
- Encryption: Encrypting the model’s code, training data, and other sensitive information can prevent attackers from being able to access and steal the model.
- Access Control: Implementing strict access control measures, such as two-factor authentication, can prevent unauthorized individuals from accessing and stealing the model.
- Regular backups: Regularly backing up the model’s code, training data, and other sensitive information can ensure that it can be recovered in the event of a theft.
- Model Obfuscation: Obfuscating the model’s code and making it difficult to reverse engineer can prevent attackers from being able to steal the model.
- Watermarking: Adding a watermark to the model’s code and training data can make it possible to trace the source of a theft and hold the attacker accountable.
- Legal protection: Securing legal protection for the model, such as patents or trade secrets, can make it more difficult for an attacker to steal the model and can provide a basis for legal action in the event of a theft.
- Monitoring and auditing: Regularly monitoring and auditing the model’s use can help detect and prevent theft by detecting when an attacker is attempting to access or steal the model.
ML06:2023 AI Supply Chain Attacks
AI supply chain attacks occur when an attacker modifies or replaces a machine learning library or model that is used by a AI system. This can also include the data associated with the machine learning models.
Consider an attack on a important ML project in an organization. The attacker knows that the project relies on several open-source packages and libraries and wants to find a way to compromise the project. The attacker executed the attack by modifying the code of one of the packages and uploads this modified version of the package to a public repository, such as PyPI, making it available for others to download and use. When the victim organization downloads and installs the package, the attacker’s malicious code is also installed and can be used to compromise the project.
This type of attack can go unnoticed for a long time, since the victim may not realize that the package they are using has been compromised. The attacker’s malicious code could be used to steal sensitive information, modify results, or even cause the machine learning model to fail.
To mitigate these risks:
- Verify Package Signatures to detect any tampering
- Use Secure Package Repositories to enforce strict security measures and update packages to patch vulnerabilities.
- Use Virtual Environments: Use virtual environments to isolate packages and libraries from the rest of the system.
- Perform Code Reviews: Regularly perform code reviews on all packages and libraries used in a project to detect any malicious code.
- Use Package Verification Tools: Use tools such as PEP 476 and “Secure Package Install” to verify the authenticity and integrity of packages before installation.
- Educate Developers: Educate developers on the risks associated with AI supply chain attack and the importance of verifying packages before installation.
ML07:2023 Transfer Learning Attacks
Transfer learning attacks occur when an attacker trains a model on one task and then fine-tunes it on another task to cause it to behave in an undesirable way.
Scenario #1: Training a model on a malicious dataset – An attacker trains a ML model on a malicious dataset that contains manipulated images of faces. The attacker then transfers the model’s knowledge to the target face recognition system for identity verification. As a result, the face recognition system starts making incorrect predictions, allowing the attacker to bypass the security and gain access to sensitive information
The following are some of the ways help to mitigate plugin vulnerability of LLMs.
- Regularly monitor and update the training datasets:
- Use secure and trusted training datasets:
- Implement model isolation: Implementing model isolation can help prevent the transfer of malicious knowledge from one model to another.
- Use differential privacy: Using differential privacy can help protect the privacy of individual records in the training dataset and prevent the transfer of malicious knowledge from the attacker’s model to the target model.
- Perform regular security audits: Regular security audits can help identify and prevent transfer learning attacks by identifying and addressing vulnerabilities in the system.
ML08:2023 Model Skewing
Model skewing attacks occur when an attacker manipulates the distribution of the training data to cause the model to behave in an undesirable way.
This attack helps attacker get financial gain through model skewing. A financial institution is using a ML model to predict the creditworthiness of loan applicants, and the model’s predictions are integrated into their loan approval process. The attacker provides fake feedback data to the system, indicating that high-risk applicants have been approved for loans in the past, and this feedback is used to update the model’s training data. As a result, the model’s predictions are skewed towards low-risk applicants, and the attacker’s chances of getting a loan approved are significantly increased.
This type of attack can compromise the accuracy and fairness of the model, leading to unintended consequences and potential harm to the financial institution and its customers.
To prevent this, developers need to perform the following:
- Implement robust access controls to ensure that only authorized personnel have access to the “ML-Ops” system
- Verify the authenticity and integrity of feedback data received by the system is genuine and reject any data that does not match the expected format.
- Clean and validate the feedback data before using it to update the training data, to minimize the risk of incorrect or malicious data being used.
- Use techniques such as statistical and machine learning-based methods to detect and alert anomalies in the feedback data, which could indicate an attack.
- Regularly monitor the model’s performance and compare its predictions with actual outcomes to detect any deviation or skewing.
- Regularly retrain the model using updated and verified training data, to ensure that it continues to reflect the latest information and trends.
ML09:2023 Output Integrity Attack
In this attack scenario, an attacker aims to modify or manipulate the output of a machine learning model in order to change its behavior or cause harm to the system it is used in.
For example, this attack can be used to modify patient’s health records. An attacker has gained access to the output of a machine learning model that is being used to diagnose diseases in a hospital. The attacker modifies the output of the model, making it provide incorrect diagnoses for patients. As a result, patients are given incorrect treatments, leading to further harm and potentially even death.
Mitigations can be done in the following ways:
- Cryptographic Techniques: Using cryptographic methods to verify the authenticity of the results.
- Secure communication channels: The communication between the model and the user interface shall use secure protocols such as SSL/TLS.
- Input Validation: Input validation should be performed on the results to check for unexpected or manipulated values.
- Tamper-evident logs: Maintaining tamper-evident logs of all input and output interactions can help detect and respond to any output integrity attacks.
- Regular software updates: Regular software updates to fix vulnerabilities and security patches can help reduce the risk of output integrity attacks.
- Monitoring and auditing: Regular monitoring and auditing of the results and the interactions between the model and the interface can help detect any suspicious activities and respond accordingly.
ML10:2023 Model Poisoning
Model poisoning attacks occur when an attacker manipulates the model’s parameters to cause it to behave in an undesirable way.
Consider a scenario where a bank is using a ML model to identify handwritten characters on cheques to automate their clearing process. The model has been trained on a large dataset of handwritten characters, and it has been designed to accurately identify the characters based on specific parameters such as size, shape, slant, and spacing. An attacker either manipulates the parameters of the model or alters the images in the training dataset in the model. This can result in the model being reprogrammed to identify characters differently. The attacker can exploit this vulnerability by introducing forged cheques into the clearing process, which the model will process as valid due to the manipulated parameters. This can result in significant financial loss to the bank.
Mitigations can be provided in the following ways:
- Regularization: Adding regularization techniques like L1 or L2 regularization to the loss function helps to prevent overfitting and reduce the chance of model poisoning attacks.
- Robust Model Design: Designing models with robust architecture and activation functions can help reduce the chances of successful model poisoning attacks.
- Cryptographic Techniques: Cryptographic techniques can be used to secure the parameters and weights of the model and prevent unauthorized access or manipulation of these parameters.
SecPod Strategy to handle Top 10 ML Securities
Risks associated with AI/ML can be broadly classified into general system level threats and specific ones related to ML as shown in Figure 1. The former class includes threats from lack of adequate access control, inadequate authentication, integrity of code and its components, denial of service, web application threats. The latter include threats due to model, data and training related vulnerabilities. Each of these threats may lead to one or more OWASP top 10 ML vulnerabilities.

The mitigation strategies are proposed as per OWASP guidelines to cover one or more vulnerabilities. SecPod proposes to have scanning mechanism followed by monitoring to provide mitigation mechanisms against top 10 OWASP vulnerabilities. The following Table 1 shows 12 important mitigation mechanisms to be adopted and how they will together cover the top 10 ML weaknesses.
TABLE 1 Preventive Mechanisms and coverage of Top 10 ML OWASP Vulnerabilities | ||||||||||
Preventive Mechanism/Scanning Rules | ML01 Input Manipulation | ML02 Data poisoning | ML03 Model Inversion | ML04 Member Inference | ML05 Model Theft | ML06 Supply Chain | ML07 Transfer Learning | ML08 Model Skewing | ML09 Output Integrity | ML10 Model Poisoning |
Input Sanitization | YES | YES | YES | |||||||
Dataset Sanitization | YES | YES | ||||||||
Robust Training | YES | YES | YES | YES | ||||||
Continuous Re-training | YES | YES | ||||||||
Model isolation | YES | |||||||||
Model Obfuscation | YES | YES | ||||||||
Robust Models | YES | |||||||||
Cryptographic Techniques | YES | YES | YES | YES | YES | |||||
Monitoring & Auditing | YES | YES | YES | YES | YES | |||||
Anomaly Detection | YES | YES | ||||||||
Regularization | YES | YES | ||||||||
Robust Access Control | YES | YES | YES | YES |
The foremost preventive rule is to have adequate protection for validating input data. Need to check the format, range and consistency patterns. Also, there is a requirement to detect anomalies from the general patterns and unexpected values. This will help mitigate input manipulation, model inversion and output integrity attacks.
Input validation is another important defense mechanism that can be used to detect and prevent input manipulation attacks. This involves checking the input data for anomalies, such as unexpected values or patterns, and rejecting inputs that are likely to be malicious. Along similar lines, it is essential to provide for sanitization of feedback data along similar lines. The next preventive mechanism is to have data validation checks and mechanisms for accurate data labeling. Training data sets are to be secured both in storage and transit and need regular monitor and updating of training sets to prevent attacker’s intent to poison the datasets. This will help mitigate data poisoning and model skew threats.
The next six preventive methods are related to providing protection to vulnerabilities that arise from training. Ensure best practices of data training are followed like separation of training dataset, validation dataset and production dataset. Further train models on randomized data and have regular re-training schedule. This minimizes compromising training data and helps detect data poisoning early in the attack scenario. Training multiple models and using ensemble techniques where possible further reduces the impact of data poisoning. Another useful technique is to use adversarial training to train the models to defuse input manipulation attacks and help in developing more robust ML models. In addition, use of regularization has to be checked to prevent overfitting of data that can lead to unexpected consequences.
The next important preventive measure is to employ cryptographic techniques to encrypt the model’s code, training datasets and other sensitive information such as parameters and weight of the model to prevent data stealing from the attackers. The confidentiality should be provided both for storage and transit scenarios. Further, these techniques can be used to set up secure package repositories that can do vetting of the packages and detect tampered ones. To add to these, the use of virtual environment is needed to isolate packages and libraries from the rest of the system and helps in detecting malicious packages and removing it before it can cause the damage.
Another advantage of having cryptographic technique is to facilitate preventing method strict access control to model or its predictions as per the least privilege policy. This prevents unauthorized data access and model/data stealing. Regular monitoring and auditing is yet another preventive mechanism recommended to prevent model inversion attacks. This can prevent theft by detecting when an attacker is attempting to steal the model and taking appropriate corrective action.
The preventive measure of monitoring the model’s predictions for anomalies can help to detect and prevent model inversion attacks. This can be done by tracking the distribution of inputs and outputs, comparing the model’s predictions to ground truth data, or monitoring the model’s performance over time.