DeepSeek AI Faces Security and Privacy Backlash Amid OpenAI Data Theft Allegations

Image Credit: Solen Feyissa | Splash

Chinese artificial intelligence company DeepSeek has recently shaken up the AI industry by launching competitive AI models developed at significantly lower costs than OpenAI’s flagship offerings. However, allegations have emerged that DeepSeek may have used OpenAI’s proprietary data to build its models, prompting an investigation by OpenAI and Microsoft. At the same time, DeepSeek has faced security concerns, with reports of cyberattacks leading to the suspension of new user registrations. These developments highlight critical challenges in AI development, data security, and intellectual property protection.

[Read More: DeepSeek’s R1 Model Redefines AI Efficiency, Challenging OpenAI GPT-4o Amid US Export Controls]

OpenAI’s Allegations Against DeepSeek

According to Bloomberg, OpenAI and Microsoft are investigating whether DeepSeek improperly accessed OpenAI’s AI models using OpenAI’s API. The investigation focuses on whether DeepSeek obtained OpenAI’s model outputs without authorization. OpenAI suspects that DeepSeek utilized a technique called distillation, where smaller AI models learn from the outputs of larger, more advanced models. While distillation is a common AI technique, OpenAI asserts that using it to replicate and build competing models violates its terms of service.

OpenAI has stated that it has evidence linking DeepSeek to this practice but has not disclosed specific details. The company is working closely with the U.S. government to prevent unauthorized access to its models by foreign competitors.

[Read More: DeepSeek’s 10x AI Efficiency: What’s the Real Story?]

Intellectual Property Rights and Data Usage in AI Development

The debate over intellectual property (IP) rights in AI development is complex. OpenAI has faced criticism for training its models on vast amounts of publicly available internet data without explicit consent from content creators. This practice has led to multiple legal challenges. In December 2023, The New York Times filed a lawsuit against OpenAI and Microsoft, accusing them of using millions of its articles without permission to train chatbots. In January 2025, Indian news firms—including those owned by billionaires Gautam Adani and Mukesh Ambani—filed lawsuits against OpenAI, alleging the unauthorized use of their content. Additionally, global publishers, represented by the Federation of Indian Publishers—including major publishers like Bloomsbury and Penguin Random House—sued OpenAI in New Delhi for similar copyright violations.

While OpenAI has faced criticism over its data usage, it argues that DeepSeek’s distillation practice is a separate concern, as it involves learning directly from OpenAI’s model outputs. OpenAI contends that this practice, if used to build competing models, breaches its terms of service.

[Read More: Harmony or Theft? Major Labels Sue AI Music Startups Over Copyright Concerns]

Understanding Knowledge Distillation in AI Development

Knowledge distillation is a machine learning technique where a smaller, more efficient model (often referred to as the "student") is trained to replicate the behaviour of a larger, more complex model (the "teacher"). This process enables the student model to achieve performance comparable to the teacher model while requiring less computational resources, making it suitable for deployment in environments with limited capacity, such as mobile devices.

The distillation process involves training the student model using the outputs of the teacher model. Instead of solely relying on the original training data and its labels, the student model learns from the "soft targets" provided by the teacher. These soft targets are the probability distributions over the possible classes produced by the teacher model's softmax layer. By learning to match these distributions, the student model captures the nuanced knowledge embedded in the teacher model, including the relationships between different classes.

[Read More: OpenAI Unveils o3: Pioneering Reasoning Models Edge Closer to AGI]

Benefits of Knowledge Distillation

  1. Model Compression: Distillation reduces the size of the model, making it more efficient for real-world applications without significant loss in accuracy.

  2. Deployment Efficiency: Smaller models resulting from distillation are less resource-intensive, allowing for deployment on devices with limited computational power.

  3. Retention of Performance: Despite the reduction in size, distilled models often retain a high level of performance, closely approximating that of their larger counterparts.

[Read More: Is AI Democratizing the World or Widening the Digital Divide?]

Security Concerns: DeepSeek Suspends New Account Registrations

Beyond the OpenAI dispute, DeepSeek has faced mounting security concerns. On January 27, 2025, the company temporarily suspended new user registrations, citing “large-scale malicious attacks” on its platform. While DeepSeek has not provided specifics, cybersecurity experts believe the attacks could involve Distributed Denial of Service (DDoS) attacks or unauthorized access attempts.

Moreover, DeepSeek’s data collection policies have raised red flags. The company stores user data—including chat history and uploaded files—on servers located in China. This has sparked concerns over potential government access to private data, echoing previous security debates around Chinese-owned tech companies like TikTok.

The U.S. government has expressed concerns regarding data practices of Chinese AI firms. On January 28, 2025, the White House announced that it is evaluating the national security implications of the Chinese AI app DeepSeek. Officials are particularly worried about the potential risks associated with U.S. user data being accessed by foreign entities.

On January 28, 2025, Australian Industry and Science Minister Ed Husic raised privacy concerns regarding the Chinese AI chatbot DeepSeek. He urged users to think carefully before downloading the app, stating, "I would be very careful about that". He also mentioned that there are "a lot of questions that will need to be answered in time on quality, consumer preferences, data and privacy management".

[Read More: Does AI Speech Recognition Handle Data with Care?]

Data Handling and Privacy Controls

There are notable differences between OpenAI and DeepSeek regarding user data handling and privacy controls, which contribute to concerns about data privacy.

OpenAI:

  • Data Usage and Opt-Out Options: OpenAI allows users to manage how their data is utilized. For instance, users can opt out of having their data used to improve models by adjusting settings within their accounts. Specifically, in the ChatGPT interface, users can navigate to Settings > Data Controls and disable the "Improve the model for everyone" option. This ensures that new conversations are not used for training purposes.

  • Data Storage: OpenAI primarily stores user data on servers located in the United States and utilizes global infrastructure, including Microsoft Azure, for processing. While OpenAI implements measures to comply with international data protection standards, including the General Data Protection Regulation (GDPR), data may still be transferred outside of Europe using Standard Contractual Clauses (SCCs) to meet regulatory requirements. OpenAI provides users in the European Economic Area (EEA), Switzerland, and the UK with specific privacy policies and Data Processing Agreements (DPAs) to ensure compliance. However, despite these safeguards, OpenAI has faced regulatory scrutiny, including a €15 million fine imposed by Italy’s privacy watchdog in December 2024 over alleged data protection violations related to ChatGPT.

DeepSeek:

  • Lack of Opt-Out Mechanism: Currently, DeepSeek does not provide users with an option to opt out of data collection or usage for model training purposes. This absence of user control over personal data contributes to heightened privacy concerns, especially given the data storage location and potential access by governmental entities.

  • Data Collection and Storage: DeepSeek's privacy policy indicates that it collects extensive user data, including text or audio inputs, uploaded files, feedback, and chat history. This data is stored on servers located in the People's Republic of China. This has raised concerns due to China's data privacy laws. Notably, the Personal Information Protection Law (PIPL) and the Data Security Law (DSL) grant authorities the right to access personal data for national security and law enforcement purposes. As a result, there are concerns that data stored by companies like DeepSeek could be accessed by Chinese government entities.

[Read More: Navigating Privacy: The Battle Over AI Training and User Data in the EU]

Choosing the Right Chatbot: A Privacy Perspective

When selecting an AI chatbot, it's crucial to consider data privacy and storage practices, which can vary significantly based on the chatbot's country of origin.

  • For Users in China, Hong Kong and Macau: If you're residing in these regions, opting for a domestically developed chatbot like DeepSeek may be appropriate, especially since OpenAI has restricted access to its services in China. DeepSeek is designed to comply with local regulations and cultural norms, providing a user experience tailored to these regions. Alternatively, users in China, including Hong Kong and Macau, can still access OpenAI’s models through Poe.com, which does not require a VPN and appears to bypass OpenAI’s restrictions. However, availability and performance may be subject to local internet regulations.

  • For Users in Other Countries: For those living outside these regions, particularly in countries with stronger data protection laws, chatbots developed in North America or Europe may offer better privacy safeguards. OpenAI’s ChatGPT, for example, operates under frameworks like the General Data Protection Regulation (GDPR) in the EU, ensuring stricter compliance with data privacy standards. Additionally, alternative AI chatbots from companies such as Anthropic (Claude) or Google (Gemini) provide options for those concerned about data security. Choosing a chatbot that aligns with your privacy expectations and regional regulations can help ensure a more secure user experience.

[Read More: AI Data Collection: Privacy Risks of Web Scraping, Biometrics, and IoT]

License This Article

Source: Bloomberg, The Verge, Financial Times, Reuters, Technology Review, Wikipedia, OpenAI, Wired, SBS News

TheDayAfterAI News

We are your source for AI news and insights. Join us as we explore the future of AI and its impact on humanity, offering thoughtful analysis and fostering community dialogue.

https://thedayafterai.com
Next
Next

DeepSeek’s 10x AI Efficiency: What’s the Real Story?