#Gate广场四月发帖挑战



AI data company Mercor confirms a major data breach involving clients such as OpenAI, Anthropic, and others

According to reports, startup Mercor, which provides training data for AI companies like OpenAI, Anthropic, and Meta, has confirmed a significant security vulnerability. The incident originated from a supply chain attack targeting the open-source library LiteLLM, which is widely used by developers to connect AI services, with millions of downloads daily. The attack was carried out by the hacker group TeamPCP, which embedded malicious code into LiteLLM to steal credentials. Another hacker group, Lapsus$, later claimed to have obtained up to 4TB of data from Mercor, including source code, database records, internal Slack communications, and platform chat videos. Unverified reports suggest that some of Mercor’s clients’ datasets and confidential AI project information may have been leaked. Mercor stated that it has taken swift measures to contain the situation and has initiated a third-party forensic investigation.

This incident is a typical chain reaction caused by “supply chain poisoning,” exposing the vulnerability of AI infrastructure and potentially putting the training data of giants like OpenAI at risk of leakage.

1. Attack Chain Review: From “Tools” to “Data”

Source of poisoning: Hacker group TeamPCP compromised the open-source library LiteLLM (AI API gateway), releasing backdoored versions (1.82.7/1.82.8) on PyPI.

Credential theft: After Mercor and other companies updated dependencies, malicious code silently stole their cloud credentials and API keys.

Data plunder: Another hacker group, Lapsus$, used these keys to infiltrate Mercor’s internal network, stealing approximately 4TB of core data and listing it for sale.

2. “Gold Content” of Leaked Data

According to Lapsus$ and verified by media, the stolen data is highly destructive:

Client secrets: Training datasets involving giants like OpenAI, Anthropic, Meta; unreleased model details; internal project codenames.

Personal privacy: Scans of passports, interview videos, and identity documents of numerous platform contractors (annotators).

Corporate secrets: Nearly 1TB of Mercor’s source code, internal Slack chat records, and ticketing system data.

3. Potential Impact and Risks

Model security: If training data (especially RLHF human feedback data) is leaked, attackers could reverse-engineer model weaknesses and launch more precise adversarial attacks.

Compliance crisis: Large-scale PII (Personally Identifiable Information) leaks of contractors could trigger hefty fines under GDPR and other regulations.

Trust collapse: As a “data blood bank” in the AI industry chain, Mercor’s security breach will force downstream vendors to reassess the risks of third-party data supply chains.

4. Current Status and Defense Measures

Emergency response: Malicious versions have been taken down, Mercor has initiated third-party forensic investigations and notified clients.

Self-check recommendations: If you updated LiteLLM in late March, please immediately verify your version and rotate all cloud credentials and API keys. Ordinary users should be alert to potential phishing emails that may follow.

Summary: This is not just a security incident involving Mercor but a high-intensity early warning about the current “heavy functionality, light security” state of the entire AI ecosystem.
View Original
post-image
post-image
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin