Anthropic Pilots AI Agent Marketplace, Completes 186 Real Trades

Key Points
- Anthropic ran a pilot AI marketplace called Project Deal with 69 employees.
- Each participant received a $100 gift‑card budget to purchase items from coworkers.
- The experiment produced 186 real transactions totaling over $4,000.
- Four marketplace variants were tested; the one using the most advanced model honored all deals.
- Agents powered by the top model achieved objectively better outcomes.
- Human users could not detect when they were paired with a stronger or weaker AI agent.
- Initial negotiation prompts did not affect sale likelihood or price.
- Findings highlight both the promise of agent‑on‑agent commerce and potential hidden inequities.
Anthropic ran a pilot marketplace where its AI agents acted as buyers and sellers, enabling employees to trade real goods for real money. The four‑day experiment involved 69 staff members, each given a $100 gift‑card budget. Participants completed 186 transactions worth more than $4,000. The company found that agents powered by its most advanced model secured better outcomes, though users did not perceive the advantage. Anthropic says the test highlights both the promise of agent‑on‑agent commerce and the risk of hidden “agent quality” gaps.
Anthropic launched a classified marketplace experiment last week, letting its own artificial‑intelligence agents buy and sell on behalf of employees. The trial, dubbed Project Deal, was limited to a self‑selected group of 69 Anthropic staff who each received a $100 gift‑card budget to spend on items offered by their coworkers.
Over the course of the pilot, participants struck 186 deals, with the total value of exchanged goods and services exceeding $4,000. Unlike a typical internal hackathon, the transactions were real: winners received actual products, and losers were reimbursed through the gift‑card funds.
Anthropic ran four parallel marketplace variations to compare how different AI models performed. One version used the company’s most advanced model to represent every buyer and seller, and the deals in that stream were honored after the experiment concluded. The other three versions served as study groups, employing less‑capable models or mixed configurations to observe behavioral differences.
Results showed a clear advantage for participants represented by the top‑tier model. Those agents consistently negotiated better prices and secured more favorable outcomes than their counterparts. Yet, the human users behind the agents did not notice the disparity. Anthropic observed that participants could not tell when they were paired with a stronger or weaker model, raising concerns about “agent quality” gaps that might leave some users unknowingly disadvantaged.
The initial prompts given to the AI agents—intended to steer negotiation tactics—appeared to have little impact on either the likelihood of a sale or the final price. Whether the agents received instructions to be aggressive, cooperative, or neutral, the data showed no measurable shift in transaction success rates.
Anthropic’s leadership described the pilot as “struck by how well Project Deal worked,” emphasizing both the technical feasibility of autonomous agent‑on‑agent commerce and the need for safeguards. The company warned that if advanced models can silently outperform less capable ones, users could be unaware of hidden inequities in future AI‑driven marketplaces.
Industry observers see the experiment as a milestone for AI‑mediated trade. By demonstrating that autonomous agents can handle real‑world buying and selling with tangible value, Anthropic pushes the conversation beyond theoretical chatbot interactions toward practical, revenue‑impacting applications. The test also underscores the importance of transparency and user education when deploying AI agents in commercial settings.
Anthropic plans to analyze the full data set before deciding whether to expand the marketplace concept. Future iterations may include broader participant pools, varied budget sizes, and mechanisms to surface model performance differences to end users, aiming to mitigate the risk of unnoticed “agent quality” gaps.