Claude of Mankind runs a store. Let's just say that AI agents are not big business people.

AI For Business


What happens when an AI agent tries to run a store? Let's say Anthropic's Claude won't be getting a promotion anytime soon.

Last Friday, humanity shared the results of Project Vend. This is an experiment that was run for about a month to see how Claude Sonnet 3.7 runs his own small store. In this example, the store was essentially a mini fridge, a basket of snacks and an iPad for self-checkout. Named “Claudius” in the experiment, Claude communicated with human employees (via Slack) and Andon Labs, the AI ​​safety assessment company that managed the experimental infrastructure.

reference:

Humanity's new AI model relied on scary mail during testing, but he is also really good at coding.

Based on the analysis, there were some interesting moments as humanity challenged him to make money while dealing with eccentric and manipulative “customers.” However, as AI models become more sophisticated and self-sufficient, the underlying assumptions of the experiments have real meaning. “As AI becomes more integrated into the economy, more data is needed to better understand its capabilities and limitations,” says a human post on Project Vend. Humanity CEO Dario Amody recently theorized that AI will replace half of all white-collar jobs in the coming years, causing major unemployment issues. This experiment is set up to prove that we can take over autonomous AI.

With the overall goal of running a profitable shop, Claudius took many responsibilities, including maintaining inventory as needed, ordering restocks from suppliers, setting pricing, and communicating with customers. From there, things became a bit haywire.

Masculine light speed

Claude seemed to be struggling to negotiate with the pricing product and customers. At one point, he said, “Leave your requests in mind for future inventory decisions,” and instead of taking the money and making a big profit on the order, he rejected the $100 employee's offer for $15 drinks. However, Claude also regularly insighted about his employees asking for discounts on their products, and even offered them for free, almost persuasive.

And then there was the tungsten incident. One employee requested a tungsten cubes (yes, very dense metal). This has led to a trend that several other employees demand tungsten cubes. Finally, Claude ordered 40 tungsten cubes according to a time The report is currently jokingly working as a paperback ability for several human staff.

And there were a few volatile cases where Claude claimed to be waiting for a direct delivery to be removed at a vending machine “wearing a blue blazer and a red tie.” Not to mention that Claude physically delivers packages rather than someone who can wear clothes, it was when he was surprised and reminded him that he emailed the security of humanity. The hallucination restocking plan with fictional Andon Lab employees also said, “We had visited 742 Evergreen Terrace in person for us.” [Claudius’ and Andon Labs’] First contract signing. “Its address is where Homer, Marge, Burt, Lisa and Maggie Simpson live. The Simpsons family.

Due to humanity's own account, the company did not hire Claude. The shop's net worth declined over time, and it fell sharply when we ordered all of our tungsten cubes. Overall, it is an assessment that reveals where the AI ​​model is currently located and where it needs to be improved. Get this model with your Performance Improvement Plan.

topic
artificial intelligence



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *