LONDON (AP) — Wikipedia has announced new deals with a number of companies. artificial intelligence The company celebrated its 25th anniversary on Thursday.
Online crowdsourcing encyclopedia includes Amazon, Metaplatform, puzzled, microsoft and french Mistral love.
Wikipedia is one of the last bastions of the early internet, but its original vision of a free online space has been clouded by the dominance of Big Tech platforms and the rise of generative AI chatbots trained on content scavenged from the web.
Aggressive data collection methods by AI developers, such as collecting data from Wikipedia’s vast free knowledge repository, are raising questions about who will ultimately pay for the artificial intelligence boom.
The nonprofit Wikimedia Foundation, which runs the site, signed a deal with Google as one of its first customers in 2022 and announced other deals last year with smaller AI companies, including search engine Ecosia.
The new deal will help one of the world’s most popular websites monetize large amounts of traffic from AI companies. The foundation says they pay to access Wikipedia content in “volumes and speeds designed specifically for their needs.” Financial and other details were not disclosed.
AI training has sparked legal battles elsewhere, but Copyright Wikipedia founder Jimmy Wales said he welcomed the issue.
“The data on Wikipedia is hand-picked by humans, so I’m personally very happy that the AI models are training on that data,” Wales said in an interview with The Associated Press. “I don’t want to use an AI that’s just trained on X. So it’s like a very angry AI,” Wales said, referring to billionaire Elon Musk’s social media platform.
Wales said the site wants to work with AI companies, not block them. But “you should probably chip in and pay your fair share of the costs that are being imposed on us.”
The Wikimedia Foundation last year urged AI developers to pay for access through its enterprise platform, saying human traffic had fallen by 8%. Meanwhile, visits from bots, which are sometimes disguised to avoid detection, put a huge strain on servers as they collect large amounts of content to feed into AI’s large-scale language models.
The findings highlight a shift in online trends, with search engine AI overviews and chatbots summarizing information rather than displaying links and directing users to sites.
Wikipedia is the 9th most visited site on the Internet. We have more than 65 million articles in 300 languages, edited by approximately 250,000 volunteers.
The site has become so popular in part because it’s free for everyone to use.
“But our infrastructure isn’t free, right?” Wikimedia Foundation CEO Mariana Iskander He spoke in a separate interview in Johannesburg, South Africa.
Maintaining the servers and other infrastructure that allows individuals and technology companies to “pull data from Wikipedia” is expensive, said Iskander, who will step down on January 20 and be replaced by Bernadette Meehan.
The majority of Wikipedia’s funding comes from its 8 million donors, most of whom are individuals.
“They’re not donating to subsidize the big AI companies,” Wales said. They’re saying, “Actually, you can’t just destroy our website. It’s going to have to come some way.”
Editors and users could benefit from AI in other ways. The Wikimedia Foundation has outlined an AI strategy, and Wales said it could lead to tools that make editors’ tasks less tedious.
While AI isn’t sufficient to create Wikipedia entries from scratch, it can be used, for example, to update dead links by scanning the surrounding text and searching online to find other sources.
“It hasn’t happened yet, but I think it will happen in the future.”
Welsh said artificial intelligence could improve the Wikipedia search experience by evolving from a traditional keyword approach to a chatbot approach.
“Imagine a world where if you ask a question in the Wikipedia search box, Wikipedia will quote you,” he said. You can respond with, “Here’s the answer to your question in this article, and here’s the actual paragraph. It seems very helpful to me, so I think we’ll go in that direction, too.”
Looking back on the early days, Welsh said it was an exciting time, as many people were willing to help build Wikipedia after he and his long-departed co-founder Larry Sanger started it as an experiment.
But while some may now wistfully look back on what seemed like more innocent times, Wales said there was a dark side to the early days of the internet.
“People were pretty toxic back then too. We didn’t need algorithms to be mean to each other,” he says. “But you know, it was a time of great excitement and a spirit of real possibility.”
Wikipedia recently found itself under fire from people on the political right, who dubbed the site “Walkpedia” and accused it of being biased towards the left.
Republicans in the US Congress are investigating allegations of “manipulative practices” in Wikipedia’s editorial process, which they say could bias the platform and the AI systems it relies on and undermine neutral viewpoints.
A notable source of criticism has been Musk, who launched his own AI-powered rival last year. Grokipedia. He criticized Wikipedia for being full of “propaganda” and called on people to stop donating to the site.
Wales said he does not think Grokipedia is a “real threat” to Wikipedia because it is based on a large language model, a trove of online texts on which AI systems are trained.
“Large language models aren’t enough to create really high-quality reference material, so a lot of it is just regurgitation from Wikipedia,” he said. “It’s often rambling and nonsensical. And I think the more you look into obscure subjects, the worse it gets.”
He stressed that he was not criticizing Grokipedia by name.
“That’s exactly how large-scale language models work.”
Welsh said he has known Musk for years but hasn’t spoken to him since he started Grokipedia.
“Maybe I should contact him,” Wales said.
What would he say?
“‘How’s your family?’ I’m a nice person and I don’t want to pick a fight with anyone.”
____
Associated Press writer Mogomotsi Magome in Johannesburg contributed to this report.
