Our world is in the midst of disruption caused by advances in artificial intelligence (AI). Companies selling AI tools have become the most valuable companies of modern times, with a value reaching trillions of dollars and exceeding the GDP of most countries. They have come to have far-reaching effects on social, commercial and political life, and are shaking up industries.
The media industry is one industry facing new types of challenges due to the rise of AI. The practice and delivery of journalism, an essential component of a healthy functioning democracy, is changing in ways that are invisible to consumers.
Understanding the impact of AI on our information environment and its political implications requires a basic understanding of what GenAI is and how it works. We need to “raise the hood” on what will increasingly enhance the information we receive and consume.
Data: The engine that powers generative AI
GenAI’s development begins by collecting vast amounts of data, including text, images, video, and audio, by crawling and scraping the Internet. Data is collected on everything from journalism and academic output to the public web and text chats. This is reinforced by the compilation of literature that is accessed, but not necessarily legally, through commercial license agreements with media repositories.
The legality of these forms of data collection remains unclear, leading to high-profile copyright and privacy lawsuits around the world. It has also sparked policy and regulatory debate over the legal conditions for accessing data, as well as loud complaints from creators whose workforce is the basis of the vast revenues of new multinational AI technology companies.
These AI technologies require more than just access to the data itself. The data must be transformed into a training dataset that involves various types of computational processes and human effort. To make data meaningful for AI training, data workers must label, clean, tag, annotate, and process images and text to create semantic links that allow GenAI models to generate meaningful responses to user “prompts.”
Much of this data work is outsourced to low-cost countries such as Kenya, India, and China, where workers are paid low wages and face poor labor standards. These datasets are used to train AI models through the process of machine learning.
Unveiling how generative AI works
Machines don’t learn like humans. What we call “machine learning” is essentially a process of statistical pattern recognition. There are various approaches to training models, but most involve continuously adjusting a large number of internal values. This process is iterative. That is, training is repeated until the prediction is close enough to the expected result.
Once trained, a model like the one that powers ChatGPT can produce what are known as a series of tokens (word fragments) that are statistically similar to similar articles seen during the model’s training, when prompted, for example, to write a short news article about the rate of inflation.
Importantly, systems such as ChatGPT do not understand the world they depict or explain. they do not have semantic knowledge, i.e. they cannot understand Facts and concepts such as what “inflation” means and what “street protests” look like. Instead, the machine is a pattern modeling engine that predicts what content would plausibly complete or respond to a given prompt. In summary, AI output is simply a function of scale and training data, not understanding.
Related articles
Below is a list of articles selected by our editorial board that have received significant interest from the public.
What does generative AI mean for journalism?
The predictive power that makes generative AI powerful is also what makes it unreliable. Prediction is not validation. These systems fill the gap with the right sound or look, but they aren’t necessarily the right thing.
Generative AI models can write sentences fluently, summarize long reports, and paraphrase complex sentences in seconds. You can generate images of your event that look photorealistic. But those outputs are the product of machine predictive tools, not validation. When AI is trained on biased or incomplete data, it has been known to “hallucinate” content that looks and sounds right, but is inaccurate or unreliable.
This distinction is crucial for journalism, which relies on truth and verification over plausibility.
The risk for journalists and viewers is the inability to verify AI-generated content. More than ever, AI-generated content is pushed into the information ecosystem without clear labels or context, contributing to a media environment where it becomes increasingly difficult to distinguish between reporting and simulation, fact and fabrication.
The future of journalism depends on institutions adapting to and meaningfully managing the use of AI. This means putting more effort into developing new editorial standards and verification practices, as well as ensuring that the data, workforce, and energy that maintain these systems are visible and accountable.
The question is not whether AI will reshape journalism. That’s already the case. The question is whether democratic societies can prevent AI from undermining trust in public institutions.
For those concerned about where our information and journalism comes from (its provenance), the human ability to check and verify information is no match for the lightning speed of chatbots spewing out dangerous text, data, and images. If we humans cannot develop protocols and methods to regain control, oversight, and checks before sharing the output of machines, we will face further erosion of the foundations of society, the agreed-upon facts that enable rational thought and consequent action.
** **
This article was first published by 360 information™.
Editor’s note: The opinions expressed here by the authors are their own and not those of Impakter.com. — Cover photo credit: fardad sepandar
