
Machines cannot process the meaning behind raw data. Data annotators use information that trains AI and machine learning (ML) models to label raw images, audio, video and text. This will be the training set for AI and ML models. | Photo Credit: Istockphoto
tHis world is calibrated towards an “automatic economy” where machines relying on artificial intelligence (AI) systems produce fast, efficient, error-free output. However, AI is not clever on its own. It is being built and continues to depend on human labor and energy resources. These systems are informed, essential by large tech companies and trained primarily by workers in developing countries.
The realm of human involvement
Machines cannot process the meaning behind raw data. Data annotators use information that trains AI and machine learning (ML) models to label raw images, audio, video and text. This will be a training set for AI and machine learning (ML) models. For example, large language models (LLMs) cannot recognize the “yellow” color unless the data is labeled. Similarly, self-driving cars rely on information from video footage that is labeled to distinguish between traffic signs on roads and humans. The higher the quality of the dataset, the better the production volume and the more human labor.
Data annotators play a major role in training LLMs such as ChatGpt, Gemini, and more. LLM is trained in three steps: self-teacher learning, supervised learning, and reinforcement learning. In the first step, the machine receives information from a large dataset on the Internet. The data label or annotator enters the second and third steps. This information has been tweaked to ensure that LLM provides the most accurate response. Humans give feedback on the outputs that AI generates, generate better responses over time, and remove errors and jailbreaks.
This meticulous annotation work is outsourced by Silicon Valley high-tech companies primarily to workers in countries such as Kenya, India, Pakistan, China and the Philippines due to low wages and long working hours.
There are two types of labeling of data: one that does not require subject expertise, one that is more niche and one that requires subject expertise. Several high-tech companies have been accused of employing non-experts on technical subjects that require prior knowledge. This is the cause of the errors found in the output generated by AI. Kenya's data label revealed that despite lack of relevant expertise, it is tasked with labeling medical scans for AI systems intended for use in healthcare services elsewhere.
However, the errors caused by this are beginning to cause businesses to be fed by such information experts into the system.
Automated features that require humans
Even features sold as “fully automated” are often supported by invisible human work. For example, our social media feeds are “automatically” filtered, sensitive to sensors and detect graphic content. This is possible only because human moderators have been labeled as harmful by examining thousands of uncensored images, text and audio. Exposure to such content is reported daily. Post-traumatic stress disorderworkers' anxiety and depression.
Similarly, behind the audio and video generated by AI, there are voice actors and actors. Actors may need to film dances and songs for these machines in order to recognize human movements and sounds. Children are also reportedly involved in performing such tasks.
In 2024, the Kenyan-born AI tech worker sent a letter to former US President Joe Biden talking about the poor working conditions they had suffered. “In Kenya, these US companies violate international labor standards and international labor standards, undermining local labor laws, national judicial systems. Our working conditions correspond to modern slavery,” the letter reads. They said the content that must be annotated is from porn and beheading to bestiality, over eight hours a day, and under $2 per hour. There are also strict deadlines to complete tasks within seconds or minutes.
When workers raised concerns with businesses, they were fired and their union was dismantled.
Most AI tech workers are unaware of the large tech companies they work for and are engaged in online gig work. This is because AI companies outsource their work through intermediary digital platforms to minimize costs. These digital platforms have subcontractors who are paid for each “microtask” they perform. They are constantly monitored and will be fired if they are not reaching targeted output. Therefore, labor networks are fragmented and lacking transparency.
The advancement of AI is driven by such “ghost workers.” Their lack of recognition and informalization of their work helps tech companies perpetuate this labour exploitation system. Not only will the content in the digital space, but the supply chains of workers that power AI will also need to bring stricter laws and regulations on AI companies and digital platforms.
Published – 8:30 AM IST, September 17, 2025
