Can't find what you're looking for?
View all search resultsCan't find what you're looking for?
View all search resultsGenAI tools are reshaping the information environment in ways most audiences never see. From the data that trains them to the labor that maintains them, their inner workings raise urgent questions for journalism and democratic accountability.
ur world is in the midst of a disruption triggered by the development of artificial intelligence (AI). Companies selling AI tools have become the most valuable corporations in modern times, worth trillions of dollars, more than the GDPs of most countries. They are becoming a pervasive influence on social, commercial and political life, and shaking up industries.
The media industry is among those facing new kinds of challenges due to the rise of AI. The practice and delivery of journalism, which is a vital component for functioning and healthy democracies, is changing in ways that are not obvious to its consumers.
To understand the impact of AI on our information environment and its political consequences requires a basic understanding of what GenAI is and how it works. We need to “lift the hood” on what will increasingly power the information we receive and consume.
The development of GenAI begins with collecting vast amounts of data, including text, images, videos and sounds, by crawling and scraping the internet. Everything from journalism, academic outputs, the public web and text chats is gathered as data. This is bolstered by compilations of literature accessed, not always legally, through commercial licensing arrangements with media repositories.
The legitimacy of these forms of data collection is still unclear and has led to high-profile copyright and privacy litigation around the world. It has also triggered policy and regulatory debates about the legal conditions for accessing data, and loud complaints from creatives whose labor has become the basis of the vast revenues of the new multinational AI tech firms.
For these AI technologies, access to data itself is not enough. The data has to be converted into training datasets that involve a range of different kinds of computational processes and human labor. To make data meaningful in AI training, data workers have to label, clean, tag, annotate and process images and text, creating semantic links that enable GenAI models to produce meaningful responses to user “prompts”. Much of this data work is outsourced to lower-cost countries such as Kenya, India and China where workers are paid low wages and face poor labor standards. Those datasets are then used to train AI models through the process of machine learning.
Machines do not learn like humans do. What we call “machine learning” is essentially a process of statistical pattern recognition. While there are many differing approaches to model training, in most cases it involves successive adjustments to vast numbers of internal values. This process is iterative, meaning the training repeats until the predictions are sufficiently close to the expected results.
Share your experiences, suggestions, and any issues you've encountered on The Jakarta Post. We're here to listen.
Thank you for sharing your thoughts. We appreciate your feedback.
Quickly share this news with your network—keep everyone informed with just a single click!
Share the best of The Jakarta Post with friends, family, or colleagues. As a subscriber, you can gift 3 to 5 articles each month that anyone can read—no subscription needed!
Get the best experience—faster access, exclusive features, and a seamless way to stay updated.