TheJakartaPost

Please Update your browser

Your browser is out of date, and may not be compatible with our website. A list of the most popular web browsers can be found below.
Just click on the icons to get to the download page.

Jakarta Post

Why DeepSeek's AI leap puts China in front for now

Commentators are right to say DeepSeek's new AI chatbot is a game changer but do not take all the hype about China now dominating the field too seriously.

Jonathan K. Kummerfeld (The Jakarta Post)
360info/Sydney, Australia
Wed, February 5, 2025

Change text size

Gift Premium Articles
to Anyone

Share the best of The Jakarta Post with friends, family, or colleagues. As a subscriber, you can gift 3 to 5 articles each month that anyone can read—no subscription needed!
Why DeepSeek's AI leap puts China in front for now This photo illustration shows the DeepSeek app on a mobile phone in Hong Kong on Jan. 28, 2025. (AFP/Mladen Antonov)

T

he knee-jerk reaction to the release of Chinese company DeepSeek's artificial intelligence chatbot mistakenly assumes it gives China an enduring lead in AI development and misses key ways it could drive demand for AI hardware.

The DeepSeek model was unveiled at the end of January, offering an AI chatbot competitive with the United States's OpenAI's leading model, o1, which drives ChatGPT today.

DeepSeek's model offered major advances in the way it uses hardware, including using far fewer and less powerful chips than other models, and in its learning efficiency, making it much cheaper to create.

The announcement dominated the international media cycle and commentators frequently suggested that the arrival of DeepSeek would dramatically cut demand for AI chips.

The Deep Seek announcement also triggered a plunge in US tech stocks that wiped nearly US$600 billion off the value of leading chipmaker Nvidia.

This dramatic reaction misses four ways DeepSeek's innovation could actually expand demand for AI hardware:

Viewpoint

Every Thursday

Whether you're looking to broaden your horizons or stay informed on the latest developments, "Viewpoint" is the perfect source for anyone seeking to engage with the issues that matter most.

By registering, you agree with The Jakarta Post's

Thank You

for signing up our newsletter!

Please check your email for your newsletter subscription.

View More Newsletter

By cutting the resources needed to train a model, more companies will be able to train models for their own needs and avoid paying a premium for access to the big tech models.

The big tech companies could combine the more efficient training with larger resources to further improve performance.

Researchers will be able to expand the number of experiments they do without needing more resources.

OpenAI and other leading model providers could expand their range of models, switching from one generic model – essentially a jack-of-all-trades like we have now – to a variety of more specialized models, for example one optimized for scientists versus another made for writers.

Researchers around the world have been exploring ways to improve the performance of AI models.

Innovations in the core ideas are widely published, allowing researchers to build on each other's work.

DeepSeek has brought together and extended a range of ideas, with the key advances in hardware and the way learning works.

DeepSeek uses the hardware more efficiently. When training these large models, so many computers are involved that communication between them can become a bottleneck. Computers sit idle, wasting time while waiting for communication. DeepSeek developed new ways to do calculations and communication at the same time, avoiding downtime.

It has also brought innovation to how learning works. All large language models today have three phases of learning.

First, the language model learns from vast amounts of text, attempting to predict the next word and getting updated if it makes a mistake. It then learns from a much smaller set of specific examples that enables the large language model to be able to communicate with users conversationally. Finally, the language model learns by generating output, being judged and adjusting in response.

In the last phase, there is no single correct answer in each step of learning. Instead, the model is learning that one output is better or worse than another.

DeepSeek's method compares a large set of outputs in the last phase of learning, which is effective enough to allow the second and third stages to be much shorter and achieve the same results.

Combined, these improvements dramatically improve efficiency.

One option is to train and run any existing AI model using DeepSeek's efficiency gains to reduce the costs and environmental impacts of the model while still being able to achieve the same results.

We could also use DeepSeek innovations to train better models. That could mean scaling these techniques up to more hardware and longer training, or it could mean making a variety of models, each suited for a specific task or user type.

There is still a lot we do not know.

DeepSeek's work is more open source than OpenAI because it has released its models, yet it is not truly open source like the non-profit Allen Institute for AI’s OLMo models that are used in their Playground chatbot.

Critically, we know very little about the data used in training. Microsoft and OpenAI are investigating claims some of their data may have been used to make DeepSeek's model. We also do not know who has access to the data that users provide to their website and app.

There are also elements of censorship in the DeepSeek model. For example, it will refuse to discuss free speech in China. The good news is that DeepSeek has published descriptions of its methods so researchers and developers can use the ideas to create new models, with no risk of DeepSeek's biases transferring.

The DeepSeek development is another significant step along AI's overall trajectory, but it is not a fundamental step-change like the switch to machine learning in the 1990s or the rise of neural networks in the 2010s.

It is unlikely that this will lead to an enduring lead for DeepSeek in AI development.

DeepSeek's success shows that AI innovation can happen anywhere with a team that is technically sharp and well-funded. Researchers around the world will continue to compete, with the lead moving back and forth between companies.

For consumers, DeepSeek could also be a step toward greater control of your own data and more personalized models.

Recently, Nvidia announced DIGITS, a desktop computer with enough computing power to run large language models.

If the computing power on your desk grows and the scale of models shrinks, users might be able to run a high-performing large language model themselves, eliminating the need for data to even leave the home or office.

And that is likely to lead to more use of AI, not less.

---

The writer is a senior lecturer in the School of Computer Science at The University of Sydney. The article is republished under a Creative Commons license. The views expressed are personal.

Your Opinion Matters

Share your experiences, suggestions, and any issues you've encountered on The Jakarta Post. We're here to listen.

Enter at least 30 characters
0 / 30

Thank You

Thank you for sharing your thoughts. We appreciate your feedback.

Share options

Quickly share this news with your network—keep everyone informed with just a single click!

Change text size options

Customize your reading experience by adjusting the text size to small, medium, or large—find what’s most comfortable for you.

Gift Premium Articles
to Anyone

Share the best of The Jakarta Post with friends, family, or colleagues. As a subscriber, you can gift 3 to 5 articles each month that anyone can read—no subscription needed!

Continue in the app

Get the best experience—faster access, exclusive features, and a seamless way to stay updated.