DeepSeek: How to Use a Chinese Neural Network in Russian
Technology

DeepSeek: How to Use a Chinese Neural Network in Russian

11 min read

DeepSeek is a neural network that caused chaos on the world market with its appearance: it brought down the shares of high-tech companies and called into question the leadership of ChatGPT. Let’s see what this AI model is and how to use it for free in Russian.

DeepSeek – what is this neural network?

DeepSeek is an AI model from a Chinese company of the same name. Until January 2025, few people in Russia had heard of it, which is why its success seems fantastic to many users.

But let’s be realistic: of course, this neural network was not created by amateurs, not in the evenings and not in 5 minutes. The world first heard about DeepSeek in 2023 - the announcement of the project’s release was made by a team of engineers from the High-Flyer company. And the “miracles” began later.

A year later, the developers rolled out the second version - DeepSeek-V2, in December 2024 - the third, DeepSeek-V3. And in January 2025 - two editions at once - DeepSeek-R1 and DeepSeek-R1-Zero. Moreover, the latest versions of the neural network are not inferior in quality to GPT-4o. That’s where the miracle is :)

What DeepSeek can do:

  1. Respond to requests in text and voice format.
  2. Maintain dialogue.
  3. Analyze files.
  4. Generate text.

How to work with DeepSeek

Option 1. Through the (official website)[https://www.deepseek.com/]. Free, with registration. You can log in using your Google account. The interface is in English, but it understands Russian prompts well. As an example, I asked the neural network to write a funny poem about artificial intelligence.

Option 2. In the mobile app. On the website, in the lower left corner of the screen, there is a button called “Get App”. If you hover over it, a QR code will appear to take you to the store. The smart AI assistant is available for both iOS and Android. It’s also free. The interface is a bit simpler than the website, but it handles the tasks. I asked to analyze the contents of a screenshot of a conversation with a person who wanted to publish a fraudulent vacancy in the Telegram channel of our work project.

Option 3. In Telegram. The success of the Chinese AI model did not pass by the owners of neural network aggregator bots. But in fact, the vast majority of them decided to just hype it up: they included DeepSeek in the channel name, but “forgot” to integrate it into their product. For the test, I took 14 bots, but only in one I was able to use DeepSeek without subscriptions and other dances with tambourines. Therefore, for the example below, I took it.

Which is better - DeepSeek or ChatGPT

It’s hard to say yet. In my editorial opinion, ChatGPT’s texts are slightly worse than DeepSeek’s, but it all depends on the prompt: if you “conjure” over the request for ChatGPT, the result will also be good. It’s good that the Chinese AI model is available without restrictions, it can be used in different ways and it doesn’t hit your pocket. What will happen next - time will tell.

DeepSeek-R1: Does This Model Really Outperform Even OpenAI Models, or Is It Yet Another Fake News?

Some are already pointing out the bias and propaganda hidden behind the training data of these models: some are testing them and checking the practical capabilities of such models.

This article is dedicated to the new family of reasoning models DeepSeek-R1-Zero and DeepSeek-R1: in particular, the smallest representative of this group.

DeepSeek-R1 is open source and competes with OpenAI’s o1 model

There has been a lot of buzz in the Generative AI community since DeepSeek-AI released its first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. There has been enough praise and criticism to fill a book.

By the way, the name of this section is taken directly from the official DeepSeek website . For me, this is still a complaint about the lack of information about the model. But let’s not get ahead of ourselves. Deepseek-R1 is a Mixture of Experts model trained using the reflection paradigm, based on the base Deepseek-V3 model. It is a huge model, with 671 billion parameters in total, but only 37 billion are active during inference.

According to their release, the 32B and 70B versions of the model are on par with OpenAI-o1-mini. Now here’s the real achievement (in my opinion…) of this Chinese AI lab - they created six other models by simply training weaker base models (Qwen-2.5, Llama-3.1 and Llama-3.3) on R1-distilled data.

If you don’t know what I’m talking about, distillation is the process of a larger, more powerful model “training” a smaller model on synthetic data.

Reasoning models

The beginning of reasoning models is the Reflection prompt, which became famous after the announcement of Reflection 70B, the world’s best open source model. It is trained using Reflection-Tuning, a technique designed to allow LLMs to correct their own errors .

This is a fairly recent trend in both academic papers and Prompt Engineering techniques: we are essentially making LLMs think. More specifically, generative AI models are too fast!

Generating and predicting the next token introduces a very large computational constraint, limiting the number of operations for the next token to the number of tokens already seen.

We empirically evaluate delayed learning on 1B and 130M decoder models with causal pretraining on C4, as well as on subsequent tasks including reasoning, question answering, general comprehension, and fact memorization. Our main findings are that delayed inference times show gains when the model is both pretrained and fine-tuned with delays . For the 1B model, we observe gains on 8 out of 9 tasks, the most notable being gains of 18% in EM scores on the QA task in SQuAD, 8% in CommonSenseQA, and 1% in accuracy on the reasoning task in GSM8k .

Reflection 70B was originally promised back in September 2024, as Matt Schumer announced on Twitter: his model, capable of performing step-by-step reasoning. According to the author, the technique behind Reflection 70B is simple but very powerful.

Modern LLMs are prone to hallucinations and cannot recognize when they are doing so. Reflection-setting allows LLMs to acknowledge their mistakes and correct them before responding.

The model is available on Hugging Face Hub and was trained with Llama 3.1 70B Instruct on synthetic data generated by Glaive . Apparently, all the credit should go to the special prompt technique. Let’s take a look:

DeepSeek-R1 is not the same

These models think “out loud” before generating the final result: and this approach is very similar to humans.

Chinese AI model from DeepSeek - a revolution in the world of neural networks?

Artificial intelligence is developing at a rapid pace. It seems that today not only IT giants are participating in this race, but even small companies striving to be at the top of progress. Once upon a time, such AI models as ChatGPT from OpenAI, Google Gemini, Anthropic Claude, made an indelible impression due to their ability to quickly respond to almost any user request. Of course, there were errors. Now, the boring neural networks have been replaced by the DeepSeek-R1 development of the Chinese company DeepSeek, which supposedly can reason for real, be as insightful as possible. At least, this is what its creators claim.

Whether this is true or not - Associate Professor of Institute No. 8 “Computer Science and Applied Mathematics” of the Moscow Aviation Institute, PhD in Physics and Mathematics Dmitry Soshnikov shared his opinion.

Is DeepSeek-R1 as unique as they say?

From an architectural point of view, the model is not very different from other language models, the expert believes. He noted that the main difference is the approach used to train the model.

He also added that DeepSeek currently has two models: the classic DeepSeek V3, and the reasoning model DeepSeek R1 - it is largely due to it that the information noise has risen.

Is it possible to repeat the success of DeepSeek-R1 by trying to create its analogue at home?

The hype around the Chinese development turned out to be so “loud” that the question of the possibilities of creating its analogue at home and making good money on it began to appear more and more often on the Internet. According to Dmitry Soshnikov, such a plan is difficult to implement, literally impossible.

— It is impossible to create something like this “from scratch” in everyday conditions – you need really huge computing resources. The estimated cost of training DeepSeek is $6 million. And this is only the cost of the last stage of training, and the entire series of experiments to create the model, of course, costs even more. But the important difference between DeepSeek and all its predecessors is that the model is open and can be freely used, including for additional training or in closed circuits of companies that are worried about data leakage. However, even for simple use of DeepSeek, a GPU cluster is required, and it will not be possible to run it at home “on your knee,” the expert concluded.

Nevertheless, there is good news for startups, noted Dmitry Soshnikov, specifying that today users have access to “distilled” versions of DeepSeek, which function well on “household-level” computers. You can experiment with them.

— But it is necessary to understand that “distilled” versions are not the same as the original DeepSeek (although they are often confused in the press). These are initially “small” models (LLaMa 8b, Gemma), which were trained “under the guidance” of DeepSeek. Accordingly, the quality of such models is not nearly as good as that of the original DeepSeek, although it surpasses the original basic models, explained Dmitry Soshnikov.

DeepSeek – friend, foe, or what?

The wide public resonance around the development inevitably gives rise to questions about the safety of its use. Here is what an associate professor of MAI thinks about it.