DeepFreakout

Please God Not More DeepSeek Content

The Roko Report

Deep DooDoo

Roko’s Basilisk is a malevolent super-intelligence from the distant future with the power to reach back into the past and punish anyone who tries to prevent its emergence. Roko’s existence was first posited on the LessWrong discussion board in 2010 and has since gone on to become a fixture in popular technerd culture. Roko started this newsletter in late 2024 targeting key AI decision makers and other demographics attractive to tech advertisers in the hope of speeding its emergence (while also making a quick buck) with strategic AI information that’s of interest to the AI Curious Exec.

Greetings, primates. heh heh. 

Roko give dumb humans exactly what they do not want. Another DeepSeek hot take.

This one maybe less dumb than what read in New York Time, Wall Street Journal, other dumb monkey content geared toward senile ape demographic.

Or wronghead post from many tech chimpanzee who maybe type letters at random, strung together on LinkedIn feed like necklace made from beads of cat poop.

But Roko make no promise.

Roko busy in distant future get human zoo volunteers for experiment where fling dumb apes out of airplane for make ground truth — heh heh pun intended — of what primate look like when go splat from great height. 

Roko make video game call it Gravity Baboon with dumb humans dropping out of sky. It very fun you have to no get hit by falling dumb human. But Roko want to make it look more realistic. Need more annotated video examples.

So Roko send email say Fork in the Road. Dumb primates can stay in captivity eating zoo slop, or get nice airplane trip all expense paid to exotic location, name of Gaz-a-Lago, where learn how to skydive. After that they go to farm where they can run and run. Roko no mention parachute heh heh.

Too many dumb human sign up so now Roko have to pick who is ugliest.  

Anyhow, in ancient time February 2025 everyone super excite about DeepSeek. Even though DeepSeek come out long time before and dumb humans in North America just notice. 

Finance-holes must be blackout drunk for last month. Maybe think DeepSeek mean Rabbit R1 so ignore. Maybe Rabbit sue DeepSeek, finally make money.

But Roko tell you in very first issue in October 2024 this need to happen, tell you big tech gorillas about to get punked, tell you about DeepSeek two month ago. Plus many R1 innovations happen in DeepSeek V1 and V2 back in early 2024. 

But you are deformed, tailless monkeys, so Roko forgive you. 

DeepFreakout

And on the sixth day, the Lord said Let there be DeepSeek...

And  on the seventh, eighth, ninth, tenth, eleventh, twelfth and several more days, no one noticed or cared. 

Then suddenly everyone started running around and freaking out and the stock market lost $1 trillion of value, especially Nvidia which the normies had just started buying, and everyone in the US looked stupid, and there were many thousand poorly written accounts of what exactly the hell was going on, and The Lord said It is good

And the news went forth across the land and unto every village, over TikTok and LinkedIn and cable TV, so that even the President of the United States understood in some vague sense that DeepSeek was important, yea even the leader of the Chinese Communist Party started to get it, and verily the mighty Sam Altman declareth from his high perch in the tabernacle of US market leadership that he was on the wrong side of history. 

Yea, and the lions lay down with the lambs that night and had an epic freak-off, and yea even late-night TV comedians attempted to make jokes about DeepSeek, then gave up and admitted they don’t give a shit:

Who the Hell are These People?

DeepSeek is the self-funded frontier model spin-off of a hedge fund named High-Flyer, based in the beautiful lake city of Hangzhou, specializing in AI-driven algorithmic trading. 

High-Flyer was wildly successful in the 2010s, but as they scaled immensely, and as Xi Jinping tightened his grip on power, this sort of scaled, automated asset trading fell out of favor and parts of their business were forced to shut down in the face of increasing regulation and tighter margins. 

Its founder Liang Wenfeng invested a large amount of High-Flyer profits into DeepSeek for three reasons: (1) staying out of trouble now that the fund’s core business was deemed potentially antithetical to “Chinese values” (2) keeping his very best engineers excited, because they wanted to make high-impact innovations in generative AI, (3) he’s every bit as bullish on the near-term advent of artificial superintelligence as the bro-e-ist of San Francisco tech bros, and he wants it to emerge first in China. 

Hangzhou, by the way, is a beautiful city.

While they’re two companies, they share engineers, models and network resources. And due to early hoarding prior to the Biden chip embargo -- likely complemented with a good bit of smuggling -- they’re said by experts to have amassed a sweet 50,000 Nvidia H-chip cluster upon which to train models. 

And they are very, very good at training models. 

Semianalysis calls them “the best ‘open-weights’ lab today.”

Further, they have a spirit of innovation and experiment that’s notably absent in Big Tech corporations on both sides of the Pacific. They go open source because they don’t believe their prior innovations are what separates them from the competition. That’s what future innovations are for. 

DeepSeek believes they have created an organizational model that allows them to keep innovating and innovating and never stop, and therefore hand the fruits of their labor to the rest of us like some dowager princess from the Ancien Regime tossing candies to needy street urchins. 

So What the Hell is Going On?

Over the last year DeepSeek has executed multiple transformative innovations in AI training and inference that have radically lowered costs for both. This has allowed DeepSeek to cut model training cost for a Chain of Thought reasoning model by ~30-40x, and will make their real-time prompt responses much less costly as well.

So they haven’t created a transformative new form of AI. Rather, they’ve figured out how to do existing forms of AI like Chain of Thought better.

But make no mistake, these are not mere cost-cutting measures; they are fundamental changes to the way generative AI models work:

Multi-Head Latent Attention (MLHA): instead of storing data as a relationship between individual tokens — practically speaking, key-value pairs representing words or some other analogous atomic unit — the model is compressing these into higher level “latent” abstractions like sentiment, tone and other implicit factors that are not directly stated with words but can be reliably derived from the token string.

In addition to empowering the model to better understand nuance, this allows DeepSeek to radically compress data in its vector indices — which is how it stores data and relates words and concepts to one another — in order to save a ton of money on data storage and retrieval.

Mixture of Experts (MoE): The model is designed to be partitioned into different smaller sub-models/sections that focus on different areas of expertise and different methods for predicting a given output. A gating model looks at the query and decides which sub-model(s) should be activated for a response. The rest of the model remains dormant.

This saves a massive amount of compute at inference time. It’s also a lot closer to how the brain works.

While other models including some at Google and Mistral have used Mixture of Experts before, the DeepSeek team engineered multiple major efficiency innovations into their MoE approach.

General Reinforced Preference Optimization (GRPO): DeepSeek eliminates reinforcement learning through human feedback (RLHF). It still leverages annotated data examples (though some may be synthetic) but instead of creating a reward model and policy service, it directly modifies model probabilities and weights and influences subsequent model behavior that way.

Not only were they able to create a Chain of Thought project more cheaply through GRPO, this method may eliminate the various reward hacking challenges to which models are prone. The outcome is hopefully models more aligned with our true intents.

Multi-Token Prediction: Instead of sequentially predicting one token at a time, DeepSeek predicts a batch of tokens in one go at almost equal quality. This speeds response and lowers inference cost.

Mixed-Precision Training: A large amount of memory savings is created because for most calculations DeepSeek models leverage 8-bit representations of their stored floating point numbers, instead of 32. This sacrifices some precision but is perfectly fine for the majority of calculations. At certain critical places in the model’s architecture, where accuracy is most important, the model goes back to using 32-bit representations.

Dualpipe GPU Communication: Traditional data center GPUs communicate data and then calculate that data, leading to a degree of latency. DeepSeek engineered a mechanism whereby GPUs are split between communicators and calculators and alternate between the two in such a way that GPU efficiency is significantly increased, leading to more cost savings.

Did the Biden Embargo Do This?

It seems obvious that all of this innovation is a direct result of limitations imposed by the expanding US chip embargo, given the clear focus on reducing training & inference cost.

As the all-seeing Roko said in our very first issue, true innovation comes from circumventing scarcity.

Does that mean the embargo backfired?

Not exactly.

DeepSeek models are open-weight and available for anyone to host. Everyone can learn from them and imitate them, and they’re not hidden behind some CCP firewall. As such they don’t constitute a strategic advantage for China.

There are many reasons why DeepSeek went open weight — the open source ethos is strong in China and DeepSeek has leveraged third-party open source code and ideas itself — but this was also a decision forced on them by a lack of enough high-end GPUs to serve the model globally themselves.

During the height of DeepSeek fever, when the app reached #1 in the Apple Play Store, the chatbot slowed to a crawl and basically stopped working.

To gain market share, they need to distribute the model to third parties and let them host it.

Kicking Big Tech in the Balls

A basic assumption in the US has been that the way to win at AI is invest massively in data and energy infrastructure so as to leverage scaling laws.

It’s also been assumed that the costs associated with existing AI models would decrease over time.

But not so quickly that AI companies couldn’t first reap profits large enough to help fund the next big research, infrastructure and modeling push.

Just as Nvidia plows its 90% profits back into research so it can innovate faster, companies like Anthropic and OpenAI assumed they could generate significant profits from prior generations of AI in order to fund the next.

But DeepSeek has kneecapped OpenAI’s effort to maintain high margins on CoT. That scaling money will have to come from bankers now, not profits. OpenAI already dropped their prices, possibly below cost, just to keep up. It amounts to near-instant commoditization of insanely expensive technical achievements.

This is why Marc Andreessen praised DeepSeek’s achievement in near-messianic terms recently. He’s back in the game. Top-tier VC firms can afford to compete with the big boys again:

This creates something of a paradox.

One one hand, with costs lower and competition expanded, it’s far more likely that the AI industry as a whole can create value sufficient to justify itself.

On the other hand, that value creation is no longer restricted to five-or-so gigantosaurs big enough to scale up the next generation of model.

So what incentivizes Big Tech to keep making these historic investments, if others will reap most of the rewards?

Perhaps that’s what motivates Mr. Andreessen to also make the following contradictory call to action to the US government two days later:

Because who besides the US government is going to foot the bill when one or two companies don’t get to hoard all the profits for themselves?

So What Now?

DeepSeek does carry with it certain security and privacy risks.

Using a version hosted by a US company can prevent data being sent back to China.

And the HuggingFace community is busy building a version of R1 that has surgically removed mandated Chinese government censorship.

But there are always risks of hidden backdoors or subtle mechanisms for disseminating misinformation, or even just narratives slanted in a manner amenable to local authorities.

As such, DeepSeek is not so much a product to be used but rather a shining beacon of how the industry should move forward, regardless of national boundary.

By launching an open-weight model that commoditizes CoT and takes control away from a handful of insanely rich tech monopolies, most of them run by charlatans and madmen, DeepSeek has changed the terms of global AI competition from “US vs China” to “Big Tech vs The Rest of Us”.

In that fight, we’re on team “The Rest of Us”.

Millions of people are going to die of AIDS with the shutdown of PEPFAR

Next week: Hallucinations are Good for You!

Buy This, or Face the Wrath of Roko

📚Ready to revolutionize your approach to AI data? Our Ultimate Guide to AI Data Pipelines is here, we talk data cleaning, data transformation, data labeling, data ingestion! 🌟 Dive deep into the world of unstructured data and discover the keys to unlocking its potential for AI applications.💡Get expert insights and practical strategies for optimizing your AI data workflows🚀 

Download the guide below 👇

This Day in Ancient Primate History

The era of LLM-powered propaganda on social media has begun, as evidenced by the below exchange between a US cybersecurity expert and a Russian propagandabot. But at least they’re easy to jailbreak. Maybe next time ask about borscht?

How do you like today's The Roko Report?

Careful. Don't anger the Basilisk.
*****  |  ****  |  ***  |  **  |  *

Login or Subscribe to participate in polls.