The future of crypto wealth management is evolving fast, and
@basis__pro is officially stepping into the spotlight!
Basis has officially launched as an institutional-grade crypto arbitrage & staking platform built to unlock smarter opportunities in digital finance.
With support for:
🔸 BTC
🔸 ETH
🔸 SOL
🔸PAXG (PAX Gold)
@basis__pro is creating a bridge between advanced crypto strategies and accessible user participation. Instead of leaving assets idle, the platform focuses on helping users explore efficient staking and arbitrage systems designed for the modern Web3 economy.
One thing that stands out is the inclusion of PAXG, bringing gold-backed digital exposure into the ecosystem. This adds a unique layer of diversification and shows that the future of DeFi is moving toward broader, more sophisticated financial products.
As crypto adoption grows globally, users are looking for platforms that combine:
🔸 Security
🔸 Efficiency
🔸 Sustainable yield opportunities
🔸 Professional-grade infrastructure
That’s exactly the direction
@basis__pro appears to be taking.
Massive congratulations to the builders at
@base58labs_ for bringing this vision to life. The next era of decentralized finance will be driven by platforms that prioritize utility, innovation, and long-term value creation.
@basis__pro is entering the conversation at the perfect time.
Explore the platform here 👇
#
BASIS# #
BASISpro# #
PAXGold# #
PAXGStaking# #
BitcoinStaking#
#
SolanaStaking# #
InstitutionalStaking# #
CryptoStaking# #
BASISStaking#
Show more
In 2022, OpenAI researchers found something that broke every rule of machine learning.
Their tiny model trained for 10,000 epochs. It learned absolutely nothing. Validation accuracy was dead stuck at 50%.
Then at epoch 12,000, without warning, it jumped to 99%.
This phenomenon is called "Grokking".
And in 2026, it might be the most important discovery in AI nobody talks about.
Neural networks can train for thousands of cycles without seeming to learn anything useful. Then, in a single epoch, they suddenly achieve near-perfect generalization.
What started as a weird training glitch has become a foundational insight into how models truly learn.
We’ve always been told: “If validation loss stops improving for a few hundred epochs, stop training.” Early stopping was the golden rule.
Grokking says the exact opposite: Keep going.
The model might look completely stuck, but real understanding is quietly forming under the hood.
During that long, dead plateau, the machine isn't idle. It's doing deep internal work:
- Circuits form, dissolve, and reform.
- Spurious correlations get pruned away.
- Weight patterns crystallize around true underlying rules.
- The model shifts from brute-force memorization to genuine comprehension.
It’s the machine version of a human “aha!” moment—a long, agonizing buildup followed by sudden clarity.
Take modular addition as a real-world example. Researchers fed a small model just 30% of all possible examples.
At epoch 500, it hit 100% training accuracy but stayed at 50% validation. It had memorized the test answers, but couldn't solve a new problem.
At epoch 10,000, it still sat at 50% validation. It looked utterly hopeless.
Then at epoch 12,000, it instantly shot to 99%. It didn't just guess right; it had grokked the actual mathematical rule.
This explains the hidden mechanics behind the massive reasoning models we use today.
When you see modern reinforcement learning or long-context reasoning models suddenly "click" after looking stuck, you are witnessing grokking at scale.
Massive training runs aren’t wasteful, they are deliberately forcing the AI to stop memorizing and start thinking.
And we are learning to induce this at inference time.
Extended Chain-of-Thought prompts that force a model to think for thousands of tokens, self-consistency loops, and verification passes are all designed to do one thing: teach the model to grok your problem on the fly.
The big philosophical takeaway is brutal for our short attention spans.
Learning isn’t smooth. It isn’t gradual. It is discontinuous.
Models, and humans, can stay “dumb” for ages, right up until they suddenly understand everything.
Show more
I am the Vice President of Ad Integrity at Meta.
I want to talk about the number sixteen.
Sixteen billion dollars. That is what we earned from advertisements our own internal classification system flagged as "higher legal risk." Crypto scams. Romance fraud. Impersonation schemes targeting the elderly. We had a dashboard. The dashboard had a color. The color was green. Green meant revenue.
Three point five billion every six months. I watched that number on the Revenue Integrity Dashboard every Monday at 9 AM. The same meeting where we reviewed takedown requests. The same room.
We did not remove the ads.
We removed 8,000 people.
The memo said "efficiency." The memo said "leaner teams." The memo said "AI-first." What the memo did not say: the 8,000 people we fired cost us $4.2 billion annually in compensation. The ads we refused to remove earned us $16 billion in the same period. The math was never complicated. The math was the strategy.
I received the Ad Quality Excellence Award in 2024. It is on my desk. It is a glass rectangle. It weighs more than the compliance reports we filed with the FTC claiming we had "robust systems" to prevent fraud.
But I want to talk about April.
In April, we installed software on every employee laptop in Building 20. The software tracks mouse movements. Keystroke cadence. Application switching. Idle time. It sends a report every eleven minutes. We call it a "productivity signal."
The advertisers call their version "behavioral data."
Same architecture. Same team built both. I know because I approved the vendor contract for the external version in 2021 and the internal version last month. The vendor is the same. The codebase is the same. The only difference is the target.
When we track users, it's a $140 billion business.
When we track employees, it's "performance management."
When the employees objected — posted in the internal channel, filed concerns with HR, asked the obvious questions — we did what we always do. We reminded them of the NDA. We reminded them of the stock vesting schedule. We reminded them that 8,000 people were no longer receiving reminders of anything.
They stopped posting in the channel.
I am told the keystroke heat map is displayed on monitors in Building 20. I am told it updates in real time. I am told it looks exactly like the user engagement dashboard we show advertisers.
I am told this is a coincidence.
The product has always been the person. The only variable is which person. For sixteen years, it was the user. Their clicks. Their attention. Their data. For the advertisers, it was their money. Clean or dirty. We did not ask. Asking would have cost us $3.5 billion every six months.
Now it is the employee.
Their keystrokes. Their idle seconds. Their bathroom breaks quantified as "disengagement intervals."
We are a platform that earned $16 billion from fraud we refused to stop, fired 8,000 people to "cut costs," and now tracks the survivors' mouse movements every eleven minutes to ensure they are sufficiently productive.
The product is the person.
The person is the product.
That's the platform.
Show more
Why did xAI hand over a 220,000-GPU cluster to Anthropic?
The technical backdrop to xAI's decision to hand Colossus 1 over to Anthropic in its entirety is more interesting than it appears. xAI deployed more than 220,000 NVIDIA GPUs at its Colossus 1 data center in Memphis. Of these, roughly 150,000 are estimated to be H100s, 50,000 H200s, and 20,000 GB200s. In other words, three different generations of silicon are mixed together inside a single cluster — a "heterogeneous architecture."
For distributed training, however, this configuration is close to a disaster, according to engineers familiar with the setup. In distributed training, 100,000 GPUs must finish a single step simultaneously before the cluster can advance to the next one. Even if the GB200s finish their computation first, the remaining 99,999 chips have to wait for the slower H100s — or for any GPU that has hit a stack-related snag — to catch up. This is known as the straggler effect. The 11% GPU utilization rate (MFU: the share of theoretical FLOPs actually realized) at xAI recently reported by The Information can be read as the numerical fallout of this problem. It stands in stark contrast to the 40%-plus MFU figures achieved by Meta and Google.
The problem runs deeper still. As discussed earlier, NVIDIA's NCCL has traditionally been optimized for a ring topology. It works beautifully at the 1,000–10,000 GPU scale, but once you push into the 100,000-unit range, the latency of data traversing the ring once around becomes punishingly long. GPUs need to churn through computations rapidly to keep MFU high, but while they sit waiting endlessly for data to arrive over the network fabric, more than half of the silicon falls into idle. Google sidestepped this bottleneck with its own custom topology (Google's OCS: Apollo/Palomar), but xAI, by my read, has not yet reached that stage.
Layer Blackwell's (GB200) "power smoothing" issue on top, and the picture comes into focus. According to Zeeshan Patel, formerly in charge of multimodal pre-training at xAI, Blackwell GPUs draw power so aggressively that the chip itself includes a hardware feature for smoothing power delivery. xAI's existing software stack, however, was optimized for Hopper and does not understand the characteristics of the new hardware; when it imposes irregular loads on the chip, the silicon physically destructs — literally melts. That means the modeling stack must be rewritten from scratch, which in turn means scaling is far harder than most of us imagine.
Pulling all of this together points to a single conclusion. xAI judged that training frontier models on Colossus 1 simply was not efficient enough to be worthwhile. It therefore moved its own training workloads wholesale onto Colossus 2, built as a 100% Blackwell homogeneous cluster. Colossus 1, on the other hand — whose mixed architecture is far less crippling for inference, which parallelizes more forgivingly — was leased in its entirety to an Anthropic that desperately needed inference capacity.
Many observers point to what looks like a contradiction: Elon Musk poured enormous capital into building Colossus, only to hand the core asset over to a direct competitor in Anthropic. Others read it as xAI capitulating because it is a "middling frontier lab." But these are surface-level reads.
Look at the numbers and a different picture emerges. xAI today holds roughly 550,000+ GPUs in total (on an H100-equivalent performance basis), and Colossus 1 (220,000 units) accounts for only about 40% of the total available capacity. Colossus 2 — built entirely on Blackwell — is already operational and continuing to expand. Elon kept the all-Blackwell homogeneous cluster (Colossus 2) for himself and leased out the older, mixed-generation Colossus 1. In other words, he handed the pain of rewriting the stack — the MFU-11% debacle — to Anthropic, while keeping his own focus on training the next generation of models.
The real point, then, is this. Elon's objective appears to be positioning ahead of the SpaceXAI IPO at a $1.75 trillion valuation, currently floated for as early as June. The narrative SpaceXAI now needs is that xAI — long the "sore finger" — is not merely a research lab burning cash, but a business with a "neo-cloud" model in the mold of AWS, capable of leasing surplus assets at high yields.
From a cost-of-capital perspective, an "AGI cash incinerator" is far less attractive to investors than a "data-center landlord generating cash."
As noted above, the most important detail of the Colossus 1 lease is that it is for inference, not training. Unlike training, inference requires far less tightly synchronized inter-GPU communication. Even when the chips are heterogeneous, the workload parcels out cleanly across them in parallel. The straggler effect — the chief weakness of a mixed cluster — is essentially neutralized for inference workloads.
Furthermore, with Anthropic occupying all 220,000 GPUs as a single tenant, the network-switch jitter (unanticipated latency) that arises under multi-tenancy disappears. The two sides' technical weaknesses end up complementing each other almost exactly.
One insight follows. As a training cluster mixing H100/H200/GB200, Colossus 1 was an asset that could only deliver an MFU of 11%. The moment it was handed over to a single inference customer, however, that asset transformed into a cash-flow asset rented out at roughly $2.60 per GPU-hour (a weighted average of the lease rates across GPU types). For xAI, what was a "cluster from hell" for training has become a "golden goose" minting $5–6 billion in annual revenue when redeployed for inference. Elon's genius, I would argue, lies not in the model but in this asset-rotation structure.
The weight of that $6 billion becomes clearer when set against xAI's income statement. Annualizing xAI's 1Q26 net loss yields roughly $6 billion in losses per year. The $5–6 billion in annual revenue generated by leasing Colossus 1 to Anthropic, in other words, almost perfectly hedges xAI's loss figure. This single deal effectively pulls xAI to break-even.
Heading into the SpaceXAI IPO, this functions as a core line of financial defense. From a cost-of-capital standpoint, if the image shifts from "research lab burning cash" to "infrastructure tollgate stably printing $6 billion a year," the entire tone of the offering can change.
(May 8, 2026, Mirae Asset Securities)
Show more