Search MachineLearning on X

2026.04.24 00:04

Over several years, we’ve contributed #research# to @ICLR exploring how #machinelearning# models are trained, interpreted, and applied. This year’s papers span #multimodallearning#, #diffusion#, interpretability, and theory, with open #code# and demos. 🔗

0

8

2

Forward to community

Water Tower Research LLC@WTR_Research

2026.05.18 18:25

The WTR AI Private Pools Index climbed another 14.8% M/M through May 11, pushing total tracked private AI market value to $2.20 trillion across 38 active constituents. 🔹 Anthropic drove the move higher, adding +$276.8B in value to reach a ~$900.5B valuation, while OpenAI remained the second-largest constituent at ~$839.0B 🔹 The index is now up 64.1% YTD, underscoring continued investor appetite for AI infrastructure, foundation models, and enterprise AI platforms 🔹 Harvey, Figure AI, and ElevenLabs were among the largest monthly gainers, while Cursor saw pressure amidst competition from Anthropic’s Claude Code and other AI coding tools Read James Kisner, CFA's full report for more detail on private AI valuation trends, index composition, top movers, and secondary market activity shaping the AI ecosystem. #AI# #ArtificialIntelligence# #PrivateMarkets# #Anthropic# #OpenAI# #GenerativeAI# #MachineLearning# #Technology# #Investing# #WaterTowerResearch#

0

2

Forward to community

Mathelirium@mathelirium

2026.05.17 14:56

What if Your Neural Network Was Forced to Obey Physics? Physics-Informed Neural Networks (PINNs) are neural networks trained to satisfy a differential equation by building the PDE residual directly into the loss. They emerged from a very practical problem...classical PDE pipelines can be brilliant, but they often demand heavy discretization work (meshes, stencils, stability tuning), and the method you build is usually tied to one geometry and one solver setup. A PINN flips the workflow by representing the solution itself as a smooth function uᵩ(x,t) and enforcing the physics everywhere you choose to sample the domain. People often meet PINNs in the least helpful way...via a flashy solution plot, and almost no explanation of what was enforced to get it. In this series we keep the enforcement visible. We pick a differential equation, represent the unknown solution as a flexible function, measure how well that function satisfies the equation across the domain, and train it to reduce that mismatch everywhere we sample. A normal neural net learns from labels...you give it inputs and target outputs. A PINN learns from a differential equation...you give it inputs (x,t) and it gets punished whenever its output fails the PDE. By punish we mean that the loss increases when the mismatch is large we reward it if the loss decreases as the mismatch gets smaller. The network isn’t replacing physics, it’s becoming a flexible function that is forced to satisfy the same calculus you’d impose on any candidate solution. The math breakdown: We start with a PDE we want to solve on a domain Ω. Write it as uₜ(x,t) + N(u(x,t), uₓ(x,t), uₓₓ(x,t), …) = 0 for (x,t) in Ω A PINN replaces the unknown function u with a neural network output uᵩ(x,t) Now define the physics residual by plugging uᵩ into the PDE rᵩ(x,t) = ∂uᵩ/∂t + N(uᵩ, ∂uᵩ/∂x, ∂²uᵩ/∂x², …) If uᵩ were an exact solution, we would have rᵩ(x,t) = 0 everywhere. We may also have data points (xᵢ,tᵢ,uᵢ) from measurements or a known initial condition. The training objective is just a weighted sum of squared errors L(ᵩ) = L_data(ᵩ) + λ L_phys(ᵩ) + L_bc/ic(ᵩ) with L_data(ᵩ) = meanᵢ |uᵩ(xᵢ,tᵢ) − uᵢ|² L_phys(ᵩ) = meanⱼ |rᵩ(xⱼ,tⱼ)|² where (xⱼ,tⱼ) are the collocation points in Ω L_bc/ic(ᵩ) = penalties enforcing boundary conditions and initial conditions The key technical step is that the derivatives inside rᵩ are computed by automatic differentiation ∂uᵩ/∂t, ∂uᵩ/∂x, ∂²uᵩ/∂x², … So we can differentiate the total loss L(ᵩ) with respect to ᵩ and train with gradient descent. This is the whole idea behind PINNs. Learn a function, but make the PDE part of the loss, so the network is trained to be a solution, not just a curve-fitter. In the render, the main 3D surface is the network’s current guess uᵩ(x,t), drawn as a living sheet over the (x,t) plane. Hovering above is the neural scaffold...a visible graph of feature nodes and connections. The bright tension threads are the physics residual rᵩ(x,t): each thread tethers a collocation bead on the sheet up to the scaffold, and it thickens and brightens exactly where |rᵩ| is large (color encodes the sign). As training runs, those threads go slack across the domain not because we hid the error, but because the network has actually been pushed toward rᵩ(x,t) ≈ 0. #PINNs# #PhysicsInformedNeuralNetworks# #ScientificMachineLearning# #PDE# #DifferentialEquations# #Optimization# #MachineLearning# #AppliedMath# #ComputationalPhysics#

0

6

380

67

Forward to community

邓亚峰@LongTermMemoryE

2026.01.06 10:32

🚀 Excited to announce the release of our latest research on EverMemOS, now available on arXiv! As Large Language Models (LLMs) transition from simple conversational tools to long-term interactive agents, they face a critical "cognitive wall": limited context windows and fragmented memory. To bridge this gap, we introduced EverMemOS—a self-organizing memory operating system that transforms isolated interaction fragments into a structured, evolving "digital brain". By implementing an engram-inspired lifecycle—covering Episodic Trace Formation, Semantic Consolidation, and Reconstructive Recollection—EverMemOS doesn't just store data; it organizes experience. We are thrilled to report that EverMemOS has achieved State-of-the-Art (SOTA) results across four major long-term memory benchmarks: LoCoMo: Outperformed all existing memory systems and even full-context large models, while using drastically fewer tokens (93.05% overall accuracy). LongMemEval: Achieved a leading 83.00% accuracy, showing particularly strong gains in Knowledge Updates and temporal reasoning. HaluMem: Set a new standard for memory integrity and accuracy (90.04% recall). PersonaMem v2: Demonstrated superior performance in deep personalization and behavioral consistency across diverse scenarios. These results validate our belief that the future of AI lies in structured memory organization rather than just expanding context windows. Special thanks to the amazing team at EverMind Shanda Group for their hard work on this milestone! Check out the full paper on arXiv: Explore our code on GitHub: #AI# #LongTermMemory# #LLM# #MachineLearning# #EverMemOS# #AIInfra# #SOTA#

0

2

29

1

Forward to community

Google Research@GoogleResearch

2026.04.24 13:20

Attending the Women in Machine Learning (WiML) Social at #ICLR2026#? Stop by 203C from 12PM - 3PM to connect with the community and the Google Research team! #WiML# #GoogleResearch#

0

13

38

2

Forward to community

Tom Dörr@tom_doerr

2026.05.19 06:13

Curated list of 920 Python machine learning projects

0

2

0

Forward to community

Kimi.ai@Kimi_Moonshot

2026.02.26 07:21

Supporting @MITEECS and @nlp_mit’s Multimodal Machine Learning course (Spring 2026). 🎓 Students are leveraging the multimodal capabilities of Kimi K2.5 to power their final research projects. We look forward to seeing the innovative applications that will emerge this semester. 🔗 Happy coding! ✨

0

21

721

65

Forward to community

Tesla@Tesla

2022.10.01 05:24

1 Exapod = 1.1 exaFLOPs of machine learning compute 🔥

0

138

5K

583

Forward to community

How To AI@HowToAI_

2026.05.17 17:51

In 2022, OpenAI researchers found something that broke every rule of machine learning. Their tiny model trained for 10,000 epochs. It learned absolutely nothing. Validation accuracy was dead stuck at 50%. Then at epoch 12,000, without warning, it jumped to 99%. This phenomenon is called "Grokking". And in 2026, it might be the most important discovery in AI nobody talks about. Neural networks can train for thousands of cycles without seeming to learn anything useful. Then, in a single epoch, they suddenly achieve near-perfect generalization. What started as a weird training glitch has become a foundational insight into how models truly learn. We’ve always been told: “If validation loss stops improving for a few hundred epochs, stop training.” Early stopping was the golden rule. Grokking says the exact opposite: Keep going. The model might look completely stuck, but real understanding is quietly forming under the hood. During that long, dead plateau, the machine isn't idle. It's doing deep internal work: - Circuits form, dissolve, and reform. - Spurious correlations get pruned away. - Weight patterns crystallize around true underlying rules. - The model shifts from brute-force memorization to genuine comprehension. It’s the machine version of a human “aha!” moment—a long, agonizing buildup followed by sudden clarity. Take modular addition as a real-world example. Researchers fed a small model just 30% of all possible examples. At epoch 500, it hit 100% training accuracy but stayed at 50% validation. It had memorized the test answers, but couldn't solve a new problem. At epoch 10,000, it still sat at 50% validation. It looked utterly hopeless. Then at epoch 12,000, it instantly shot to 99%. It didn't just guess right; it had grokked the actual mathematical rule. This explains the hidden mechanics behind the massive reasoning models we use today. When you see modern reinforcement learning or long-context reasoning models suddenly "click" after looking stuck, you are witnessing grokking at scale. Massive training runs aren’t wasteful, they are deliberately forcing the AI to stop memorizing and start thinking. And we are learning to induce this at inference time. Extended Chain-of-Thought prompts that force a model to think for thousands of tokens, self-consistency loops, and verification passes are all designed to do one thing: teach the model to grok your problem on the fly. The big philosophical takeaway is brutal for our short attention spans. Learning isn’t smooth. It isn’t gradual. It is discontinuous. Models, and humans, can stay “dumb” for ages, right up until they suddenly understand everything.

0

15

134

27

Forward to community

Anthropic@AnthropicAI

2026.03.11 10:10

The Institute will be led by @jackclarkSF, in a new role as Anthropic’s Head of Public Benefit. It'll bring together an interdisciplinary staff of machine learning engineers, economists, and social scientists, making full use of the inside information of a frontier AI lab.

0

70

382

24

Forward to community