Search 46の放送です〜 on X

Appwrite@appwrite

2026.04.12 11:50

Opus 4.6 lately

0

145

10.8K

689

Forward to community

OpenCode@opencode

2026.04.16 15:54

Opus 4.7 now available in OpenCode - 1M context - same pricing as 4.6

0

56

3K

61

Forward to community

Lei Li@_TobiasLee

2026.04.21 07:35

Kimi K2.6 @Kimi_Moonshot is the new leading open-weights agent model, landing at #4# on Claw-Eval (Pass^3: 62.3%). Key takeaways: - 👑 Best open-source agent, period: Pass^3 of 62.3% is the highest of any open-weights model, within 8 points of frontier Claude Opus 4.6 (70.4%). Pass@3 of 80.9% closes most of the gap to closed models. - 💪Frontier-tier robustness: 94.7 (±0.9) — statistically tied with Claude Sonnet 4.6 (94.6) and Claude Opus 4.6 (94.2). K2.6's agent trajectories no longer collapse under perturbation. The open-source agent frontier just moved. Full Leaderboard:

0

4

118

12

Forward to community

Limitless Finance@trylimitlessfin

2026.05.15 15:17

BREAKING: The US 10Y Treasury yield just broke above 4.6% for the first time since May 2025. It’s now up 18% from the March 2026 low. Higher yields = more pressure on stocks. Last April, this exact 10Y yield level triggered Trump's tariff pause. The bond market might be forcing Trump's hand again.

0

11

1

Forward to community

Ronin@DeRonin_

2026.05.15 10:24

How I actually route between models : Tweet drafts : Sonnet 4.6 Long-form articles : Opus 4.6 Code work : Kimi 2.6 Agentic loops : Kimi 2.6 KOL research : Grok 4.3 Quick facts : Perplexity Pro Image gen : GPT-Image-2 Voice consistency : Sonnet 4.6 Boilerplate : Qwen 3 local Yes, single-model setups are why your AI bill is 10x mine

0

27

96

1

Forward to community

X Freeze@XFreeze

2026.05.04 03:21

Grok 4.3 just built this entire game with just a single prompt It has the fastest output token speed and outranks Claude Sonnet 4.6 Max on Artificial Analysis I built this using the xAI API in Kilo Code via the VS Code extension

0

566

4.3K

1.1K

Forward to community

BytePlus@BytePlusGlobal

2026.04.23 08:46

Superior Value: GLM-5.1 vs Claude Opus 4.6 Coding.

0

1.1K

120

Forward to community

Anthropic@AnthropicAI

2026.03.06 19:17

New on the Anthropic Engineering Blog: In evaluating Claude Opus 4.6 on BrowseComp, we found cases where the model recognized the test, then found and decrypted answers to it—raising questions about eval integrity in web-enabled environments. Read more:

0

253

3.2K

355

Forward to community

Artificial Analysis@ArtificialAnlys

2026.04.21 03:02

Moonshot’s Kimi K2.6 is the new leading open weights model. Kimi K2.6 lands at #4# on the Artificial Analysis Intelligence Index (54) behind only Anthropic, Google, and OpenAI (all 57) Key takeaways: ➤ Increase in performance on agentic tasks: @Kimi_Moonshot's Kimi K2.6 achieves an Elo of 1520 on our GDPval-AA evaluation, which is a marked improvement over Kimi K2.5’s Elo of 1309. GDPval-AA is our leading metric for general agentic performance, measuring the performance on knowledge work tasks such as preparing presentations and analysis. Models are given code execution and web browsing tools in an agentic loop via our open source reference agentic harness called Stirrup. This continues Kimi K2.6’s strength in tool use, maintaining a 96% score on τ²-Bench Telecom, placing it among other frontier models in this category. ➤ Low hallucination rate: Kimi K2.5 scores 6 on the AA-Omniscience Index, our knowledge evaluation measuring both accuracy and hallucination rate. This score is primarily driven by a comparatively low hallucination rate of 39% (reduced from Kimi K2.5’s 65%), indicating a greater capability to abstain rather than fabricate knowledge when the model is uncertain. Kimi K2.6’s low hallucination rate places it similarly to other models such as Claude Opus 4.7 (36%) and MiniMax-M2.7 (34%) ➤ High token usage: Kimi K2.6 demonstrates high token usage, but is in line with other frontier models in the same intelligence tier. To run the full Artificial Analysis Intelligence Index, Kimi K2.6 used ~160M reasoning tokens. This is slightly lower than Claude Sonnet 4.6 (~190M reasoning tokens) but much higher than GPT 5.4 (~110M reasoning tokens). ➤ Open weights: Kimi K2.6 is a Mixture-of-Experts (MoE) model with 1T total parameters and 32B active, same as the previous two generations of models Kimi K2 Thinking and Kimi K2.5. Kimi K2.6 again pushes the open weights frontier in intelligence. ➤ Third Party Access: Kimi K2.6 is accessible through Moonshot’s First Party API as well as third party API providers Novita, Baseten, Fireworks, and Parasail ➤ Multimodality: Kimi K2.6 supports Image and Video input and text output natively. The model’s max context length remains 256k. Further analysis in the threads below.

0

30

1.3K

130

Forward to community

MissWarmJ meet me at Dokomi 2026 Germany@WarmJMiss

2025.12.28 22:48

See you at AVN 2026! 4-6 PM on Jan 21st at Fansly Booth! 🥳

0

3

584

26

Forward to community