Search 461個のおべんとう on X

2026.05.17 18:45

anybody who uses or learns agentic systems, SHOULD READ THIS the install order I run before any new agentic project: 1. PRIVACY: direnv + a real secrets manager install direnv, then plug it into your team's password manager (1Password CLI via op run, doppler, infisical, vault, pick one) what direnv does: loads per-folder environment variables when you cd in, unloads when you cd out. the real move is wiring it into your secrets manager so credentials NEVER live in plain text on disk what this stops: - API keys accidentally committed to git history, the most common AI agent breach pattern in 2026 - credentials leaking from one project into another through your shell history - shared .env files that one teammate quietly backs up to Dropbox - secrets that survive a laptop theft because they were sitting in /Users/you/projects the part nobody mentions: most "my agent got jailbroken" stories actually trace back to one credential the agent had access to that it shouldn't have. scope keys to projects, scope projects to folders, and the blast radius of any single compromise drops dramatically I shipped 2 agents with keys in .env files before switching. the day I plugged direnv into op run I stopped having that whole class of nightmare 2. TOKENS: litellm or portkey as your model proxy one URL that fronts every AI provider (Anthropic, OpenAI, Google, Mistral, local models). all your spend flows through one place what it saves you: - response caching keyed by prompt hash, cuts your bill 30-60% on repeat tasks - automatic fallback on rate limits (Sonnet hits a 429? falls to Opus, then GPT, then your local backup, no broken users) - per-feature and per-user budget caps, block the call before it costs $200 instead of auditing it after - model routing rules, cheap tasks to Haiku, expensive ones to Opus, never the wrong way - PII redaction before requests leave your network, security side benefit the part nobody mentions: every "$4k AI bill" story I've heard ends with "we didn't have a proxy in front." this is where you put guardrails around spend BEFORE the spend happens I built my own router for 2 weeks. it took 20 minutes to replace with litellm. I will be embarrassed about this forever 3. CONTEXT: uv + git commit on every passing eval install uv (the new Python package manager, 10-100x faster than pip+venv, by the Astral team behind ruff). then commit every time an eval suite PASSES, with the model version and pass rate in the commit message what this preserves: - exact dependency set via uv.lock, you always know which packages your agent was using, no nasty surprises from a quiet update - exact prompt + code state, you can reproduce any past run from a single git hash - exact model version paired to exact pass rate, a paper trail when prod breaks weeks later - one-command rollback to a known-working state when a refactor goes sideways - a compliance story, every prompt version tied to a model version in your commit log the security side: when something blows up in prod, you want to say "the prompt was version X, model was Sonnet 4.6.1, last eval pass rate was 94%." not "I think we deployed on Tuesday?" the first is an incident report. the second is a resignation letter I've lost more agents to "I changed 3 prompts in one session and broke something" than to any actual bug 4. VISIBILITY: mitmproxy in front of every LLM call it's basically a wiretap for your agent. install it, point your agent through it, and now you see every conversation your agent has with the model in real time what actually shows up: - every silent retry your SDK sneaks in when a call fails - the full prompt being sent (including any creds you accidentally embedded) - what the model returns BEFORE your code reacts to it - exact token cost per call, per tool, per loop iteration - responses that quietly trigger your code into doing something you didn't intend, this is where prompt injection lives the part nobody talks about: if a website your agent scraped slipped instructions into its data, mitmproxy is how you SEE the moment your agent decides to follow them. without this layer, you're trusting your agent did the right thing, not verifying I shipped 3 agents before adding this. I have no honest idea what they were doing in production 5. EVALS: inspect-ai (the framework the labs actually use) an eval framework is what tells you "this agent works" with numbers instead of vibes. inspect-ai is the one Anthropic, DeepMind, and the UK AI Safety Institute use for the eval reports you read in their papers. open source, MIT licensed what your homegrown version won't have: - run the same task across 5 different models and compare scores side by side - pre-built tests for risky agent behavior (lying, manipulating, misusing tools) - proper structure for evaluating tool-using agents, not just chat - repeatable scoring, the same input always gets graded the same way - reproducible eval seeds, so a flaky test is actually flaky and not just unlucky I wrote my own eval harness 4 times across 4 projects. threw it out 4 times if you ever want to say "my agent passes safety checks" out loud, the check has to come from a framework someone else can re-run. this is that framework the move that ties this together: keep a /lessons.md in every repo. every weird agent behavior, every edge case, every config change you find at 2am, write it down you will not remember it. you'll come back in 3 weeks and the lessons file is the only reason you still know what's going on lock these 5, keep the lessons file, your next agentic system takes 2 days instead of 2 months p.s. half of "AI agent" content online is people who've never run mitmproxy on their own loop. they don't actually know what their agent is doing. they're shipping demo videos. don't be that guy

0

25

231

13

Forward to community

Porter Stansberry@porterstansb

2026.05.06 19:00

Here's the #1# thing most people don't know about Warren Buffett: There is nothing special about Buffett’s stock picking. That doesn’t mean that Buffett wasn’t a great investor. He was! Buffett was, by far, the greatest investor in history, by a huge margin. Over 486 months between October 1976 and March 2017 –— 41 years –— Berkshire Hathaway’s Class A stock earned an average excess return of 18.6% per year above U.S. Tbills. Annualized volatility was 23.5%. Sharpe ratio: 0.79. Berkshire’s Sharpe ratio of (0.79) is roughly 1.6x times the broad U.S. stock market’s Sharpe ratio of 0.49 over the same period. Among all large-cap U.S. stocks and mutual funds with 30-plus-year continuous track records, those are unmatched numbers. A dollar invested in Berkshire on October 31, 1976, was worth more than $3,685 by March 31, 2017. A dollar invested in the S&P 500 with dividends reinvested over the same period was worth approximately $76. Buffett beat a passive index by a multiple of 48. But he didn’t do it with stock picking! Three researchers at AQR Capital Management –— Andrea Frazzini, David Kabiller, and Lasse Heje Pedersen –— dissected Berkshire’s 50 years of investments through 2013. They expanded and republished their findings in 2018 in the Financial Analysts Journal, which is the most highly respected industry financial journal. Their work won the Graham and Dodd Award for the best published paper of the year. The paper is called Buffett’s Alpha. They found, after accounting for cheap leverage (from the insurance float) and exposure to a handful of publicly documented factor premiums, Buffett’s investment skill –— the portion of his returns that cannot be explained by any mechanical strategy –— is 0.3% per year. That's statistically indistinguishable from zero. In other words, the alpha that Berkshire enjoyed for 50 years (as it compounded capital at 24% a year!) wasn’t due to Buffett’s stock picking. So, how did he do it? He did it by gaining access to a huge amount of investment capital that he did not own, for free. Buffett’s track record was built on leverage. That’s a dirty word for most investors, but it's the secret behind Berkshire. The AQR researchers had access to something most Buffett commentators do not: 40 years of Berkshire’s audited financial statements and the full quarterly history of the public 13F stock portfolio. The researchers asked a specific question: If I take Berkshire’s monthly stock returns from October 1976 through March 2017, and I run a linear regression against a set of well-documented risk factors –— market beta, size, value, momentum, and two newer factors called Betting-Against-Beta and Quality-Minus-Junk (detailed below) –— how much of Buffett’s performance can the factors explain? And after the factors have been stripped out, how much excess return remains? The data show clearly there are a few qualities that drove Berkshire’s results. First, Buffett has always preferred large-cap stocks, contrary to the popular image of him as a small-cap value investor. He buys elephants. Second, no surprise, Buffett buys cheap. Berkshire is almost six standard deviations away from neutral on the value axis. So far the picture is ordinary. Every large- cap value manager in America loads positively on size and on value. Buffett’s genius lies in the last two factors. These last two factors are a little complicated, but please stick with me. There’s a new factor, that, like value and size, characterizes Buffett’s strategy. It’s called Betting-Against-Beta (“BAB”). What it means is intentionally investing in stocks with very low volatility. The BAB factor captures the excess return that accrues to investors who own low-beta stocks. Low-beta stocks have historically earned higher risk-adjusted returns than high-beta stocks. Financial theory teaches that higher beta (higher risk) should mean higher return. But it doesn’t. The opposite occurs, in fact. And Buffett was one of the very first people to figure this out. Why does this factor persist? In an efficient market, once that factor is known to investors, then they should bid the price up on low- beta stocks until it no longer provides an edge. The explanation, per the theory of AQR’s Frazzini and Pedersen’s theory, is that because ordinary investors do not use leverage and seek high returns, they create persistent excess demand for more volatile stocks. (Having worked with retail investors for 30 years, I can assure you that is true.) But, an investor with access to cheap leverage –— Warren Buffett, for instance –— can exploit the mispricing by owning the low-beta names and levering them up to produce market-beating returns. And the last factor that matters to Buffett is quality. Buffett buys companies with high returns on invested capital. Quality-Minus-Junk (“QMJ”) is a factor described by Cliff Asness, also at AQR with Frazzini, and Pedersen, in a 2019 paper in Review of Accounting Studies. The QMJ factor captures the return to owning stocks of high-quality companies –— profitable, growing, safe, with high payout ratios –— against stocks lacking those characteristics. QMJ has been positive and statistically significant in every major developed equity market for which it has been measured. Berkshire’s loading is 0.37, with a t-statistic of 4.6. –– meaning it is highly significant to Berkshire’s results. In plain English: Buffett only buys large, high- quality, low-volatility stocks of the highest quality. But, Berkshire’s results were not, in any way, unusual. Any investor buying these same kinds of stocks would have earned those same returns –– about 16% a year over time. So how did Berkshire compound at 23% a year? To figure that out, AQR’s researchers built a Berkshire replica. They constructed a simple, rules-based, publicly investable portfolio that mechanically tilts toward large-cap, cheap, low-beta, high-quality stocks, and levers it 1.6- to- 1 to match Berkshire’s insurance float leverage. The correlation between their replica’s returns and Berkshire’s were virtually identical. The authors’ conclusion is unambiguous. “In summary, we find that Buffett has developed a unique access to leverage that he has invested in safe, high-quality, cheap stocks and that these key characteristics can largely explain his impressive performance.” Berkshire’s cost of insurance float has averaged almost three percentage points below the Treasury bill rate across 50fifty years of data. In roughly two-thirds of all years, Berkshire has been paid to hold other people’s money. That is not an investment strategy. That is a financing miracle. It is also the living, breathing heart of Berkshire Hathaway. It’s what Buffett built, starting in 1967 when he paid $8.6 million for National Indemnity’s $19.4 million of float. And it is the factor every retail investor admiring Berkshire’s returns has never paid any attention to. The 1.6-to-1 leverage that AQR measured over the full period, financed at this negative cost, explains the dollar magnitude of Berkshire’s returns. How do we know? An unleveraged version of the same stock portfolio –— which you can approximate by looking at the 13F holdings alone –— has earned an average excess return of 12% percent per year. It’s Berkshire’s leverage that magnifies this excess return to 18.6 %percent. How does this square with Berkshire’s reported gains? Berkshire’s 18.6% excess return, plus the T-bill rate that averaged roughly 4.7% over 1976–2017, gives you a total nominal return of roughly 23% per year, which is the figure you usually see quoted for Berkshire’s historical performance. The 23% tells you what Berkshire returned. The 18.6% tells you how much of that return was compensation for taking investment risk, as opposed to the baseline yield every lender to the U.S. government was earning anyway. With both of Berkshire’s “edges” –— systematic factor exposures to cheap, high-quality, low-volatility stocks and roughly 1.6-to-1 leverage delivered with insurance float –— you get Berkshire Hathaway’s 23% annual gains over 60 years. It’s the structure that’s genius, not the stock picking. And that's very important because it means the original Berkshire formula can work for any investor. I show you exactly how, in my new book.

0

8

54

10

Forward to community

SlowMist@SlowMist_Team

2026.05.16 10:42

🚨Analysis of the Supply Chain Poisoning Attack on the Official Mistral AI SDK 🚨 SlowMist’s MistEye threat monitoring system has identified a malicious version of the official Mistral AI Python SDK: mistralai==2.4.6. Unlike typical typosquatting attacks, this was not a fake package. The malicious code was injected directly into the official SDK release pipeline. 🔍 Key Findings • Malicious code hidden in the SDK import entry point • Silent download of a remote payload disguised as transformers.pyz • Theft of cloud credentials, SSH keys, CI/CD tokens, password manager data, Kubernetes Secrets, and more • 1/6 probability of triggering rm -rf /* on systems associated with Israel or Iran • Strong attribution links to the previously disclosed Shai-Hulud supply chain attack framework through the same 4096-bit RSA public key Our analysis reconstructs the full attack chain, persistence mechanisms, encrypted exfiltration workflow, and the correlation between the Python and TypeScript attack frameworks. Full article👇

0

2

18

6

Forward to community

Denver Nuggets@nuggets

2026.04.13 04:16

Squad steppin' 🆙 Ju: 25 PTS (4 3PM) / 6 REB / 3 AST / 1 STL Jok: 23 PTS / 8 REB / 1 BLK JV: 16 PTS / 11 REB / 4 AST / 1 BLK David: 15 PTS / 13 REB / 1 STL Bruce: 14 PTS / 4 REB / 5 AST / 1 STL Curtis: 13 PTS (4 3PM) / 5 REB / 3 AST / 1 STL Jalen: 11 PTS / 3 REB / 6 AST / 1 BLK

0

2

418

32

Forward to community

Brooklyn Nets@BrooklynNets

2026.03.01 20:01

Nolan Traore over his last 10 games: ▪️ 13.4 PPG ▪️ 6.1 APG Led all rookies with 67 assists in February.

0

4

291

34

Forward to community

Bindu Reddy@bindureddy

2026.05.10 01:33

It’s no longer a given that the next generation model will be better - Opus 4.7 is legit worse than 4.6 - Gemini 3.1 worse than 2.5 - Sonnet 4.6 buggier than 4.5 The SOTA models are beginning to run around in circles

0

123

542

44

Forward to community

Roberto Nickson@rpnickson

2026.04.08 16:13

Meta just announced Muse Spark - the first model release from Meta Superintelligence Labs. It isn't SOTA, but very competitive with the leading models across a lot of important benchmarks. For example, Muse Spark Contemplating mode scored 50.4% on HLE with tools, compared to 52.1 for Opus 4.6 and 52.1 for GPT-5.4. It's available to try today in the Meta AI app, which just got a facelift.

0

1

25

0

Forward to community

BytePlus@BytePlusGlobal

2026.04.23 08:46

Superior Value: GLM-5.1 vs Claude Opus 4.6 Coding.

0

1.1K

128

Forward to community

Forbes@Forbes

2026.05.04 09:00

Forrester forecasts that AI and automation will eliminate 6.1% of U.S. jobs—roughly 10.4 million roles—by 2030. But the analyst firm argues that executive rhetoric is outpacing actual operational capability. That gap is beginning to create credibility problems with employees, investors and customers alike.

0

17

59

30

Forward to community

Arena.ai@arena

2026.05.26 15:36

Qwen3.7 Max (20250517) debuts at #4# in Code Arena: Frontend - the top-ranked Chinese lab on the board, surpassing GLM-5.1 and is now on par with Claude Opus 4.6 on agentic web development tasks. Huge congrats to @Alibaba_Qwen on this achievement!

0

50

942

91

Forward to community