Register and share your invite link to earn from video plays and referrals.

Akshay ๐Ÿš€
@akshay_pachaar
Simplifying LLMs, AI Agents, RAG, and Machine Learning for you! โ€ข Co-founder @dailydoseofds_โ€ข BITS Pilani โ€ข 3 Patents โ€ข ex-AI Engineer @ LightningAI
485 Following    271.6K Followers
write /goals like acceptance criteria. /goal is now everywhere. Claude Code, Codex, Hermes, and more agents are adopting the same pattern: you set a completion condition, the agent works autonomously until a fast evaluator model confirms the condition is met. the feature is simple. writing good goals is not. vague goals fail in two ways: the agent loops forever trying to satisfy an unclear condition, or the evaluator hallucinates success because there's nothing concrete to check against. both burn tokens for nothing. here's what separates goals that work from goals that break: ๐—ด๐—ผ๐—ผ๐—ฑ ๐—ด๐—ผ๐—ฎ๐—น๐˜€ ๐—ฑ๐—ฒ๐˜€๐—ฐ๐—ฟ๐—ถ๐—ฏ๐—ฒ ๐—ฎ๐—ป ๐—ผ๐—ฏ๐˜€๐—ฒ๐—ฟ๐˜ƒ๐—ฎ๐—ฏ๐—น๐—ฒ ๐—ฒ๐—ป๐—ฑ ๐˜€๐˜๐—ฎ๐˜๐—ฒ. "all tests in test/auth pass and lint is clean" works because the agent can run the tests, print the output, and the evaluator can confirm it from the transcript. "every call site of the old API migrated and build succeeds" works because there's a verifiable artifact: the build output. "CHANGELOG.md has an entry for each PR merged this week" works because it points to a concrete file with concrete content. ๐—ฏ๐—ฎ๐—ฑ ๐—ด๐—ผ๐—ฎ๐—น๐˜€ ๐—ต๐—ฎ๐˜ƒ๐—ฒ ๐—ป๐—ผ ๐—ณ๐—ถ๐—ป๐—ถ๐˜€๐—ต ๐—น๐—ถ๐—ป๐—ฒ. "make the codebase better" fails because better by what metric? "refactor everything" fails because there's no exit condition. "fix the bugs" fails because which bugs, verified how? the mental model that helps: if a human couldn't tell when the ticket is done, neither can the evaluator. treat every /goal like a ticket you're assigning to a very literal junior developer who never gets tired. write the exact acceptance criteria you'd put in that ticket. one more thing: complex multi-step objectives overwhelm it. "redesign auth, add OAuth, write tests, update docs" is four goals pretending to be one. break them into sequential /goal calls where each has a single verifiable finish line. i wrote a detailed breakdown of /goal (article below) covering the full mechanics.
Show more
RT @twid: Best setup guide for Hermes Agent I've seen. Gets you fully configured without getting too far into the weed.
the three-tier memory of Hermes agent. AI agents forgets everything when your session ends. Hermes doesn't. it has three memory layers, each at a different speed. ๐˜๐—ถ๐—ฒ๐—ฟ ๐Ÿญ: ๐˜๐˜„๐—ผ ๐˜๐—ถ๐—ป๐˜† ๐—บ๐—ฎ๐—ฟ๐—ธ๐—ฑ๐—ผ๐˜„๐—ป ๐—ณ๐—ถ๐—น๐—ฒ๐˜€ MEMORY.md (2,200 chars) and USER.md (1,375 chars). injected into the system prompt at session start as a frozen snapshot. MEMORY.md holds project conventions, tool quirks, lessons learned. USER.md holds your profile: name, communication style, skill level. these files are tiny on purpose. when MEMORY.md hits ~80% capacity, the agent consolidates: merges related entries, drops redundancy, keeps only the densest facts. natural selection pressure applied to memory. the files stay small, but what's inside gets sharper over time. ๐˜๐—ถ๐—ฒ๐—ฟ ๐Ÿฎ: ๐—ณ๐˜‚๐—น๐—น-๐˜๐—ฒ๐˜…๐˜ ๐˜€๐—ฒ๐˜€๐˜€๐—ถ๐—ผ๐—ป ๐˜€๐—ฒ๐—ฎ๐—ฟ๐—ฐ๐—ต (๐˜€๐—พ๐—น๐—ถ๐˜๐—ฒ + ๐—ณ๐˜๐˜€๐Ÿฑ) every conversation gets stored in SQLite with FTS5 indexing. the agent can search weeks of past sessions on demand. when the agent calls session_search: FTS5 ranks matches in ~10ms over 10,000+ docs, an LLM summarizes the top hits, and a concise result returns to context. tier 1 is always present but tiny. tier 2 has unlimited capacity but requires an active search. critical facts live in memory, everything else is searchable. ๐˜๐—ถ๐—ฒ๐—ฟ ๐Ÿฏ: ๐—ฒ๐˜…๐˜๐—ฒ๐—ฟ๐—ป๐—ฎ๐—น ๐—บ๐—ฒ๐—บ๐—ผ๐—ฟ๐˜† ๐—ฝ๐—ฟ๐—ผ๐˜ƒ๐—ถ๐—ฑ๐—ฒ๐—ฟ๐˜€ 8 pluggable providers that run alongside tiers 1 and 2, never replacing them. three worth knowing: Honcho (dialectic user modeling, 12 identity layers), Holographic (local-first, HRR vectors, no external calls), and Supermemory (context fencing that prevents the same fact from being re-stored infinitely). when active, hermes auto-syncs every turn: prefetch before, sync after, extract at session end. ๐—ต๐—ผ๐˜„ ๐˜๐—ต๐—ฒ๐˜† ๐—ฐ๐—ผ๐—บ๐—ฝ๐—ผ๐˜€๐—ฒ ๐—ถ๐—ป ๐—ฎ ๐˜€๐—ถ๐—ป๐—ด๐—น๐—ฒ ๐˜๐˜‚๐—ฟ๐—ป this is the part most people miss. the tiers compose on every turn through a five-step cycle: 1. turn opens. tier 1 is already in prompt, tier 3 prefetches and prepends. 2. agent responds using all three tiers as context. 3. periodic nudge fires (~every 300s). the agent reflects: "has anything worth persisting happened?" if yes, it writes. if no, it returns silently. 4. memory written to MEMORY.md on disk. invisible this session because the prefix cache stays warm. 5. session closes. tier 2 logs the transcript, tier 3 extracts semantics. next session opens with the new state. agent memory today is either always-on but shallow (stuff everything in the prompt) or deep but passive (vector store that never fires at the right time). hermes composes across both: tiny always-present files for critical facts, full-text search for deep recall, external providers for semantic modeling, all orchestrated by a nudge that decides autonomously what's worth saving. the agent doesn't just store memories. it curates them under pressure. i wrote a full deep dive (article below) covering hermes agent's memory system, self-evolving skills, GEPA optimization, and how to set up multiple specialized agents on your machine.
Show more
0
41
774
102
Forward to community
What actually is GBrain? (Y Combinator CEO's personal agent brain) Every agent memory tool you've seen solves a simple problem: store facts, retrieve facts. GBrain solves a different one. It gives your agent a knowledge system that wires itself, enriches itself, and compounds while you're not even using it. Here's what makes it fundamentally different from Mem0, Zep, LangMem, or a CLAUDE.md file. The standard approach to agent memory is vector-based. Your agent stores memories as embeddings, retrieves them by semantic similarity, and that's the loop. Some tools add a knowledge graph on top. GBrain flips the model entirely. The source of truth is a folder of markdown files. One page per person, one page per company, one page per concept. Every page follows the same two-part structure: ๐—–๐—ผ๐—บ๐—ฝ๐—ถ๐—น๐—ฒ๐—ฑ ๐˜๐—ฟ๐˜‚๐˜๐—ต on top: your current best understanding, rewritten as new evidence arrives ๐—ง๐—ถ๐—บ๐—ฒ๐—น๐—ถ๐—ป๐—ฒ on the bottom: an append-only evidence trail that never gets edited This is not a vector store with a markdown export. The markdown IS the system of record. You can open it in VS Code, edit it by hand, and ๐—ด๐—ฏ๐—ฟ๐—ฎ๐—ถ๐—ป ๐˜€๐˜†๐—ป๐—ฐ picks up the changes. Now the part that makes this compound. Every time a page is written, GBrain extracts entity references and creates typed relationship links: ๐˜„๐—ผ๐—ฟ๐—ธ๐˜€_๐—ฎ๐˜, ๐—ถ๐—ป๐˜ƒ๐—ฒ๐˜€๐˜๐—ฒ๐—ฑ_๐—ถ๐—ป, ๐—ณ๐—ผ๐˜‚๐—ป๐—ฑ๐—ฒ๐—ฑ, ๐—ฎ๐˜๐˜๐—ฒ๐—ป๐—ฑ๐—ฒ๐—ฑ, ๐—ฎ๐—ฑ๐˜ƒ๐—ถ๐˜€๐—ฒ๐˜€. All deterministic, all regex-based, zero LLM calls. The knowledge graph wires itself on every single write, without spending tokens. So when you ask "who works at Acme AI?" or "what has Bob invested in this quarter?", the agent walks the graph instead of relying on vector similarity (which struggles with relational queries like these). Search layers ~20 deterministic techniques in concert: intent classification, multi-query expansion, vector search, keyword search, reciprocal rank fusion, cosine re-scoring, compiled-truth boosting, and backlink ranking. Each catches what the others miss. But the real unlock is the compounding loop. GBrain has a ๐˜€๐—ถ๐—ด๐—ป๐—ฎ๐—น ๐—ฑ๐—ฒ๐˜๐—ฒ๐—ฐ๐˜๐—ผ๐—ฟ that fires on every message and captures entities in the background. Person mentioned once? They get a stub page. Three mentions across different sources? Web enrichment kicks in. After a meeting? Full pipeline. The agent runs a ๐—ฑ๐—ฟ๐—ฒ๐—ฎ๐—บ ๐—ฐ๐˜†๐—ฐ๐—น๐—ฒ overnight: scans conversations, enriches missing entities, fixes broken citations, consolidates memory. You wake up and the brain is smarter than when you went to bed. This is fundamentally different from memory systems that only store what you explicitly tell them to store. Garry Tan (President and CEO of Y Combinator) built this to run his actual AI agents. It ships with 34 skills, runs on embedded PGLite (no server, ready in 2 seconds), and works as an MCP server for Claude Code, Cursor, and Windsurf. GBrain:
Show more
As an AI Engineer. Please learn: - Harness engineering, not just prompt engineering - Prompt caching vs. semantic caching tradeoffs - KV cache management at scale - Speculative decoding vs quantization - Structured output failures & fallback chains - Evals (LLM-as-judge + human evals) - Cost attribution per feature, not just per model - Agent guardrails & loop budgets - LLM observability as a first-class discipline - Model routing & graceful fallback logic - Knowing when to fine-tune vs. in-context learning
Show more
0
77
3.2K
327
Forward to community
Claude Code's architecture, mapped. Calude Code is one of the most powerful agent harnessed out there, it's a lot more than "a CLI that calls claude." the actual system has six layers, and the model is just one node inside the loop. the diagram breaks down every component: ๐—œ๐—ป๐—ฝ๐˜‚๐˜ ๐—Ÿ๐—ฎ๐˜†๐—ฒ๐—ฟ handles session management, permission gating, and YAML-based trust tiers before anything reaches the model. ๐—ž๐—ป๐—ผ๐˜„๐—น๐—ฒ๐—ฑ๐—ด๐—ฒ ๐—Ÿ๐—ฎ๐˜†๐—ฒ๐—ฟ holds the skill registry, context compressor (3-layer, 92% threshold), task graph, and cross-session memory store. this is where harness intelligence lives outside the weights. ๐—˜๐˜…๐—ฒ๐—ฐ๐˜‚๐˜๐—ถ๐—ผ๐—ป ๐—Ÿ๐—ฎ๐˜†๐—ฒ๐—ฟ runs tool dispatch through a typed registry with one handler per tool. bash, read, write, grep, glob, revert. streaming runtime handles parallel execution. prompt cache reuses stable prefixes at 10% cost. ๐—œ๐—ป๐˜๐—ฒ๐—ด๐—ฟ๐—ฎ๐˜๐—ถ๐—ผ๐—ป ๐—Ÿ๐—ฎ๐˜†๐—ฒ๐—ฟ connects the MCP runtime to external servers. filesystem, git, custom. tools register inward, memory writes outward to agent_memory. md. ๐— ๐˜‚๐—น๐˜๐—ถ-๐—”๐—ด๐—ฒ๐—ป๐˜ ๐—Ÿ๐—ฎ๐˜†๐—ฒ๐—ฟ is the most underappreciated piece. subagent spawner, teammate mailboxes over redis pub/sub, FSM protocol (IDLEโ†’REQUESTโ†’WAITโ†’RESPOND), autonomous board with atomic locks, and worktree isolation with per-task branches and conflict detection on merge. ๐—ข๐—ฏ๐˜€๐—ฒ๐—ฟ๐˜ƒ๐—ฎ๐—ฏ๐—ถ๐—น๐—ถ๐˜๐˜† ๐—Ÿ๐—ฎ๐˜†๐—ฒ๐—ฟ wraps everything. event bus with lifecycle hooks, background executor running daemon threads non-blocking. the master agent loop sits at the center. perception โ†’ action โ†’ observation. it's deliberately simple. a "dumb loop" where the model reasons and the harness mediates. this is the architecture behind what feels like magic when you use claude code. it's not magic. it's harness engineering. the article below is a deep-dive covering how Anthropic, OpenAI, LangChain, and others build this pattern from the ground up.
Show more
0
29
705
151
Forward to community
Claude Code's architecture, mapped. Calude Code is one of the most powerful agent harnessed out there, it's a lot more than "a CLI that calls claude." the actual system has six layers, and the model is just one node inside the loop. the diagram breaks down every component: ๐—œ๐—ป๐—ฝ๐˜‚๐˜ ๐—Ÿ๐—ฎ๐˜†๐—ฒ๐—ฟ handles session management, permission gating, and YAML-based trust tiers before anything reaches the model. ๐—ž๐—ป๐—ผ๐˜„๐—น๐—ฒ๐—ฑ๐—ด๐—ฒ ๐—Ÿ๐—ฎ๐˜†๐—ฒ๐—ฟ holds the skill registry, context compressor (3-layer, 92% threshold), task graph, and cross-session memory store. this is where harness intelligence lives outside the weights. ๐—˜๐˜…๐—ฒ๐—ฐ๐˜‚๐˜๐—ถ๐—ผ๐—ป ๐—Ÿ๐—ฎ๐˜†๐—ฒ๐—ฟ runs tool dispatch through a typed registry with one handler per tool. bash, read, write, grep, glob, revert. streaming runtime handles parallel execution. prompt cache reuses stable prefixes at 10% cost. ๐—œ๐—ป๐˜๐—ฒ๐—ด๐—ฟ๐—ฎ๐˜๐—ถ๐—ผ๐—ป ๐—Ÿ๐—ฎ๐˜†๐—ฒ๐—ฟ connects the MCP runtime to external servers. filesystem, git, custom. tools register inward, memory writes outward to agent_memory. md. ๐— ๐˜‚๐—น๐˜๐—ถ-๐—”๐—ด๐—ฒ๐—ป๐˜ ๐—Ÿ๐—ฎ๐˜†๐—ฒ๐—ฟ is the most underappreciated piece. subagent spawner, teammate mailboxes over redis pub/sub, FSM protocol (IDLEโ†’REQUESTโ†’WAITโ†’RESPOND), autonomous board with atomic locks, and worktree isolation with per-task branches and conflict detection on merge. ๐—ข๐—ฏ๐˜€๐—ฒ๐—ฟ๐˜ƒ๐—ฎ๐—ฏ๐—ถ๐—น๐—ถ๐˜๐˜† ๐—Ÿ๐—ฎ๐˜†๐—ฒ๐—ฟ wraps everything. event bus with lifecycle hooks, background executor running daemon threads non-blocking. the master agent loop sits at the center. perception โ†’ action โ†’ observation. it's deliberately simple. a "dumb loop" where the model reasons and the harness mediates. this is the architecture behind what feels like magic when you use claude code. it's not magic. it's harness engineering. the article below is a deep-dive covering how Anthropic, OpenAI, LangChain, and others build this pattern from the ground up.
Show more
0
29
705
151
Forward to community
The MCP vs CLI debate. For most of 2025, AI Engineers argued about it. The skeptics had real numbers: - Playwright MCP eats 13.7K tokens - Chrome DevTools MCP eats 18K - A 5-server setup burns 55K tokens before any work The defenders pushed back: - CLIs break on multi-tenant apps - No typed contracts, so the agent guesses at outputs - On unfamiliar APIs, agents waste turns parsing text Both sides were arguing about the wrong thing. In November 2025, Anthropic published "Code execution with MCP" and reframed it from first principles. The problem was never the protocol. It was the habit of dumping every tool's full description into the model's context the moment a session starts. Add the data those tools return, passed through the model on every step, and a single workflow can balloon to 150K tokens. Most of which the model never needed. The fix is to flip the model's job. Instead of the model calling tools through its context, the model writes code that calls tools through a runtime. The runtime is where tools live. The model only sees what it imports. In Anthropic's example, a Google Drive transcript flows into a Salesforce CRM update. The old way loaded both tool schemas and piped the entire transcript through the model twice. The new way is ten lines of TypeScript that import what they need. Same task, 2K tokens. A 98.7% drop. Cloudflare pushed the idea to its limit. They collapsed their entire 2,500-endpoint API from 1.17M tokens of schemas down to 1K tokens, by exposing just two functions: search and execute. The agent writes code that searches the catalog, then executes only what matches. The new pattern has a name: Code Mode. It is a runtime where the agent writes code that mixes two primitives. Bash, for anything with a binary already installed like git or curl. Typed module imports, for proprietary APIs where the type signatures load only when the agent actually imports the tool. That second part is the unlock. Types travel with imports, so the agent gets a strict contract for the tools it picks, and pays nothing for the ones it skips. MCP's typed contracts plus CLI's lazy loading, in one runtime. The agent picks per task. "MCP is dead" was the wrong takeaway. Anthropic just reported 300M MCP SDK downloads, up from 100M at the start of the year. The protocol is not dying. It is the fastest growing piece of agent infrastructure right now. What died was loading every tool upfront. That was always a bad idea. If you are building agents in 2026, the rule is simple. Tool definitions belong in code, not in context. The model writes a few lines that call them. The runtime does the rest. That is what the debate was actually about.
Show more
Naive RAG vs. Blockify! There's a new RAG approach that: - cuts corpus size by 40x. - reduces tokens per query by 3x. - improves vector search relevance by 2.3x. Blockify GitHub:
Show more
this is the most underrated update in the agent space right now. your AI workflow runs for 47 minutes, burns 312 LLM calls, then crashes at step 8. most frameworks make you restart from zero. @crewAIInc just shipped checkpointing. think google docs autosave, but for your agent's work-in-progress. every flow method becomes a recovery point. resume in one line. fork from any saved state into a new branch. edit past outputs and watch changes ripple downstream. visual TUI to inspect everything. your pipelines stop being fragile one-shot jobs. they become resumable, inspectable, branchable processes. zero extra infra. 100% open-source. get started with CrewAI here: (don't forget to star it โญ๏ธ)
Show more