Akshay 🚀(@akshay_pachaar )

Register and share your invite link to earn from video plays and referrals.

Register now

Akshay 🚀

@akshay_pachaar

Simplifying LLMs, AI Agents, RAG, and Machine Learning for you! • Co-founder @dailydoseofds_• BITS Pilani • 3 Patents • ex-AI Engineer @ LightningAI

485 Following 271.6K Followers

Akshay 🚀@akshay_pachaar

7hours ago

write /goals like acceptance criteria. /goal is now everywhere. Claude Code, Codex, Hermes, and more agents are adopting the same pattern: you set a completion condition, the agent works autonomously until a fast evaluator model confirms the condition is met. the feature is simple. writing good goals is not. vague goals fail in two ways: the agent loops forever trying to satisfy an unclear condition, or the evaluator hallucinates success because there's nothing concrete to check against. both burn tokens for nothing. here's what separates goals that work from goals that break: 𝗴𝗼𝗼𝗱 𝗴𝗼𝗮𝗹𝘀 𝗱𝗲𝘀𝗰𝗿𝗶𝗯𝗲 𝗮𝗻 𝗼𝗯𝘀𝗲𝗿𝘃𝗮𝗯𝗹𝗲 𝗲𝗻𝗱 𝘀𝘁𝗮𝘁𝗲. "all tests in test/auth pass and lint is clean" works because the agent can run the tests, print the output, and the evaluator can confirm it from the transcript. "every call site of the old API migrated and build succeeds" works because there's a verifiable artifact: the build output. "CHANGELOG.md has an entry for each PR merged this week" works because it points to a concrete file with concrete content. 𝗯𝗮𝗱 𝗴𝗼𝗮𝗹𝘀 𝗵𝗮𝘃𝗲 𝗻𝗼 𝗳𝗶𝗻𝗶𝘀𝗵 𝗹𝗶𝗻𝗲. "make the codebase better" fails because better by what metric? "refactor everything" fails because there's no exit condition. "fix the bugs" fails because which bugs, verified how? the mental model that helps: if a human couldn't tell when the ticket is done, neither can the evaluator. treat every /goal like a ticket you're assigning to a very literal junior developer who never gets tired. write the exact acceptance criteria you'd put in that ticket. one more thing: complex multi-step objectives overwhelm it. "redesign auth, add OAuth, write tests, update docs" is four goals pretending to be one. break them into sequential /goal calls where each has a single verifiable finish line. i wrote a detailed breakdown of /goal (article below) covering the full mechanics.

Show more

Akshay 🚀@akshay_pachaar

2026.05.14 13:23

0

0

2

12

1

Forward to community

Akshay 🚀@akshay_pachaar

10hours ago

RT @twid: Best setup guide for Hermes Agent I've seen. Gets you fully configured without getting too far into the weed.

0

0

0

0

1

Forward to community

Akshay 🚀@akshay_pachaar

2026.05.14 09:47

the three-tier memory of Hermes agent. AI agents forgets everything when your session ends. Hermes doesn't. it has three memory layers, each at a different speed. 𝘁𝗶𝗲𝗿 𝟭: 𝘁𝘄𝗼 𝘁𝗶𝗻𝘆 𝗺𝗮𝗿𝗸𝗱𝗼𝘄𝗻 𝗳𝗶𝗹𝗲𝘀 MEMORY.md (2,200 chars) and USER.md (1,375 chars). injected into the system prompt at session start as a frozen snapshot. MEMORY.md holds project conventions, tool quirks, lessons learned. USER.md holds your profile: name, communication style, skill level. these files are tiny on purpose. when MEMORY.md hits ~80% capacity, the agent consolidates: merges related entries, drops redundancy, keeps only the densest facts. natural selection pressure applied to memory. the files stay small, but what's inside gets sharper over time. 𝘁𝗶𝗲𝗿 𝟮: 𝗳𝘂𝗹𝗹-𝘁𝗲𝘅𝘁 𝘀𝗲𝘀𝘀𝗶𝗼𝗻 𝘀𝗲𝗮𝗿𝗰𝗵 (𝘀𝗾𝗹𝗶𝘁𝗲 + 𝗳𝘁𝘀𝟱) every conversation gets stored in SQLite with FTS5 indexing. the agent can search weeks of past sessions on demand. when the agent calls session_search: FTS5 ranks matches in ~10ms over 10,000+ docs, an LLM summarizes the top hits, and a concise result returns to context. tier 1 is always present but tiny. tier 2 has unlimited capacity but requires an active search. critical facts live in memory, everything else is searchable. 𝘁𝗶𝗲𝗿 𝟯: 𝗲𝘅𝘁𝗲𝗿𝗻𝗮𝗹 𝗺𝗲𝗺𝗼𝗿𝘆 𝗽𝗿𝗼𝘃𝗶𝗱𝗲𝗿𝘀 8 pluggable providers that run alongside tiers 1 and 2, never replacing them. three worth knowing: Honcho (dialectic user modeling, 12 identity layers), Holographic (local-first, HRR vectors, no external calls), and Supermemory (context fencing that prevents the same fact from being re-stored infinitely). when active, hermes auto-syncs every turn: prefetch before, sync after, extract at session end. 𝗵𝗼𝘄 𝘁𝗵𝗲𝘆 𝗰𝗼𝗺𝗽𝗼𝘀𝗲 𝗶𝗻 𝗮 𝘀𝗶𝗻𝗴𝗹𝗲 𝘁𝘂𝗿𝗻 this is the part most people miss. the tiers compose on every turn through a five-step cycle: 1. turn opens. tier 1 is already in prompt, tier 3 prefetches and prepends. 2. agent responds using all three tiers as context. 3. periodic nudge fires (~every 300s). the agent reflects: "has anything worth persisting happened?" if yes, it writes. if no, it returns silently. 4. memory written to MEMORY.md on disk. invisible this session because the prefix cache stays warm. 5. session closes. tier 2 logs the transcript, tier 3 extracts semantics. next session opens with the new state. agent memory today is either always-on but shallow (stuff everything in the prompt) or deep but passive (vector store that never fires at the right time). hermes composes across both: tiny always-present files for critical facts, full-text search for deep recall, external providers for semantic modeling, all orchestrated by a nudge that decides autonomously what's worth saving. the agent doesn't just store memories. it curates them under pressure. i wrote a full deep dive (article below) covering hermes agent's memory system, self-evolving skills, GEPA optimization, and how to set up multiple specialized agents on your machine.

Show more

Akshay 🚀@akshay_pachaar

2026.05.13 14:08

0

0

41

774

102

Forward to community

Akshay 🚀@akshay_pachaar

2026.05.13 14:08

0

0

63

4.1K

462

Forward to community

Akshay 🚀@akshay_pachaar

2026.05.12 19:53

What actually is GBrain? (Y Combinator CEO's personal agent brain) Every agent memory tool you've seen solves a simple problem: store facts, retrieve facts. GBrain solves a different one. It gives your agent a knowledge system that wires itself, enriches itself, and compounds while you're not even using it. Here's what makes it fundamentally different from Mem0, Zep, LangMem, or a CLAUDE.md file. The standard approach to agent memory is vector-based. Your agent stores memories as embeddings, retrieves them by semantic similarity, and that's the loop. Some tools add a knowledge graph on top. GBrain flips the model entirely. The source of truth is a folder of markdown files. One page per person, one page per company, one page per concept. Every page follows the same two-part structure: 𝗖𝗼𝗺𝗽𝗶𝗹𝗲𝗱 𝘁𝗿𝘂𝘁𝗵 on top: your current best understanding, rewritten as new evidence arrives 𝗧𝗶𝗺𝗲𝗹𝗶𝗻𝗲 on the bottom: an append-only evidence trail that never gets edited This is not a vector store with a markdown export. The markdown IS the system of record. You can open it in VS Code, edit it by hand, and 𝗴𝗯𝗿𝗮𝗶𝗻 𝘀𝘆𝗻𝗰 picks up the changes. Now the part that makes this compound. Every time a page is written, GBrain extracts entity references and creates typed relationship links: 𝘄𝗼𝗿𝗸𝘀_𝗮𝘁, 𝗶𝗻𝘃𝗲𝘀𝘁𝗲𝗱_𝗶𝗻, 𝗳𝗼𝘂𝗻𝗱𝗲𝗱, 𝗮𝘁𝘁𝗲𝗻𝗱𝗲𝗱, 𝗮𝗱𝘃𝗶𝘀𝗲𝘀. All deterministic, all regex-based, zero LLM calls. The knowledge graph wires itself on every single write, without spending tokens. So when you ask "who works at Acme AI?" or "what has Bob invested in this quarter?", the agent walks the graph instead of relying on vector similarity (which struggles with relational queries like these). Search layers ~20 deterministic techniques in concert: intent classification, multi-query expansion, vector search, keyword search, reciprocal rank fusion, cosine re-scoring, compiled-truth boosting, and backlink ranking. Each catches what the others miss. But the real unlock is the compounding loop. GBrain has a 𝘀𝗶𝗴𝗻𝗮𝗹 𝗱𝗲𝘁𝗲𝗰𝘁𝗼𝗿 that fires on every message and captures entities in the background. Person mentioned once? They get a stub page. Three mentions across different sources? Web enrichment kicks in. After a meeting? Full pipeline. The agent runs a 𝗱𝗿𝗲𝗮𝗺 𝗰𝘆𝗰𝗹𝗲 overnight: scans conversations, enriches missing entities, fixes broken citations, consolidates memory. You wake up and the brain is smarter than when you went to bed. This is fundamentally different from memory systems that only store what you explicitly tell them to store. Garry Tan (President and CEO of Y Combinator) built this to run his actual AI agents. It ships with 34 skills, runs on embedded PGLite (no server, ready in 2 seconds), and works as an MCP server for Claude Code, Cursor, and Windsurf. GBrain:

Show more

Garry Tan@garrytan

2026.04.11 11:20

0

0

28

470

58

Forward to community

Akshay 🚀@akshay_pachaar

2026.05.11 12:32

As an AI Engineer. Please learn: - Harness engineering, not just prompt engineering - Prompt caching vs. semantic caching tradeoffs - KV cache management at scale - Speculative decoding vs quantization - Structured output failures & fallback chains - Evals (LLM-as-judge + human evals) - Cost attribution per feature, not just per model - Agent guardrails & loop budgets - LLM observability as a first-class discipline - Model routing & graceful fallback logic - Knowing when to fine-tune vs. in-context learning

Show more

0

0

77

3.2K

327

Forward to community

Akshay 🚀 Reposted

Akshay 🚀@akshay_pachaar

2026.05.10 14:22

Claude Code's architecture, mapped. Calude Code is one of the most powerful agent harnessed out there, it's a lot more than "a CLI that calls claude." the actual system has six layers, and the model is just one node inside the loop. the diagram breaks down every component: 𝗜𝗻𝗽𝘂𝘁 𝗟𝗮𝘆𝗲𝗿 handles session management, permission gating, and YAML-based trust tiers before anything reaches the model. 𝗞𝗻𝗼𝘄𝗹𝗲𝗱𝗴𝗲 𝗟𝗮𝘆𝗲𝗿 holds the skill registry, context compressor (3-layer, 92% threshold), task graph, and cross-session memory store. this is where harness intelligence lives outside the weights. 𝗘𝘅𝗲𝗰𝘂𝘁𝗶𝗼𝗻 𝗟𝗮𝘆𝗲𝗿 runs tool dispatch through a typed registry with one handler per tool. bash, read, write, grep, glob, revert. streaming runtime handles parallel execution. prompt cache reuses stable prefixes at 10% cost. 𝗜𝗻𝘁𝗲𝗴𝗿𝗮𝘁𝗶𝗼𝗻 𝗟𝗮𝘆𝗲𝗿 connects the MCP runtime to external servers. filesystem, git, custom. tools register inward, memory writes outward to agent_memory. md. 𝗠𝘂𝗹𝘁𝗶-𝗔𝗴𝗲𝗻𝘁 𝗟𝗮𝘆𝗲𝗿 is the most underappreciated piece. subagent spawner, teammate mailboxes over redis pub/sub, FSM protocol (IDLE→REQUEST→WAIT→RESPOND), autonomous board with atomic locks, and worktree isolation with per-task branches and conflict detection on merge. 𝗢𝗯𝘀𝗲𝗿𝘃𝗮𝗯𝗶𝗹𝗶𝘁𝘆 𝗟𝗮𝘆𝗲𝗿 wraps everything. event bus with lifecycle hooks, background executor running daemon threads non-blocking. the master agent loop sits at the center. perception → action → observation. it's deliberately simple. a "dumb loop" where the model reasons and the harness mediates. this is the architecture behind what feels like magic when you use claude code. it's not magic. it's harness engineering. the article below is a deep-dive covering how Anthropic, OpenAI, LangChain, and others build this pattern from the ground up.

Show more

Akshay 🚀@akshay_pachaar

2026.04.06 13:31

0

0

29

705

151

Forward to community

Akshay 🚀@akshay_pachaar

2026.05.10 14:22

Claude Code's architecture, mapped. Calude Code is one of the most powerful agent harnessed out there, it's a lot more than "a CLI that calls claude." the actual system has six layers, and the model is just one node inside the loop. the diagram breaks down every component: 𝗜𝗻𝗽𝘂𝘁 𝗟𝗮𝘆𝗲𝗿 handles session management, permission gating, and YAML-based trust tiers before anything reaches the model. 𝗞𝗻𝗼𝘄𝗹𝗲𝗱𝗴𝗲 𝗟𝗮𝘆𝗲𝗿 holds the skill registry, context compressor (3-layer, 92% threshold), task graph, and cross-session memory store. this is where harness intelligence lives outside the weights. 𝗘𝘅𝗲𝗰𝘂𝘁𝗶𝗼𝗻 𝗟𝗮𝘆𝗲𝗿 runs tool dispatch through a typed registry with one handler per tool. bash, read, write, grep, glob, revert. streaming runtime handles parallel execution. prompt cache reuses stable prefixes at 10% cost. 𝗜𝗻𝘁𝗲𝗴𝗿𝗮𝘁𝗶𝗼𝗻 𝗟𝗮𝘆𝗲𝗿 connects the MCP runtime to external servers. filesystem, git, custom. tools register inward, memory writes outward to agent_memory. md. 𝗠𝘂𝗹𝘁𝗶-𝗔𝗴𝗲𝗻𝘁 𝗟𝗮𝘆𝗲𝗿 is the most underappreciated piece. subagent spawner, teammate mailboxes over redis pub/sub, FSM protocol (IDLE→REQUEST→WAIT→RESPOND), autonomous board with atomic locks, and worktree isolation with per-task branches and conflict detection on merge. 𝗢𝗯𝘀𝗲𝗿𝘃𝗮𝗯𝗶𝗹𝗶𝘁𝘆 𝗟𝗮𝘆𝗲𝗿 wraps everything. event bus with lifecycle hooks, background executor running daemon threads non-blocking. the master agent loop sits at the center. perception → action → observation. it's deliberately simple. a "dumb loop" where the model reasons and the harness mediates. this is the architecture behind what feels like magic when you use claude code. it's not magic. it's harness engineering. the article below is a deep-dive covering how Anthropic, OpenAI, LangChain, and others build this pattern from the ground up.

Show more

Akshay 🚀@akshay_pachaar

2026.04.06 13:31

0

0

29

705

151

Forward to community

Akshay 🚀@akshay_pachaar

2026.05.10 07:44

The MCP vs CLI debate. For most of 2025, AI Engineers argued about it. The skeptics had real numbers: - Playwright MCP eats 13.7K tokens - Chrome DevTools MCP eats 18K - A 5-server setup burns 55K tokens before any work The defenders pushed back: - CLIs break on multi-tenant apps - No typed contracts, so the agent guesses at outputs - On unfamiliar APIs, agents waste turns parsing text Both sides were arguing about the wrong thing. In November 2025, Anthropic published "Code execution with MCP" and reframed it from first principles. The problem was never the protocol. It was the habit of dumping every tool's full description into the model's context the moment a session starts. Add the data those tools return, passed through the model on every step, and a single workflow can balloon to 150K tokens. Most of which the model never needed. The fix is to flip the model's job. Instead of the model calling tools through its context, the model writes code that calls tools through a runtime. The runtime is where tools live. The model only sees what it imports. In Anthropic's example, a Google Drive transcript flows into a Salesforce CRM update. The old way loaded both tool schemas and piped the entire transcript through the model twice. The new way is ten lines of TypeScript that import what they need. Same task, 2K tokens. A 98.7% drop. Cloudflare pushed the idea to its limit. They collapsed their entire 2,500-endpoint API from 1.17M tokens of schemas down to 1K tokens, by exposing just two functions: search and execute. The agent writes code that searches the catalog, then executes only what matches. The new pattern has a name: Code Mode. It is a runtime where the agent writes code that mixes two primitives. Bash, for anything with a binary already installed like git or curl. Typed module imports, for proprietary APIs where the type signatures load only when the agent actually imports the tool. That second part is the unlock. Types travel with imports, so the agent gets a strict contract for the tools it picks, and pays nothing for the ones it skips. MCP's typed contracts plus CLI's lazy loading, in one runtime. The agent picks per task. "MCP is dead" was the wrong takeaway. Anthropic just reported 300M MCP SDK downloads, up from 100M at the start of the year. The protocol is not dying. It is the fastest growing piece of agent infrastructure right now. What died was loading every tool upfront. That was always a bad idea. If you are building agents in 2026, the rule is simple. Tool definitions belong in code, not in context. The model writes a few lines that call them. The runtime does the rest. That is what the debate was actually about.

Show more

Akshay 🚀@akshay_pachaar

2026.05.09 17:35

0

0

48

476

67

Forward to community

Akshay 🚀@akshay_pachaar

2026.05.09 17:35

0

0

8

433

50

Forward to community

Akshay 🚀@akshay_pachaar

2026.05.09 07:52

Naive RAG vs. Blockify! There's a new RAG approach that: - cuts corpus size by 40x. - reduces tokens per query by 3x. - improves vector search relevance by 2.3x. Blockify GitHub:

Show more

Akshay 🚀@akshay_pachaar

2026.05.08 13:33

0

0

21

291

54

Forward to community

Akshay 🚀@akshay_pachaar

2026.05.08 10:05

this is the most underrated update in the agent space right now. your AI workflow runs for 47 minutes, burns 312 LLM calls, then crashes at step 8. most frameworks make you restart from zero. @crewAIInc just shipped checkpointing. think google docs autosave, but for your agent's work-in-progress. every flow method becomes a recovery point. resume in one line. fork from any saved state into a new branch. edit past outputs and watch changes ripple downstream. visual TUI to inspect everything. your pipelines stop being fragile one-shot jobs. they become resumable, inspectable, branchable processes. zero extra infra. 100% open-source. get started with CrewAI here: (don't forget to star it ⭐️)

Show more

0

0

11

95

17

Forward to community

Most Popular Users

14.4M Followers

37.3M Followers

3.8M Followers

15.1M Followers

Roshn Saudi League

136.4K Followers

869.1K Followers

Natsume✨枣糕

1.2M Followers

45.3M Followers

桃乃木かな

2.1M Followers

390.2K Followers

3.2M Followers

7.4M Followers

Saturday Night Live

3.4M Followers

2.6M Followers

Sera Choi | 최수연

114.5K Followers