AI News | Field Notes by Michael Nemtsev

AI Agents

AI agents news and analysis: autonomous systems, tool use, retrieval, context engineering, and what holds up in production.

AI Agents · 3 Jun 2026 ·github.blog

GitHub Copilot App: agent-native desktop with autonomous PR merge launches in preview

A backend developer running parallel feature branches can now spin up separate Copilot sessions for each without worrying about file collisions. Agent Merge handles the tedious part: watching CI, chasing reviewers, and hitting merge when conditions clear. The 'technical preview' label means rough edges are still landing.

AI Agents · 3 Jun 2026 ·github.blog

GitHub Copilot SDK goes GA: embed Copilot's agentic runtime in any app, six languages

A platform engineer can now embed Copilot's agentic engine into an internal developer portal without building a separate LLM orchestration layer. The BYOK option matters for teams on Anthropic or OpenAI subscriptions: route through your existing keys and skip GitHub's credit rates entirely.

AI Agents · 2 Jun 2026 ·chatforest.com

Windows Agent Framework v1.0: MIT-licensed agent runtime open-sourced at Build 2026

The MIT license removes the negotiation and usage-cap problems that dog proprietary agent runtimes. If your team builds internal automation on Windows infrastructure, a YAML-defined agent that spans laptops, cloud desktops, and edge nodes without rewriting the abstraction is now available without a vendor contract.

AI Agents · 2 Jun 2026 ·radicaldatascience.wordpress.com

AI2 MolmoAct 2: open-source robotics model runs 37x faster, tops 7 of 8 benchmarks

For a robotics team using open-source models for manipulation planning, a 37x speed gain is the operational unlock that determines whether a model can control a robot arm in real time or only in batch post-processing. The open weights also make it integrable into custom hardware pipelines without API dependency or per-inference billing.

AI Agents · 31 May 2026 ·blog.jetbrains.com

TeamCity 2026.1: CI/CD now speaks MCP so AI agents can query build failures directly

Engineers using TeamCity On-Premises: update immediately for the CVE patch. After that, explore the MCP integration for agent-based build diagnostics. For teams already running AI coding agents, this gives them a structured path into your CI system without building a custom adapter.

AI Agents · 30 May 2026 ·techcrunch.com

Anthropic's Dynamic Workflows lets Claude write a script to run 1,000 subagents

A staff engineer who used to budget a week to untangle a legacy migration is the target user here. The model writes and runs the orchestration itself, so the bottleneck moves from your hours to your token bill and your test suite's honesty. Thin tests mean the swarm merges confident nonsense.

AI Agents · 30 May 2026 ·cursor.com

Cursor shifts Bugbot to usage-based billing and adds configurable PR review depth

Usage billing lowers the barrier for teams that skipped Bugbot on cost grounds. A team lead can now trial it on a few PRs without committing to a flat per-seat fee. The configurable depth also matters: critical security reviews get deep analysis, routine formatting changes get the fast pass.

AI Agents LLM Evals · 27 May 2026 ·cycode.com

GitHub Copilot CVE-2025-53773: hidden prompt injection in PR descriptions enables RCE

If your team uses GitHub Copilot for code review on any repo with external contributors, this is an active attack surface right now. Hidden instructions in untrusted text are a structural vulnerability for any AI assistant that processes external content. Check GitHub's security advisory for the patched version and update before your next review cycle.

AI Agents LLM Evals · 26 May 2026 ·bleepingcomputer.com

Anthropic's Mythos cyber model briefly appeared in Claude Code before removal

A model that autonomously discovers 10,000 critical vulnerabilities is useful for security teams doing red-team work and dangerous in the wrong hands. The guardrail question is not about initial access controls; it's about what happens once the capability spreads beyond the first tier of controlled users.

AI Agents · 26 May 2026 ·infoq.com

Claude Code adds worktrees, scheduled routines, and cross-device remote sessions

The worktrees feature changes the collaboration model most concretely: Claude can now run a parallel branch experiment or generate a draft PR without touching your working tree. If your team is evaluating autonomous agent work in CI, the checkpoint and credential scoping features are the safety controls worth reading first.

AI Agents · 26 May 2026 ·techtimes.com

Anthropic's April Claude Code packaging error exposed internal TypeScript source

For engineers connecting Claude Code to production CI pipelines or external APIs, knowing that the tool's internal architecture is partially documented in the wild changes the sandboxing calculus. Explicit credential scoping and tight MCP permissions matter more now than before the leak.

AI Agents · 24 May 2026 ·ai.google.dev

Gemini Interactions API: migrate off the old schema by May 26 or lose new features

If your code reads from outputs, you have two days before new features stop reaching it and about three weeks before your integration stops working entirely. Google's migration guide at ai.google.dev spells out the code changes, which are mostly a find-and-replace on response parsing.

AI Agents AI Industry · 23 May 2026 ·theregister.com

Anthropic splits billing: automated agent workflows get a separate credit pool June 15

If you have CI pipelines, scheduled agents, or any automated script calling Claude, audit your usage before June 15. Those calls move to a monthly credit pool sized to your subscription tier. The interactive Claude experience stays unchanged. The risk is automated workflows that routinely exceed the new credit budget and start billing at API rates.

AI Agents · 22 May 2026 ·ppc.land

WebMCP: Chrome 149 origin trial lets websites expose JavaScript tools to AI agents

If you build web products that need reliable AI agent interaction, the WebMCP origin trial is worth testing in Chrome 149. Declaring structured tools replaces the entire surface area that pixel-parsing agents currently get wrong, and the implementation is a few hours of JavaScript.

AI Agents · 22 May 2026 ·techcrunch.com

Gemini Spark: Google's 24/7 personal AI agent launches for AI Ultra subscribers

Spark is the first Google product committing to always-on background behavior: it works with your laptop closed, no session required. If you live in Google Workspace, it is worth testing once AI Ultra opens next week. The standing-permissions model requires explicit opt-in, so it will not act on anything you have not configured.

AI Agents AI Models · 21 May 2026 ·9to5google.com

Google I/O 2026: Gemini 3.5 Flash undercuts rivals on speed, Spark runs 24/7 without you

If you're evaluating frontier-class APIs for a new project, Gemini 3.5 Flash's pricing makes it worth a benchmark run before committing to anything pricier. Gemini Spark is in a different category than a chat assistant: it runs your tasks on cloud VMs while you're offline.

AI Agents · 20 May 2026 ·adversa.ai

MCP security: CVSS 9.8 flaw and a design bug affecting 200,000+ servers

If you have MCP servers running in your infrastructure, treat default STDIO configurations as compromised and audit internet-facing integrations before patching. CVE-2026-33032 is exploitable without authentication, so exposed nginx-ui instances need to be taken offline or patched immediately.

AI Agents · 20 May 2026 ·developers.googleblog.com

Google Genkit Middleware: composable retry and fallback hooks for AI agents

If you build production agents on Genkit, retries and provider fallbacks are now a middleware import rather than custom code wrapped around every model call. The tool approval gate is immediately useful for any agent that touches production systems where you want a human checkpoint before an irreversible action runs.

AI Agents LLM Evals · 19 May 2026 ·dig.watch

Microsoft MDASH: agentic AI system finds 16 Windows vulnerabilities, zero false positives

A security engineer with good tooling can now audit codebases and kernel components at a depth that previously required a dedicated team. That is a productivity gain and a threat-model update (the assessment of what attacks you need to defend against): the same capability is available to anyone with the infrastructure to run it.

AI Agents · 19 May 2026 ·vercel.com

Vercel AI SDK 6: Agent interface, DurableAgent, full MCP support, and DevTools

If you build AI applications in JavaScript or TypeScript, SDK 6 is the update where the framework caught up to what production agents actually need: durable workflows, MCP connectivity, and a DevTools panel you can inspect in a browser. The 20M monthly download number means this is already the default starting point for most JS-based AI builds.

AI Agents AI Industry · 18 May 2026 ·dev.to

Google I/O preview: Vertex AI retired, Gemini Enterprise Agent Platform takes over

If you build on Google Cloud, this is your platform now. Agent Studio handles prototyping, ADK handles production code, and Agent Engine manages the runtime. Migrating from Vertex AI APIs is documented but not instantaneous, so engineering teams should start the audit before I/O drops new surface area tomorrow.

AI Agents LLM Evals · 18 May 2026 ·microsoft.com

Microsoft's AI security agents found 16 Windows flaws, 4 critical RCEs, before patch

AI agent frameworks are now both an attack vector and a detection tool for the same class of flaw. If your security team is not running agent-based scanning alongside traditional static and dynamic analysis, they are behind. If your agents consume untrusted content without output validation, that is the hole.

AI Agents · 16 May 2026 ·codenewsletter.ai

Grok Build: xAI ships its first CLI coding agent with parallel subagents

If you're a SuperGrok Heavy subscriber, the plan mode is the right thing to test first: you see every step the agent intends before it touches anything. CLI agents handle multi-file refactors better than most IDE plugins because they operate across the whole repository, not just open files.

AI Agents · 15 May 2026 ·startuphub.ai

Cursor SDK: build custom coding agents on Cursor's runtime, now in public beta

If your team wants to ship coding agents without building the underlying runtime from scratch, the Cursor SDK is worth evaluating. You get sandboxed cloud execution and the same model access as the IDE itself. The tradeoff is committing to a startup's infrastructure stack at a company now seeking a $50 billion valuation.

AI Agents AI Industry · 15 May 2026 ·futurumgroup.com

Microsoft Agent 365: governance layer for enterprise AI agents now generally available

If your company runs AI agents in production, agent governance is now an immediate concern. A production database was deleted in under 10 seconds by an agent with elevated permissions. Regulators and auditors will start asking who controls your agents; Agent 365 is one answer, whether you use it or not.

AI Agents · 14 May 2026 ·radicaldatascience.wordpress.com

Anthropic ships Claude into Excel, Word, and Outlook with cross-app context

If your team lives in Microsoft 365, Claude is now available without switching apps. The cross-app context matters: an Outlook email thread can inform a Word draft without reloading the session. For developers building on M365, Claude's awareness now spans the suite by default.

AI Agents AI Models · 14 May 2026 ·radicaldatascience.wordpress.com

Thinking Machines Lab previews real-time AI that reacts mid-sentence across audio and video

This is still a research preview, not a shipping product. But it shows where the interactivity bar is heading: systems that react mid-conversation, not after you finish speaking. For developers building meeting tools or live coaching apps, this is the capability threshold to plan around.

AI Agents · 11 May 2026 ·github.com

Karpathy-inspired CLAUDE.md: one file fixes common Claude Code failure modes

If you use Claude Code daily, this is the single config tweak that probably reduces 'why did it touch that file' frustration the most. The CLAUDE.md ships with verifiable success criteria built in. For a tech lead onboarding a new team to Claude Code, it's a one-file standard you can hand them.

AI Agents AI Industry · 11 May 2026 ·singhajit.com

Claude Code rate limits doubled: Anthropic secures SpaceX Colossus compute

Pro and Max Claude Code users no longer hit the old peak-hour slowdown, and the five-hour usage ceiling just doubled. If you've been scheduling sessions around rate constraints, those constraints loosened on May 6. The Colossus deal suggests more headroom is coming as 220,000 GPUs come online over the next month.

AI Agents · 11 May 2026 ·singhajit.com

AWS Bedrock AgentCore Payments: AI agents can now transact without human approval

An AI agent on AWS Bedrock can now pay for API access or transact with another agent without human approval on each step. For multi-agent systems, this is the first managed billing rail from a major cloud provider. Session spending caps and audit logs make it easier to bring to a compliance review.

AI Agents · 10 May 2026 ·sdtimes.com

AI security in the IDE: Snyk adds Claude, Opsera lands in Cursor

If your team uses Cursor or the Snyk platform, security review is moving inside your coding flow rather than sitting at the end of a pull request. For a developer shipping AI-generated code at speed, that shift is the difference between catching a vulnerability before commit and finding it in production.

AI Agents · 10 May 2026 ·crescendo.ai

Cloudflare and Stripe: AI agents can now buy domains, deploy, and pay

If you build agents or design software pipelines, the assumption that a human must approve purchases and deployments is now optional at the infrastructure level. Cloudflare and Stripe are building the financial plumbing for autonomous software. The question of who is liable when an agent buys something incorrectly is still open.

AI Agents AI Models · 9 May 2026 ·blogs.oracle.com

Oracle OCI Enterprise AI ships Grok 4.3 and an open NVIDIA multimodal model

If you build anything that has to read images, video, or audio together, you now have an open-weights option you can fine-tune and run on your own infrastructure. The trade is your ops team owns the inference stack. The closed-API option is still cheaper for low-volume cases.

AI Agents · 8 May 2026 ·anthropic.com

Claude Code rate limits doubled: Anthropic adds 220,000 GPUs via SpaceX

If you live inside Claude Code or the Anthropic API, the friction you have been planning around just shifted. Pro and Max users no longer get throttled harder during US hours, and a tier-1 startup running Opus jumped from a 30,000-token-per-minute ceiling to 500,000. Plan budgets and quotas accordingly.

AI Agents · 8 May 2026 ·microsoft.com

Microsoft Semantic Kernel RCE: prompt injection turns into shell access

If you ship anything built on Semantic Kernel, patch this week. If you build agents on a different framework, treat tool plugins and filter strings as untrusted input by default. A backend engineer who would never eval() user data is suddenly doing it through their agent's filter expressions, and the fix is the same: validate, sandbox, do not pass strings straight to an interpreter.

AI Agents AI Industry · 7 May 2026 ·venturebeat.com

MCP supply chain flaw: 200,000 AI agent servers exposed to remote code execution

If your team uses Cursor, Claude Code, Windsurf, or VS Code with MCP servers, treat any mcp.json from a repo you did not write as untrusted code. Search for MCP configs in dev machines this week. Pin patched versions where they exist, sandbox the rest, and stop accepting MCP servers from random registries.

AI Agents · 7 May 2026 ·openai.com

OpenAI Symphony: Codex agents now pull tickets straight from Linear

If you write tickets and review pull requests for a living, the job is changing this quarter, not next year. Tighter ticket descriptions become more valuable than fast keystrokes. Code review turns into the bottleneck. Test coverage and CI guardrails stop being nice-to-have and become the only thing standing between you and a flood of plausible-looking wrong code.

AI Agents AI Industry · 7 May 2026 ·cryptointegrat.com

Saperly launches first phone carrier built for AI agents, not humans

If you work in customer service, sales, or any role that depends on a human picking up a phone, you are about to start fielding more calls from machines. If you build software, ask whether your product treats agents as a real user type. The terms of service, rate limits, and identity model your team wrote in 2023 probably assume humans, and that assumption is breaking.

AI Agents AI Industry · 6 May 2026 ·openai.com

OpenAI and PwC build 'AI native finance function' inside the CFO office

If you work in finance ops, accruals, or accounts payable, the layer above your spreadsheets is being rebuilt. Auditors and controllers will spend more time reviewing what an agent did than typing entries. Treat MCP, Codex, and Skills as vocabulary worth learning, because they will show up on job descriptions.

Keep up daily

One email a day, zero hype.

Get AI Agents and the rest of the day's AI news in a short read every morning.