🤖 Tech Talk: Understanding Microsoft and Google's new AI agents
Plus, AI tool of the week: Grok's new PDF generation capability; Apple desperately needs the AI help it’s seeking; and more…
Dear reader,
Last July, I wrote about why enterprises should care about Agentic AI systems, or AI agents as they are better known, even though some experts make a slight distinction between the two terms. Broadly, these intelligent systems refer to artificial intelligence (AI) models capable of autonomous decision-making and action to achieve specific goals. And they typically work without any human intervention.
AI agents typically exhibit key characteristics such as autonomy, adaptability, decision-making, and learning. Autonomy allows them to operate independently to achieve predefined goals. Adaptability enables them to adjust their actions based on changes in the environment or new data. Decision-making utilizes sophisticated algorithms and data to make decisions without human input. Learning employs machine-learning techniques to continuously improve their performance.
Examples include a driverless car or smart home assistant that learns and adapts to your lifestyle, or a trading bot that can dabble in stocks much better than you or I, all powered by AI agents.
Microsoft and Google, at their respective grand events this week—Microsoft Build 20225 on 19 May and Google I/O 2025 on 20 May—reiterated the importance of autonomous AI agents.
Project Mariner: According to Google CEO Sundar Pichai, Google thinks of agents as systems that combine the intelligence of advanced AI models with access to tools, so they can take actions on your behalf and under your control. He cited the company's early research prototype, Project Mariner, as "an early step forward in agents with computer-use capabilities to interact with the web and get stuff done for you". Developers can now access Project Mariner’s computer use capabilities via the Gemini application programming interface (API). Companies like Automation Anywhere and UiPath are already starting to build with it, Pichai added.
Building an agent ecosystem: Pichai highlighted the need for agents to access services and communicate with each other with Google's open Agent2Agent Protocol or Anthropic's Model Context Protocol (MCP). Google is also starting to bring agentic capabilities to Chrome, Search, and the Gemini app. For instance, if you’re apartment hunting, a new Agent Mode in the Gemini app will help you find listings that match your criteria on websites like Zillow, adjust filters, and use MCP to access the listings and even schedule a tour for you.
Reimaging search: Gemini models, according to Pichai, are helping to make Google Search more intelligent, agentic and personalized. AI Overviews, for instance, has been experienced by over 1.5 billion users and is now in 200 countries and territories. AI Overviews appears in Google Search results when its systems determine that generative AI (GenAI) responses can be helpful. AI Overviews is driving over 10% growth in the types of queries that show them, and this growth is increasing in Google's biggest markets like the US and India.
AI Mode for search: Google has also introduced a new AI Mode for Search, which it touts as "a total reimagining of Search". "With more advanced reasoning, you can ask AI Mode longer and more complex queries. In fact, early testers have been asking queries that are two to three times the length of traditional searches, and you can go further with follow-up questions. All of this is available as a new tab right in Search," Pichai said during his keynote.
AI agent-powered AGI: During a fireside side on the sidelines of the event, which also had Google co-founder Sergei Brin as a participant, the CEO of Google DeepMind, Demis Hassabis, said while "we started with agent-based systems in games" the goal now is to build artificial general intelligence (AGI)—a truly general intelligence that understands the physical world. The reason: AGI could supercharge AI to understand, reason, plan, and execute actions autonomously when iIntegrated with autonomous agentic capabilities.
Hassabis added that he envisaged two key applications in this context: a genuinely useful assistant that accompanies you through daily life—not just confined to a screen, and robotics. He acknowledged, though, that for robotics to work at scale software intelligence is the real bottleneck, not hardware. "That’s where advancements like Gemini 2.5 come in. I believe we’re finally at a point where the right algorithms can unlock robotics' full potential," he said.
You may read more about Google’s announcements here.
Era of AI agents: Microsoft too declared that the world has "entered the era of AI agents", thanks to "groundbreaking advancements in reasoning and memory" that have made AI models "more capable and efficient". The Redmond giant envisions a world in which agents "operate across individual, organizational, team and end-to-end business contexts", transforming the internet into an "open agentic web, where AI agents make decisions and perform tasks on behalf of users or organizations".
With the general release of Azure AI Foundry Agent Service, Microsoft is enabling developers to orchestrate multiple specialized agents for complex tasks. Through Copilot Tuning, organizations can use their own data, workflows, and processes to train models and create agents with minimal coding. These agents operate securely within the Microsoft 365 environment and can handle precise, domain-specific tasks—for instance, a law firm generating documents tailored to its expertise and style.
Copilot Studio now supports multi-agent orchestration, allowing agents to collaborate and tackle broader challenges. Microsoft will also support MCP across its agent ecosystem—including GitHub, Copilot Studio, Dynamics 365, Azure AI Foundry, Semantic Kernel, and Windows 11.
Additionally, Microsoft is introducing NLWeb, envisioned as the HTML of the agentic web. It allows websites to integrate conversational interfaces using their preferred models and data, enabling users to interact with content in a natural, semantic way.
You may read more about this here.
AI Unlocked
by AI&Beyond, with Jaspreet Bindra and Anuj Magazine
The AI hack we have unlocked this week is: Grok's new PDF generation capability
What is the problem here?
AI has a last-mile problem. It can generate content brilliantly, but that content often isn't ready to use. Whether it's a resume, an invoice, or a research paper, AI-generated outputs usually need formatting, polishing, and exporting before they're usable. This typically involves switching between multiple tools—chat interfaces for drafting, LaTeX editors for formatting, and separate platforms for PDF export. It's time-consuming, breaks creative flow, and often results in messy or inconsistent files.
Grok solves this with a focused and elegant solution: instant PDF generation built right into its Grok Studio interface. One place to create, refine, and finalize your documents.
How to access: https://grok.com
Grok lets you:
- Generate documents: Create resumes, invoices, research papers and more directly from natural language prompts.
- Live edit: Refine and customize your output using LaTeX or the interactive Grok Studio editor.
- Download PDFs: Instantly export your documents as polished PDFs—no extra tools required.
Example:
You need polished PDF files in various work scenarios—like writing research papers, generating invoices, or updating your resume.
Here’s how you can generate a professional document and export it as a PDF using Grok:
- Open Grok from your browser by visiting https://grok.com
- In the chat window, enter a detailed prompt like: "Create an invoice template for my company <company name> that includes GST, company logo, and is ready to download as a PDF."
- Grok will automatically activate its Studio mode if the task involves document creation.
- Once you review, click the download button to save your polished PDF.
Pro tip: For academic work like research papers, save both the PDF and LaTeX source files for future edits or journal submissions.
What makes PDF generation by Grok special?
• Utility-first: A small feature solving a big workflow pain.
• Interactive studio: Refine your document before rendering.
• Versatile use cases: Invoices, resumes, papers—all done fast.
• Native PDF export: No third-party tools needed.
Note: The tools and analysis featured in this section demonstrated clear value based on our internal testing. Our recommendations are entirely independent and not influenced by the tool creators.
You may also want to read
Apple Desperately Needs the AI Help It’s Seeking (🔒)
Rahul Matthan: Don’t let data privacy safeguards work against us (🔒)
Hope you folks have a great weekend, and your feedback will be much appreciated — just reply to this mail, and I’ll respond.