OpenAI introduced a personalized year-in-review feature called “Your Year with ChatGPT,” available to eligible users in select regions

TLDR AI 2025-12-23

Headlines & Launches

Your Year with ChatGPT (2 minute read)

OpenAI introduced a personalized year-in-review feature called “Your Year with ChatGPT,” available to eligible users in select regions. Inspired by Spotify Wrapped, the feature highlights individual usage trends from the past year.

Z.AI launches GLM-4.7, new SOTA open-source model for coding (2 minute read)

GLM-4.7 is the latest release in Z.AI’s General Language Model line. The high-end foundation model is aimed at advanced reasoning, coding, and multimodal workloads. The late update expands context handling and reasoning depth compared to earlier versions. It introduced upgraded reasoning pipelines and broader multimodal support.

MiniMax M2.1 is live in Kilo (3 minute read)

MiniMax M2.1 is ahead of DeepSeek and Kimi on several benchmarks. It is even catching up to state-of-the-art models in some areas. The model is super-fast and efficient. It is now available to all Kilo Code users.

Introducing Manus Design View (3 minute read)

Manus Design View is an extension of the Manus agent for seamless AI design workflows. It enables designers to generate concepts with simple prompts, make precise edits using the Mark Tool, and change text easily. The Manus mobile app allows design edits on-the-go with either text or voice input. It is available to all users today.

Deep Dives & Analysis

Hardening Atlas Against Prompt Injection (13 minute read)

This post details OpenAI’s ongoing efforts to secure its AI browser, Atlas, against prompt injection attacks - malicious instructions embedded in web content that manipulate agent behavior. While mitigation techniques are improving, the company acknowledged prompt injection remains an unsolved and persistent threat, particularly as agent capabilities expand on the open web.

Async Coding Agents “From Scratch” (10 minute read)

It’s pretty easy to homebrew your own asynchronous coding agent. This means that businesses selling coding agents can no longer differentiate themselves by only running sandboxed agents in the cloud that connect to Slack. Companies working on coding agents likely realize this and are doing everything they can to train their own SWE agents and auxiliary models to improve their harnesses.

What (I think) makes Gemini 3 Flash so good and fast (8 minute read)

Gemini 3 Flash is a lightweight, efficient model optimized for speed and low latency. It is capable of delivering performance comparable to Gemini 3 Pro at a fraction of the cost. The model’s design brings unprecedented power but introduces specific tradeoffs in token efficiency and reliability. This post takes a look at the leaked architectural details of the new model.

We removed 80% of our agent’s tools (4 minute read)

Vercel spent months building a sophisticated internal text-to-SQL agent with specialized tools, heavy prompt engineering, and careful context management. It kind of worked, but it was fragile, slow, and required constant maintenance. The team then deleted most of it and stripped the agent down to a single tool that executed arbitrary bash commands. Its agent got simpler and better at the same time: it had a 100% success rate instead of 80%.

Engineering & Research

Agent Skills for Context Engineering (GitHub Repo)

This repository contains a comprehensive collection of Agent Skills for building production-grade AI agent systems. They are categorized into Foundational skills, Architectural skills, and Operational skills. Each skill is structured for efficient context use. The patterns work on any agent platform that supports skills or allows custom instructions.

OpenTinker (GitHub Repo)

OpenTinker is an RL-as-a-Service infrastructure for foundation models. It features separation of programming and execution, separation of environment and training code, and seamless transition from training to inference. The platform enables users to perform RL training and inference without requiring local GPU resources by separating client-side programming from server-side execution. It provides a high-level Python API that abstracts away the complexity of distributed systems.

Scientific Intelligence Benchmark (GitHub Repo)

SGI-Bench is a benchmark for assessing Scientific General Intelligence across the entire research cycle, such as Deliberation, Conception, Action, and Perception. It spans 10 disciplines with over 1,000 expert-curated tasks inspired by major open scientific questions.

Miscellaneous

Cursor Expands Agent Hooks (3 minute read)

Cursor has announced partnerships to integrate its agent hook system with security and platform vendors. These hooks allow organizations to observe, modify, or block stages of the agent loop, supporting use cases like governance, dependency scanning, secrets management, and agent safety.

The Shape of Artificial Intelligence (33 minute read)

AI’s utility in the coming decade will come from understanding the technology’s strengths and where it can be used to augment human ability. It won’t replace humans, at least in the short term, because we are too complex. However, the technology will eventually conquer territories we thought were exclusively ours. This will be the first time we will face true otherness, a new species of being.

Quick Links

Alphabet to Acquire Intersect for $4.75 Billion in Cash (3 minute read)

Alphabet agreed to acquire Intersect, a provider of energy and data center infrastructure, for $4.75 billion.

Announcing advanced governance capabilities for Vertex AI Agent Builder (5 minute read)

Google announced advanced governance features for Vertex AI Agent Builder, enhancing its Agent Engine to manage both short-term and long-term memory.

2026 Predictions (8 minute read)

2026 may also be the first year the AI trade meaningfully splits into AI infra vs AI applications.

sumitup.dev

Explorer

tldr-ai-2025-12-23

Table of Contents