Skip to content

人工智能

Awesome Artificial Intelligence

A curated collection of must-use, actively maintained resources for building and shipping AI systems.

Focus: AI engineering (RAG, agents, evals, guardrails, deploy) plus the best books, guides, papers, and a carefully selected set of tools.


📚 Learn

Deep, durable knowledge — still valuable five years from now.

Books

Modern & Practical - Designing Machine Learning Systems — Scalable, maintainable ML pipelines (Chip Huyen). - AI Engineering — End-to-end AI product building (Chip Huyen). - Build a Large Language Model from Scratch — Transformers in raw PyTorch, layer by layer (Sebastian Raschka). - Hands-On Large Language Models — Visual + practical guide to LLM applications (Jay Alammar, Maarten Grootendorst). - LLM Engineer's Handbook — Production LLMOps: fine-tuning, quantization, serving (Labonne, Iusztin). - The 100-Page Language Models Book — Concise, math-grounded path from n-grams to transformers (Andriy Burkov). - Generative Deep Learning (2nd Edition) — GANs, VAEs, diffusion models (David Foster).

Foundational - Artificial Intelligence: A Modern Approach — The canonical AI theory text (Russell, Norvig). - Deep Learning — Mathematical foundations of neural networks (Goodfellow, Bengio, Courville). - Deep Learning: Foundations and Concepts — Bishop's 2024 update; probability-grounded modern DL (Bishop & Bishop). - Understanding Deep Learning — Math + intuition + Python notebooks (Simon Prince). - Speech and Language Processing (3rd Edition) — The NLP reference, kept current through the deep learning era (Jurafsky, Martin). - Reinforcement Learning: An Introduction (2nd Edition) — RL foundations (Sutton, Barto).

Courses

Beginner - Google Generative AI Learning Path - Hugging Face LLM Course - Fast.ai — Practical Deep Learning

Intermediate / Advanced - Stanford CS324: Large Language Models - Full Stack Deep Learning - MIT 6.S191: Intro to Deep Learning

Focused - DeepLearning.AI Short Courses - Google DeepMind — Introduction to Reinforcement Learning - Karpathy — Neural Networks: Zero to Hero

Landmark Papers

Research that shaped modern AI — worth reading to understand the "why" behind today's architectures. - Attention Is All You Need — Transformer architecture. - Scaling Laws for Neural Language Models — Model/data/compute scaling. - Language Models are Few-Shot Learners — GPT-3 capabilities. - Constitutional AI — Safer model alignment.


🛠 Build

The toolchain for building with AI. Personal note: you don't need tons of frameworks — start with simple LLM calls and work up.

Guides & Playbooks

Frameworks

  • PocketFlow — Extremely minimalist AI agent framework in just 100 lines of code. Fantastic way to learn.
  • Google ADK — Google's Agent Development Kit (Python, Java). Great local development experience + A2A + MCP.
  • Pydantic-AI — Typed, structured LLM orchestration framework built on Pydantic models for safe, predictable outputs.
  • LangGraph — Build multi-agent workflows with stateful graphs on top of LangChain.
  • CrewAI — Agent orchestration with structured tasks and human-in-the-loop controls.
  • AutoGen — Microsoft's framework for multi-agent conversation and collaboration.
  • LlamaIndex — Data framework for ingesting, indexing, and querying private data with LLMs.
  • Haystack — Open-source search/RAG framework with modular pipelines.
  • Docling — Great library for ingesting any kind of document for RAG ⭐

Evals

IDEs

  • Cursor — LLM-powered IDE for multi-file edits and codebase-aware chat.
  • GitHub Copilot — In-IDE code completion, chat, and refactors.

🤖 Agents

Harnesses that turn LLMs into autonomous workers. The model is swappable; the harness is the product.

Coding

For live capability comparison, see Terminal-Bench and SWE-bench.

  • Claude Code — Anthropic's CLI agent; multi-file codebase refactoring with long context.
  • Codex CLI — OpenAI's Rust-based local terminal agent; lightweight and fast.
  • Gemini CLI — Google's official open-source terminal agent; long-context repo exploration.
  • Cursor CLI — Cursor's terminal-native agent with sandboxed permissions.
  • Aider — Git-integrated pair programming with surgical edits and undo.
  • OpenCode — Provider-agnostic terminal harness with a strong TUI.
  • OpenHands — Open-source autonomous SWE platform; browser + shell + editor loop.
  • Cline — Open-source agentic IDE extension with strong multi-provider support.
  • Continue — Open-source IDE + CLI assistant with source-controlled rules.
  • Goose — Block's extensible, MCP-driven local agent.
  • Factory Droid — Benchmark-leading multi-model harness with BYOK local execution.
  • Amp — Sourcegraph's commercial agentic coding tool with strong product UX.
  • Mistral Vibe — Mistral's agentic coding CLI, powered by Devstral.
  • Qwen Code — Alibaba's terminal coding agent, optimized for Qwen models.
  • Pi — Highly customizable terminal harness; minimal base prompt, extension-driven.
  • Nanocoder — Private, local-first agent for Ollama and LM Studio.
  • Kilo CLI — Multi-mode agent with a unified gateway to 500+ models.

🧠 Models

State-of-the-art models by modality.

💬 Language

The major frontier labs.

  • ChatGPT — Best for general reasoning, tool use, and the broadest ecosystem.
  • Claude — Best for long-context analysis, coding, and structured thinking.
  • Gemini — Best for multimodal tasks and Google ecosystem integration.
  • Grok — Best for real-time information via X and very long context.
  • Llama — Best open-weight family for self-hosting and fine-tuning.
  • Mistral — Best for lightweight, high-performance open-weight models.
  • DeepSeek — Best for cost-efficient reasoning with open weights.
  • Qwen — Best for multilingual and Chinese-first applications.
  • Kimi — Best for long-context instruction following.
  • GLM — Frontier-tier Chinese model with open weights.
  • Cohere — Best for enterprise LLMs with strong retrieval-augmented generation APIs.

🖼 Image

  • GPT Image — OpenAI's integrated image generation with near-perfect text rendering.
  • Midjourney — Artistic and photorealistic images.
  • Adobe Firefly — Integrated into Creative Cloud; commercial-safe.
  • Ideogram — Precise, legible text in generated images.
  • Flux — High-res, prompt-editable, open-weight images.

🎥 Video

  • Google Veo — High-quality video with synchronized audio.
  • Runway — Video editing + generation with granular creative control.
  • Kling — Cinematic, realistic video generation.

🎙 Audio

  • ElevenLabs — High-quality text-to-speech and voice cloning.
  • Suno — AI music from text prompts.

📊 Compare

Live benchmarks, pricing, and the latest model versions. - OpenRouter — Unified API + live pricing across ~300 models. - LMArena — Human-preference Elo rankings for text, image, and video. - Artificial Analysis — Speed, price, and quality benchmarks across providers.


📡 Follow

Stay current without drowning in noise.

Newsletters