A Common-Sense Guide to AI Engineering
Build Production-Ready LLM Applications
by: Jay Wengrow
| Published | 2026-05-10 |
|---|---|
| Internal code | jwpaieng |
| Print status | In Print |
| Pages | 300 |
| User level | |
| Keywords | |
| Related titles | |
| ISBN | 9798888651933 |
| Other ISBN |
Channel epub: 9798888651940 Channel PDF: 9798888651940 Kindle: 9798888651940 Safari: 9798888651940 Kindle: 9798888651940 |
| BISACs |
Highlight
Want to build an LLM-powered app but don’t know where to begin? Can’t get past a proof-of-concept? With this step-by-step guide, you can master the underlying principles of AI engineering by building an LLM-powered app from the ground up. Tame unpredictable models with prompt and context engineering. Use evals to keep them on track. Give chatbots the knowledge to answer anything a user wants to know. Equip agents with the tools and smarts to actually get the job done. By the end, you’ll have the intuition and the confidence to build on top of LLMs in the real world.
Description
Fragmented documentation, obsolete tutorials, and frameworks that deliver a prototype but flop in production can make AI engineering feel overwhelming. But it doesn’t have to be that way. With real-world code and step-by-step instructions as your guide, you can learn to build robust LLM-powered apps from the ground up while mastering both the how and why of the most crucial underlying concepts.
Harness context engineering and retrieval systems to create AI assistants that understand your proprietary data. Create chatbots that answer organization-specific questions and help solve users’ issues. Design agents that conduct research, make decisions, and take action in the real world. Level up your prompt engineering and get an LLM to do your bidding—-not its own. Use automated evals to keep constant tabs on your app’s quality while setting up guardrails to protect your users and organization. And implement observability systems that make it easy to debug your app when things do go wrong.
With a systematic approach grounded in the core principles of building AI apps for real users, you’ll easily evolve and adapt even as the hype and tools come and go.
Contents and Extracts
- Foundations
- HeLLMo, World!
- Signing Up for an LLM-as-a-Service
- Creating Our First App
- Tweaking the Model and Temperature
- Checking API Usage
- Wrapping Up
- Understanding How LLMs Work
- What Is a Large Language Model (LLM)?
- Realizing LLMs Are Nondeterministic Creatures
- Gauging the Temperature
- Understanding the Challenges of Nondeterminism
- Wrapping Up
- Diving Deeper into LLMs
- Diving into Tokens
- Diving into Embeddings
- Diving into Fine-Tuning Behavior
- Wrapping Up
- Selecting an LLM
- Getting Your Hands on an LLM
- Comparing Different LLMs
- Deciding on an LLM
- Wrapping Up
- HeLLMo, World!
- Chatbots
- Building a Chatbot
- Getting User Input
- Augmenting the Prompt
- Adding Multi-Turn Dialogue
- Managing State with Memory Systems
- Adding a System Prompt
- Treating the Prompt as an Array
- Wrapping Up
- Augmenting a Prompt with Knowledge
- Building a Chatbot
- Augmenting with Knowledge
- Avoiding Context Window Limitations
- Preparing the Data
- Implementing the Knowledge Chatbot
- Running into PACKing Problems
- Wrapping Up
- Efficiently Adding Knowledge with RAG
- Augmenting with Documentation Chunks
- Getting into Search Engines, Retrieval, and RAG
- Searching with Meaning: Keywords Versus Semantics
- Using Embedding-Similarity Search
- Building a Starter Search Engine
- Implementing a RAG Chatbot
- Choosing the Right Top-K
- Wrapping Up
- Measuring Quality with Evals
- Introducing Evals
- Setting Up Our App
- Conducting Error Analysis
- Open Coding
- Axial Coding
- Creating an Eval Test Framework
- Running Human Evals
- Wrapping Up
- Prompt Engineering
- Eliminating Ambiguity
- Utilizing the System Prompt
- Rewriting History
- Using Delimiters and Bullet Points
- Reordering Prompt Components
- Wrapping Up
- Reducing Hallucinations
- Understanding Why Our App Hallucinates
- Instructing the LLM to be Faithful
- Pleading and Threatening
- Upgrading the Model
- Citing Sources and Few-Shot Prompting
- Iterate, Iterate, Iterate
- Reviewing Our Current Chatbot Implementation
- Final Prompt Engineering Thoughts
- Checking On Our Evals
- Wrapping Up
- Evaluating and Optimizing RAG
- Discovering a RAG Failure
- Evaluating RAG
- Expanding the Query
- Metadata-Based Filtering
- Evaluating RAG Subcomponents
- Dreaming Up an Agentic RAG Wish List
- Wrapping Up
- Building a Chatbot
- Agents
- Equipping an LLM with Tools
- Understanding an LLM’s Limitations
- Triggering a Function
- Defining “Agents”
- Feeding Tool Results Back to the LLM
- Building a Website Reader Tool
- Deciding to Use a Tool
- Using the Tools API
- Wrapping Up
- Running the Agent Loop
- Solving a Complex Problem
- Constructing an Agent Loop
- Building a News Podcast Agent
- Exploring Agent Failure Modes and Evals
- Giving the Agent a Plan
- Asking the Agent to Create a Plan
- Wrapping Up
- Architecting Agentic Workflows
- Designing an LLM Assembly Line
- Implementing an LLM Assembly Line
- Weighing Agentic Workflows Against Classic Agent Loops
- Workflow Routing
- Performing Tasks in Parallel
- Wrapping Up
- Enhancing Retrieval with Agentic RAG
- Architecting an Agentic RAG Plan
- Implementing a RAG Agent
- Avoiding Unnecessary RAG
- Generating Structured Outputs
- Researching as an Agent
- Conducting Multi-Hop Research
- Wrapping Up
- Building System-Integrated Agents
- Integrating with Databases
- Reading and Writing
- Writing Safely
- Including a Human in the Loop
- Integrating with Web APIs
- Integrating MCP and Other Third-Party Tools
- Hosting Your Own Tools
- Wrapping Up
- Equipping an LLM with Tools
- Production
- Adding Guardrails
- Introducing Guardrail Types
- Guarding LLMs with Other Models
- Balancing Guardrail Trade-Offs
- Mitigating Cybersecurity Risks
- Protecting Personally Identifiable Information
- Using Guardrail Frameworks
- Red-Teaming, Monitoring, and Eval Creation
- Wrapping Up
- Observing AI Systems
- Logging All the Things
- Using Observability Tools
- Performing Qualitative and Quantitative Assessment
- Monitoring and Alerts
- Gathering User Feedback
- Wrapping Up
- Adding Guardrails