图书目录

目录

第一部分 大语言模型导论

第 1 章 大语言模型概述····························· 2

1.1 什么是大语言模型························· 3

LLM 的定义································· 5

1.2 流行的现代 LLM··························· 6

1.2.1 BERT ································ 6

1.2.2 GPT 家族与 ChatGPT ············ 7

1.2.3 T5 ···································· 8

1.2.4 Llama ································ 8

1.3 LLM 的关键特征··························· 9

1.4 理解上下文的重要性····················· 12

1.5 LLM 的工作原理·························· 12

1.5.1 预训练阶段························ 13

1.5.2 迁移学习··························· 15

1.5.3 微调································· 15

1.5.4 注意力······························ 16

1.5.5 嵌入································· 18

1.5.6 标记化······························ 18

1.5.7 超越语言建模:模型对齐与基于人类反馈的强化学习 ···························· 21

1.5.8 领域特定大语言模型············ 22

1.6 LLM 的应用································ 23

1.6.1 经典 NLP 任务···················· 23

1.6.2 自由文本生成····················· 26

1.6.3 信息检索 / 神经语义搜索······· 27

1.6.4 聊天机器人························ 28

1.7 总结·········································· 29

第 2 章 使用 LLM 进行语义搜索 ·····························30

2.1 引言·········································· 30

2.2 任务背景···································· 31

非对称语义搜索··························· 31

2.3 解决方案概览······························ 33

2.4 核心组件···································· 34

2.4.1 文本嵌入器························ 34

2.4.2 如何判断文本片段的“相似性” ····························· 34

2.4.3 文档分块··························· 38

2.4.4 向量数据库························ 43

2.4.5 重新排序检索结果··············· 44

2.4.6 API ·································· 45

2.5 整合全局:让一切运转起来············ 46

性能评估···································· 47

2.6 闭源组件的成本··························· 50

2.7 总结·········································· 50

第 3 章 提示工程入门······························· 51

3.1 引言·········································· 51

3.2 提示工程···································· 51

3.2.1 语言模型中的对齐··············· 52

3.2.2 直接询问··························· 53

3.2.3 当“直接询问”不再奏效······ 55

3.2.4 少样本学习························ 55

3.2.5 输出格式化························ 56

3.2.6 人设提示··························· 57

3.2.7 思维链提示························ 58

3.2.8 示例:基础算术·················· 59

3.3 跨模型使用提示··························· 59

3.3.1 聊天模型补全模型··············· 59

3.3.2 Cohere 的 Command 系列 ······ 61

3.3.3 开源模型的提示工程············ 62

3.4 总结·········································· 64

第 4 章 AI 生态系统:整合各个组件············65

4.1 引言·········································· 65

4.2 闭源 AI 的性能漂移 ······················ 65

4.3 AI 的推理与思考之别 ···················· 67

4.4 案例研究 1:检索增强生成( RAG)·· 68

4.4.1 组件协作:检索器与生成器··· 69

4.4.2 评估 RAG 系统··················· 74

4.5 案例研究 2:自动化 AI 智能体 ········ 76

4.5.1 思考 → 行动 → 观察 → 响应 · 76

4.5.2 评估 AI 智能体 ··················· 81

4.6 总结·········································· 82

第二部分 充分发挥 LLM 的价值

第 5 章 利用定制微调优化 LLM ·················84

5.1 引言·········································· 84

5.2 迁移学习与微调:入门指南············ 85

5.2.1 微调流程详解····················· 86

5.2.2 以闭源预训练模型为基础······ 88

5.3 OpenAI 微调 API 一览 ··················· 88

5.3.1 OpenAI 微调 API················· 88

5.3.2 案例研究:应用评论情感分类 88

5.3.3 数据准则和最佳实践············ 89

5.4 使用 OpenAI CLI 准备自定义示例 ···· 90

5.5 设置 OpenAI CLI·························· 92

超参数选择与优化························ 93

5.6 我们的第一个微调 LLM················· 93

5.6.1 使用定量指标评估微调模型··· 94

5.6.2 定性评估技术····················· 97

5.6.3 将微调后的 OpenAI 模型集成到应用中·················· 100

5.6.4 OpenAI 对决开源自编码 BERT ························ 100

5.7 总结········································ 102

第 6 章 高级提示工程····························· 103

6.1 引言········································ 103

6.2 提示注入攻击···························· 103

6.3 输入 / 输出验证 ························· 105

示例:使用 NLI 构建验证管道 ······ 106

6.4 批量提示·································· 108

6.5 提示链····································· 109

6.5.1 使用提示链防止提示堆砌·····111

6.5.2 示例:使用多模态 LLM 进行安全链式操作····················113

6.6 案例研究: AI 的数学有多好···········115

6.6.1 我们的数据集: MathQA·······115

6.6.2 展示你的计算过程?测试思维链····················117

6.6.3 用少样本示例鼓励 LLM······ 120

6.6.4 示例重要吗?重访语义搜索· 121

6.6.5 总结 MathQA 数据集的结果· 122

6.7 总结········································ 124

第 7 章 定制嵌入与模型架构···················· 125

7.1 引言········································ 125

7.2 案例研究:构建推荐系统············· 125

7.2.1 设置问题和数据················ 126

7.2.2 定义推荐问题··················· 127

7.2.3 推荐系统的宏观视角·········· 129

7.2.4 生成自定义描述字段以比较物品················ 132

7.2.5 使用基础嵌入器设定基线···· 133

7.2.6 准备微调数据··················· 134

7.2.7 结果总结························· 138

7.3 总结········································ 141

第 8 章 AI 对齐的第一性原理··················· 142

8.1 引言········································ 142

8.2 对齐的对象与目的······················ 142

8.2.1 指令对齐························· 142

8.2.2 行为对齐························· 143

8.2.3 风格对齐························· 145

8.2.4 价值对齐························· 146

8.3 对齐作为偏见缓解器··················· 147

8.4 对齐的三大支柱························· 151

8.4.1 数据······························· 151

8.4.2 训练 / 调优模型 ················ 154

8.4.3 评估······························· 156

8.4.4 对齐的三大支柱小结·········· 165

8.5 宪法式 AI:迈向自我对齐的一步 ··· 166

8.6 总结········································ 168

第三部分 LLM 高级应用

第 9 章 超越基础模型····························· 170

9.1 引言········································ 170

9.2 案例研究:视觉问答··················· 170

9.2.1 模型介绍:Vision Transformer、 GPT-2 和 DistilBERT····························· 171

9.2.2 隐藏状态投影与融合·········· 175

9.2.3 交叉注意力:它是什么?为什么如此关键·························· 175

9.2.4 我们的自定义多模态模型···· 178

9.2.5 我们的数据: Visual QA······· 180

9.2.6 VQA 训练循环·················· 182

9.2.7 结果总结························· 183

9.3 案例研究:基于反馈的强化学习···· 185

9.3.1 我们的模型: FLAN-T5 ······· 186

9.3.2 我们的奖励模型:情感与语法正确性··················· 187

9.3.3 Transformer 强化学习 ········· 188

9.3.4 RLF 训练循环 ·················· 189

9.3.5 结果总结························· 192

9.4 总结········································ 194

第 10 章 高级开源 LLM 微调··················· 195

10.1 引言 ······································ 195

10.2 示例:使用 BERT 进行动漫流派多标签分类 ····················· 195

10.2.1 使用 Jaccard 分数衡量动漫标题多标签流派预测的性能························ 196

10.2.2 一个简单的微调循环 ········ 197

10.2.3 微调开源 LLM 的通用技巧  199

10.2.4 结果总结 ······················· 205

10.3 示例:使用 GPT2 生成 LaTeX······ 207

10.3.1 开源模型的提示工程 ········ 208

10.3.2 结果总结 ······················· 210

10.4 打造自己的睿智且引人入胜的对话助手—SAWYER ···························211

10.4.1 步骤 1:监督指令微调······ 213

10.4.2 步骤 2:奖励模型训练······ 218

10.4.3 步骤 3:基于(模拟的)人类反馈的强化学习··································· 222

10.4.4 结果总结 ······················· 225

10.4.5 用新鲜知识更新我们的 LLM ·························· 228

10.5 总结 ······································ 230

第 11 章 将 LLM 投入生产······················ 233

11.1 引言 ······································ 233

11.2 闭源 LLM 的生产部署················ 233

11.2.1 成本预估 ······················· 233

11.2.2 API 密钥管理·················· 234

11.3 开源 LLM 的生产部署················ 234

11.3.1 为推理准备模型 ·············· 234

11.3.2 互操作性 ······················· 234

11.3.3 量化 ····························· 235

11.3.4 知识蒸馏 ······················· 240

11.3.5 LLM 的成本预估 ············· 248

11.3.6 发布到 Hugging Face········· 249

11.4 总结 ······································ 252

第 12 章 评估 LLM ······························· 253

12.1 引言 ······································ 253

12.2 评估生成任务 ·························· 254

12.2.1 生成式多项选择 ·············· 254

12.2.2 自由文本回答 ················· 257

12.2.3 基准测试 ······················· 259

12.3 评估理解任务 ·························· 267

12.3.1 嵌入 ····························· 267

12.3.2 校准分类 ······················· 270

12.3.3 探测 LLM 的世界模型 ······ 273

12.4 总结 ······································ 277

12.5 继续前行 ································ 278

第四部分 附录

附录 A LLM 常见问题解答 ····················· 280

附录 B LLM 术语表 ······························ 283

附录 C LLM 应用原型 ··························· 288

附录 D 代码仓库使用指南······················· 291