golang

发布日期: 2024-01-01

更新日期: 2025-12-09

文章字数: 5.7k

阅读时长: 25 分

pprof 简介

pprof 是 Go 语言提供的性能分析工具，可以用于分析程序的 CPU 使用、内存分配、goroutine 阻塞等性能问题。它是 Go 标准库的一部分，提供了强大的性能分析能力。

主要功能

CPU Profiling: 分析 CPU 使用情况，找出性能瓶颈
Memory Profiling: 分析内存分配和泄漏
Goroutine Profiling: 分析 goroutine 的使用情况
Block Profiling: 分析阻塞操作
Mutex Profiling: 分析互斥锁竞争

集成 pprof

集成方式对比

graph TB
    A[集成 pprof] --> B{选择集成方式}
    
    B -->|标准 HTTP| C1[导入 net/http/pprof
自动注册路由]
    B -->|自定义框架| C2[手动注册路由
Gin/Echo等]
    B -->|程序化| C3[使用 runtime/pprof
直接生成文件]
    
    C1 --> D[HTTP 接口访问]
    C2 --> D
    C3 --> E[本地文件分析]
    
    D --> F[go tool pprof
分析]
    E --> F
    
    style C1 fill:#ffcccc
    style C2 fill:#ccffcc
    style C3 fill:#ccccff

HTTP 服务器集成

最简单的方式是在 HTTP 服务器中导入 net/http/pprof 包：

package main

import (
    _ "net/http/pprof"  // 导入即注册路由
    "net/http"
)

func main() {
    // 启动 HTTP 服务器
    go func() {
        http.ListenAndServe("localhost:6060", nil)
    }()
    
    // 你的业务代码
    // ...
}

导入 net/http/pprof 后，会自动注册以下路由：

/debug/pprof/: pprof 主页，显示所有可用的 profile
/debug/pprof/profile: CPU profile，默认采样 30 秒
/debug/pprof/heap: 内存 profile
/debug/pprof/goroutine: goroutine profile
/debug/pprof/block: 阻塞 profile
/debug/pprof/mutex: 互斥锁 profile
/debug/pprof/trace: 执行追踪

自定义 HTTP 服务器

如果使用自定义的 HTTP 服务器（如 Gin、Echo 等），需要手动注册路由：

package main

import (
    "net/http"
    "net/http/pprof"
    "github.com/gin-gonic/gin"
)

func main() {
    router := gin.Default()
    
    // 注册 pprof 路由
    pprofGroup := router.Group("/debug/pprof")
    {
        pprofGroup.GET("/", gin.WrapH(http.HandlerFunc(pprof.Index)))
        pprofGroup.GET("/cmdline", gin.WrapH(http.HandlerFunc(pprof.Cmdline)))
        pprofGroup.GET("/profile", gin.WrapH(http.HandlerFunc(pprof.Profile)))
        pprofGroup.POST("/symbol", gin.WrapH(http.HandlerFunc(pprof.Symbol)))
        pprofGroup.GET("/symbol", gin.WrapH(http.HandlerFunc(pprof.Symbol)))
        pprofGroup.GET("/trace", gin.WrapH(http.HandlerFunc(pprof.Trace)))
        pprofGroup.GET("/allocs", gin.WrapH(http.HandlerFunc(pprof.Handler("allocs").ServeHTTP)))
        pprofGroup.GET("/block", gin.WrapH(http.HandlerFunc(pprof.Handler("block").ServeHTTP)))
        pprofGroup.GET("/goroutine", gin.WrapH(http.HandlerFunc(pprof.Handler("goroutine").ServeHTTP)))
        pprofGroup.GET("/heap", gin.WrapH(http.HandlerFunc(pprof.Handler("heap").ServeHTTP)))
        pprofGroup.GET("/mutex", gin.WrapH(http.HandlerFunc(pprof.Handler("mutex").ServeHTTP)))
    }
    
    router.Run(":8080")
}

程序化使用

如果不使用 HTTP 服务器，也可以程序化地生成 profile：

package main

import (
    "os"
    "runtime/pprof"
)

func main() {
    // CPU profiling
    cpuFile, _ := os.Create("cpu.prof")
    defer cpuFile.Close()
    pprof.StartCPUProfile(cpuFile)
    defer pprof.StopCPUProfile()
    
    // 你的业务代码
    // ...
    
    // Memory profiling
    memFile, _ := os.Create("mem.prof")
    defer memFile.Close()
    pprof.WriteHeapProfile(memFile)
}

CPU Profiling

CPU profiling 用于分析程序的 CPU 使用情况，找出性能瓶颈。

获取 CPU Profile

方式 1: 通过 HTTP 接口

# 默认采样 30 秒
go tool pprof http://localhost:6060/debug/pprof/profile

# 指定采样时间（60 秒）
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=60

# 保存到文件
curl http://localhost:6060/debug/pprof/profile?seconds=30 > cpu.prof

方式 2: 程序化生成

package main

import (
    "os"
    "runtime/pprof"
    "time"
)

func main() {
    f, _ := os.Create("cpu.prof")
    defer f.Close()
    
    pprof.StartCPUProfile(f)
    defer pprof.StopCPUProfile()
    
    // 执行需要分析的代码
    for i := 0; i < 1000; i++ {
        cpuIntensiveTask()
    }
}

func cpuIntensiveTask() {
    sum := 0
    for i := 0; i < 1000000; i++ {
        sum += i
    }
}

分析 CPU Profile

交互式命令行

1	go tool pprof cpu.prof

进入交互式界面后，可以使用以下命令：

(pprof) top          # 显示占用 CPU 最多的函数
(pprof) top10        # 显示前 10 个
(pprof) list main    # 显示 main 函数的详细代码
(pprof) web          # 生成 SVG 图（需要安装 graphviz）
(pprof) png          # 生成 PNG 图
(pprof) svg          # 生成 SVG 图
(pprof) peek main    # 显示包含 main 的调用链
(pprof) disasm main  # 显示 main 的汇编代码
(pprof) help         # 显示帮助

常用命令详解

top 命令输出示例：

Showing nodes accounting for 50.20s, 100% of 50.20s total
      flat  flat%   sum%        cum   cum%
    20.10s 40.04%  40.04%     20.10s 40.04%  runtime.mallocgc
    15.05s 29.98%  70.02%     15.05s 29.98%  math.Sqrt
    10.03s 19.98%  89.99%     10.03s 19.98%  main.cpuIntensiveTask
     5.02s  9.99%  99.99%      5.02s  9.99%  runtime.memmove

flat: 函数自身消耗的 CPU 时间
flat%: 占 CPU 总时间的百分比
cum: 函数及其调用的函数消耗的 CPU 时间
cum%: 累计百分比

list 命令输出示例：

Total: 50.20s
ROUTINE ======================== main.cpuIntensiveTask in main.go
    10.03s   10.03s (flat, cum) 19.98% of Total
         .          .     10:func cpuIntensiveTask() {
         .          .     11:    sum := 0
    10.03s   10.03s     12:    for i := 0; i < 1000000; i++ {
         .          .     13:        sum += i
         .          .     14:    }
         .          .     15:}

生成可视化报告

# 生成 PNG 图片
go tool pprof -png cpu.prof > cpu.png

# 生成 SVG 图片
go tool pprof -svg cpu.prof > cpu.svg

# 生成 PDF 报告
go tool pprof -pdf cpu.prof > cpu.pdf

# 生成文本报告
go tool pprof -text cpu.prof > cpu.txt

Memory Profiling

Memory profiling 用于分析内存分配和潜在的内存泄漏。

获取 Memory Profile

通过 HTTP 接口

# 当前内存使用情况
go tool pprof http://localhost:6060/debug/pprof/heap

# 保存到文件
curl http://localhost:6060/debug/pprof/heap > heap.prof

程序化生成

package main

import (
    "os"
    "runtime/pprof"
)

func main() {
    // 执行代码
    memoryIntensiveTask()
    
    // 生成 heap profile
    f, _ := os.Create("heap.prof")
    defer f.Close()
    pprof.WriteHeapProfile(f)
}

func memoryIntensiveTask() {
    var data [][]byte
    for i := 0; i < 1000; i++ {
        data = append(data, make([]byte, 1024*1024)) // 1MB
    }
}

分析 Memory Profile

交互式分析

1	go tool pprof heap.prof

在交互式界面中：

(pprof) top          # 显示分配内存最多的函数
(pprof) top10 -cum    # 显示累计分配最多的函数
(pprof) list main     # 显示代码详情
(pprof) alloc_space   # 查看分配的内存总量
(pprof) alloc_objects # 查看分配的对象数量
(pprof) inuse_space   # 查看正在使用的内存
(pprof) inuse_objects # 查看正在使用的对象数量

内存泄漏检测

通过对比不同时间点的 heap profile 来检测内存泄漏：

sequenceDiagram
    participant 程序
    participant pprof接口
    participant 分析工具
    
    程序->>pprof接口: 第一次采样 heap
    pprof接口-->>分析工具: heap1.prof
    Note over 程序: 等待一段时间
观察内存变化
    程序->>pprof接口: 第二次采样 heap
    pprof接口-->>分析工具: heap2.prof
    分析工具->>分析工具: 对比分析
-base heap1.prof heap2.prof
    分析工具-->>分析工具: 识别内存增长点

# 第一次采样
curl http://localhost:6060/debug/pprof/heap > heap1.prof

# 等待一段时间后，第二次采样
curl http://localhost:6060/debug/pprof/heap > heap2.prof

# 对比分析
go tool pprof -base heap1.prof heap2.prof

在交互式界面中：

1 2	(pprof) top # 显示增长最多的函数 (pprof) list main # 查看具体代码

内存分析示例

package main

import (
    "fmt"
    "runtime"
    "time"
)

func main() {
    // 定期打印内存统计
    go func() {
        for {
            var m runtime.MemStats
            runtime.ReadMemStats(&m)
            fmt.Printf("Alloc = %v KB, TotalAlloc = %v KB, Sys = %v KB, NumGC = %v\n",
                m.Alloc/1024, m.TotalAlloc/1024, m.Sys/1024, m.NumGC)
            time.Sleep(5 * time.Second)
        }
    }()
    
    // 模拟内存泄漏
    var data [][]byte
    for i := 0; i < 100; i++ {
        data = append(data, make([]byte, 1024*1024)) // 1MB
        time.Sleep(100 * time.Millisecond)
    }
    
    select {} // 保持程序运行
}

Goroutine Profiling

Goroutine profiling 用于分析 goroutine 的使用情况，找出 goroutine 泄漏。

获取 Goroutine Profile

# 通过 HTTP 接口
go tool pprof http://localhost:6060/debug/pprof/goroutine

# 保存到文件
curl http://localhost:6060/debug/pprof/goroutine > goroutine.prof

分析 Goroutine Profile

1	go tool pprof goroutine.prof

在交互式界面中：

1
2
3

(pprof) top          # 显示 goroutine 数量最多的函数
(pprof) list main    # 查看代码详情
(pprof) traces       # 显示所有 goroutine 的调用栈

traces 命令示例

goroutine profile: total 100
1 @ 0x123456
  main.leakyFunction
    runtime.newproc
      runtime.main

2 @ 0x234567
  main.anotherFunction
    runtime.newproc
      runtime.main

Goroutine 泄漏检测示例

package main

import (
    "net/http"
    _ "net/http/pprof"
    "time"
)

func main() {
    go http.ListenAndServe("localhost:6060", nil)
    
    // 模拟 goroutine 泄漏
    for i := 0; i < 100; i++ {
        go func() {
            select {} // 永远阻塞，导致泄漏
        }()
    }
    
    select {}
}

Block Profiling

Block profiling 用于分析阻塞操作，找出程序中的阻塞点。

启用 Block Profiling

需要在代码中启用：

package main

import (
    "runtime"
    _ "net/http/pprof"
    "net/http"
)

func main() {
    // 启用 block profiling
    runtime.SetBlockProfileRate(1) // 采样率：1 表示每个阻塞事件都记录
    
    go http.ListenAndServe("localhost:6060", nil)
    
    // 你的业务代码
    // ...
}

获取 Block Profile

1	go tool pprof http://localhost:6060/debug/pprof/block

分析 Block Profile

1	go tool pprof block.prof

1 2	(pprof) top # 显示阻塞时间最长的函数 (pprof) list main # 查看代码详情

Mutex Profiling

Mutex profiling 用于分析互斥锁的竞争情况。

启用 Mutex Profiling

package main

import (
    "runtime"
    _ "net/http/pprof"
    "net/http"
)

func main() {
    // 启用 mutex profiling
    runtime.SetMutexProfileFraction(1) // 采样率：1 表示每个竞争事件都记录
    
    go http.ListenAndServe("localhost:6060", nil)
    
    // 你的业务代码
    // ...
}

获取 Mutex Profile

1	go tool pprof http://localhost:6060/debug/pprof/mutex

分析 Mutex Profile

1	go tool pprof mutex.prof

1 2	(pprof) top # 显示竞争最激烈的锁 (pprof) list main # 查看代码详情

Trace 分析

Trace 用于分析程序的执行追踪，可以查看 goroutine 的调度、GC 等事件。

获取 Trace

1 2	# 通过 HTTP 接口（默认追踪 1 秒） curl http://localhost:6060/debug/pprof/trace?seconds=5 > trace.out

分析 Trace

1	go tool trace trace.out

这会打开一个 Web 界面（通常是 http://127.0.0.1:随机端口），可以查看：

View trace: 时间线视图，显示所有事件
Goroutine analysis: goroutine 分析
Network blocking profile: 网络阻塞分析
Synchronization blocking profile: 同步阻塞分析
Syscall blocking profile: 系统调用阻塞分析
Scheduler latency profile: 调度延迟分析

Trace 分析示例

package main

import (
    _ "net/http/pprof"
    "net/http"
    "time"
)

func main() {
    go http.ListenAndServe("localhost:6060", nil)
    
    // 创建大量 goroutine
    for i := 0; i < 1000; i++ {
        go func(id int) {
            time.Sleep(time.Second)
            // 模拟工作
            sum := 0
            for j := 0; j < 1000000; j++ {
                sum += j
            }
        }(i)
    }
    
    time.Sleep(10 * time.Second)
}

实际应用示例

完整的性能分析示例

package main

import (
    "fmt"
    "log"
    "math"
    "net/http"
    _ "net/http/pprof"
    "runtime"
    "sync"
    "time"
)

func main() {
    // 启用各种 profiling
    runtime.SetBlockProfileRate(1)
    runtime.SetMutexProfileFraction(1)
    
    // 启动 pprof HTTP 服务器
    go func() {
        log.Println("pprof server started on :6060")
        log.Println(http.ListenAndServe("localhost:6060", nil))
    }()
    
    // 模拟 CPU 密集型任务
    go cpuIntensiveTask()
    
    // 模拟内存密集型任务
    go memoryIntensiveTask()
    
    // 模拟 goroutine 泄漏
    go goroutineLeak()
    
    // 模拟锁竞争
    go mutexContention()
    
    select {}
}

// CPU 密集型任务
func cpuIntensiveTask() {
    for {
        sum := 0.0
        for i := 0; i < 1000000; i++ {
            sum += math.Sqrt(float64(i))
        }
        time.Sleep(100 * time.Millisecond)
    }
}

// 内存密集型任务
func memoryIntensiveTask() {
    var data [][]byte
    for {
        data = append(data, make([]byte, 1024*1024)) // 1MB
        if len(data) > 100 {
            data = data[1:] // 模拟内存泄漏（只删除第一个）
        }
        time.Sleep(100 * time.Millisecond)
    }
}

// Goroutine 泄漏
func goroutineLeak() {
    for i := 0; i < 100; i++ {
        go func() {
            select {} // 永远阻塞
        }()
    }
}

// 锁竞争
var mu sync.Mutex
var counter int

func mutexContention() {
    for i := 0; i < 10; i++ {
        go func() {
            for {
                mu.Lock()
                counter++
                time.Sleep(10 * time.Millisecond) // 持有锁时间较长
                mu.Unlock()
            }
        }()
    }
}

性能分析流程

flowchart TD
    A[启动程序] --> B[访问 pprof 接口]
    B --> C{选择分析类型}
    
    C -->|CPU| D1[获取 CPU profile]
    C -->|内存| D2[获取 Heap profile]
    C -->|Goroutine| D3[获取 Goroutine profile]
    C -->|阻塞| D4[获取 Block profile]
    C -->|锁竞争| D5[获取 Mutex profile]
    
    D1 --> E[使用 go tool pprof 分析]
    D2 --> E
    D3 --> E
    D4 --> E
    D5 --> E
    
    E --> F{使用分析方式}
    F -->|命令行| G1[交互式命令
top/list/web]
    F -->|Web UI| G2[可视化界面
Graph/Flame Graph]
    
    G1 --> H[识别性能瓶颈]
    G2 --> H
    
    H --> I[优化代码]
    I --> J[重新分析验证]
    J --> K{问题解决?}
    K -->|否| H
    K -->|是| L[完成]
    
    style A fill:#ffcccc
    style H fill:#ccffcc
    style I fill:#ccccff
    style L fill:#ffffcc

启动程序并访问 pprof 接口

# 查看所有可用的 profile
curl http://localhost:6060/debug/pprof/

# 获取 CPU profile（采样 30 秒）
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30

# 获取内存 profile
go tool pprof http://localhost:6060/debug/pprof/heap

# 获取 goroutine profile
go tool pprof http://localhost:6060/debug/pprof/goroutine

分析并找出问题

# CPU 分析
go tool pprof cpu.prof
(pprof) top
(pprof) list 函数名

# 内存分析
go tool pprof heap.prof
(pprof) top
(pprof) list 函数名

# Goroutine 分析
go tool pprof goroutine.prof
(pprof) top
(pprof) traces

优化代码

根据分析结果优化代码，然后重新分析验证。

命令行工具使用

go tool pprof 常用参数

# 基本用法
go tool pprof [options] <profile>

# 常用选项
go tool pprof -http=:8080 profile.prof    # 启动 Web UI
go tool pprof -text profile.prof          # 文本输出
go tool pprof -top profile.prof            # 只显示 top
go tool pprof -png profile.prof > out.png # 生成 PNG
go tool pprof -svg profile.prof > out.svg # 生成 SVG
go tool pprof -pdf profile.prof > out.pdf # 生成 PDF

Web UI 使用

# 启动 Web UI
go tool pprof -http=:8080 http://localhost:6060/debug/pprof/profile

# 或者分析本地文件
go tool pprof -http=:8080 cpu.prof

Web UI 提供以下视图：

Top: 显示占用资源最多的函数
Graph: 调用关系图
Flame Graph: 火焰图
Peek: 显示调用者
Source: 源代码视图
Disassemble: 汇编代码视图

Graph 流线图详解

Graph 视图是 pprof Web UI 中最常用的可视化工具之一，它以有向图的形式展示函数之间的调用关系和资源占用情况。

如何查看 Graph 视图

启动 Web UI：

1	go tool pprof -http=:8080 cpu.prof

在浏览器中打开 http://localhost:8080，点击 Graph 标签

Graph 图元素说明

节点（Node）

每个节点代表一个函数，节点的信息包括：

graph TB
    subgraph "节点结构"
        A["函数名
flat% (cum%)
flat (cum)"]
    end
    
    style A fill:#ffcccc

函数名: 函数的完整名称
flat%: 函数自身占用的资源百分比（不包括调用的子函数）
cum%: 函数及其调用的所有子函数累计占用的资源百分比
flat: 函数自身占用的资源绝对值
cum: 函数及其调用的所有子函数累计占用的资源绝对值

示例：

graph TB
    A["main.cpuIntensiveTask
20.5% (30.2%)
2.05s (3.02s)"]
    
    style A fill:#ffcccc

表示：

main.cpuIntensiveTask 函数自身占用 20.5% 的 CPU 时间（2.05 秒）
包括其调用的子函数，总共占用 30.2% 的 CPU 时间（3.02 秒）

边（Edge）

节点之间的箭头表示函数调用关系：

graph LR
    A[调用者] -->|资源占比| B[被调用者]
    
    style A fill:#ccffcc
    style B fill:#ccccff

边的标签显示：

调用次数: 调用者调用被调用者的次数
资源占比: 该调用路径占用的资源百分比

示例：

graph LR
    A[main.main] -->|20.5%| B[main.cpuIntensiveTask]
    
    style A fill:#ccffcc
    style B fill:#ccccff

表示 main.main 调用 main.cpuIntensiveTask，该调用路径占用 20.5% 的资源。

Graph 图阅读技巧

1. 寻找热点函数

节点大小: 节点越大，占用的资源越多
节点颜色: 通常红色表示占用资源多，绿色表示占用资源少
关注 flat%: flat% 高的函数是真正的性能瓶颈

2. 理解调用链

graph TD
    A[main.main] --> B[runtime.main]
    B --> C[main.cpuIntensiveTask]
    C --> D[math.Sqrt]
    
    style A fill:#ffcccc
    style B fill:#ccffcc
    style C fill:#ccccff
    style D fill:#ffffcc

这个调用链表示：

main.main 调用 runtime.main
runtime.main 调用 main.cpuIntensiveTask
main.cpuIntensiveTask 调用 math.Sqrt

3. 识别性能瓶颈

示例场景：

graph TD
    A["A
10% flat, 50% cum"] --> B["B
5% flat, 20% cum"]
    A --> D["D
25% flat, 25% cum"]
    B --> C["C
15% flat, 15% cum"]
    
    style A fill:#ffcccc
    style B fill:#ccffcc
    style C fill:#ccccff
    style D fill:#ff9999

分析：

D 是最大的瓶颈（25% flat），需要优先优化
C 也是瓶颈（15% flat），但只在 B 的调用链中
A 虽然 cum% 高（50%），但 flat% 低（10%），说明问题在子函数

Graph 图交互操作

点击节点: 高亮显示该节点的调用关系
悬停节点: 显示节点的详细信息
搜索功能: 在搜索框输入函数名，快速定位
缩放: 使用鼠标滚轮或工具栏按钮缩放视图
拖拽: 可以拖拽节点调整布局

Graph 图实际应用

场景 1: CPU 性能分析

1 2	# 获取 CPU profile go tool pprof -http=:8080 http://localhost:6060/debug/pprof/profile?seconds=30

在 Graph 视图中：

找出 flat% 最高的节点（真正的 CPU 热点）
查看调用链，理解函数调用关系
识别可以优化的函数

场景 2: 内存分析

1 2	# 获取内存 profile go tool pprof -http=:8080 http://localhost:6060/debug/pprof/heap

在 Graph 视图中：

找出分配内存最多的函数
查看内存分配的调用链
识别内存泄漏的源头

Flame Graph 火焰图详解

Flame Graph（火焰图）是另一种强大的性能可视化工具，它以层次化的方式展示调用栈和资源占用。

如何查看 Flame Graph

启动 Web UI：

1	go tool pprof -http=:8080 cpu.prof

在浏览器中打开 http://localhost:8080，点击 Flame Graph 标签

火焰图结构说明

基本结构

火焰图是一个水平堆叠的条形图，类似于火焰的形状：

graph TD
    A[main.main
顶层: 调用栈底部] --> B[function A]
    A --> C[function B]
    B --> D[func A1]
    B --> E[func A2]
    C --> F[function C]
    
    style A fill:#ffcccc
    style B fill:#ccffcc
    style C fill:#ccccff
    style D fill:#ffffcc
    style E fill:#ffccff
    style F fill:#ccffff

说明：

顶层：调用栈底部（如 main.main）
底层：调用栈顶部（实际执行的函数）
宽度：表示函数占用的资源比例
颜色：用于区分不同的函数

关键元素

X 轴: 按字母顺序排列的采样点，不表示时间顺序
Y 轴: 调用栈的深度，从上到下表示调用链
宽度: 每个矩形的宽度表示该函数占用的资源比例
颜色: 通常随机着色，用于区分不同的函数

火焰图阅读技巧

1. 理解调用栈

顶部: 调用栈的底部（如 main.main）
底部: 调用栈的顶部（实际执行的函数）
从上到下: 表示函数调用链

2. 识别性能热点

宽矩形: 占用资源多的函数
窄矩形: 占用资源少的函数
最宽的部分: 通常是性能瓶颈所在

3. 查找调用链

在火焰图中，从顶部到底部的一条路径就是一个完整的调用栈：

graph TD
    A[main.main] --> B[runtime.main]
    B --> C[main.cpuIntensiveTask]
    C --> D[math.Sqrt]
    
    style A fill:#ffcccc
    style B fill:#ccffcc
    style C fill:#ccccff
    style D fill:#ffffcc

4. 对比分析

相同宽度的矩形: 表示这些函数占用相同的资源
宽矩形 + 深调用栈: 表示该函数及其调用链占用大量资源

火焰图交互操作

点击矩形:
- 放大显示该函数及其调用链
- 在底部显示详细信息
悬停矩形:
- 显示函数名、资源占用等信息
- 高亮显示相关的调用链
搜索功能:
- 在搜索框输入函数名
- 高亮显示所有匹配的函数
重置视图:
- 点击 “Reset Zoom” 恢复原始视图

火焰图 vs Graph 图

特性	Flame Graph	Graph
展示方式	水平堆叠条形图	有向图
调用关系	垂直层次	箭头连接
适用场景	快速定位热点	理解调用关系
可读性	直观，易于发现宽矩形	清晰，易于理解调用链
交互性	点击放大，搜索高亮	点击高亮，可拖拽

火焰图实际应用

场景 1: 快速定位 CPU 热点

1 2	# 获取 CPU profile go tool pprof -http=:8080 http://localhost:6060/debug/pprof/profile?seconds=30

在 Flame Graph 中：

查看最宽的矩形（占用 CPU 最多的函数）
查看该函数的调用链（从顶部到底部）
识别可以优化的函数

示例分析：

graph TD
    A["main.main
(很宽)"] --> B["main.processData
(很宽)"]
    B --> C["main.expensiveOperation
(很宽)"]
    
    style A fill:#ffcccc
    style B fill:#ff9999
    style C fill:#ff6666

说明 processData 和 expensiveOperation 是性能瓶颈

场景 2: 内存分配分析

1 2	# 获取内存 profile go tool pprof -http=:8080 http://localhost:6060/debug/pprof/heap

在 Flame Graph 中：

找出分配内存最多的函数（最宽的矩形）
查看内存分配的调用链
识别可以优化的内存分配点

场景 3: Goroutine 分析

1 2	# 获取 goroutine profile go tool pprof -http=:8080 http://localhost:6060/debug/pprof/goroutine

在 Flame Graph 中：

查看 goroutine 的分布情况
找出创建 goroutine 最多的函数
识别可能的 goroutine 泄漏

火焰图优化建议

1. 关注最宽的部分

最宽的矩形通常代表最大的性能瓶颈，应该优先优化。

2. 查看完整的调用链

从顶部到底部的完整路径可以帮助理解函数的调用上下文。

3. 对比不同时间点的火焰图

通过对比优化前后的火焰图，可以验证优化效果：

# 优化前
go tool pprof -http=:8080 cpu_before.prof

# 优化后
go tool pprof -http=:8080 cpu_after.prof

4. 结合其他视图

Flame Graph: 快速定位热点
Graph: 理解调用关系
Top: 查看具体数值
Source: 查看源代码

火焰图示例解读

假设看到如下火焰图：

graph TD
    A["main.main
100%"] --> B["handler
60%"]
    A --> C["other
40%"]
    B --> D["process
50%"]
    D --> E["compute
40%"]
    
    style A fill:#ffcccc
    style B fill:#ff9999
    style C fill:#ccffcc
    style D fill:#ff6666
    style E fill:#ff3333

解读：

main.main 占用 100% 的资源（根节点）
handler 占用 60% 的资源，是主要瓶颈
process 占用 50% 的资源（在 handler 的调用链中）
compute 占用 40% 的资源（在 process 的调用链中）
优化建议: 优先优化 compute 函数，因为它占用了 40% 的资源

对比分析

# 对比两个 profile
go tool pprof -base profile1.prof profile2.prof

# 在交互式界面中
(pprof) base profile1.prof
(pprof) top  # 显示差异

最佳实践

1. 生产环境使用

在生产环境中使用 pprof 需要注意：

package main

import (
    "net/http"
    "net/http/pprof"
    "os"
)

func main() {
    // 只在开发环境或特定条件下启用
    if os.Getenv("ENABLE_PPROF") == "true" {
        go func() {
            // 使用独立的端口，避免暴露给外部
            http.ListenAndServe("localhost:6060", nil)
        }()
    }
    
    // 或者使用认证
    mux := http.NewServeMux()
    mux.HandleFunc("/debug/pprof/", func(w http.ResponseWriter, r *http.Request) {
        // 添加认证逻辑
        if !isAuthorized(r) {
            http.Error(w, "Unauthorized", http.StatusUnauthorized)
            return
        }
        pprof.Index(w, r)
    })
    
    http.ListenAndServe(":8080", mux)
}

2. 采样时间设置

CPU Profile: 建议 30-60 秒，太短可能不够准确，太长可能影响性能
Memory Profile: 可以随时获取，建议在内存使用高峰时获取
Trace: 建议 1-5 秒，太长时间会产生大量数据

3. 性能开销

CPU Profiling: 约 1-2% 的性能开销
Memory Profiling: 几乎无开销
Block/Mutex Profiling: 根据采样率，通常很小
Trace: 开销较大，不建议在生产环境长时间开启

4. 定期分析

建议定期进行性能分析：

# 定期获取 profile 的脚本
#!/bin/bash
while true; do
    timestamp=$(date +%Y%m%d_%H%M%S)
    curl http://localhost:6060/debug/pprof/profile?seconds=30 > cpu_${timestamp}.prof
    curl http://localhost:6060/debug/pprof/heap > heap_${timestamp}.prof
    sleep 300  # 每 5 分钟采样一次
done

5. 集成到 CI/CD

可以在 CI/CD 流程中集成性能测试：

# 运行基准测试并生成 profile
go test -bench=. -cpuprofile=cpu.prof -memprofile=mem.prof

# 分析结果
go tool pprof -text cpu.prof
go tool pprof -text mem.prof

常见问题排查

1. CPU 使用率高

# 获取 CPU profile
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30

# 分析
(pprof) top
(pprof) list 占用高的函数

2. 内存泄漏

# 获取两个时间点的 heap profile
curl http://localhost:6060/debug/pprof/heap > heap1.prof
# 等待一段时间
sleep 60
curl http://localhost:6060/debug/pprof/heap > heap2.prof

# 对比分析
go tool pprof -base heap1.prof heap2.prof
(pprof) top

3. Goroutine 泄漏

# 获取 goroutine profile
go tool pprof http://localhost:6060/debug/pprof/goroutine

# 分析
(pprof) top
(pprof) traces

4. 锁竞争

1 2	// 启用 mutex profiling runtime.SetMutexProfileFraction(1)

# 获取 mutex profile
go tool pprof http://localhost:6060/debug/pprof/mutex

# 分析
(pprof) top

总结

pprof 是 Go 语言强大的性能分析工具，可以帮助我们：

找出性能瓶颈: 通过 CPU profiling 找出占用 CPU 最多的函数
检测内存泄漏: 通过 Memory profiling 对比不同时间点的内存使用
分析并发问题: 通过 Goroutine profiling 找出 goroutine 泄漏
优化锁竞争: 通过 Mutex profiling 找出锁竞争热点
追踪程序执行: 通过 Trace 分析程序的详细执行过程

掌握 pprof 的使用，可以大大提高 Go 程序的性能优化效率。

djaigo

https://blog.djaigo.com/golang/pprof.html

本博客所有文章除特別声明外，均采用 CC BY 4.0 许可协议。转载请注明来源 djaigo !

golang pprof 性能分析 profiling

Go 协程常见代码问题与解决方案

2024-01-01 golang

并发 goroutine concurrency race condition 协程

Go 协程相关算法题

2024-01-01 golang

channel 并发 goroutine 协程算法题

golang-pprof性能分析

pprof 简介

主要功能

集成 pprof

集成方式对比

HTTP 服务器集成

自定义 HTTP 服务器

程序化使用

CPU Profiling

获取 CPU Profile

方式 1: 通过 HTTP 接口

方式 2: 程序化生成

分析 CPU Profile

交互式命令行

常用命令详解

生成可视化报告

Memory Profiling

获取 Memory Profile

通过 HTTP 接口

程序化生成

分析 Memory Profile

交互式分析

内存泄漏检测

内存分析示例

Goroutine Profiling

获取 Goroutine Profile

分析 Goroutine Profile

traces 命令示例

Goroutine 泄漏检测示例

Block Profiling

启用 Block Profiling

获取 Block Profile

分析 Block Profile

Mutex Profiling

启用 Mutex Profiling

获取 Mutex Profile

分析 Mutex Profile

Trace 分析

获取 Trace

分析 Trace

Trace 分析示例

实际应用示例

完整的性能分析示例

性能分析流程

命令行工具使用

go tool pprof 常用参数

Web UI 使用

Graph 流线图详解

如何查看 Graph 视图

Graph 图元素说明

节点（Node）

边（Edge）

Graph 图阅读技巧

1. 寻找热点函数

2. 理解调用链

3. 识别性能瓶颈

Graph 图交互操作

Graph 图实际应用

场景 1: CPU 性能分析

场景 2: 内存分析

Flame Graph 火焰图详解

如何查看 Flame Graph

火焰图结构说明

基本结构

关键元素

火焰图阅读技巧

1. 理解调用栈

2. 识别性能热点

3. 查找调用链

4. 对比分析

火焰图交互操作

火焰图 vs Graph 图

火焰图实际应用

场景 1: 快速定位 CPU 热点

场景 2: 内存分配分析

场景 3: Goroutine 分析

火焰图优化建议

1. 关注最宽的部分

2. 查看完整的调用链

3. 对比不同时间点的火焰图