Proxy: Enable pprof endpoints and add WHEP performance analysis guide.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 18:18:46 -04:00 · 2026-05-17 18:18:46 -04:00 · a217ff9a4e
commit a217ff9a4e
parent 9b08a3809a
3 changed files with 153 additions and 0 deletions
--- a/docs/perf/proxy-whep.md
+++ b/docs/perf/proxy-whep.md
@ -0,0 +1,149 @@
+# How to Analyze WHEP Performance for the Proxy Server
+
+This guide walks through profiling the Go proxy under a WHEP (WebRTC play) load.
+The workload of interest is **one RTMP publisher + N WHEP players**, where N is
+large enough to stress the proxy's UDP forwarding path (typically 300+).
+
+When analyzing WHEP performance for the proxy, you should:
+
+1. Set up the topology: proxy + SRS origin + publisher + WHEP load
+2. Enable Go pprof on the proxy
+3. Run the load and let it warm up
+4. Collect CPU, allocation, heap, goroutine, and trace profiles
+5. Read the profiles and identify hot spots
+6. Save profiles to compare before and after a change
+
+## Step 1: Build and Start the Proxy with pprof
+
+The proxy reads `GO_PPROF` from the environment and, when set, exposes
+`net/http/pprof` endpoints at that address. Use the same standard ports SRS
+uses by default so the publisher and player commands stay unchanged.
+
+```bash
+cd ~/git/srs
+make && env GO_PPROF=:6060 \
+    PROXY_RTMP_SERVER=1935 PROXY_HTTP_SERVER=8080 \
+    PROXY_HTTP_API=1985 PROXY_WEBRTC_SERVER=8000 PROXY_SRT_SERVER=10080 \
+    PROXY_SYSTEM_API=12025 PROXY_LOAD_BALANCER_TYPE=memory \
+    ./bin/srs-proxy
+```
+
+> The pprof endpoints live under `http://localhost:6060/debug/pprof/`. The
+> proxy registers them only because `internal/debug/pprof.go` blank-imports
+> `net/http/pprof`. Without that import the endpoints return 404.
+
+## Step 2: Start the SRS Origin on Alt Ports
+
+`origin1-for-proxy.conf` runs SRS on non-standard ports (RTMP 19351, HTTP 8081,
+API 19851, RTC 8001/udp, SRT 10081) so the proxy can sit on the defaults. SRS
+auto-registers with the proxy's system API on startup.
+
+Set `CANDIDATE` to a LAN-reachable IP so the SDP answer the proxy returns
+points clients at an address they can route to. The proxy only rewrites the
+candidate **port**; the IP comes from the origin's SDP.
+
+```bash
+ulimit -n 10000 && bash -c "cd ~/git/srs/trunk && \
+    CANDIDATE=192.168.3.187 ./objs/srs -c conf/origin1-for-proxy.conf"
+```
+
+## Step 3: Run the WHEP Workload
+
+In separate terminals, start the publisher and the WHEP load generator.
+
+**Publisher (RTMP):**
+
+```bash
+cd ~/git/srs/trunk
+ffmpeg -stream_loop -1 -re -i doc/source.200kbps.768x320.flv \
+    -c copy -f flv -y rtmp://localhost/live/livestream
+```
+
+**WHEP players (use the LAN IP that matches `CANDIDATE`):**
+
+```bash
+cd ~/git/srs/trunk/3rdparty/srs-bench
+./objs/srs_bench -sr webrtc://192.168.3.187/live/livestream -nn 300
+```
+
+Let the workload run for at least 30 seconds before sampling. Connection
+setup churn dominates the first few seconds and will skew profiles taken
+too early.
+
+> Sanity-check with `-nn 1` first. If a single WHEP session does not play,
+> the 300-player run is testing something other than steady-state forwarding.
+
+## Step 4: Collect Profiles
+
+Profiles must be collected **while the workload is steady**, not before or
+after. The CPU profile is the single most useful starting point.
+
+```bash
+# CPU profile (30s sample) — interactive web UI on :8123
+# Use :8123 (or any free port) because :8080 is the proxy's HTTP-FLV/HLS port.
+go tool pprof -http=:8123 'http://localhost:6060/debug/pprof/profile?seconds=30'
+
+# Allocation profile — GC pressure / per-packet allocations
+go tool pprof -http=:8124 http://localhost:6060/debug/pprof/allocs
+
+# Heap (live memory snapshot)
+go tool pprof -http=:8125 http://localhost:6060/debug/pprof/heap
+
+# Goroutine count + stack dump — look for goroutine explosion under load
+curl -s 'http://localhost:6060/debug/pprof/goroutine?debug=1' | head -50
+
+# Runtime trace (10s) — GC pauses, scheduler latency, syscall behavior
+curl -s -o trace.out 'http://localhost:6060/debug/pprof/trace?seconds=10'
+go tool trace trace.out
+```
+
+The web UI requires Graphviz for the Flame Graph and Graph views:
+
+```bash
+brew install graphviz   # macOS
+```
+
+If you cannot install Graphviz, the **Top** view in the web UI is HTML-only
+and works without it. The CLI form is also unaffected:
+
+```bash
+go tool pprof 'http://localhost:6060/debug/pprof/profile?seconds=30'
+(pprof) top20
+(pprof) top20 -cum
+(pprof) list <FunctionName>
+```
+
+## Step 5: Read the Profiles
+
+Open the web UI and use the views in this order:
+
+1. **Flame Graph** — visual hot path. Wide bars near the top are where time
+   is spent. For 300-player WHEP the path should be dominated by
+   `webRTCProxyServer.Run` and its UDP read/write children.
+2. **Top** — sorted list by `flat` (self time) and `cum` (cumulative). The
+   top 5–10 functions usually tell the whole story.
+3. **Graph** — call graph with edge weights. Good for tracing "who calls this
+   hot function".
+4. **Source** — line-level cost inside a single function. Use after Top has
+   pointed you at a function worth dissecting.
+
+## Step 6: Save Profiles for Before/After Comparison
+
+When you change code to fix a hot spot, comparing profiles is the only
+reliable way to confirm the fix moved the needle (and didn't just shift cost
+elsewhere).
+
+```bash
+# Save the raw profile from a baseline run
+curl -s -o cpu-before.pb.gz 'http://localhost:6060/debug/pprof/profile?seconds=30'
+
+# After the code change, sample again under the same workload
+curl -s -o cpu-after.pb.gz 'http://localhost:6060/debug/pprof/profile?seconds=30'
+
+# Diff the two
+go tool pprof -http=:8123 -base cpu-before.pb.gz cpu-after.pb.gz
+```
+
+In the diff view, red bars are functions that got more expensive, green
+bars are functions that got cheaper. The total should shrink overall if
+the change is a net win.
--- a/internal/debug/pprof.go
+++ b/internal/debug/pprof.go
@ -6,6 +6,7 @@ package debug
 import (
 	"context"
 	"net/http"
+	_ "net/http/pprof"

 	"srsx/internal/env"
 	"srsx/internal/logger"
--- a/memory/srs-codebase-map.md
+++ b/memory/srs-codebase-map.md
@ -303,6 +303,9 @@ The knowledge base (`memory/srs-*.md`) captures William's knowledge about SRS
 - `proxy-load-balancer.md` — Load balancer design: memory vs Redis implementations, stream-to-server mapping, server health via heartbeats, protocol-specific state
 - `proxy-origin-cluster.md` — Origin cluster tutorial: build proxy + SRS, configure multi-origin with proxy, stream publishing and playback verification

+**Next-Generation Server Performance Docs** (`docs/perf/`) — Performance analysis guides for the Go server:
+- `proxy-whep.md` — WHEP perf analysis: enable GO_PPROF, run publisher + N WHEP players via srs_bench, collect CPU/alloc/heap/goroutine/trace profiles, read hot spots, diff before/after with `pprof -base`
+
 **Next-Generation Server API Examples** — Executable API documentation:
 - `internal/rtmp/example_test.go` — RTMP API examples: AMF0, handshake, and protocol workflow