diff --git a/docs/perf/proxy-whep.md b/docs/perf/proxy-whep.md new file mode 100644 index 000000000..7b3000b69 --- /dev/null +++ b/docs/perf/proxy-whep.md @@ -0,0 +1,149 @@ +# How to Analyze WHEP Performance for the Proxy Server + +This guide walks through profiling the Go proxy under a WHEP (WebRTC play) load. +The workload of interest is **one RTMP publisher + N WHEP players**, where N is +large enough to stress the proxy's UDP forwarding path (typically 300+). + +When analyzing WHEP performance for the proxy, you should: + +1. Set up the topology: proxy + SRS origin + publisher + WHEP load +2. Enable Go pprof on the proxy +3. Run the load and let it warm up +4. Collect CPU, allocation, heap, goroutine, and trace profiles +5. Read the profiles and identify hot spots +6. Save profiles to compare before and after a change + +## Step 1: Build and Start the Proxy with pprof + +The proxy reads `GO_PPROF` from the environment and, when set, exposes +`net/http/pprof` endpoints at that address. Use the same standard ports SRS +uses by default so the publisher and player commands stay unchanged. + +```bash +cd ~/git/srs +make && env GO_PPROF=:6060 \ + PROXY_RTMP_SERVER=1935 PROXY_HTTP_SERVER=8080 \ + PROXY_HTTP_API=1985 PROXY_WEBRTC_SERVER=8000 PROXY_SRT_SERVER=10080 \ + PROXY_SYSTEM_API=12025 PROXY_LOAD_BALANCER_TYPE=memory \ + ./bin/srs-proxy +``` + +> The pprof endpoints live under `http://localhost:6060/debug/pprof/`. The +> proxy registers them only because `internal/debug/pprof.go` blank-imports +> `net/http/pprof`. Without that import the endpoints return 404. + +## Step 2: Start the SRS Origin on Alt Ports + +`origin1-for-proxy.conf` runs SRS on non-standard ports (RTMP 19351, HTTP 8081, +API 19851, RTC 8001/udp, SRT 10081) so the proxy can sit on the defaults. SRS +auto-registers with the proxy's system API on startup. + +Set `CANDIDATE` to a LAN-reachable IP so the SDP answer the proxy returns +points clients at an address they can route to. The proxy only rewrites the +candidate **port**; the IP comes from the origin's SDP. + +```bash +ulimit -n 10000 && bash -c "cd ~/git/srs/trunk && \ + CANDIDATE=192.168.3.187 ./objs/srs -c conf/origin1-for-proxy.conf" +``` + +## Step 3: Run the WHEP Workload + +In separate terminals, start the publisher and the WHEP load generator. + +**Publisher (RTMP):** + +```bash +cd ~/git/srs/trunk +ffmpeg -stream_loop -1 -re -i doc/source.200kbps.768x320.flv \ + -c copy -f flv -y rtmp://localhost/live/livestream +``` + +**WHEP players (use the LAN IP that matches `CANDIDATE`):** + +```bash +cd ~/git/srs/trunk/3rdparty/srs-bench +./objs/srs_bench -sr webrtc://192.168.3.187/live/livestream -nn 300 +``` + +Let the workload run for at least 30 seconds before sampling. Connection +setup churn dominates the first few seconds and will skew profiles taken +too early. + +> Sanity-check with `-nn 1` first. If a single WHEP session does not play, +> the 300-player run is testing something other than steady-state forwarding. + +## Step 4: Collect Profiles + +Profiles must be collected **while the workload is steady**, not before or +after. The CPU profile is the single most useful starting point. + +```bash +# CPU profile (30s sample) — interactive web UI on :8123 +# Use :8123 (or any free port) because :8080 is the proxy's HTTP-FLV/HLS port. +go tool pprof -http=:8123 'http://localhost:6060/debug/pprof/profile?seconds=30' + +# Allocation profile — GC pressure / per-packet allocations +go tool pprof -http=:8124 http://localhost:6060/debug/pprof/allocs + +# Heap (live memory snapshot) +go tool pprof -http=:8125 http://localhost:6060/debug/pprof/heap + +# Goroutine count + stack dump — look for goroutine explosion under load +curl -s 'http://localhost:6060/debug/pprof/goroutine?debug=1' | head -50 + +# Runtime trace (10s) — GC pauses, scheduler latency, syscall behavior +curl -s -o trace.out 'http://localhost:6060/debug/pprof/trace?seconds=10' +go tool trace trace.out +``` + +The web UI requires Graphviz for the Flame Graph and Graph views: + +```bash +brew install graphviz # macOS +``` + +If you cannot install Graphviz, the **Top** view in the web UI is HTML-only +and works without it. The CLI form is also unaffected: + +```bash +go tool pprof 'http://localhost:6060/debug/pprof/profile?seconds=30' +(pprof) top20 +(pprof) top20 -cum +(pprof) list +``` + +## Step 5: Read the Profiles + +Open the web UI and use the views in this order: + +1. **Flame Graph** — visual hot path. Wide bars near the top are where time + is spent. For 300-player WHEP the path should be dominated by + `webRTCProxyServer.Run` and its UDP read/write children. +2. **Top** — sorted list by `flat` (self time) and `cum` (cumulative). The + top 5–10 functions usually tell the whole story. +3. **Graph** — call graph with edge weights. Good for tracing "who calls this + hot function". +4. **Source** — line-level cost inside a single function. Use after Top has + pointed you at a function worth dissecting. + +## Step 6: Save Profiles for Before/After Comparison + +When you change code to fix a hot spot, comparing profiles is the only +reliable way to confirm the fix moved the needle (and didn't just shift cost +elsewhere). + +```bash +# Save the raw profile from a baseline run +curl -s -o cpu-before.pb.gz 'http://localhost:6060/debug/pprof/profile?seconds=30' + +# After the code change, sample again under the same workload +curl -s -o cpu-after.pb.gz 'http://localhost:6060/debug/pprof/profile?seconds=30' + +# Diff the two +go tool pprof -http=:8123 -base cpu-before.pb.gz cpu-after.pb.gz +``` + +In the diff view, red bars are functions that got more expensive, green +bars are functions that got cheaper. The total should shrink overall if +the change is a net win. diff --git a/internal/debug/pprof.go b/internal/debug/pprof.go index b89923e38..bae05e71f 100644 --- a/internal/debug/pprof.go +++ b/internal/debug/pprof.go @@ -6,6 +6,7 @@ package debug import ( "context" "net/http" + _ "net/http/pprof" "srsx/internal/env" "srsx/internal/logger" diff --git a/memory/srs-codebase-map.md b/memory/srs-codebase-map.md index 4ef507e60..987ddda2e 100644 --- a/memory/srs-codebase-map.md +++ b/memory/srs-codebase-map.md @@ -303,6 +303,9 @@ The knowledge base (`memory/srs-*.md`) captures William's knowledge about SRS - `proxy-load-balancer.md` — Load balancer design: memory vs Redis implementations, stream-to-server mapping, server health via heartbeats, protocol-specific state - `proxy-origin-cluster.md` — Origin cluster tutorial: build proxy + SRS, configure multi-origin with proxy, stream publishing and playback verification +**Next-Generation Server Performance Docs** (`docs/perf/`) — Performance analysis guides for the Go server: +- `proxy-whep.md` — WHEP perf analysis: enable GO_PPROF, run publisher + N WHEP players via srs_bench, collect CPU/alloc/heap/goroutine/trace profiles, read hot spots, diff before/after with `pprof -base` + **Next-Generation Server API Examples** — Executable API documentation: - `internal/rtmp/example_test.go` — RTMP API examples: AMF0, handshake, and protocol workflow