winlin a217ff9a4e Proxy: Enable pprof endpoints and add WHEP performance analysis guide.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-17 18:18:46 -04:00

5.4 KiB

Raw Blame History

How to Analyze WHEP Performance for the Proxy Server

This guide walks through profiling the Go proxy under a WHEP (WebRTC play) load. The workload of interest is one RTMP publisher + N WHEP players, where N is large enough to stress the proxy's UDP forwarding path (typically 300+).

When analyzing WHEP performance for the proxy, you should:

Set up the topology: proxy + SRS origin + publisher + WHEP load
Enable Go pprof on the proxy
Run the load and let it warm up
Collect CPU, allocation, heap, goroutine, and trace profiles
Read the profiles and identify hot spots
Save profiles to compare before and after a change

Step 1: Build and Start the Proxy with pprof

The proxy reads GO_PPROF from the environment and, when set, exposes net/http/pprof endpoints at that address. Use the same standard ports SRS uses by default so the publisher and player commands stay unchanged.

cd ~/git/srs
make && env GO_PPROF=:6060 \
    PROXY_RTMP_SERVER=1935 PROXY_HTTP_SERVER=8080 \
    PROXY_HTTP_API=1985 PROXY_WEBRTC_SERVER=8000 PROXY_SRT_SERVER=10080 \
    PROXY_SYSTEM_API=12025 PROXY_LOAD_BALANCER_TYPE=memory \
    ./bin/srs-proxy

The pprof endpoints live under http://localhost:6060/debug/pprof/. The proxy registers them only because internal/debug/pprof.go blank-imports net/http/pprof. Without that import the endpoints return 404.

Step 2: Start the SRS Origin on Alt Ports

origin1-for-proxy.conf runs SRS on non-standard ports (RTMP 19351, HTTP 8081, API 19851, RTC 8001/udp, SRT 10081) so the proxy can sit on the defaults. SRS auto-registers with the proxy's system API on startup.

Set CANDIDATE to a LAN-reachable IP so the SDP answer the proxy returns points clients at an address they can route to. The proxy only rewrites the candidate port; the IP comes from the origin's SDP.

ulimit -n 10000 && bash -c "cd ~/git/srs/trunk && \
    CANDIDATE=192.168.3.187 ./objs/srs -c conf/origin1-for-proxy.conf"

Step 3: Run the WHEP Workload

In separate terminals, start the publisher and the WHEP load generator.

Publisher (RTMP):

cd ~/git/srs/trunk
ffmpeg -stream_loop -1 -re -i doc/source.200kbps.768x320.flv \
    -c copy -f flv -y rtmp://localhost/live/livestream

WHEP players (use the LAN IP that matches CANDIDATE):

cd ~/git/srs/trunk/3rdparty/srs-bench
./objs/srs_bench -sr webrtc://192.168.3.187/live/livestream -nn 300

Let the workload run for at least 30 seconds before sampling. Connection setup churn dominates the first few seconds and will skew profiles taken too early.

Sanity-check with -nn 1 first. If a single WHEP session does not play, the 300-player run is testing something other than steady-state forwarding.

Step 4: Collect Profiles

Profiles must be collected while the workload is steady, not before or after. The CPU profile is the single most useful starting point.

# CPU profile (30s sample) — interactive web UI on :8123
# Use :8123 (or any free port) because :8080 is the proxy's HTTP-FLV/HLS port.
go tool pprof -http=:8123 'http://localhost:6060/debug/pprof/profile?seconds=30'

# Allocation profile — GC pressure / per-packet allocations
go tool pprof -http=:8124 http://localhost:6060/debug/pprof/allocs

# Heap (live memory snapshot)
go tool pprof -http=:8125 http://localhost:6060/debug/pprof/heap

# Goroutine count + stack dump — look for goroutine explosion under load
curl -s 'http://localhost:6060/debug/pprof/goroutine?debug=1' | head -50

# Runtime trace (10s) — GC pauses, scheduler latency, syscall behavior
curl -s -o trace.out 'http://localhost:6060/debug/pprof/trace?seconds=10'
go tool trace trace.out

The web UI requires Graphviz for the Flame Graph and Graph views:

brew install graphviz   # macOS

If you cannot install Graphviz, the Top view in the web UI is HTML-only and works without it. The CLI form is also unaffected:

go tool pprof 'http://localhost:6060/debug/pprof/profile?seconds=30'
(pprof) top20
(pprof) top20 -cum
(pprof) list <FunctionName>

Step 5: Read the Profiles

Open the web UI and use the views in this order:

Flame Graph — visual hot path. Wide bars near the top are where time is spent. For 300-player WHEP the path should be dominated by webRTCProxyServer.Run and its UDP read/write children.
Top — sorted list by flat (self time) and cum (cumulative). The top 5–10 functions usually tell the whole story.
Graph — call graph with edge weights. Good for tracing "who calls this hot function".
Source — line-level cost inside a single function. Use after Top has pointed you at a function worth dissecting.

Step 6: Save Profiles for Before/After Comparison

When you change code to fix a hot spot, comparing profiles is the only reliable way to confirm the fix moved the needle (and didn't just shift cost elsewhere).

# Save the raw profile from a baseline run
curl -s -o cpu-before.pb.gz 'http://localhost:6060/debug/pprof/profile?seconds=30'

# After the code change, sample again under the same workload
curl -s -o cpu-after.pb.gz 'http://localhost:6060/debug/pprof/profile?seconds=30'

# Diff the two
go tool pprof -http=:8123 -base cpu-before.pb.gz cpu-after.pb.gz

In the diff view, red bars are functions that got more expensive, green bars are functions that got cheaper. The total should shrink overall if the change is a net win.

5.4 KiB Raw Blame History Unescape Escape