# How to Analyze WHEP Performance for the Proxy Server This guide walks through profiling the Go proxy under a WHEP (WebRTC play) load. The workload of interest is **one RTMP publisher + N WHEP players**, where N is large enough to stress the proxy's UDP forwarding path (typically 300+). When analyzing WHEP performance for the proxy, you should: 1. Set up the topology: proxy + SRS origin + publisher + WHEP load 2. Enable Go pprof on the proxy 3. Run the load and let it warm up 4. Collect CPU, allocation, heap, goroutine, and trace profiles 5. Read the profiles and identify hot spots 6. Save profiles to compare before and after a change ## Step 1: Build and Start the Proxy with pprof The proxy reads `GO_PPROF` from the environment and, when set, exposes `net/http/pprof` endpoints at that address. Use the same standard ports SRS uses by default so the publisher and player commands stay unchanged. ```bash cd ~/git/srs make && env GO_PPROF=:6060 \ PROXY_RTMP_SERVER=1935 PROXY_HTTP_SERVER=8080 \ PROXY_HTTP_API=1985 PROXY_WEBRTC_SERVER=8000 PROXY_SRT_SERVER=10080 \ PROXY_SYSTEM_API=12025 PROXY_LOAD_BALANCER_TYPE=memory \ ./bin/srs-proxy ``` > The pprof endpoints live under `http://localhost:6060/debug/pprof/`. The > proxy registers them only because `internal/debug/pprof.go` blank-imports > `net/http/pprof`. Without that import the endpoints return 404. ## Step 2: Start the SRS Origin on Alt Ports `origin1-for-proxy.conf` runs SRS on non-standard ports (RTMP 19351, HTTP 8081, API 19851, RTC 8001/udp, SRT 10081) so the proxy can sit on the defaults. SRS auto-registers with the proxy's system API on startup. Set `CANDIDATE` to a LAN-reachable IP so the SDP answer the proxy returns points clients at an address they can route to. The proxy only rewrites the candidate **port**; the IP comes from the origin's SDP. ```bash ulimit -n 10000 && bash -c "cd ~/git/srs/trunk && \ CANDIDATE=192.168.3.187 ./objs/srs -c conf/origin1-for-proxy.conf" ``` ## Step 3: Run the WHEP Workload In separate terminals, start the publisher and the WHEP load generator. **Publisher (RTMP):** ```bash cd ~/git/srs/trunk ffmpeg -stream_loop -1 -re -i doc/source.200kbps.768x320.flv \ -c copy -f flv -y rtmp://localhost/live/livestream ``` **WHEP players (use the LAN IP that matches `CANDIDATE`):** ```bash cd ~/git/srs/trunk/3rdparty/srs-bench ./objs/srs_bench -sr webrtc://192.168.3.187/live/livestream -nn 300 ``` Let the workload run for at least 30 seconds before sampling. Connection setup churn dominates the first few seconds and will skew profiles taken too early. > Sanity-check with `-nn 1` first. If a single WHEP session does not play, > the 300-player run is testing something other than steady-state forwarding. ## Step 4: Collect Profiles Profiles must be collected **while the workload is steady**, not before or after. The CPU profile is the single most useful starting point. ```bash # CPU profile (30s sample) — interactive web UI on :8123 # Use :8123 (or any free port) because :8080 is the proxy's HTTP-FLV/HLS port. go tool pprof -http=:8123 'http://localhost:6060/debug/pprof/profile?seconds=30' # Allocation profile — GC pressure / per-packet allocations go tool pprof -http=:8124 http://localhost:6060/debug/pprof/allocs # Heap (live memory snapshot) go tool pprof -http=:8125 http://localhost:6060/debug/pprof/heap # Goroutine count + stack dump — look for goroutine explosion under load curl -s 'http://localhost:6060/debug/pprof/goroutine?debug=1' | head -50 # Runtime trace (10s) — GC pauses, scheduler latency, syscall behavior curl -s -o trace.out 'http://localhost:6060/debug/pprof/trace?seconds=10' go tool trace trace.out ``` The web UI requires Graphviz for the Flame Graph and Graph views: ```bash brew install graphviz # macOS ``` If you cannot install Graphviz, the **Top** view in the web UI is HTML-only and works without it. The CLI form is also unaffected: ```bash go tool pprof 'http://localhost:6060/debug/pprof/profile?seconds=30' (pprof) top20 (pprof) top20 -cum (pprof) list ``` ## Step 5: Read the Profiles Open the web UI and use the views in this order: 1. **Flame Graph** — visual hot path. Wide bars near the top are where time is spent. For 300-player WHEP the path should be dominated by `webRTCProxyServer.Run` and its UDP read/write children. 2. **Top** — sorted list by `flat` (self time) and `cum` (cumulative). The top 5–10 functions usually tell the whole story. 3. **Graph** — call graph with edge weights. Good for tracing "who calls this hot function". 4. **Source** — line-level cost inside a single function. Use after Top has pointed you at a function worth dissecting. ## Step 6: Save Profiles for Before/After Comparison When you change code to fix a hot spot, comparing profiles is the only reliable way to confirm the fix moved the needle (and didn't just shift cost elsewhere). ```bash # Save the raw profile from a baseline run curl -s -o cpu-before.pb.gz 'http://localhost:6060/debug/pprof/profile?seconds=30' # After the code change, sample again under the same workload curl -s -o cpu-after.pb.gz 'http://localhost:6060/debug/pprof/profile?seconds=30' # Diff the two go tool pprof -http=:8123 -base cpu-before.pb.gz cpu-after.pb.gz ``` In the diff view, red bars are functions that got more expensive, green bars are functions that got cheaper. The total should shrink overall if the change is a net win.