5.4 KiB
How to Analyze WHEP Performance for the Proxy Server
This guide walks through profiling the Go proxy under a WHEP (WebRTC play) load. The workload of interest is one RTMP publisher + N WHEP players, where N is large enough to stress the proxy's UDP forwarding path (typically 300+).
When analyzing WHEP performance for the proxy, you should:
- Set up the topology: proxy + SRS origin + publisher + WHEP load
- Enable Go pprof on the proxy
- Run the load and let it warm up
- Collect CPU, allocation, heap, goroutine, and trace profiles
- Read the profiles and identify hot spots
- Save profiles to compare before and after a change
Step 1: Build and Start the Proxy with pprof
The proxy reads GO_PPROF from the environment and, when set, exposes
net/http/pprof endpoints at that address. Use the same standard ports SRS
uses by default so the publisher and player commands stay unchanged.
cd ~/git/srs
make && env GO_PPROF=:6060 \
PROXY_RTMP_SERVER=1935 PROXY_HTTP_SERVER=8080 \
PROXY_HTTP_API=1985 PROXY_WEBRTC_SERVER=8000 PROXY_SRT_SERVER=10080 \
PROXY_SYSTEM_API=12025 PROXY_LOAD_BALANCER_TYPE=memory \
./bin/srs-proxy
The pprof endpoints live under
http://localhost:6060/debug/pprof/. The proxy registers them only becauseinternal/debug/pprof.goblank-importsnet/http/pprof. Without that import the endpoints return 404.
Step 2: Start the SRS Origin on Alt Ports
origin1-for-proxy.conf runs SRS on non-standard ports (RTMP 19351, HTTP 8081,
API 19851, RTC 8001/udp, SRT 10081) so the proxy can sit on the defaults. SRS
auto-registers with the proxy's system API on startup.
Set CANDIDATE to a LAN-reachable IP so the SDP answer the proxy returns
points clients at an address they can route to. The proxy only rewrites the
candidate port; the IP comes from the origin's SDP.
ulimit -n 10000 && bash -c "cd ~/git/srs/trunk && \
CANDIDATE=192.168.3.187 ./objs/srs -c conf/origin1-for-proxy.conf"
Step 3: Run the WHEP Workload
In separate terminals, start the publisher and the WHEP load generator.
Publisher (RTMP):
cd ~/git/srs/trunk
ffmpeg -stream_loop -1 -re -i doc/source.200kbps.768x320.flv \
-c copy -f flv -y rtmp://localhost/live/livestream
WHEP players (use the LAN IP that matches CANDIDATE):
cd ~/git/srs/trunk/3rdparty/srs-bench
./objs/srs_bench -sr webrtc://192.168.3.187/live/livestream -nn 300
Let the workload run for at least 30 seconds before sampling. Connection setup churn dominates the first few seconds and will skew profiles taken too early.
Sanity-check with
-nn 1first. If a single WHEP session does not play, the 300-player run is testing something other than steady-state forwarding.
Step 4: Collect Profiles
Profiles must be collected while the workload is steady, not before or after. The CPU profile is the single most useful starting point.
# CPU profile (30s sample) — interactive web UI on :8123
# Use :8123 (or any free port) because :8080 is the proxy's HTTP-FLV/HLS port.
go tool pprof -http=:8123 'http://localhost:6060/debug/pprof/profile?seconds=30'
# Allocation profile — GC pressure / per-packet allocations
go tool pprof -http=:8124 http://localhost:6060/debug/pprof/allocs
# Heap (live memory snapshot)
go tool pprof -http=:8125 http://localhost:6060/debug/pprof/heap
# Goroutine count + stack dump — look for goroutine explosion under load
curl -s 'http://localhost:6060/debug/pprof/goroutine?debug=1' | head -50
# Runtime trace (10s) — GC pauses, scheduler latency, syscall behavior
curl -s -o trace.out 'http://localhost:6060/debug/pprof/trace?seconds=10'
go tool trace trace.out
The web UI requires Graphviz for the Flame Graph and Graph views:
brew install graphviz # macOS
If you cannot install Graphviz, the Top view in the web UI is HTML-only and works without it. The CLI form is also unaffected:
go tool pprof 'http://localhost:6060/debug/pprof/profile?seconds=30'
(pprof) top20
(pprof) top20 -cum
(pprof) list <FunctionName>
Step 5: Read the Profiles
Open the web UI and use the views in this order:
- Flame Graph — visual hot path. Wide bars near the top are where time
is spent. For 300-player WHEP the path should be dominated by
webRTCProxyServer.Runand its UDP read/write children. - Top — sorted list by
flat(self time) andcum(cumulative). The top 5–10 functions usually tell the whole story. - Graph — call graph with edge weights. Good for tracing "who calls this hot function".
- Source — line-level cost inside a single function. Use after Top has pointed you at a function worth dissecting.
Step 6: Save Profiles for Before/After Comparison
When you change code to fix a hot spot, comparing profiles is the only reliable way to confirm the fix moved the needle (and didn't just shift cost elsewhere).
# Save the raw profile from a baseline run
curl -s -o cpu-before.pb.gz 'http://localhost:6060/debug/pprof/profile?seconds=30'
# After the code change, sample again under the same workload
curl -s -o cpu-after.pb.gz 'http://localhost:6060/debug/pprof/profile?seconds=30'
# Diff the two
go tool pprof -http=:8123 -base cpu-before.pb.gz cpu-after.pb.gz
In the diff view, red bars are functions that got more expensive, green bars are functions that got cheaper. The total should shrink overall if the change is a net win.