150 lines
5.4 KiB
Markdown
150 lines
5.4 KiB
Markdown
# How to Analyze WHEP Performance for the Proxy Server
|
||
|
||
This guide walks through profiling the Go proxy under a WHEP (WebRTC play) load.
|
||
The workload of interest is **one RTMP publisher + N WHEP players**, where N is
|
||
large enough to stress the proxy's UDP forwarding path (typically 300+).
|
||
|
||
When analyzing WHEP performance for the proxy, you should:
|
||
|
||
1. Set up the topology: proxy + SRS origin + publisher + WHEP load
|
||
2. Enable Go pprof on the proxy
|
||
3. Run the load and let it warm up
|
||
4. Collect CPU, allocation, heap, goroutine, and trace profiles
|
||
5. Read the profiles and identify hot spots
|
||
6. Save profiles to compare before and after a change
|
||
|
||
## Step 1: Build and Start the Proxy with pprof
|
||
|
||
The proxy reads `GO_PPROF` from the environment and, when set, exposes
|
||
`net/http/pprof` endpoints at that address. Use the same standard ports SRS
|
||
uses by default so the publisher and player commands stay unchanged.
|
||
|
||
```bash
|
||
cd ~/git/srs
|
||
make && env GO_PPROF=:6060 \
|
||
PROXY_RTMP_SERVER=1935 PROXY_HTTP_SERVER=8080 \
|
||
PROXY_HTTP_API=1985 PROXY_WEBRTC_SERVER=8000 PROXY_SRT_SERVER=10080 \
|
||
PROXY_SYSTEM_API=12025 PROXY_LOAD_BALANCER_TYPE=memory \
|
||
./bin/srs-proxy
|
||
```
|
||
|
||
> The pprof endpoints live under `http://localhost:6060/debug/pprof/`. The
|
||
> proxy registers them only because `internal/debug/pprof.go` blank-imports
|
||
> `net/http/pprof`. Without that import the endpoints return 404.
|
||
|
||
## Step 2: Start the SRS Origin on Alt Ports
|
||
|
||
`origin1-for-proxy.conf` runs SRS on non-standard ports (RTMP 19351, HTTP 8081,
|
||
API 19851, RTC 8001/udp, SRT 10081) so the proxy can sit on the defaults. SRS
|
||
auto-registers with the proxy's system API on startup.
|
||
|
||
Set `CANDIDATE` to a LAN-reachable IP so the SDP answer the proxy returns
|
||
points clients at an address they can route to. The proxy only rewrites the
|
||
candidate **port**; the IP comes from the origin's SDP.
|
||
|
||
```bash
|
||
ulimit -n 10000 && bash -c "cd ~/git/srs/trunk && \
|
||
CANDIDATE=192.168.3.187 ./objs/srs -c conf/origin1-for-proxy.conf"
|
||
```
|
||
|
||
## Step 3: Run the WHEP Workload
|
||
|
||
In separate terminals, start the publisher and the WHEP load generator.
|
||
|
||
**Publisher (RTMP):**
|
||
|
||
```bash
|
||
cd ~/git/srs/trunk
|
||
ffmpeg -stream_loop -1 -re -i doc/source.200kbps.768x320.flv \
|
||
-c copy -f flv -y rtmp://localhost/live/livestream
|
||
```
|
||
|
||
**WHEP players (use the LAN IP that matches `CANDIDATE`):**
|
||
|
||
```bash
|
||
cd ~/git/srs/trunk/3rdparty/srs-bench
|
||
./objs/srs_bench -sr webrtc://192.168.3.187/live/livestream -nn 300
|
||
```
|
||
|
||
Let the workload run for at least 30 seconds before sampling. Connection
|
||
setup churn dominates the first few seconds and will skew profiles taken
|
||
too early.
|
||
|
||
> Sanity-check with `-nn 1` first. If a single WHEP session does not play,
|
||
> the 300-player run is testing something other than steady-state forwarding.
|
||
|
||
## Step 4: Collect Profiles
|
||
|
||
Profiles must be collected **while the workload is steady**, not before or
|
||
after. The CPU profile is the single most useful starting point.
|
||
|
||
```bash
|
||
# CPU profile (30s sample) — interactive web UI on :8123
|
||
# Use :8123 (or any free port) because :8080 is the proxy's HTTP-FLV/HLS port.
|
||
go tool pprof -http=:8123 'http://localhost:6060/debug/pprof/profile?seconds=30'
|
||
|
||
# Allocation profile — GC pressure / per-packet allocations
|
||
go tool pprof -http=:8124 http://localhost:6060/debug/pprof/allocs
|
||
|
||
# Heap (live memory snapshot)
|
||
go tool pprof -http=:8125 http://localhost:6060/debug/pprof/heap
|
||
|
||
# Goroutine count + stack dump — look for goroutine explosion under load
|
||
curl -s 'http://localhost:6060/debug/pprof/goroutine?debug=1' | head -50
|
||
|
||
# Runtime trace (10s) — GC pauses, scheduler latency, syscall behavior
|
||
curl -s -o trace.out 'http://localhost:6060/debug/pprof/trace?seconds=10'
|
||
go tool trace trace.out
|
||
```
|
||
|
||
The web UI requires Graphviz for the Flame Graph and Graph views:
|
||
|
||
```bash
|
||
brew install graphviz # macOS
|
||
```
|
||
|
||
If you cannot install Graphviz, the **Top** view in the web UI is HTML-only
|
||
and works without it. The CLI form is also unaffected:
|
||
|
||
```bash
|
||
go tool pprof 'http://localhost:6060/debug/pprof/profile?seconds=30'
|
||
(pprof) top20
|
||
(pprof) top20 -cum
|
||
(pprof) list <FunctionName>
|
||
```
|
||
|
||
## Step 5: Read the Profiles
|
||
|
||
Open the web UI and use the views in this order:
|
||
|
||
1. **Flame Graph** — visual hot path. Wide bars near the top are where time
|
||
is spent. For 300-player WHEP the path should be dominated by
|
||
`webRTCProxyServer.Run` and its UDP read/write children.
|
||
2. **Top** — sorted list by `flat` (self time) and `cum` (cumulative). The
|
||
top 5–10 functions usually tell the whole story.
|
||
3. **Graph** — call graph with edge weights. Good for tracing "who calls this
|
||
hot function".
|
||
4. **Source** — line-level cost inside a single function. Use after Top has
|
||
pointed you at a function worth dissecting.
|
||
|
||
## Step 6: Save Profiles for Before/After Comparison
|
||
|
||
When you change code to fix a hot spot, comparing profiles is the only
|
||
reliable way to confirm the fix moved the needle (and didn't just shift cost
|
||
elsewhere).
|
||
|
||
```bash
|
||
# Save the raw profile from a baseline run
|
||
curl -s -o cpu-before.pb.gz 'http://localhost:6060/debug/pprof/profile?seconds=30'
|
||
|
||
# After the code change, sample again under the same workload
|
||
curl -s -o cpu-after.pb.gz 'http://localhost:6060/debug/pprof/profile?seconds=30'
|
||
|
||
# Diff the two
|
||
go tool pprof -http=:8123 -base cpu-before.pb.gz cpu-after.pb.gz
|
||
```
|
||
|
||
In the diff view, red bars are functions that got more expensive, green
|
||
bars are functions that got cheaper. The total should shrink overall if
|
||
the change is a net win.
|