Proxy: Enable pprof endpoints and add WHEP performance analysis guide.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
9b08a3809a
commit
a217ff9a4e
149
docs/perf/proxy-whep.md
Normal file
149
docs/perf/proxy-whep.md
Normal file
|
|
@ -0,0 +1,149 @@
|
|||
# How to Analyze WHEP Performance for the Proxy Server
|
||||
|
||||
This guide walks through profiling the Go proxy under a WHEP (WebRTC play) load.
|
||||
The workload of interest is **one RTMP publisher + N WHEP players**, where N is
|
||||
large enough to stress the proxy's UDP forwarding path (typically 300+).
|
||||
|
||||
When analyzing WHEP performance for the proxy, you should:
|
||||
|
||||
1. Set up the topology: proxy + SRS origin + publisher + WHEP load
|
||||
2. Enable Go pprof on the proxy
|
||||
3. Run the load and let it warm up
|
||||
4. Collect CPU, allocation, heap, goroutine, and trace profiles
|
||||
5. Read the profiles and identify hot spots
|
||||
6. Save profiles to compare before and after a change
|
||||
|
||||
## Step 1: Build and Start the Proxy with pprof
|
||||
|
||||
The proxy reads `GO_PPROF` from the environment and, when set, exposes
|
||||
`net/http/pprof` endpoints at that address. Use the same standard ports SRS
|
||||
uses by default so the publisher and player commands stay unchanged.
|
||||
|
||||
```bash
|
||||
cd ~/git/srs
|
||||
make && env GO_PPROF=:6060 \
|
||||
PROXY_RTMP_SERVER=1935 PROXY_HTTP_SERVER=8080 \
|
||||
PROXY_HTTP_API=1985 PROXY_WEBRTC_SERVER=8000 PROXY_SRT_SERVER=10080 \
|
||||
PROXY_SYSTEM_API=12025 PROXY_LOAD_BALANCER_TYPE=memory \
|
||||
./bin/srs-proxy
|
||||
```
|
||||
|
||||
> The pprof endpoints live under `http://localhost:6060/debug/pprof/`. The
|
||||
> proxy registers them only because `internal/debug/pprof.go` blank-imports
|
||||
> `net/http/pprof`. Without that import the endpoints return 404.
|
||||
|
||||
## Step 2: Start the SRS Origin on Alt Ports
|
||||
|
||||
`origin1-for-proxy.conf` runs SRS on non-standard ports (RTMP 19351, HTTP 8081,
|
||||
API 19851, RTC 8001/udp, SRT 10081) so the proxy can sit on the defaults. SRS
|
||||
auto-registers with the proxy's system API on startup.
|
||||
|
||||
Set `CANDIDATE` to a LAN-reachable IP so the SDP answer the proxy returns
|
||||
points clients at an address they can route to. The proxy only rewrites the
|
||||
candidate **port**; the IP comes from the origin's SDP.
|
||||
|
||||
```bash
|
||||
ulimit -n 10000 && bash -c "cd ~/git/srs/trunk && \
|
||||
CANDIDATE=192.168.3.187 ./objs/srs -c conf/origin1-for-proxy.conf"
|
||||
```
|
||||
|
||||
## Step 3: Run the WHEP Workload
|
||||
|
||||
In separate terminals, start the publisher and the WHEP load generator.
|
||||
|
||||
**Publisher (RTMP):**
|
||||
|
||||
```bash
|
||||
cd ~/git/srs/trunk
|
||||
ffmpeg -stream_loop -1 -re -i doc/source.200kbps.768x320.flv \
|
||||
-c copy -f flv -y rtmp://localhost/live/livestream
|
||||
```
|
||||
|
||||
**WHEP players (use the LAN IP that matches `CANDIDATE`):**
|
||||
|
||||
```bash
|
||||
cd ~/git/srs/trunk/3rdparty/srs-bench
|
||||
./objs/srs_bench -sr webrtc://192.168.3.187/live/livestream -nn 300
|
||||
```
|
||||
|
||||
Let the workload run for at least 30 seconds before sampling. Connection
|
||||
setup churn dominates the first few seconds and will skew profiles taken
|
||||
too early.
|
||||
|
||||
> Sanity-check with `-nn 1` first. If a single WHEP session does not play,
|
||||
> the 300-player run is testing something other than steady-state forwarding.
|
||||
|
||||
## Step 4: Collect Profiles
|
||||
|
||||
Profiles must be collected **while the workload is steady**, not before or
|
||||
after. The CPU profile is the single most useful starting point.
|
||||
|
||||
```bash
|
||||
# CPU profile (30s sample) — interactive web UI on :8123
|
||||
# Use :8123 (or any free port) because :8080 is the proxy's HTTP-FLV/HLS port.
|
||||
go tool pprof -http=:8123 'http://localhost:6060/debug/pprof/profile?seconds=30'
|
||||
|
||||
# Allocation profile — GC pressure / per-packet allocations
|
||||
go tool pprof -http=:8124 http://localhost:6060/debug/pprof/allocs
|
||||
|
||||
# Heap (live memory snapshot)
|
||||
go tool pprof -http=:8125 http://localhost:6060/debug/pprof/heap
|
||||
|
||||
# Goroutine count + stack dump — look for goroutine explosion under load
|
||||
curl -s 'http://localhost:6060/debug/pprof/goroutine?debug=1' | head -50
|
||||
|
||||
# Runtime trace (10s) — GC pauses, scheduler latency, syscall behavior
|
||||
curl -s -o trace.out 'http://localhost:6060/debug/pprof/trace?seconds=10'
|
||||
go tool trace trace.out
|
||||
```
|
||||
|
||||
The web UI requires Graphviz for the Flame Graph and Graph views:
|
||||
|
||||
```bash
|
||||
brew install graphviz # macOS
|
||||
```
|
||||
|
||||
If you cannot install Graphviz, the **Top** view in the web UI is HTML-only
|
||||
and works without it. The CLI form is also unaffected:
|
||||
|
||||
```bash
|
||||
go tool pprof 'http://localhost:6060/debug/pprof/profile?seconds=30'
|
||||
(pprof) top20
|
||||
(pprof) top20 -cum
|
||||
(pprof) list <FunctionName>
|
||||
```
|
||||
|
||||
## Step 5: Read the Profiles
|
||||
|
||||
Open the web UI and use the views in this order:
|
||||
|
||||
1. **Flame Graph** — visual hot path. Wide bars near the top are where time
|
||||
is spent. For 300-player WHEP the path should be dominated by
|
||||
`webRTCProxyServer.Run` and its UDP read/write children.
|
||||
2. **Top** — sorted list by `flat` (self time) and `cum` (cumulative). The
|
||||
top 5–10 functions usually tell the whole story.
|
||||
3. **Graph** — call graph with edge weights. Good for tracing "who calls this
|
||||
hot function".
|
||||
4. **Source** — line-level cost inside a single function. Use after Top has
|
||||
pointed you at a function worth dissecting.
|
||||
|
||||
## Step 6: Save Profiles for Before/After Comparison
|
||||
|
||||
When you change code to fix a hot spot, comparing profiles is the only
|
||||
reliable way to confirm the fix moved the needle (and didn't just shift cost
|
||||
elsewhere).
|
||||
|
||||
```bash
|
||||
# Save the raw profile from a baseline run
|
||||
curl -s -o cpu-before.pb.gz 'http://localhost:6060/debug/pprof/profile?seconds=30'
|
||||
|
||||
# After the code change, sample again under the same workload
|
||||
curl -s -o cpu-after.pb.gz 'http://localhost:6060/debug/pprof/profile?seconds=30'
|
||||
|
||||
# Diff the two
|
||||
go tool pprof -http=:8123 -base cpu-before.pb.gz cpu-after.pb.gz
|
||||
```
|
||||
|
||||
In the diff view, red bars are functions that got more expensive, green
|
||||
bars are functions that got cheaper. The total should shrink overall if
|
||||
the change is a net win.
|
||||
|
|
@ -6,6 +6,7 @@ package debug
|
|||
import (
|
||||
"context"
|
||||
"net/http"
|
||||
_ "net/http/pprof"
|
||||
|
||||
"srsx/internal/env"
|
||||
"srsx/internal/logger"
|
||||
|
|
|
|||
|
|
@ -303,6 +303,9 @@ The knowledge base (`memory/srs-*.md`) captures William's knowledge about SRS
|
|||
- `proxy-load-balancer.md` — Load balancer design: memory vs Redis implementations, stream-to-server mapping, server health via heartbeats, protocol-specific state
|
||||
- `proxy-origin-cluster.md` — Origin cluster tutorial: build proxy + SRS, configure multi-origin with proxy, stream publishing and playback verification
|
||||
|
||||
**Next-Generation Server Performance Docs** (`docs/perf/`) — Performance analysis guides for the Go server:
|
||||
- `proxy-whep.md` — WHEP perf analysis: enable GO_PPROF, run publisher + N WHEP players via srs_bench, collect CPU/alloc/heap/goroutine/trace profiles, read hot spots, diff before/after with `pprof -base`
|
||||
|
||||
**Next-Generation Server API Examples** — Executable API documentation:
|
||||
- `internal/rtmp/example_test.go` — RTMP API examples: AMF0, handshake, and protocol workflow
|
||||
|
||||
|
|
|
|||
Loading…
Reference in New Issue
Block a user