winlinvip/Load Balancing Streaming Servers.md

## Load Balancing Streaming Servers.md

      
    Raw
  

              Load Balancing Streaming Servers.md
            
          
    Load Balancing Streaming Servers


Written by Winlin, azusachino, gforce07

When our business workload exceeds streaming server capacity, we must balance workload. Normally, the problem can be solved by clustering. Clustering is not the only way to solve the problem, though. Sometimes the concept of Load Balance can be linked to many emerging terms such as Service Discovery, but LoadBalancer on cloud service is an indispensable requirement for solving the problem. In short, this problem is very complicated, and many people ask me about this on multiple occasions. Here I’m going to systematically discuss the issue.
If you already have the answers to the questions below and understand mechanisms behind the matters, you may skip this article:


Does SRS need NGINX, F5 or HAProxy as stream proxy? No, not at all. Who thinks the above three are needed misunderstood how streaming media balances its loads. However, for HTTPS, we recommend using NGINX, F5 or HAProxy. To reduce IP address for external service, cloud LoadBalancer is also recommended for use.


How to discover SRS edge nodes? How to discover Origin nodes? Edge nodes can be found through DNS and HTTP-DNS; Origin nodes should not be directly exposed to Clients for connection.


What is special about WebRTC in terms of Service Discovery? Due to high XXX consumption, the threshold for load balance should be set lower. In single PeerConnection streaming traffic changes dynamically, making it even harder to balance loads; For mobile devices, UDP switching between networks and IP drift will introduce more problems.


Comparing DNS and HTTP-DNS, which one is more suitable for streaming media Service Discovery? The answer is of course HTTP-DNS. Because of streaming server’s special load changing pattern, its load would be much greater than that of a Web server, considering when 1K more clients are added to the server.


Is that Load Balance should only consider how to lower system load? The foremost target is to lower system load, or prevent the system from crash; when the load is similar, we should also consider factors like provider service nearby and service cost.


Can Load Balance be achieved by only adding one layer of server? Normally, a large scale CDN distribution system is layered, despite it is designed following static tree structure or dynamic MESH structure, streaming capability is increased thanks to layering; meanwhile, using REUSEPORT enable node with more load by multi-processing, without adding layer.


Is that Load Balance can only be achieved through Clustering?  Clustering is a very fundamental way to increase system capacity. Besides that, combining business segmentation and Vhost to achieve system isolation, and diverging business and users via consistent hashing can also increase system capability.


Well, let's discuss load and load balancing in detail.
What is Load

Before addressing load balancing, let's define what is load. For a server, load is the consequence of increasing resource consumption, at the time of handling more requests from clients. Such an increase may cause the service to become unavailable when there is a serious imbalance in terms of resource consumption.For example, all clients behave abnormally when CPU reaches 100%.
For streaming servers, load is caused by streaming clients consuming server resources. Load is generally evaluated from following perspectives:


CPU, the CPU’s computational resources consumed by the server. In general, we would classify the cases that consume huge amounts of CPU resources as computational-intensive scenarios, whereas the cases that consume large network bandwidth as I/O-intensive scenarios. Most of the time, live streaming is I/O-intensive scenario, hence usually CPU is not the foremost bottleneck. Except in RTC, which is a both I/O- intensive and computational-intensive scenario. This is the main difference.


Network bandwidth, network bandwidth is consumed when transmitting livestream. As mentioned, live streaming is an I/O-intensive scenario, hence bandwidth becomes an critical bottleneck. In addition, scenarios such as RTC are also computational-intensive at the same time.


Disk, when there is no need to record and support HLS, disk is not a critical bottleneck. However, in case recording and HLS are necessary, disk becomes a very critical bottleneck. Simply because disk is the slowest. Virtual RAM drive is usually introduced to solve disk slowness because RAM is considerably more affordable nowadays.


RAM, is a less consumed resource in streaming servers, relatively speaking. Despite a lot of Cache in place, RAM generally doesn't reach the limit before others. Therefore, RAM is also used heavily as Cache in streaming servers to exchange load on other resources. For example, in SRS, when optimizing CPU for live streaming, we use writev to cache and send a lot of data. The idea is to use  RAM to trade for lower CPU load.


What are the consequences for high loads? It will directly lead to system issues, such as increased latency, lag, or even denial-of-service. Consequences of overload usually occur in the form chain events. For example:


CPU overload, which means the server can't support that many clients, will cause a chain of adverse events. On one hand, it will lead to queue stacking, which consumes a lot of memory. On the other hand, network throughput can't keep up as well, because the clients can't receive the data they need, consequently there is increasing latency and lag. All above issues will keep on adding even more CPU overloads, until the server crashes.


Network bandwidth, when the system’s bandwidth limit is exceeded, such as the limit of the network card or system queue, all users cannot receive the data they need, and there will be lag. In addition, it will cause the queue to stack. More CPU will be required to process the stacked queue, so increasing CPU load.


Disks, if the disk load exceeds limit, system may hang further write operation. On servers that do write operation synchronously, it will lead to streams failing to transfer properly and logs piling up. On servers that do write operation asynchronously, it will lead to asynchronous queues to pile up. Note that SRS is currently writing synchronously and we’re working on enabling asynchronous writing in a multi-threaded fashion.


RAM, the service process will be killed in case of Out-Of-Memory. RAM load increase is mainly caused by memory leaks, especially when there are many streams. For the sake of simplicity, SRS does not clean up and delete streams, hence worth your attention if there is continuous rising in memory load. In light of the above, to reduce system load, the first and foremost step is to measure the load, aka focusing on overloads, this is also a issue that needs clarification.


What is Overload

When the load exceeds the system’s load capacity, overload happens. This is actually a complicated issue despite its simplistic description, for instance:


Is that when the CPU consumption reached 100% means overload happens? No, because generally a server has multiple CPUs, hence a server with 8 CPUs reaches its CPU load limit when CPU consumption is 800%.


Is that when CPU capacity will not be reached when CPU consumption rate is below server’s capacity (such as 8-CPU server not reaching 800%)? No, because streaming servers might not have multiple CPU cores. For example, SRS server only has one CPU, and its CPU capacity is 100%.


Is that when SRS server CPU consumption rate does not reaching 100%, the server will not overload? No, because other processes could also be using CPU, SRS is not the only load that consumes CPU resources.


In conclusion, for CPU load and capacity, to know when overload happens, both streaming server’s available CPU resource and CPU resource in use must be identified and measured:


For system total CPU resources, when consumption rate is 80%, overload happens. For instance, an 8-CPU server, when CPU consumption rate is 640%, overload happens. When a typical system is busy when the total CPU load level is high.


For each SRS process, when CPU load reaches 80%, overload happens. For instance, an 8-CPU server total CPU usage is 120%, but SRS process is using 80% of the CPU, other processes consume 40%, it is also overload.


Special for Media Server

Except general resource consumption, there are some additional factors that affect load or load balancing in a streaming server, including:

Long connection: The live streaming and WebRTC streaming are both long, the longest live streaming may exceed 2 days, and it's common to hold a meeting for a few hours. Therefore, the load of the streaming media server has the characteristics of long connection, which will cause great trouble to the load rebalancing. For example, the round-robin scheduling strategy may not be the most effective.
Stateful: There are many interactions between the streaming media server and the client, and some states are saved in the process, which makes the load balancing server unable to directly pass the request to a new server when there is a problem with the service. It is not even a request. The problem is especially evident in WebRTC, where these states of DTLS and SRTP encryption make it impossible to switch servers at will.
Correlation: There is no correlation between two web requests, and the failure of one will not affect the other. While in the live streaming, the push stream can affect all the playback; in WebRTC, as long as one person fails to pull the stream or the transmission quality is too bad, even though the other streams are good, the meeting may not continue.

Of course, these problems are not completely load and load balancing problems. For example, in order to solve the problem of some weak clients, WebRTC developed SVC and Simulcast. Some problems can be solved by the client's failed retry, such as connection migration under high load, the server can be forced to close, and the client retry to migrate to the next server.
There is also a tendency to avoid streaming services and use slices such as HLS/DASH/CMAF instead, which makes it a web server and all of the above problems suddenly disappear. However, the slicing protocol can actually only achieve 3 seconds, or the more common delay scenarios of more than 5 seconds, while the live streaming delay of 1 to 3 seconds, or the low-latency live streaming of 500ms to 1 second, and the RTC of 200ms to 500ms calls and control scenarios within 100ms can never be counted on a slice server. These scenarios can only be implemented using a stream server, regardless of whether the TCP stream or UDP packet is transmitted.
We need to consider these issues when designing the system. For example, WebRTC should try to avoid coupling between streams and rooms, that is, the streams in a room must be distributed to multiple servers, not limited to one server. The more restrictions on these services, the more unfavorable it is for load balancing.
SRS Overload

Let's talk about the load and overload conditions of SRS:

SRS process, if the CPU exceeds 100%, it will be overloaded. SRS is designed for single-thread and cannot use the capabilities of multiple CPUs (I will talk about this in detail later).
Network bandwidth, which is generally the fastest resource that reaches overload. For example, when the bandwidth throughput reaches 1Gbps in live streaming, the CPU may still be idle. While RTC is computationally intensive, which could be slightly different.
Disk, except for the recording of very few streams, it is generally necessary to avoid disk problems, we could mount memory disks, or reduce the number of streams processed by each SRS. Refer to srs-cloud for best practices.
Memory, which is generally less used, but we still to care about the number of streams. For example, when the monitoring scene requires constantly pushing and disconnecting, it is necessary to continue to pay attention to the memory increase of SRS. This issue can be circumvented by Gracefully Quit.

In particular, Let me explain the problem of SRS single thread. This is actually a choice. There is no free performance optimization. Of course, multi-threading can improve the processing capability. At the same time, it is at the expense of the complexity of the system, and it is hard to evaluate the overall load. How much is a 8-core multi-threaded streaming server CPU overloaded? 640%? No, because the utilization of each core may be uneven, and to achieve core-uniform we need to load-balance threads, which is a more complicated problem.
At present, the single thread of SRS can adapt to most scenarios. For live streaming, Edge can use the multi-process REUSEPORT method to listen on the same port to achieve multi-core consumption; RTC can use a certain amount of ports; or in cloud-native scenarios, using docker to run SRS, and you can also start multiple K8S Pods. These are optional easier options.

Note: If there has a very cost-sensitive cloud service, they would definitely customize it for themselves, and they can pay the price of the complexity. As far as I know, several well-known cloud vendors have implemented multi-threaded versions based on SRS. We are working together to open source the multi-threading capability to improve the system load capacity within an acceptable complexity range. For details, please refer to Threading #2188.

Now, we understand these loads of streaming servers, it's time to think about how to balance those loads.
Round Robin: Simple and Robust

Round Robin is a very simple load balancing strategy: every time a client requests a service, the scheduler finds the next server from the server list and returns to the client:
server = servers[pos++ % servers.length()]
This strategy is more effective if each request is relatively balanced, for example, web requests are generally completed in a short time. In this way, it is very easy to add and delete servers, go online and offline, upgrade and isolate services.
Due to the characteristics of long connections in streaming media, the polling strategy is not useful enough, because some requests may be longer and some are shorter, which will cause load imbalance. Of course, if there are only a small number of requests, this strategy is still capable.
In the Edge Cluster of SRS, round robin is used when looking for the upstream Edge server, which assumes that the number of streams and the service time are relatively balanced. Essentially, this is the load balancing strategy of the upstream Edge server, which is equivalent to solving the overload problem of always returning to one server. As shown below:

In the SRS Origin Cluster, the Edge will also select an Origin server when pushing the stream for the first time, which also uses the Round Robin strategy. This is essentially the load balancing of the Origin server, which solves the problem of overloading the Origin server. As shown below:

In actual business, no one would simply uses Round Robin, but there is a scheduling service that collects the data of these servers, evaluates the load, and gives servers with low load or high quality. As shown below:

Then how do we solve the load balancing problem of Edge? It relies on the Frontend Load Balance strategy, which is the system on the frontend. We will talk about the commonly used methods below.
Frontend Load Balancer: DNS or HTTP DNS

In Round Robin part, we focused on load balancing within the service, and the server that provides services directly to the client is generally called Frontend Load Balancer, and they are a bit different:

If the entire streaming service has fewer nodes and is deployed centrally, the Round Robin is an OK choice. There is also a feasible solution to set multiple resolution IPs in DNS, or randomly select nodes when HTTP DNS returns, or select a server with a relatively less load.
If there are multiple nodes, especially distributed, it is impossible to choose the Round Robin method, because in addition to the load, the geographical location of the user also needs to be considered. Generally speaking, the "nearest" node will be selected. The same DNS and HTTP DNS can also achieve this, which generally based on the user's exit IP, to obtain geographic location information from the IP database.

In fact, there is no difference between DNS and HTTP DNS in terms of scheduling capabilities, and even lots of DNS and HTTP DNS systems have the same decision-making system, because they have the same problem: how to use user IP, or other information (such as RTT or other detection data) to allocate more appropriate nodes (sometime, the cost is more important than the distance).
DNS is the basis of the Internet. It can be considered as a name translator. For example, when we PING SRS, we will resolve ossrs.net into the IP address 182.92.233.108. There is no load balancing capability here, because It's just a server, DNS is just name resolution here:
ping ossrs.net
PING ossrs.net (182.92.233.108): 56 data bytes
64 bytes from 182.92.233.108: icmp_seq=0 ttl=64 time=24.350 ms
The role of DNS in streaming media load balancing is actually to return the IP of different server according to the IP of the client, and the DNS system itself is also distributed, which can be recorded in the /etc/hosts file. If there is no DNS information, the IP of this domain name will be queried in LocalDNS (usually configured in the system or obtained automatically).
This means that DNS can withstand very large concurrency, because it is not a centralized DNS server providing resolution services, but a distributed system. This is why there is a TTL and expiration time when creating a new resolution. After modifying the resolution record, it will take effect after this time. In fact, it all depends on the policies of each DNS server, and there are some operations such as DNS hijacking and tampering, which sometimes cause load imbalance.
Therefore, HTTP DNS comes out. It can be considered that DNS is the basic network service provided by ISPs, while HTTP DNS can be implemented by streaming media platforms, that is, we developers. It is a name service, or you can call an HTTP API to parse, for example:
curl http://your-http-dns-service/resolve?domain=ossrs.net
{["182.92.233.108"]}
Since this service is provided by yourself, you can decide when to update the meaning of the name. Of course, you can achieve a more precise load-balancing, and also use HTTPS to prevent tampering and hijacking.

Note: The your-http-dns-service of HTTP-DNS can use a set of IP or DNS domain name, because its load is relatively well balanced.

Load Balance by Vhost

SRS supports Vhost, which is generally used by CDN platforms to isolate multiple customers. Each customer can have its own domain name, such as:
vhost customer.a {
}

vhost customer.b {
}

If users push streams to the same IP server but use different vhosts, then the streams are different streams, and different addresses also mean different streams during playback, for example:

rtmp://ip/live/livestream?vhost=customer.a
rtmp://ip/live/livestream?vhost=customer.b


Note: Of course, you can directly use the DNS system to map the ip to a different domain name, so that you can directly use the domain name in the URL.

In fact, Vhost can also be used for load balancing of multiple origin sites, because in Edge, different customers can be distributed to different origin sites, so that the capabilities of the origin site can be expanded without using Origin Cluster:
vhost customer.a {
  cluster {
    mode remote;
    origin server.a;
  }
}

vhost customer.b {
  cluster {
    mode remote;
    origin server.b;
  }
}

Different vhosts actually share same Edge nodes, but Upstream and Origin can be isolated. And, of course, it can also be done with Origin Cluster. At this time, there are multiple origin site centers, which is a bit similar to the goal of Consistent Hash.
Consistent_Hash

In the scenario where Vhost isolates users, the configuration file can become much complicated, and there is a simpler strategy to achieve this job, that is, Consistent Hash.
For example, Hash can be done based on the URL of the stream requested by the user to determine which Upstream or Origin to return to, which can achieve the same isolation and load reduction.
In practical applications, there are already such schemes to provide services online, so the scheme is definitely feasible. Of course, SRS does not implement this capability and you need to implement it by yourself.
In fact, Vhost or Consistent Hash can also cooperate with Redirect to complete more complex load balancing.
HTTP 302: Redirect

302 is redirect, which can actually be used for load balancing, such as accessing the server through scheduling, but if the server finds that its load is too high, the request will be directed to another server, as shown in the following figure:


Note: Not only HTTP supports 302, RTMP also supports 302, and the SRS Origin Cluster is implemented in this way. Of course, 302 here is mainly used for streaming service discovery, not for load balancing.

Since RTMP also supports 302, we can use 302 to achieve load rebalancing inside our service. If the load of one Upstream is too high, the stream will be scheduled to other nodes with several 302 jumps.
Generally, in Frontend Server, only HTTP streams support 302, such as HTTP-FLV or HLS. RTMP requires the client to support 302, which is generally not supported, so it cannot be used.
In addition, UDP-based streaming media protocols also support 302, such as RTMFP (a Flash P2P protocol designed by Adobe), which also supports 302. And it's rarely used now.
WebRTC currently does not have a 302 mechanism, and generally relies on the proxy of Frontend Server to achieve load balancing of subsequent servers. As a future H3 standard, QUIC will definitely support 302. And WebRTC will gradually support WebTransport (based on QUIC), so this capability will also be available in the future.
SRS: Edge Cluster

SRS Edge is essentially a Frontend Server and solves the following problems:

Expand the capability of live streaming, such as supporting 100k viewers, we can scale the Edge Server horizontally.
Solve the problem of nearby services, which plays the same role as CDN, and is generally deployed in the city where users are located.
Edge uses the Round Robin when connects to the Upstream Edge to achieve Upstream's load balancing.
The load balancing of Edge itself relies on a scheduling system, such as DNS or HTTP-DNS.

As shown below:

Special Note:

Edge is an edge cluster for live streaming that supports RTMP and HTTP-FLV protocols.
Edge does not support slices such as HLS or DASH, slices should be distributed by using Nginx or ATS.
WebRTC is not supported, WebRTC has its own clustering mechanism.

Since Edge itself is a Frontend Server, it is generally not necessary to place Nginx or LB in front of it in order to increase system capacity, because Edge itself is to solve the capacity problem, and only Edge can solve the problem of merging back to the origin.

Note: Merging back to the origin means that the same stream will only be returned to the origin once. For example, if 1000 players are connected to the Edge, the Edge will only get 1 stream from the Upstream, rather than 1000 streams. This is different from the transparent Proxy.

Of course, sometimes we still need to place Nginx or LB in front, for example:

In order to support HTTPS-FLV or HTTPS-API. Nginx is better and supports HTTP/2.
Reduce external IP. For example, multiple servers only use one IP externally. At this time, a dedicated LB service is required to proxy multiple back-end Edges.
Deploying on the cloud can only provide services through LB, which is limited by the design of cloud products, such as K8S's Service.

In addition, no other servers should be placed in front of Edge, and services should be provided directly by Edge.
SRS: Origin Cluster

SRS Origin cluster, different from the Edge cluster, is mainly to expand the origin-site's capability:

If there are massive streams, such as 10,000 streams, then a single origin-site cannot handle it, and we need multiple origin-sites to form a cluster.
To solve the single-point problem of the origin-site, we could switch to other regions when a problem occurs in any region if there are multi-region deployments.
The performance problem of the slicing protocol, due to the large performance loss of writing to the disk, we could also use multiple origin-sites to reduce the load in addition to using the memory disk.

The SRS Origin cluster cannot be accessed directly and relies on Edge cluster to provide external services, because two simple strategies:

For stream discovery, the Origin cluster will access an HTTP address to query stream, which is configured as other origin-site by default, could also be configured as a specialized service.
RTMP 302 redirection, if the stream is not on the current origin-site, it will be directed to another origin-site.


Note: In fact, Edge could also access the stream query service before accessing the origin-site, and initiate the connection after finding the one with the stream. But it's possible that the stream will be switch away, so there still requires a process of relocating the stream.

The whole process is quite simple, as shown below:

Since the stream is always on only one origin-site, the HLS slice will also be generated by one origin without extra synchronization. Generally we could use shared storage, or use on_hls to send slices to the cloud storage.

Note: Another way to achieve this: use the dual-stream hot backup. Usually there are two different streams, and we need to implement backup by ourself. Generally this is very complicated for HLS, SRT and WebRTC, and SRS does not support it.

Not sure about this line -> From the aspect of load balancing, the origin cluster fits the job as a scheduler.

Use Round Robin when the Edge returns to the origin-site
Query the specialized service of which origin-site should be used
Actively disconnect the stream when the load of the origin-site is too high and force the stream to re-push for load-rebalance.

SRS: WebRTC Cascade

The load of WebRTC is only on the origin site, and there is no load balancing at the edge, because the publishing and viewing in WebRTC are almost equal, rather than the 1-to-10,000-level asymmetry in the LIVE scenario. In other words, the edge is to solve the problem of massive viewing, and the edge does not need load balancing when the publishing and viewing are similar (the LIVE scenario can use edge for access and redirect).
Since WebRTC does not implement 302 redirect, there is no need to deploy edge (for access). For example, in a Load Balance scene, that is, there are 10 SRS origin sites behind a VIP, and the same VIP will be returned to the client, so which SRS origin site will the client end up? It is entirely based on the Load Balance strategy. At this time, it is not possible to add an edge to implement the RTMP 302 redirect like the LIVE scenario.
Therefore, the load balancing of WebRTC cannot be solved by Edge at all, it originally relies on the OriginCluster. Generally in RTC, this is called cascading, that is, all sites are equal, but connected in levels like Routing to increase the load capacity. As shown below:

This is essentially different from OriginCluster, because there is no media transmission between OriginCluster, but use RTMP 302 to make Edge redirect to the specified origin site. The load of the origin site is controllable, and one origin site only owns a limited number of Edges.
The OriginCluster strategy is not suitable for WebRTC, because the client needs to connect directly to the origin site, and the load of the origin site is uncontrollable at this time. For example, in a conference of 100 people, each stream will be subscribed by 100 people. At this time, each user needs to be distributed to different origin sites, and establish a connection with each origin site to push stream and get other people's streams.

Note: This example is a rare one. Generally, a 100-person interactive conference will use the MCU mode, and the server will merge into one stream, or selectively forward several streams. The internal logic of the server is very complicated.

In fact, the common scenario of considering WebRTC is one-to-one calls, which basically account for about 80%. At this time, everyone publishes one stream and plays one stream, which is a typical situation where there are many streams, then the users can just connect to one nearby origin site, while the geographical locations of the users might not be the same, such as in different region or country, then the origin sites should cascade to improve call quality.
Under the cascading architecture of origin sites, user access uses DNS or HTTP DNS protocol to access HTTPS API, and the IP of origin site is returned in SDP, so this is an opportunity for load balancing, which can return an origin site that is close to the user and with less load.
In addition, how to cascade multiple origin sites? If users are in similar regions, one origin site can be dispatched to avoid cascading, which can save internal transmission bandwidth (it is effective when there are a large number of one-to-one calls in the same area), and at the same time, this also increases the unschedulability of loads, especially as they evolve into a multi-person conference.
Therefore, in a conference, distinguishing one-to-one conferences, multi-person conferences, or limiting the users of conferences is actually very helpful for load balancing. It's easier to schedule and load-balance if you know that this is a one-to-one conference ahead of time. Unfortunately, PMs are generally not interested in this.

Remark: In particular, the cascading function of SRS has not been implemented yet, only the prototype has been implemented, and it has not been submitted to the repository.

TURN, ICE, QUIC, etc

In particular, let's talk about some WebRTC-related protocols, such as TURN, ICE, and QUIC.
ICE is not actually a transmission protocol, it is more like an identification protocol, generally referring to Binding Request and Response, which will contain IP and priority information to identify the address and channel information for the selection of multiple channels, such as Who to choose when both 4G and WiFi are good enough. It is also used as the heartbeat of the session, and the client will always send ICE messages.
Therefore, ICE has no effect on load balancing, but it can be used to identify sessions, similar to QUIC's ConnectionID, so it can play a role in identifying sessions when passing through Load Balance, especially when the client's network switches.
The TURN protocol is actually a very unfriendly protocol for Cloud Native, because it needs to allocate a series of ports and use ports to distinguish users. This is practical in private networks, assuming that the ports are unlimited, while the ports on the cloud are often limited.

Note: Of course, TURN can also multiplex a port without actually assigning a port, which limits the ability to use TURN to communicate directly but go through SFU, so there is no problem with SRS.

The real deal of TURN is to downgrade to the TCP protocol, because some enterprise firewalls do not support UDP, so they can only use TCP, and the client needs to use the TCP function of TURN. Of course, you can also use the TCP host directly. For example, mediasoup already supports it, but SRS doesn't support it yet.
What QUIC is more friendly to is its 0RTT connection, that is, the client caches the SSL ticket-like thing and can skip the handshake. For load balancing, QUIC is more effective because it has a ConnectionID. Then, when load balancing, even though the client changes the address and network, Load-Balancer can still know which service on the backend handles it. But, of course, this actually makes the server load more difficult to transfer.
In fact, such a complex set of protocols and systems as WebRTC is quite messy and disgusting. Since the 100ms-level delay is a hard indicator, UDP and a complex set of congestion control protocols must be used, and encryption is also indispensable. Some people claim that Cloud Native's RTC is the future, which introduces more problems like port multiplexing, load balancing, long-connection and restart upgrades, as well as the structure that has been turned upside down, and H3 and QUIC to spoil the game...
Perhaps for the load balancing of WebRTC, there is one word that is most applicable: there is no difficulty in the world, as long as you are willing to give up.
SRS: Prometheus Exporter

The premise for Load Balance is to know how to balance the load, which highly depends on data collection and calculation. Prometheus is for this purpose. It will continuously collect various data, and calculate these data according to its set of rules. Prometheus is essentially a time series database.
System load, which is also essentially a series of time series data, changes over time.
For example, Prometheus has a node_exporter, which provides the relevant timing information of the host node, such as CPU, disk, network, memory, etc., which can be used to compute service load.
Many services own a corresponding exporter. For example, redis_exporter collects Redis load data, nginx-exporter collects Nginx load data.
At present, SRS has not implemented its own srs-exporter, but it will be implemented in the future. For details, please refer to #2899.