This document describes a set of use cases motivating the development of WebRTC Next Version (WebRTC-NV), as well as the requirements derived from those use cases.
To motivate the development of WebRTC 1.0, the IETF RTCWEB WG developed [[RFC7478]]. This document describes use cases motivating the development of "WebRTC Next Version" (WebRTC-NV), and the requirements deriving from those use cases. The use cases fall into one of two categories: enhancements to use cases already covered in [[RFC7478]], and new use cases currently not implementable in WebRTC 1.0.
The uses cases in this section improve upon use cases described in [[RFC7478]].
[[RFC7478]] Section 2.3.12 describes a use case involving a multiparty online game with voice communications. In these scenarios, reducing time to join the game and receive media is important. To minimize this, ICE enhancements are desirable, such as the ability to control candidate gathering and pruning. Also, “parallel forking” minimizes conference establishment time by allowing a participant to broadcast an Offer to a “room” abstraction (maintained on a server), with other room participants responding back directly to the Offerer, avoiding a separate discovery step. It is desirable to allow media to be received from responders before the initiator receives an answer. Also, the ability to impose a bandwidth limit across all mesh endpoints limits the build up of queues that can affect audio quality or perceived latency in the game. Supporting this enhancement adds the following requirements:
---------------------------------------------------------------- REQ-ID DESCRIPTION ---------------------------------------------------------------- N01 The user agent can control candidate gathering and pruning, limiting the networks on which candidates are gathered, the types of candidates, etc. N02 The user agent must be able to support parallel forking, including reuse of local ICE candidates and local certificates with multiple Answerers. N03 The user agent must be able to impose a bandwidth limit across mesh endpoints. N04 Early media must be supported.
Experience: This use case has been implemented by a gaming service utilizing [[ORTC]].
References:[[RFC7478]] Section 2.3.6 describes a simple communications service where the user changes access network during the session. This use case is enhanced by being able to re-route media over an alternate path (potentially taking network cost into account) without need for signaling.
---------------------------------------------------------------- REQ-ID DESCRIPTION ---------------------------------------------------------------- N05 The ICE agent must be able to maintain multiple candidate pairs and move traffic between them. N06 The ICE agent must be able to take the network cost into account when considering re-routing.References:
[[RFC7478]] Section 2.4.3.1 describes a use case involving Multiparty Video Communications with a central conferencing server. In such a use case, clients with disparate capabilities such as differing bandwidth availability, screen size and maximum displayable frame rate may participate in the same conference. In such a situation it is advantageous to support Scalable Video Coding (SVC). Encoding with temporal scalability is supported by several browsers today and is utilized by most centralized conferencing services.
It is expected that spatial scalability (supported by VP9 and AV1) will become more popular with time. In this use case, if the desired video codec is known beforehand and participants are muted by default (as in a very large meeting), it is desirable to allow new participants to start receiving immediately, without negotiation. Supporting this enhancement adds the following requirements:
---------------------------------------------------------------- REQ-ID DESCRIPTION ---------------------------------------------------------------- N07 The user agent must be able to encode and decode video utilizing temporal scalability and (if supported by the chosen codec) spatial scalability. N08 A user agent can receive audio/video without a corresponding sender. N09 It is possible to select the sending and/or receiving codec without negotiation. N10 The user agent must be able to control robustness (RTX, RED, FEC) applied to individual simulcast and SVC layers.
This use case has been implemented by conferencing services utilizing [[ORTC]], as well as proprietary additions to [[WEBRTC10]].
Several new uses cases relate to scenarios that cannot be supported in [[WEBRTC10]].
Participants in a mesh exchange large files without disruption to audio/video sessions. It is also possible for a participant to send a large file to a user who is not currently online. Supporting this use case adds the following requirements:
---------------------------------------------------------------- REQ-ID DESCRIPTION ---------------------------------------------------------------- N11 It must be possible for the user agent to transfer large files as a single message. N12 The application must be able to signal backpressure (flow control) when receiving data. It must also receive a backpressure signal when sending data. N13 It must be possible for the user agent to transfer data utilizing a congestion control algorithm that does not compete aggressively with audio/video communications. N14 It must be possible for the file exchange to be supported by servers as well as user agents. N15 It must be possible to support data exchange in a worker.References:
An IoT sensor maintains a long-term connection and seeks to minimize power consumption. Some of the sensor’s data may need to be sent reliable and ordered while other sensors may provide data that can be sent unreliable and unordered or in a partially reliable manner. This use case adds the following requirements:
---------------------------------------------------------------- REQ-ID DESCRIPTION ---------------------------------------------------------------- N16 The application must be able to minimize ICE connectivity checks. N17 The application must be able to control aspects of the data transport (e.g. set the SCTP heartbeat interval or turn it off), RTO values, etc. N18 It must be possible to send arbitrary data reliable, unreliable or partially reliable with a specific maximum number of retransmissions or a specific maximum timeout. N19 It must be possible to send arbitrary data ordered or unordered.Reference: Mailing list discussion
A communications service that manipulates captured media prior to encoding and after decoding to provide effects including:
This use case adds the following requirements:
---------------------------------------------------------------- REQ-ID DESCRIPTION ---------------------------------------------------------------- N20 The application must be able to obtain raw media from the capture device. N21 The application must be able to insert processed frames into the outgoing media path. N22 The application must be able to obtain decoded media from the remote party. N23 It must be possible to efficiently share media between the main thread and worker threads. N24 It must be possible to do efficient media manipulation in worker threads.References:
In a web game called “NameTheBird.com” participants use their devices to provide audio and video observations of birds to the service along with identifications for training purposes, allowing the service to identify birds from the provided audio and video and returning this information to the users in real-time.
The web application has a site specific federated learning-based classifier for contextual object detection, user intent prediction and media manipulation, allowing it to augment the streams it receives and inject identifying or other supplemental information into the streams sent or received.
The shared classification models are trained on the birds found by the participants and are based on the feedback of the participants. Each device client updates of the model are up-streamed to a shared model server that pushes updates of the global model to the clients.
Implementation outline:
This use case adds the following requirements:
---------------------------------------------------------------- REQ-ID DESCRIPTION ---------------------------------------------------------------- N20 The application must be able to obtain raw media from the capture device. N21 The application must be able to insert processed frames into the outgoing media path. N22 The application must be able to obtain decoded media from the remote party. N23 It must be possible to efficiently share media between the main thread and worker threads. N24 It must be possible to do efficient media manipulation in worker threads.
A virtual reality gaming service utilizing a centralized conferencing server wants to synchronize data with media, using an existing Selective Forwarding Unit (SFU) to distribute the data. This use case adds the following requirements:
---------------------------------------------------------------- REQ-ID DESCRIPTION ---------------------------------------------------------------- N25 The user agent must be able to send data synchronized with audio.References: Mailing list discussion
A service that supports end-to-end confidentiality of communications in mesh and centralized conferencing scenarios. While the service provides the Javascript application which is considered trusted, backend services such as the SFU are considered untrusted. This use cases adds the following requirements:
---------------------------------------------------------------- REQ-ID DESCRIPTION ---------------------------------------------------------------- N26 The application must be able to enable end-to-end payload confidentiality and integrity protection. N27 The browser must be able to obtain e2e keying keying material so as to enable content to be rendered. N28 TBD: restrictions on the application so as to prevent unauthorized recording of the session.
This section lists the requirements arising from the use-cases catalogued in this document.
---------------------------------------------------------------- REQ-ID DESCRIPTION --------------------------------------------------------------- N01 The user agent can control candidate gathering and pruning, limiting the networks on which candidates are gathered, the types of candidates, etc. N02 The user agent must be able to support parallel forking, including reuse of local ICE candidates and local certificates with multiple Answerers. N03 The user agent must be able to impose a bandwidth limit across mesh endpoints. N04 Early media must be supported. N05 The ICE agent must be able to maintain multiple candidate pairs and move traffic between them. N06 The ICE agent must be able to take the network cost into account when considering re-routing. N07 The user agent must be able to encode and decode video utilizing temporal scalability and (if supported by the chosen codec) spatial scalability. N08 A user agent can receive audio/video without a corresponding sender. N09 It is possible to select the sending and/or receiving codec without negotiation. N10 The user agent must be able to control robustness (RTX, RED, FEC) applied to individual simulcast and SVC layers. N11 It must be possible for the user agent to transfer large files as a single message. N12 The application must be able to signal backpressure (flow control) when receiving data. It must also receive a backpressure signal when sending data. N13 It must be possible for the user agent to transfer data utilizing a congestion control algorithm that does not compete aggressively with audio/video communications. N14 It must be possible for the file exchange to be supported by servers as well as user agents. N15 It must be possible to support data exchange in a worker. N16 The application must be able to minimize ICE connectivity checks. N17 The application must be able to control aspects of the data transport (e.g. set the SCTP heartbeat interval or turn it off), RTO values, etc. N18 It must be possible to send arbitrary data reliable, unreliable or partially reliable with a specific maximum number of retransmissions or a specific maximum timeout. N19 It must be possible to send arbitrary data ordered or unordered. N20 The application must be able to obtain raw media from the capture device. N21 The application must be able to insert processed frames into the outgoing media path. N22 The application must be able to obtain decoded media from the remote party. N23 It must be possible to efficiently share media between the main thread and worker threads. N24 It must be possible to do efficient media manipulation in worker threads. N25 The user agent must be able to send data synchronized with audio. N26 The application must be able to enable end-to-end payload confidentiality and integrity protection. N27 The browser must be able to obtain e2e keying keying material so as to enable content to be rendered. N28 TBD: restrictions on the application so as to prevent unauthorized recording of the session.