U.S. Pat. No. 12,231,632
METHODS OF PARAMETER SET SELECTION IN CLOUD GAMING SYSTEM
AssigneeTENCENT AMERICA LLC
Issue DateJune 29, 2021
Illustrative Figure
Abstract
Video coding and decoding methods, apparatuses, and storage medium are provided. The encoding method includes training a model for determining a parameter set to be used for encoding; obtaining a video content; determining the parameter set for encoding the video content using the trained model; encoding the video content according to the parameter set; transmitting the encoded video content to a decoding device; and transmitting the parameter set to the decoding device. The decoding method includes receiving an encoded video content from an encoding device; determining whether a parameter set is received from the encoding device; based on determining that the parameter set is received, decoding the encoded video content according to the parameter set; and transmitting the decoded video content to a display device.
Description
DETAILED DESCRIPTION Example embodiments are described in detail herein with reference to the accompanying drawings. It should be understood that the one or more embodiments of the disclosure described herein are only examples, and should not be construed as limiting the scope of the disclosure. Hereinafter, the term “encoding device” is used interchangeably with the term “encoder.” Also, the term “decoding device” is used interchangeably with the term “decoder.” FIG.1is a sequence diagram illustrating a network-based gaming system according to an embodiment. Referring toFIG.1, a network-based gaming system100may include one or more clients110A to110C and a server120. The server120may be a cloud server or a server cluster that includes a plurality of servers. The server120may be connected to a network and communicates with the one or more clients110A to110C through the network. The server120may be a remote computer system that provides one or more software services to the one or more clients110A to110C. For example, the server120may provide a gaming streaming service to the one or more clients110A to110C by generating and rendering game contents by a gaming engine running on the server120. However, the one or more embodiments are not limited thereto, and the server120may be a server that provides other types of services, such as online video and/or audio streaming services. The one or more clients110A to110C may be a personal computer (PC), a laptop, a personal digital assistant (PDA), a mobile device, a console, a tablet PC, a wearable device, etc. In an embodiment, the one or more clients110A to110C may be software running on a computer to request and receive information from the server120. For example, in a cloud gaming environment, the one or more clients110A to110C may be a computer or a mobile device running a cloud gaming client software that sends user control information ...
DETAILED DESCRIPTION
Example embodiments are described in detail herein with reference to the accompanying drawings. It should be understood that the one or more embodiments of the disclosure described herein are only examples, and should not be construed as limiting the scope of the disclosure.
Hereinafter, the term “encoding device” is used interchangeably with the term “encoder.” Also, the term “decoding device” is used interchangeably with the term “decoder.”
FIG.1is a sequence diagram illustrating a network-based gaming system according to an embodiment.
Referring toFIG.1, a network-based gaming system100may include one or more clients110A to110C and a server120. The server120may be a cloud server or a server cluster that includes a plurality of servers. The server120may be connected to a network and communicates with the one or more clients110A to110C through the network. The server120may be a remote computer system that provides one or more software services to the one or more clients110A to110C. For example, the server120may provide a gaming streaming service to the one or more clients110A to110C by generating and rendering game contents by a gaming engine running on the server120. However, the one or more embodiments are not limited thereto, and the server120may be a server that provides other types of services, such as online video and/or audio streaming services.
The one or more clients110A to110C may be a personal computer (PC), a laptop, a personal digital assistant (PDA), a mobile device, a console, a tablet PC, a wearable device, etc. In an embodiment, the one or more clients110A to110C may be software running on a computer to request and receive information from the server120. For example, in a cloud gaming environment, the one or more clients110A to110C may be a computer or a mobile device running a cloud gaming client software that sends user control information to the server120, receives video streaming from the server120and displays the received video streaming on a display of the computer or the mobile device to be viewed by a user.
Game streaming provides a unique way to play video games by running one or more game software on a remote server, as opposed to on a game console or other local device. The gaming content may be compressed by a video encoder on the server-side and may be streamed to the one or more clients via the network. This provides several unique advantages in the game streaming services. First, because dedicated servers used by game streaming companies are significantly more powerful than hardware on the consumer-end, video encoding and rendering can be performed at a rate much faster than the hardware on the consumer-end. Generally, local machines decrease in performance overtime due to wear and tear. In contrast, cloud gaming servers are continuously upgraded as technology develops, and bear no extra cost to the consumer for upgrading the gaming performance. Moreover, streaming services can send similar data to multiple users who may be playing the same game, thereby reducing the processing load for processing gaming content for each user.
Referring toFIG.1, a user B of the client110B may be playing the same game as a user A of a client110A and a user C of a client110C. The client110B among the plurality of clients may detect an action performed by a user B of the client110B (S101). For example, an action may be a shooting, picking up an item, and any other action that may be performed by a user in a game. The client110B may transmit the detected action to the server120(S102). The server120may receive the detected action transmitted from the client110B, and analyze the detected action and determine a new state of the game (S103). For example, if the detected action performed by the user B of the client110B is shooting another player, the server120may determine the result of the shooting as a new state to be displayed to users in the same game. The server120may transmit the determined new state to the client110A, the client110B and the client110C (S104). Each of the client110A, the client110B and the client110C may receive the determined new state and display the new state on a display (S105).
FIG.2is a diagram illustrating an operation of a network-based gaming system according to an embodiment.
Referring toFIG.2, a network-based gaming system200may include an encoder230on a server-side and a decoder250on a client-side. Although the network-based gaming system200according to this embodiment shows only one encoder and one decoder, the one or more embodiments are not limited thereto. There may be a plurality of servers including a plurality of encoders230to encode gaming video content. Also, there may be a plurality of client devices, each including one or more decoders250to decode the encoded gaming video content received from one or more servers.
In S202, the encoder230may receive a gaming content210as a raw video content. Here, the raw video content may refer to one or more images of the gaming content210prior to encoding. The encoder230may receive the raw video content of the gaming content210and code the raw video content. For example, a gaming software may run on the cloud server side, and gaming content may be rendered by graphic processing unit (GPU) of the server. The rendered gaming content may be fed into the encoder230as raw video contents, for example, in YUV format. However, the one or more embodiments are not limited thereto, and the format of the raw video contents may be any other coding format, such as RGB color.
In S204, the encoder230may generate a compressed video bitstream according to a certain video coding format or standard, such as H.264, H.265, AV1, AVS2, etc. The encoder230of the server may transmit the compressed video bitstream to the decoder250of the client250through the network. The operations of the encoder230will be described in more detail below with reference toFIG.3.
In S206, the decoder250at the client may receive the compressed video bitstream generated by the encoder230, and decode the compressed video bitstream into a video content to be displayed. The operations of the decoder250will be described in more detail below with reference toFIG.4
In S208, the decoded video bitstream may be transmitted to a display270of the client to be viewed by a user.
FIG.3is a flowchart illustrating an operation of an encoding device of a network-based gaming system according to an embodiment.
Referring toFIG.3, a method of operating a learning-based video coding by an encoder is shown. As described above, the encoder may be disposed in the server to code the video content of a game.
In S310, the method may include training a model for determining coding parameter set by one or more processors of the server. In a learning-based video coding, a model may be trained using neural networks to determine optimal coding parameters, such as filter coefficients, through a large amount of diversified training data. Here, a parameter set may be a set of parameters that are used for encoding and/or decoding a video content.
In training, a selection of training data may heavily impact the overall performance of a particular video coding. That is, some training data may be more relevant to one type of video content as opposed to another type of video content. Accordingly, it is necessary to quickly identify certain training data over other training data to efficiently train a model for a certain type of video content.
In order to train a model more efficiently, one parameter set may be set for each type of video contents. For example, there may be different types of games, e.g., an action game, an adventure game, a role-playing game (RPG), a first-person shooter (FPS) game, a sports game and etc. For each game type, one optimal parameter set for coding and decoding the video content may be learned based on training data. For example, a model for determining a parameter set for an action game may be fed with training data more relevant to action games. Also, each parameter set may be associated with a parameter identifier (ID), and depending on the type of content, the parameter ID may be determined for each type of games. As such, a trained model may be specifically directed to each type of contents.
More specifically, a predetermined threshold may be set for an initial model of a certain content type such that the initial model is continuously trained until the initial model achieves the predetermined threshold. That is, various training data may be iteratively fed into the initial model until the initial model can output a parameter set for encoding the video content that is similar to test data at the predetermined threshold. Here, the test data may refer to data that should be output by a model when it is finally trained.
According to an embodiment, one or more models may be trained for each type of video contents using sample training data of various video contents. However, the one or more embodiments of the disclosure are not limited thereto, and only one model may be trained to generate a plurality of parameter sets to optimize encoding of the video content.
After training, one or more parameter sets may be assumed or agreed by both encoder and decoder to code video contents and decode the encoded video contents. In an embodiment, when the decoder receives a compressed video content, the decoder can decode the compressed video content according to the learning-based video coding performed by the encoder using the assumed parameter set. For example, when encoding a video content, the encoder may encode the video content using a set of filter coefficients to perform filtering on the reconstructed images of the video content. However, depending on the type of video content, a set of filter coefficients used for coding images of one video content may not be suitable for coding images of another video content. For example, a set of filter coefficients for coding images of an action game not be suitable or relevant to a set of filter coefficients that should be used for a sports game. Therefore, the server may determine different sets of parameters to code different types of video contents in order to improve quality of the video contents for each type of video contents.
In S320, an encoder may receive a video content. A video content may include one or more images that are sequentially arranged to constitute a video. Although it has been described that the video content may include gaming contents, the one or more embodiments are not limited thereto, and the video content may include any other types of contents, such as movies, dramas, short clips, and etc. The video content may also include audio content that is associated with one or more images of the video content. According to an embodiment, the encoder may receive video contents from a graphic processing unit (GPU) that processes and renders video contents. However, the one or more embodiments are not limited thereto, and the encoder may receive video contents from a video source, such as a camera.
In S330, an encoder may determine a parameter set for encoding and/or decoding a video content using a trained model. When the video content is received from a GPU or a camera, the encoder may identify the video content based on its type, size, quality and etc. The video content may include a header information that identifies its type, size and quality. For example, the header information may include information that identifies the video content as a game with a size of about 10 gigabytes at a resolution of 1280×720 pixels.
Based on the header information, the encoder may determine a parameter set for coding the video content. The encoder may use the trained model to determine the parameter set to be used for coding the received video content into a compressed video content. For example, there may be a title identifier, a game identifier, and/or content identifier associated with each video content, and the encoder may use such information select a parameter set to be used for coding the video content.
Alternatively, the encoder may not use any specific parameter set or the trained model may not be able to identify a parameter set that is suitable for the received video content. In such a case, the encoder may use a default parameter set to code the received video content.
In S340, the encoder may encode the video content according to the selected parameter set. That is, the encoder generates a compressed video bitstream according to a certain video coding format or standards, such as H.264, H.265, AV1, AVS2, etc. However, the one or more embodiments are not limited thereto, and the video contents may be encoded according to other video coding/compression standards, such as ITU-T H.266 (also referred to as Future Video Coding (FVC)).
In S350, the parameter set information may be transmitted to the client so that the decoder of the client can recognize the coding scheme of the encoder and decode the compressed video bitstream accordingly. According to an embodiment, a parameter set may be included in metadata that is transmitted to the client separately from the compressed video bitstream. For example, in cloud gaming, the server may derive a parameter set based on the training data of a particular gaming content, and transmit the parameter set to the client (that is, an end-user) as ancillary information through the network. The ancillary information may be in the form of system metadata, plug-in, etc. The user may then download the parameter set information prior to starting a game or decoding the video content. However, the one or more embodiments are not limited thereto, and the parameter set information may be added to header information of the compressed video content and transmitted to the client together with the compressed video content.
In S360, the encoded video bitstream may be transmitted to the client. Here, the encoded video bitstream may be transmitted separately from the parameter set, and transmitted simultaneously with the parameter set. However, the one or more embodiments are not limited thereto, and the parameter set may be transmitted prior to the encoded video bitstream so as to allow the decoder to recognize the parameter set used for encoding before receiving the encoded video bitstream. Also, the encoded video bitstream may be transmitted to the client prior to the parameter set.
FIG.4is a flowchart illustrating an operation of a decoding device of a network-based gaming system according to an embodiment.
Referring toFIG.4, a method of operating a learning-based video decoding by a decoder is shown. As described above, the decoder may be disposed in the client device to decode the video content received from the server.
In S410, the decoder may receive video bitstream from the server. Here, the video bitstream may refer to the video content coded by the encoder of the server.
In S420, the decoder may determine whether a parameter set is received from the server. That is, when the decoder receives one or more parameter sets from the server, the decoder may determine that the parameter set is received. According to an embodiment, the decoder may receive the one or more parameter sets separately from the video bitstream. For example, the decoder may receive the one or more parameter sets from the server prior to receiving the video bitstream. However, the one or more embodiments are not limited thereto, and the decoder may receive the video bitstream prior to receiving the one or more parameter sets.
In S430, when at least one parameter set is received from the server (S420: Yes), the decoder may decode the video bitstream according to the received parameter set. According to an embodiment, multiple parameter sets may be received by the decoder. For example, the encoder may code video content using multiple parameter sets, and those multiple parameter sets may be received by the decoder. In such a case, the decoder may select one or more parameter sets among the received multiple parameter sets and decode the video content according to the selected one or more parameter sets. Alternatively, the encoder may determine the multiple parameter sets to be used by the decoder, and send the multiple parameter sets to be used by the decoder. When selecting one or more parameter sets by either the encoder or the decoder, for example, a title identifier, a game identifier, and/or a content identifier may be used to select the one or more parameter sets. For example, if the video content to be decoded is a sports game indicated by the game identifier, the decoder may select a parameter set that is most suitable for the sports game according to a trained model. Here, the trained model may be a trained model used to encode the video content by the encoder.
In S440, when the decoder determines that no parameter set is received from the server (S420: No), then the decoder may further determine whether it should use a default parameter set to decode the received video bitstream. A default parameter set may be a predetermined parameter set that may be used to decode various types of video content. The default parameter set may be predefined by a user. As described above with respect to the encoder, the decoder may decode the compressed video bitstream according to a certain video coding format or standards, such as H.264, H.265, H.266, AV1, AVS2, etc.
In a sequence level syntax signaling, an indication flag may be used to indicate whether a default parameter set should be used or another parameter set can be obtained to replace the default parameter set. When it is determined that an alternative parameter set can be used to decode the received video bitstream (S440: No), the decoder may inquire the server to obtain another parameter set. However, the one or more embodiments are not limited thereto, and the decoder may obtain another parameter set stored in the client device, or any other server that may be capable of providing another parameter set to decode the received video bitstream.
In S450, when the decoder determines that a default parameter should be used according to the indication flag (S440: Yes), the decoder uses the default parameter set to decode the received video bitstream.
In S460, the decoder transmits the decoded video bitstream to a display of the client device.
FIG.5is a block diagram illustrating an encoding device of a network-based gaming system according to an embodiment. An encoding device500may include at least one processor and at least one memory storing one or more computer program codes. The at least one processor may access at least one memory and operate as instructed by the one or more computer program codes.
Referring toFIG.5, the encoding device500may include training code510, obtaining code520, determining code530, encoding code540and transmitting code550. It should be understood that some of the features or functions of the components of the encoding device ofFIG.5are described above with reference toFIG.3. Therefore, repeated descriptions thereof will be omitted.
The training code510may be configured to train a model for determining coding parameter set. The model may be trained using neural networks to determine optimal coding parameters such as filter coefficients, through a large amount of diversified training data. The training code510may perform operations of S310described above with reference toFIG.3.
The obtaining code520may be configured to receive a video content from a GPU that processes and renders video contents. The obtaining code520may perform operations of S320described above with reference toFIG.3.
The determining code530may be configured to determine a parameter set for encoding and/or decoding a video content using the trained model. The determining code530may perform operations of S330described above with reference toFIG.3.
The encoding code540may be configured to encode the video content according to a parameter set selected. The encoding code540may perform operations of S340described above with reference toFIG.3.
The transmitting code550may be configured to transmit parameter set information and the encoded video content to the client device. The transmitting code550may perform operations of S350and S360described above with reference toFIG.3.
FIG.6is a block diagram illustrating a decoding device of a network-based gaming system according to an embodiment. A decoding device500may include at least one processor and at least one memory storing one or more computer program codes. The at least one processor may access at least one memory and operate as instructed by the one or more computer program codes.
Referring toFIG.6, the decoding device600may include receiving code610, first determining code620, second determining code630, decoding code640and transmitting code650. It should be understood that some of the features or functions of the components of the decoding device ofFIG.6are described above with reference toFIG.4. Therefore, repeated descriptions thereof will be omitted.
The receiving code610may be configured to receive video bitstream from the server. The video bitstream may refer to the video content coded by the encoder of the server. The receiving code610may perform operations of S410described above with reference toFIG.4.
The first determining code620may be configured to determine whether a parameter set is received from the server. The first determining code620may perform operations of S420described above with reference toFIG.4.
The second determining code630may be configured to determine whether the decoder should use a default parameter set to decode the received video bitstream based on the first determining code620determining that no parameter set is received from the server. The second determining code630may perform operations of S440described above with reference toFIG.4.
The decoding code640may be configured to decode the video bitstream according to the determinations of the first determining code620and the second determining code630. That is, when at least one parameter is received from the server, the decoding code640may decode the video bitstream according to the received parameter set. Alternatively, when the first determining code620determines that no parameter set is received from the server and the second determining code630determines that the default parameter should be used for decoding the video bitstream, the decoding code640may decode the video bitstream according to the default parameter set. The decoding code640may perform operations of S430and S450described above with reference toFIG.4.
The transmitting code650may be configured to transmit the decoded video bitstream to a display of the client device.
FIG.7is a diagram illustrating an example of a device of a network-based gaming system according to an embodiment. A device700may be implemented in the encoding device500and the decoding device600of a network-based gaming system.
Referring toFIG.7, the device700may include a processor710, a memory720, a storage730, an input interface740, an output interface750, a communication interface760, and a bus770.
The processor710is implemented in hardware, firmware, or a combination of hardware and software. The processor710is a central processing unit (CPU), a graphics processing unit (GPU), an accelerated processing unit (APU), a microprocessor, a microcontroller, a digital signal processor (DSP), a field-programmable gate array (FPGA), an application-specific integrated circuit (ASIC), or another type of processing component. In some implementations, the processor710includes one or more processors capable of being programmed to perform a function.
The memory720includes a random access memory (RAM), a read only memory (ROM), and/or another type of dynamic or static storage device (e.g., a flash memory, a magnetic memory, and/or an optical memory) that stores information and/or instructions for use by the processor710.
The storage730stores information and/or software related to the operation and use of the device700. For example, the storage730may include a hard disk (e.g., a magnetic disk, an optical disk, a magneto-optic disk, and/or a solid state disk), a compact disc (CD), a digital versatile disc (DVD), a floppy disk, a cartridge, a magnetic tape, and/or another type of non-transitory computer-readable medium, along with a corresponding drive.
The input interface740includes a component that permits the device700to receive information, such as via user input (e.g., a touch screen display, a keyboard, a keypad, a mouse, a button, a switch, and/or a microphone). Additionally, or alternatively, the input interface740may include a sensor for sensing information (e.g., a global positioning system (GPS) component, an accelerometer, a gyroscope, and/or an actuator). The output interface750includes a component that provides output information from the device700(e.g., a display, a speaker, and/or one or more light-emitting diodes (LEDs)).
The communication interface760includes a transceiver-like component (e.g., a transceiver and/or a separate receiver and transmitter) that enables the device700to communicate with other devices, such as via a wired connection, a wireless connection, or a combination of wired and wireless connections. The communication interface760may permit the device700to receive information from another device and/or provide information to another device. For example, the communication interface760may include an Ethernet interface, an optical interface, a coaxial interface, an infrared interface, a radio frequency (RF) interface, a universal serial bus (USB) interface, a Wi-Fi interface, a cellular network interface, or the like.
The bus770includes a component that permits communication among the components of the device700.
The device700may perform one or more operations described herein. The device700may perform operations described above in response to the processor710executing software instructions stored in a non-transitory computer-readable medium, such as the memory720and/or the storage730. A computer-readable medium is defined herein as a non-transitory memory device. A memory device includes memory space within a single physical storage device or memory space spread across multiple physical storage devices.
Software instructions may be read into the memory720and/or the storage730from another computer-readable medium or from another device via the communication interface760. When executed, software instructions stored in the memory720and/or the storage730may cause the processor710to perform one or more processes described herein. Additionally, or alternatively, hardwired circuitry may be used in place of or in combination with software instructions to perform one or more processes described herein. Thus, implementations described herein are not limited to any specific combination of hardware circuitry and software.
The number and arrangement of components shown inFIG.7are provided as an example. In practice, the device700may include additional components, fewer components, different components, or differently arranged components than those shown inFIG.7. Additionally, or alternatively, a set of components (e.g., one or more components) of the device700may perform one or more functions described as being performed by another set of components of the device700.
Some of the embodiments of the disclosure have been shown and described above. However, the one or more embodiments of the disclosure are not limited to the aforementioned specific embodiments. It may be understood that various modifications, substitutions, improvements and equivalents thereof can be made without departing from the spirt and scope of the disclosure. It should be understood that such modifications, substitutions, improvements and equivalents thereof shall fall within the protection scope of the disclosure, and should not to be construed independent from the inventive concept or prospect of the disclosure.
Claims
- A method of performing video coding in a network-based system, performed by an encoding device, the method comprising: obtaining a video content;determining an encoding parameter set for encoding the video content and a decoding parameter set for decoding the video content among multiple parameter sets, wherein the multiple parameter sets for encoding the video content and decoding the video content are generated using a trained parameter set model, wherein the encoding parameter set for encoding the video content is selected specifically for an encoding instance and is based on a training data of a content type of the video content, wherein the decoding parameter set for decoding the video content is selected specifically for a decoding instance and is based on the training data of the content type of the video content, and wherein the content type of the video content includes a plurality of game types comprising at least one of a game genre and a game title;encoding the video content according to the encoding parameter set;transmitting the encoded video content to a decoding device;transmitting an indicator flag indicating to a decoder that the decoding parameter set is to be used for decoding the video content instead of a default parameter set;and transmitting the encoding parameter set to the decoding device, wherein the transmitting of the encoding parameter set is distinct from the transmitting of the encoded video content.
- The method of claim 1, wherein a parameter set comprises a set of filter coefficients to perform filtering on one or more images of the video content.
- The method of claim 1, wherein at least one parameter set is determined for each of the plurality of game types.
- The method of claim 1, wherein the default parameter set is transmitted to the decoding device prior to the encoded video content, or wherein more than one default parameter set is transmitted to the decoding device prior to the encoded video content.
- The method of claim 1, wherein the determined parameter set is transmitted to the decoding device as metadata.
- A non-transitory computer-readable storage medium storing computer program code, the computer program code, when executed by at least one processor is configured to perform a method of claim
- An encoding device comprising: at least one memory storing a computer program code;and at least one processor configured to access the at least one memory and operate as instructed by the computer program code, the computer program code comprising: obtaining code configured to cause the at least one processor to obtain a video content;determining code configured to cause the at least one processor to determine an encoding parameter set for encoding the video content and a decoding parameter set for decoding the video content among multiple parameter sets, wherein the multiple parameter sets for encoding the video content and decoding the video content are generated using a trained parameter set model, wherein the encoding parameter set for encoding the video content is selected specifically for an encoding instance and is based on a training data of a content type of the video content, wherein the decoding parameter set for decoding the video content is selected specifically for a decoding instance and is based on the training data of the content type of the video content, and wherein the content type of the video content includes a plurality of game types comprising at least one of a game genre and a game title;encoding code configured to cause the at least one processor to encode the video content according to the encoding parameter set;first transmitting code configured to cause the at least one processor to control a communication interface to transmit the encoded video content to a decoding device;second transmitting code configured to cause the at least one processor to control the communication interface to transmit an indicator flag indicating to a decoder that the parameter set is to be used for decoding the video content instead of a default parameter set;and third transmitting code configured to cause the at least one processor to control the communication interface to transmit the encoding parameter set to the decoding device, wherein the transmitting of the encoding parameter set is distinct from the transmitting of the encoded video content.
- The encoding device of claim 7, wherein a parameter set comprises a set of filter coefficients to perform filtering on one or more images of the video content.
- The encoding device of claim 7, wherein at least one parameter is determined for each of the plurality of game types.
- The encoding device of claim 7, wherein the determined parameter set is transmitted to the decoding device as metadata.
- The encoding device of claim 7, wherein the default parameter set is transmitted to the decoding device prior to the encoded video content, or wherein more than one default parameter sets is transmitted to the decoding device prior to the encoded video content.
Disclaimer: Data collected from the USPTO and may be malformed, incomplete, and/or otherwise inaccurate.