U.S. Pat. No. 12,128,299

TIMING COMPENSATION AND CORRELATION OF INPUT WITH FRAMES IN A VIDEO GAME APPLICATION

AssigneeGOOGLE LLC

Issue DateSeptember 2, 2021

Illustrative Figure

Abstract

A server executing an application generates a frame token for a frame that is rendered for the application. One or more first metric messages are provided to the application in response to at least one first operation performed by the server on the frame. The first metric messages include the frame token and information indicating timing of the at least one first operation. The encoded information representing the frame token and the frame is transmitted from the server towards a client. One or more second metric messages are provided to the application in response to one or more second operations performed by the client on the frame. The one or more second metric messages include the frame token and information indicating timing of the second operations. A state of the application is modified based on the first and second metric messages.

Description

DETAILED DESCRIPTION Streaming applications such as games from a central server, such as a cloud server, has the potential to alleviate difficulties associated with the acquisition and storage of program code and the like. Dramatic improvements in bandwidth and reductions in network latency make it possible to implement games as streaming applications provided by a server to a client in the player's location. The game server executes game code that implements a physics engine, renders graphics and audio frames, and controls other aspects of the game state, as well as performing modifications of the game state in response to input provided by the player. The server transmits encoded video and audio frames to the client, which is implemented in a local processing device. The client decodes the frames and provides the video (via a display) and the audio (via speakers or earphones) to the player. The player provides input to the client using a game controller or other devices such as visual or infrared sensors that capture and analyze the player's motions. As used herein, the term “game controller” refers to any device that allows the player to generate input to the video game application responsive to video, audio, haptic, or other sensory output generated by the client. For example, the player uses the various buttons on the game controller to cause a character in the game to perform actions in response to the video or audio frames presented to the player. Despite ongoing improvements in network bandwidth and latency, there is still significant (and sometimes variable) end-to-end latency from the application that generates the video/audio to the display/speakers that represent the frames. The end-to-end latency can introduce difficulties correlating game events generated by the application with input provided by the player. Similar drawbacks occur in other applications that require ...

DETAILED DESCRIPTION

Streaming applications such as games from a central server, such as a cloud server, has the potential to alleviate difficulties associated with the acquisition and storage of program code and the like. Dramatic improvements in bandwidth and reductions in network latency make it possible to implement games as streaming applications provided by a server to a client in the player's location. The game server executes game code that implements a physics engine, renders graphics and audio frames, and controls other aspects of the game state, as well as performing modifications of the game state in response to input provided by the player. The server transmits encoded video and audio frames to the client, which is implemented in a local processing device. The client decodes the frames and provides the video (via a display) and the audio (via speakers or earphones) to the player. The player provides input to the client using a game controller or other devices such as visual or infrared sensors that capture and analyze the player's motions. As used herein, the term “game controller” refers to any device that allows the player to generate input to the video game application responsive to video, audio, haptic, or other sensory output generated by the client. For example, the player uses the various buttons on the game controller to cause a character in the game to perform actions in response to the video or audio frames presented to the player.

Despite ongoing improvements in network bandwidth and latency, there is still significant (and sometimes variable) end-to-end latency from the application that generates the video/audio to the display/speakers that represent the frames. The end-to-end latency can introduce difficulties correlating game events generated by the application with input provided by the player. Similar drawbacks occur in other applications that require synchronization of user response and sensory output generated by the application.

FIGS.1-6disclose systems and techniques for gathering real time network information associated with a video game application from the server that hosts the video game application and the client that serves video or audio frames to the player. The video game application (hereafter referred to as “application” or “software application,” for ease of reference) uses the network information to compensate for end-to-end latency and to correlate input received from the player with frames that were presented to the player concurrently with the player generating the input. The server generates a frame token for a frame that is rendered for the application, such as a video frame or an audio frame. The frame token includes information (such as a frame identification number) that associates the frame token with the frame. In some embodiments, the server generates the frame token in response to a request from the application. The frame token is processed by the server and the client in conjunction with the frame. The server and the client generate timing information that indicates when the frame was processed at different points within the server. In some embodiments, the server generates information indicating a first time that the frame was dispatched for encoding, a second time when encoding of the frame was completed, and a third time when the frame was transmitted from the server towards the client. In some embodiments, the client generates a fourth time when the encoded frame was received from the server, a fifth time when the client successfully decoded the frame, and a sixth time when the client presented the frame to the player via a display. However, more or fewer times are recorded at more or fewer processing points in other embodiments.

The server and the client return metric messages including the timing information and the frame token to the application. The server returns one or more metric messages including the recorded times and the associated frame token to the application. The client returns one or more metric messages including the recorded times and the associated frame token to the server, which provides the metric messages to the application. In some embodiments, the recorded times and associated frame tokens are batched over more than one frame and the metric message includes a set of recorded times/tokens for the frames in the batch. Inputs from the player are correlated with the frame token that is associated with the video or audio frame that was presented to the player concurrently with the input received from the player. Metric messages including information representing the inputs are returned to the server with the correlated frame tokens. The application uses the metric messages to compensate for end-to-end latency. For example, the application can determine that the user provided input while viewing a video frame (or hearing an audio frame) even if there is a long, variable, or unexpected delay between the application generating the frame and subsequently receiving the input. For another example, the application can modify the size of the buffer that stores messages received from the client to increase or decrease the time available for receiving inputs from the player. Some embodiments of the application roll back the game state in response to one or more of the metric messages. For example, if the application receives an input associated with a frame token that should have been received 50 ms prior to the current game time, the application rolls back the game state by 50 ms and processes the input based on the game state at the earlier game time.

FIG.1is a block diagram of a cloud-based processing system100that supports timing compensation and correlation of inputs with frames based on frame tokens according to some embodiments. The cloud-based processing system100includes a server105that is interconnected with a network110. Although a single server105shown inFIG.1, some embodiments of the cloud-based processing system100include more than one server connected to the network110. In the illustrated embodiment, the server105includes a transceiver115(or other network interface) that transmits signals towards the network110and receives signals from the network110. The transceiver115can be implemented using one or more separate transmitters and receivers. The server105also includes one or more processors120and one or more memories125. The processor120executes instructions such as program code stored in the memory125and the processor120stores information in the memory125such as the results of the executed instructions.

The cloud-based processing system100includes one or more processing devices130such as a computer, set-top box, gaming console, and the like that are connected to the server105via the network110. In the illustrated embodiment, the processing device130operates as a client to the server105. The processing device130includes a transceiver135(or other network interface) that transmits signals towards the network110and receives signals from the network110. The transceiver135can be implemented using one or more separate transmitters and receivers. The processing device130also includes one or more processors140and one or more memories145. The processor140executes instructions such as program code stored in the memory145and the processor140stores information in the memory145such as the results of the executed instructions. The transceiver135is connected to a display150that displays images or video on a screen155, one or more speakers160that generate audio output, and a game controller165. Some embodiments of the cloud-based system100are therefore used by cloud-based streaming applications including video game applications.

Program code representing an application170is stored in the memory125on the server105. Some embodiments of the application170support a streaming applications such as a video game application that is played by one or more players at the processing device130, which operates as a client for the application service. The application170generates frames including frames that represent images, video, audio, haptic output, or other sensory output that is generated at the processing device130based on the frames received from the server105. In the illustrated embodiment, the processor120in the server105generates frame tokens for the frames that are rendered for the application170. As discussed herein, the server105generates metric messages in response to operations performed on the frames by the server105. The metric messages include the corresponding frame token and information indicating timing of the operations, such as a timestamp that records the time that the operation was performed. The server105also encodes the frame and the frame token and the transceiver115transmits the encoded information to the client processing device130.

The processing device130also generates metric messages that are returned to the server105, which provides that the received metric messages to the application170. The metric messages are generated in response to operations performed by the processing device130on the frames received from the server105. The metric messages include the frame token and information indicating timing of the operations, such as a timestamp indicating a time at which the operation was performed. Some embodiments of the processing device130generate additional messages in response to input provided by a player, e.g., via the controller165. The messages that are generated in response to the input include information representing the frame token for the frame that was being presented concurrently with the input being provided by the player. The messages also include information representing the input.

A state of the application170or the server105is modified based on the metric messages generated by the server105or the client processing device130. Some embodiments of the application170use the metric messages to compensate for end-to-end latency from generation of a frame by the application170to presentation of the frame by the display150or the speaker160. The application170can also roll back the game state (e.g., the state of the transceiver115, the processor120, with the memory125in the server105) in response to the metric messages. For example, if the application170receives an input associated with a frame token that should have been received 50 ms prior to the current game time, the application170rolls back the game state by 50 ms and processes the input based on the game state at the earlier game time. Some embodiments of the application170use the metric messages provided by the processing device130to correlate player input with the frame that was being presented (or used to generate sensory output) concurrently with the player generating the input.

FIG.2is a block diagram of a processing system200that supports timing compensation and correlation of inputs with frames based on frame tokens according to some embodiments. The processing system200includes or has access to a memory205or other storage component that is implemented using a non-transitory computer readable medium such as a dynamic random-access memory (DRAM). However, some embodiments of the memory205are implemented using other types of memory including static RAM (SRAM), nonvolatile RAM, and the like. The processing system200also includes a bus210to support communication between entities implemented in the processing system200, such as the memory205. Some embodiments of the processing system200include other buses, bridges, switches, routers, and the like, which are not shown inFIG.2in the interest of clarity.

The processing system200includes a central processing unit (CPU)215. Some embodiments of the CPU215include multiple processing elements (not shown inFIG.2in the interest of clarity) that execute instructions concurrently or in parallel. The processing elements are referred to as processor cores, compute units, or using other terms. The CPU215is connected to the bus210and the CPU215communicates with the memory205via the bus210(or other network interface). The CPU215executes instructions such as program code220stored in the memory205and the CPU215stores information in the memory205such as the results of the executed instructions. The CPU215is also able to initiate graphics processing by issuing draw calls.

An input/output (I/O) engine225handles input or output operations associated with a display230that presents images or video on a screen235. The I/O engine225also handles input or output operations associated with one or more speakers240, which can also include or be associated with corresponding microphones (not shown inFIG.2in the interest of clarity). In the illustrated embodiment, the I/O engine225is connected to a game controller245which provides control signals to the I/O engine225in response to a user pressing one or more buttons on the game controller245or interacting with the game controller245in other ways, e.g., using motions that are detected by an accelerometer. The I/O engine225also provides signals (via the bus210or other network interface) to the game controller245to trigger responses in the game controller245such as vibrations, illuminating lights, and other haptic or sensory outputs. In the illustrated embodiment, the I/O engine225reads information stored on an external storage component250, which is implemented using a non-transitory computer readable medium such as a compact disk (CD), a digital video disc (DVD), and the like. The I/O engine225also writes information to the external storage component250, such as the results of processing by the CPU215. Some embodiments of the I/O engine225are coupled to other elements of the processing system200such as keyboards, mice, printers, external disks, and the like. The I/O engine225is coupled to the bus210(or other network interface) so that the I/O engine225communicates with the memory205, the CPU215, or other entities that are connected to the bus210.

The processing system200includes at least one graphics processing unit (GPU)255that renders images for presentation on the screen235of the display230, e.g., by controlling pixels that make up the screen235. For example, the GPU255renders visual content to produce values of pixels that are provided to the display230, which uses the pixel values to display an image that represents the rendered visual content. The GPU255includes one or more processing elements such as an array260of compute units that execute instructions concurrently or in parallel. Some embodiments of the GPU255are used for general purpose computing. In the illustrated embodiment, the GPU255communicates with the memory205(and other entities that are connected to the bus210) over the bus210. However, some embodiments of the GPU255communicate with the memory205over a direct connection or via other buses, bridges, switches, routers, or other network interface. The GPU255executes instructions stored in the memory205and the GPU255stores information in the memory205such as the results of the executed instructions. For example, the memory205stores a copy265of instructions that represent a program code that is to be executed by the GPU255.

As discussed herein, some embodiments of the CPU215or the GPU255generate frame tokens associated with frames that are used to generate sensory output received by a user or player of the processing system200. Entities within the processing system200generate feedback in response to performing operations on the frames and provide the feedback with the corresponding frame tokens to applications executing on the CPU215or the GPU255. The applications use the feedback to compensate for latency within the processing system200, as discussed herein. In some embodiments, information270representing the feedback is stored in the memory205.

FIG.3is a block diagram of a processing system300that includes a server305that supports an application310that provides streaming services, such as a video game application, to a client315via a network320according to some embodiments. The processing system300is used to implement some embodiments of the cloud-based processing system100shown inFIG.1and the processing system200shown inFIG.2. The application310generates frames that are provided to the client315to generate video, audio, haptic, or other sensory output that are provided to a user at the client, such as a player of a video game application implemented by the application310. In the illustrated embodiment, the application310requests that the server305and the client315generate feedback based on frame tokens created by the server305and transmitted with the frames generated by the application310. For example, in response to the application310generating a frame325, the server305(or a processor implemented on the server305such as the processor120shown inFIG.1) generates a frame token330such as an identification number that identifies the frame325. The frame token330and the frame325are subsequently transmitted together (or in conjunction with each other) along a path from the application310to presentation by the client315, e.g., on a display335.

The frame325and the frame token330are encoded for transmission using an encoder340implemented by (or accessible to) the server305. The encoder340is implemented using a processor such as the processor120shown inFIG.1, the CPU215or the GPU255shown inFIG.2, dedicated encoding hardware, or a combination thereof. The encoder340generates encoded information representative of the frame325and the frame token330, which is represented as an encoded frame350and an encoded frame token355. In some embodiments, the server305includes a buffer345that is used to buffer the frame325or the frame token330prior to transmission. The buffer345is used to buffer packets prior to encoding by the encoder340. However, the buffer345(or other buffers) can buffer packets at other locations such as buffering the encoded frame350or the encoded frame token355after encoding by the encoder340and prior to transmission from the server305.

The encoded frame350and the encoded frame token355are transmitted from the server305to the client315via the network320. The client315includes a decoder360that decodes the encoded frame350and the encoded frame token355. The decoder360is implemented using a processor such as the processor140shown inFIG.1, the CPU215or the GPU255shown inFIG.2, dedicated encoding hardware, or a combination thereof. If the decoding process is successful, the decoder360generates the frame325and the frame token330that were generated by the application310. In response to successful decoding, the client315provides the frame325to the display335for presentation to the user or player. As discussed herein, some embodiments of the frame325represent other sensory output such as audio output or haptic output, in which case the client315uses the frame325to provide the corresponding sensory output to the user or player.

In some embodiments, the user or player generates an input365in response to the sensory output produced by the client315based on the frame325. For example, if the frame325represents an image of a mole poking his head out of a hole in a game of “whack-a-mole,” the player may provide an input365via a controller370that indicates an attempt by the player to “whack” the mole. Information representative of the input365is included in a metric message375. The client315also associates the frame token330(or a representation thereof) with the metric message375because the frame325was presented to the user or player concurrently with the user or player generating the input365. In some embodiments, the representation of the frame token330includes a subset of the information that represents the frame token330that is sufficient to associate the frame token330with the metric message375at the application310. The frame token330(or the representation thereof) and the metric message375are then returned to the server305, which provides the frame token330and the metric message375to the application310. In the interest of clarity, arrows indicating the path of the frame token330and the metric message375from the network320to the application310are not shown inFIG.3.

The application310uses the received frame token330(or the representation thereof) and the metric message370to correlate the input365with the frame325that was presented to the user or player concurrently with the input365. Thus, despite delays or latency between transmission of the frame325from the application310and presentation of the frame325via the display335, the application310is able to determine that the user or player generated the input365while viewing (or sensing) the sensory output generated based on the frame325. For example, if the frame325represents an image of a mole poking his head out of a hole in a game of “whack-a-mole,” the application310can determine that the player successfully provided the input365via the controller370to “whack” the mole while the mole was visible to the player. As discussed herein, other entities in the processing system300also provide metric messages that are associated with corresponding frame tokens and the application310uses these metric messages to modify the state of the application310or the server305. For example, the application310or the server305can modify a size of the buffer345based on the received metric messages. The size of the buffer345is increased in response to the received metric messages indicating an increase in an end-to-end latency and the size of the buffer345is decreased in response to the received metric messages indicating a decrease in the end-to-end latency.

FIG.4is a block diagram of a processing system400including entities that tag metric messages with corresponding frame tokens according to some embodiments. The processing system400is used to implement some embodiments of the cloud-based processing system100shown inFIG.1, the processing system200shown inFIG.2, and the processing system300shown inFIG.3. The processing system400includes an application405that generates one or more frames for providing a sensory output to a user or player. The processing system400also includes an encoder410to encode the frames for transmission via a network415, the decoder420to decode the frames received via the network415, and a display425that is used to present images to the user or player based on the received frames. As discussed herein, other devices are used to present other sensory output based on the frames in some embodiments.

A frame430is generated by the application410and associated with a frame token435. The frame430and the corresponding frame token435are then processed by the other entities that convey the frame430to the display425. These entities (or a corresponding server or client that hosts the entities) generate metric messages that are fed back to the application410. The metric messages are associated with the frame token435to allow the application410to compensate for end-to-end delays or correlate the frame430with actions or inputs performed by users or players. The metric messages also include timing information for operations performed on the frame430, such as a timestamp indicating a time that an operation was performed.

In the illustrated embodiment, a metric message440and the frame token435are transmitted to the application410in response to dispatching the frame430(or a representation thereof) towards the encoder. A metric message441and the frame token430(or the representation thereof) are transmitted to the application410in response to the encoder415encoding the frame430and the frame token435. A metric message442and the frame token435(or a representation thereof) are transmitted to the application410in response to the encoded frame430and frame token435being transmitted from the server into the network420. A metric message443and the frame token435(or a representation thereof) are transmitted to the application410in response to the encoded frame430and frame token435being received at the client from the network420. A metric message444and the frame token435(or a representation thereof) are transmitted to the application410in response to the encoded frame430and frame token435being successfully decoded by the decoder425. A metric message445and the frame token435(or a representation thereof) are transmitted to the application410in response to the frame430and frame token435being provided to the display425for presentation to the player or user.

FIG.5is a flow diagram of a method500of modifying an application state based on metric messages and associated frame tokens according to some embodiments. The method500is implemented in some embodiments of the cloud-based processing system100shown inFIG.1, the processing system200shown inFIG.2, and the processing system300shown inFIG.3. Some embodiments of the method500are implemented in a server such as the server105shown inFIG.1and the server305shown inFIG.3.

At block505, the server generates a frame token for a frame produced by an application. In some embodiments, the frame token is generated by the server in response to receiving a request to create frame tokens from the application. At block510, the frame and the associated frame token are provided to an encoder implemented in the server. At block515, the encoder encodes the frame and the associated frame token. The server then transmits the encoded information via a network. As discussed herein, metric messages associated with the operations performed in blocks510and515are generated and transmitted back to the application with the associated frame token.

At block520, the server receives one or more additional metric messages in response to providing the encoded information, e.g., to a client. The additional metric messages include or are associated with the frame token included in the encoded information. At block525, the server provides the metric messages and the frame token to the application. At block530, the application modifies the application state (or the state of the server) based on the information included in the metric messages and the frame token. As discussed herein, some embodiments of the application compensate for end-to-end latency, roll back the application state, correlate user/player inputs with the frame, and the like based on the information in the metric messages and the frame token.

FIG.6is a flow diagram of a method600of correlating timing of a transmitted frame with timing of user input according to some embodiments. The method600is implemented in some embodiments of the cloud-based processing system100shown inFIG.1, the processing system200shown inFIG.2, and the processing system300shown inFIG.3.

At block605, a server generates a frame token for a frame produced by an application. At block610, the frame and the associated frame token are transmitted from the server to a client via a network. Some embodiments of the server encode the frame and the associated frame token prior to transmitting the encoded information to the client. At block615, the client presents sensory output based on the received frame to a user or player. As discussed herein, the sensory output can include video, audio, haptic output, and the like.

At decision block620, the client determines whether any player input has been received concurrently with presentation of the sensory output based on the frame. If not, the method600flows back to block605and another frame and corresponding frame token are generated. If the client detects player input concurrent with presentation of the sensory output, the method600flows to block625.

At block625, the client generates a metric message including information representing the player input. The metric message is then transmitted back to the server with the frame token for the frame that was presented concurrently with receiving the player input.

At block630, the server receives the metric message and the frame token. The server provides the metric message and the frame token to the application, which uses the received information to correlate the metric message with the frame associated with the frame token, as discussed herein.

At block635, a state of the application is modified based on the received information. For example, if the received information indicates that a player of a video game application performed a required action by providing an input concurrently with display of the frame associated with the frame token, the state of the video game application is modified to indicate successful completion of the action. If the received information indicates that the player of the video game application did not perform the required action, e.g., by failing to provide the input concurrently with display of the frame associated with the frame token, the state of the video game application is modified to indicate that the action was not successfully completed.

In some embodiments, certain aspects of the techniques described above may implemented by one or more processors of a processing system executing software. The software comprises one or more sets of executable instructions stored or otherwise tangibly embodied on a non-transitory computer readable storage medium. The software can include the instructions and certain data that, when executed by the one or more processors, manipulate the one or more processors to perform one or more aspects of the techniques described above. The non-transitory computer readable storage medium can include, for example, a magnetic or optical disk storage device, solid state storage devices such as Flash memory, a cache, random access memory (RAM) or other non-volatile memory device or devices, and the like. The executable instructions stored on the non-transitory computer readable storage medium may be in source code, assembly language code, object code, or other instruction format that is interpreted or otherwise executable by one or more processors.

A computer readable storage medium may include any storage medium, or combination of storage media, accessible by a computer system during use to provide instructions and/or data to the computer system. Such storage media can include, but is not limited to, optical media (e.g., compact disc (CD), digital versatile disc (DVD), Blu-Ray disc), magnetic media (e.g., floppy disc, magnetic tape, or magnetic hard drive), volatile memory (e.g., random access memory (RAM) or cache), non-volatile memory (e.g., read-only memory (ROM) or Flash memory), or microelectromechanical systems (MEMS)-based storage media. The computer readable storage medium may be embedded in the computing system (e.g., system RAM or ROM), fixedly attached to the computing system (e.g., a magnetic hard drive), removably attached to the computing system (e.g., an optical disc or Universal Serial Bus (USB)-based Flash memory), or coupled to the computer system via a wired or wireless network (e.g., network accessible storage (NAS)).

Note that not all of the activities or elements described above in the general description are required, that a portion of a specific activity or device may not be required, and that one or more further activities may be performed, or elements included, in addition to those described. Still further, the order in which activities are listed are not necessarily the order in which they are performed. Also, the concepts have been described with reference to specific embodiments. However, one of ordinary skill in the art appreciates that various modifications and changes can be made without departing from the scope of the present disclosure as set forth in the claims below. Accordingly, the specification and figures are to be regarded in an illustrative rather than a restrictive sense, and all such modifications are intended to be included within the scope of the present disclosure.

Benefits, other advantages, and solutions to problems have been described above with regard to specific embodiments. However, the benefits, advantages, solutions to problems, and any feature(s) that may cause any benefit, advantage, or solution to occur or become more pronounced are not to be construed as a critical, required, or essential feature of any or all the claims. Moreover, the particular embodiments disclosed above are illustrative only, as the disclosed subject matter may be modified and practiced in different but equivalent manners apparent to those skilled in the art having the benefit of the teachings herein. No limitations are intended to the details of construction or design herein shown, other than as described in the claims below. It is therefore evident that the particular embodiments disclosed above may be altered or modified and all such variations are considered within the scope of the disclosed subject matter. Accordingly, the protection sought herein is as set forth in the claims below.

Claims

  1. A method comprising: generating, at a processor in a server that implements an application, a frame token for a frame that is rendered for the application;providing, to the application, at least one first metric message in response to at least one first operation performed by the server on the frame, the at least one first metric message comprising the frame token and information indicating timing of the at least one first operation;transmitting, from the server for receipt by a client, encoded information representing the frame token and the frame;providing, to the application, at least one second metric message in response to at least one second operation performed by the client on the frame, the at least one second metric message comprising a representation of the frame token and information indicating timing of the at least one second operation;modifying a state of the application based on the at least one first metric message and the at least one second metric message;and rolling back the state of the application in response to at least one input associated with the frame token.
  1. The method of claim 1, wherein the frame comprises at least one of a video frame and an audio frame generated by the application.
  2. The method of claim 1, wherein generating the frame token for the frame comprises generating the frame token in response to a request for the frame token provided by the application.
  3. The method of claim 1, wherein generating the frame token comprises generating the frame token comprising an identification number associated with the frame.
  4. The method of claim 1, wherein providing the at least one first metric message in response to the at least one first operation comprises providing the at least one first metric message in response to at least one of dispatching the frame for encoding, completing encoding of the frame, and transmitting the frame from the server for receipt by the client.
  5. The method of claim 1, wherein: the frame is a video frame;and providing the at least one second metric message in response to the at least one second operation comprises providing the at least one second metric message in response to at least one of receiving the encoded information at the client, successfully decoding the encoded information, and presenting the frame to a player via a display.
  6. The method of claim 6, further comprising: providing, to a video game application, at least one third metric message comprising information representing the frame token for the frame and at least one input provided by the player concurrently with presenting the frame to the player via the display.
  7. The method of claim 7, wherein modifying the state of the application comprises correlating, at the video game application, the at least one input with the frame based on the at least one third metric message.
  8. The method of claim 1, wherein modifying the state of the application comprises compensating for end-to-end latency based on the at least one first metric message and the at least one second metric message.
  9. A server comprising: a memory configured to store program code representing a software application;a processor configured to generate a frame token for a frame that is rendered for the software application and provide, to the software application, at least one first metric message in response to at least one first operation performed by the server on the frame, the at least one first metric message comprising the frame token and information indicating timing of the at least one first operation;and a network interface configured to transmit, for receipt by a client, encoded information representing the frame token and the frame, wherein the processor is configured to provide, to the software application, at least one second metric message in response to at least one second operation performed by the client on the frame, the at least one second metric message comprising the frame token and information indicating timing of the at least one second operation, wherein the processor is configured to modify a state of the software application based on the at least one first metric message and the at least one second metric message, and wherein the processor is configured to roll back the state of the software application in response to at least one input associated with the frame token.
  10. The server of claim 10, wherein the frame comprises at least one of a video frame and an audio frame generated by the software application.
  11. The server of claim 10, wherein the processor is configured to generate the frame token in response to a request for the frame token provided by the software application.
  12. The server of claim 10, wherein the processor is configured to generate a frame token comprising an identification number of the frame.
  13. The server of claim 10, wherein the network interface is configured to provide the at least one first metric message in response to at least one of dispatching the frame for encoding, completing encoding of the frame, and transmitting the frame from the server for receipt by the client.
  14. The server of claim 10, wherein the network interface is configured to provide the at least one second metric message in response to at least one of receiving the encoded information at the client, successfully decoding the encoded information, and presenting the frame to a player.
  15. The server of claim 15, wherein the processor is configured to provide, to the software application, at least one third metric message comprising information representing the frame token for the frame and at least one input provided by the player concurrently with presenting the frame to the player via a display.
  16. The server of claim 16, wherein the processor is configured to correlate the at least one input with the frame based on the at least one third metric message.
  17. The server of claim 10, wherein the processor is configured to compensate for end-to-end latency based on the at least one first metric message and the at least one second metric message and modify the state of the software application based on the end-to-end latency.
  18. A method comprising: receiving, at a client from a server, encoded information representing a frame that is rendered for an application and an associated frame token;decoding, at the client, the encoded information;in response to decoding the encoded information, providing the frame to a display for presentation to a player;transmitting, from the client for receipt by the server, at least one first metric message in response to the decoding and providing the frame to the display, the at least one first metric message comprising information representative of the frame token and information indicating timing of the decoding and providing the frame to the display;and rolling back a state of the application in response to at least one input associated with the frame token.
  19. The method of claim 19, wherein the frame comprises at least one of a video frame and an audio frame generated by the application.
  20. The method of claim 19, wherein the frame token comprises an identification number of the frame.
  21. The method of claim 19, further comprising: detecting at least one input provided by the player concurrently with the frame being presented to the player via the display;and transmitting, to the server, at least one second metric message comprising information representing the frame token for the frame and the at least one input.
  22. An apparatus comprising: a receiver configured to receive, from a server, encoded information representing a frame that is rendered for an application and an associated frame token;a processor configured to decode the encoded information and in response to decoding the encoded information, provide the frame to a display for presentation to a player;and a network interface configured to transmit, towards the server, at least one first metric message in response to the processor decoding and providing the frame to the display, the at least one first metric message comprising the frame token and information indicating timing of the decoding and providing the frame to the display, wherein the processor is configured to roll back a state of the application in response to at least one input associated with the frame token.
  23. The apparatus of claim 23, wherein the frame comprises at least one of a video frame and an audio frame generated by the application.
  24. The apparatus of claim 23, wherein the frame token comprises an identification number of the frame.
  25. The apparatus of claim 23, wherein: the processor is configured to detect at least one input provided by the player concurrently with the frame being presented to the player via the display;and the network interface is configured to transmit, to the server, at least one second metric message comprising information representing the frame token for the frame and the at least one input.

Disclaimer: Data collected from the USPTO and may be malformed, incomplete, and/or otherwise inaccurate.