U.S. Pat. No. 10,573,048

EMOTIONAL REACTION SHARING

Issue DateJuly 25, 2016

Patent Arcade analysis Read the full post

U.S. Patent No. 10,573,048: Emotional reaction sharing

Issued February 25, 2020 to Oath Inc.
Filed/Priority to July 24, 2016

Overview:

U.S. Patent No. 10,573,048 (the ‘048 patent) relates to identifying facial expressions of viewers of content at different points in time, which can be displayed in real time. The ‘048 patent details a device which, through a user reaction distribution service, uses landmark points mapped onto facial features to identify changes in a user’s facial expressions over time, as well as audio from the user to identify the user’s mood. At the user reaction service, the device uses an expression recognition algorithm to track changes in the mapped points, identifying different facial expressions over time and then verifying the expressions with the mood from the user. The total number of viewers’ different facial expressions are evaluated to find the most common expression, which is then sent to viewers live. The expression could be represented with an animation, image, text, or symbol. The ‘048 patent could allow viewers of game streamers to understand the general feelings and mood of an audience without having to read a fastmoving chat box.

Abstract:

One or more computing devices, systems, and/or methods for emotional reaction sharing are provided. For example, a client device captures video of a user viewing content, such as a live stream video. Landmark points, corresponding to facial features of the user, are identified and provided to a user reaction distribution service that evaluates the landmark points to identify a facial expression of the user, such as a crying facial expression. The facial expression, such as landmark points that can be applied to a three-dimensional model of an avatar to recreate the facial expression, are provided to client devices of users viewing the content, such as a second client device. The second client device applies the landmark points of the facial expression to a bone structure mapping and a muscle movement mapping to create an expressive avatar having the facial expression for display to a second user.

Illustrative Claim:

A computing device comprising: a processor; and memory comprising processor-executable instructions that when executed by the processor cause performance of operations, the operations comprising: receiving, at a user reaction distribution service, a first set of landmark points, a second set of landmark points, and a mood of a first user from a client device, wherein: the first set of landmark points represents a set of facial features of the first user at a first point in time and the second set of landmark points represents the set of facial features of the first user at a second point in time while the first user is viewing content through the client device, and the mood is identified at the client device from audio of the first user while the first user is viewing the content through the client device; evaluating, at the user reaction distribution service, the first set of landmark points and the second set of landmark points, using a facial expression recognition algorithm that maps changes in location of landmark points to facial movements indicative of facial expressions, to identify a facial expression of the first user while the first user is viewing the content; verifying the facial expression of the first user based upon the mood; identifying, at the user reaction distribution service, a set of facial expressions of other users viewing the content during a time interval between the first point in time and the second point in time based upon landmark points received from client devices of the other users, wherein: the client device of the first user and the client devices of the other users define a group of client devices, and the facial expression of the first user and the set of facial expressions of other users define a group of facial expressions; ranking, at the user reaction distribution service, the group of facial expressions to determine a most frequently occurring facial expression, amongst the group of facial expressions, during the time interval; and sending, from the user reaction distribution service, the most frequently occurring facial expression to a plurality of client devices amongst the group of client devices in real-time during viewing of the content by the first user and by the other users.

Illustrative Figure

Abstract

Description

DETAILED DESCRIPTION Subject matter will now be described more fully hereinafter with reference to the accompanying drawings, which form a part hereof, and which show, by way of illustration, specific example embodiments. This description is not intended as an extensive or detailed discussion of known concepts. Details that are known generally to those of ordinary skill in the relevant art may have been omitted, or may be handled in summary fashion. The following subject matter may be embodied in a variety of different forms, such as methods, devices, components, and/or systems. Accordingly, this subject matter is not intended to be construed as limited to any example embodiments set forth herein. Rather, example embodiments are provided merely to be illustrative. Such embodiments may, for example, take the form of hardware, software, firmware or any combination thereof. 1. Computing Scenario The following provides a discussion of some types of computing scenarios in which the disclosed subject matter may be utilized and/or implemented. 1.1. Networking FIG. 1is an interaction diagram of a scenario100illustrating a service102provided by a set of servers104to a set of client devices110via various types of networks. The servers104and/or client devices110may be capable of transmitting, receiving, processing, and/or storing many types of signals, such as in memory as physical memory states. The servers104of the service102may be internally connected via a local area network106(LAN), such as a wired network where network adapters on the respective servers104are interconnected via cables (e.g., coaxial and/or fiber optic cabling), and may be connected in various topologies (e.g., buses, token rings, meshes, and/or trees). The servers104may be interconnected directly, or through one or more other networking devices, such as routers, switches, and/or repeaters. The servers104may utilize a variety of physical networking protocols (e.g., Ethernet and/or Fiber Channel) and/or logical networking protocols (e.g., variants of an Internet Protocol ...

DETAILED DESCRIPTION

Subject matter will now be described more fully hereinafter with reference to the accompanying drawings, which form a part hereof, and which show, by way of illustration, specific example embodiments. This description is not intended as an extensive or detailed discussion of known concepts. Details that are known generally to those of ordinary skill in the relevant art may have been omitted, or may be handled in summary fashion.

The following subject matter may be embodied in a variety of different forms, such as methods, devices, components, and/or systems. Accordingly, this subject matter is not intended to be construed as limited to any example embodiments set forth herein. Rather, example embodiments are provided merely to be illustrative. Such embodiments may, for example, take the form of hardware, software, firmware or any combination thereof.

1. Computing Scenario

The following provides a discussion of some types of computing scenarios in which the disclosed subject matter may be utilized and/or implemented.

1.1. Networking

FIG. 1is an interaction diagram of a scenario100illustrating a service102provided by a set of servers104to a set of client devices110via various types of networks. The servers104and/or client devices110may be capable of transmitting, receiving, processing, and/or storing many types of signals, such as in memory as physical memory states.

The servers104of the service102may be internally connected via a local area network106(LAN), such as a wired network where network adapters on the respective servers104are interconnected via cables (e.g., coaxial and/or fiber optic cabling), and may be connected in various topologies (e.g., buses, token rings, meshes, and/or trees). The servers104may be interconnected directly, or through one or more other networking devices, such as routers, switches, and/or repeaters. The servers104may utilize a variety of physical networking protocols (e.g., Ethernet and/or Fiber Channel) and/or logical networking protocols (e.g., variants of an Internet Protocol (IP), a Transmission Control Protocol (TCP), and/or a User Datagram Protocol (UDP). The local area network106may include, e.g., analog telephone lines, such as a twisted wire pair, a coaxial cable, full or fractional digital lines including T1, T2, T3, or T4type lines, Integrated Services Digital Networks (ISDNs), Digital Subscriber Lines (DSLs), wireless links including satellite links, or other communication links or channels, such as may be known to those skilled in the art. The local area network106may be organized according to one or more network architectures, such as server/client, peer-to-peer, and/or mesh architectures, and/or a variety of roles, such as administrative servers, authentication servers, security monitor servers, data stores for objects such as files and databases, business logic servers, time synchronization servers, and/or front-end servers providing a user-facing interface for the service102.

Likewise, the local area network106may comprise one or more sub-networks, such as may employ differing architectures, may be compliant or compatible with differing protocols and/or may interoperate within the local area network106. Additionally, a variety of local area networks106may be interconnected; e.g., a router may provide a link between otherwise separate and independent local area networks106.

In the scenario100ofFIG. 1, the local area network106of the service102is connected to a wide area network108(WAN) that allows the service102to exchange data with other services102and/or client devices110. The wide area network108may encompass various combinations of devices with varying levels of distribution and exposure, such as a public wide-area network (e.g., the Internet) and/or a private network (e.g., a virtual private network (VPN) of a distributed enterprise).

In the scenario100ofFIG. 1, the service102may be accessed via the wide area network108by a user112of one or more client devices110, such as a portable media player (e.g., an electronic text reader, an audio device, or a portable gaming, exercise, or navigation device); a portable communication device (e.g., a camera, a phone, a wearable or a text chatting device); a workstation; and/or a laptop form factor computer. The respective client devices110may communicate with the service102via various connections to the wide area network108. As a first such example, one or more client devices110may comprise a cellular communicator and may communicate with the service102by connecting to the wide area network108via a wireless local area network106provided by a cellular provider. As a second such example, one or more client devices110may communicate with the service102by connecting to the wide area network108via a wireless local area network106provided by a location such as the user's home or workplace (e.g., a WiFi (Institute of Electrical and Electronics Engineers (IEEE) Standard 802.11) network or a Bluetooth (IEEE Standard 802.15.1) personal area network). In this manner, the servers104and the client devices110may communicate over various types of networks. Other types of networks that may be accessed by the servers104and/or client devices110include mass storage, such as network attached storage (NAS), a storage area network (SAN), or other forms of computer or machine readable media.

1.2. Server Configuration

FIG. 2presents a schematic architecture diagram200of a server104that may utilize at least a portion of the techniques provided herein. Such a server104may vary widely in configuration or capabilities, alone or in conjunction with other servers, in order to provide a service such as the service102.

The server104may comprise one or more processors210that process instructions. The one or more processors210may optionally include a plurality of cores; one or more coprocessors, such as a mathematics coprocessor or an integrated graphical processing unit (GPU); and/or one or more layers of local cache memory. The server104may comprise memory202storing various forms of applications, such as an operating system204; one or more server applications206, such as a hypertext transport protocol (HTTP) server, a file transfer protocol (FTP) server, or a simple mail transport protocol (SMTP) server; and/or various forms of data, such as a database208or a file system. The server104may comprise a variety of peripheral components, such as a wired and/or wireless network adapter214connectible to a local area network and/or wide area network; one or more storage components216, such as a hard disk drive, a solid-state storage device (SSD), a flash memory device, and/or a magnetic and/or optical disk reader.

The server104may comprise a mainboard featuring one or more communication buses212that interconnect the processor210, the memory202, and various peripherals, using a variety of bus technologies, such as a variant of a serial or parallel AT Attachment (ATA) bus protocol; a Uniform Serial Bus (USB) protocol; and/or Small Computer System Interface (SCI) bus protocol. In a multibus scenario, a communication bus212may interconnect the server104with at least one other server. Other components that may optionally be included with the server104(though not shown in the schematic architecture diagram200ofFIG. 2) include a display; a display adapter, such as a graphical processing unit (GPU); input peripherals, such as a keyboard and/or mouse; and a flash memory device that may store a basic input/output system (BIOS) routine that facilitates booting the server104to a state of readiness.

The server104may operate in various physical enclosures, such as a desktop or tower, and/or may be integrated with a display as an “all-in-one” device. The server104may be mounted horizontally and/or in a cabinet or rack, and/or may simply comprise an interconnected set of components. The server104may comprise a dedicated and/or shared power supply218that supplies and/or regulates power for the other components. The server104may provide power to and/or receive power from another server and/or other devices. The server104may comprise a shared and/or dedicated climate control unit220that regulates climate properties, such as temperature, humidity, and/or airflow. Many such servers104may be configured and/or adapted to utilize at least a portion of the techniques presented herein.

1.3. Client Device Configuration

FIG. 3presents a schematic architecture diagram300of a client device110whereupon at least a portion of the techniques presented herein may be implemented. Such a client device110may vary widely in configuration or capabilities, in order to provide a variety of functionality to a user such as the user112. The client device110may be provided in a variety of form factors, such as a desktop or tower workstation; an “all-in-one” device integrated with a display308; a laptop, tablet, convertible tablet, or palmtop device; a wearable device mountable in a headset, eyeglass, earpiece, and/or wristwatch, and/or integrated with an article of clothing; and/or a component of a piece of furniture, such as a tabletop, and/or of another device, such as a vehicle or residence. The client device110may serve the user in a variety of roles, such as a workstation, kiosk, media player, gaming device, and/or appliance.

The client device110may comprise one or more processors310that process instructions. The one or more processors310may optionally include a plurality of cores; one or more coprocessors, such as a mathematics coprocessor or an integrated graphical processing unit (GPU); and/or one or more layers of local cache memory. The client device110may comprise memory301storing various forms of applications, such as an operating system303; one or more user applications302, such as document applications, media applications, file and/or data access applications, communication applications such as web browsers and/or email clients, utilities, and/or games; and/or drivers for various peripherals. The client device110may comprise a variety of peripheral components, such as a wired and/or wireless network adapter306connectible to a local area network and/or wide area network; one or more output components, such as a display308coupled with a display adapter (optionally including a graphical processing unit (GPU)), a sound adapter coupled with a speaker, and/or a printer; input devices for receiving input from the user, such as a keyboard311, a mouse, a microphone, a camera, and/or a touch-sensitive component of the display308; and/or environmental sensors, such as a global positioning system (GPS) receiver319that detects the location, velocity, and/or acceleration of the client device110, a compass, accelerometer, and/or gyroscope that detects a physical orientation of the client device110. Other components that may optionally be included with the client device110(though not shown in the schematic architecture diagram300ofFIG. 3) include one or more storage components, such as a hard disk drive, a solid-state storage device (SSD), a flash memory device, and/or a magnetic and/or optical disk reader; and/or a flash memory device that may store a basic input/output system (BIOS) routine that facilitates booting the client device110to a state of readiness; and a climate control unit that regulates climate properties, such as temperature, humidity, and airflow.

The client device110may comprise a mainboard featuring one or more communication buses312that interconnect the processor310, the memory301, and various peripherals, using a variety of bus technologies, such as a variant of a serial or parallel AT Attachment (ATA) bus protocol; the Uniform Serial Bus (USB) protocol; and/or the Small Computer System Interface (SCI) bus protocol. The client device110may comprise a dedicated and/or shared power supply318that supplies and/or regulates power for other components, and/or a battery304that stores power for use while the client device110is not connected to a power source via the power supply318. The client device110may provide power to and/or receive power from other client devices.

In some scenarios, as a user112interacts with a software application on a client device110(e.g., an instant messenger and/or electronic mail application), descriptive content in the form of signals or stored physical states within memory (e.g., an email address, instant messenger identifier, phone number, postal address, message content, date, and/or time) may be identified. Descriptive content may be stored, typically along with contextual content. For example, the source of a phone number (e.g., a communication received from another user via an instant messenger application) may be stored as contextual content associated with the phone number. Contextual content, therefore, may identify circumstances surrounding receipt of a phone number (e.g., the date or time that the phone number was received), and may be associated with descriptive content. Contextual content, may, for example, be used to subsequently search for associated descriptive content. For example, a search for phone numbers received from specific individuals, received via an instant messenger application or at a given date or time, may be initiated. The client device110may include one or more servers that may locally serve the client device110and/or other client devices of the user112and/or other individuals. For example, a locally installed webserver may provide web content in response to locally submitted web requests. Many such client devices110may be configured and/or adapted to utilize at least a portion of the techniques presented herein.

2. Presented Techniques

One or more computing devices and/or techniques for emotional reaction sharing are provided. Users, viewing content such as a live stream of a video, may desire to share emotional reactions to the video. In an example, the users may be provided with access to a chat room through which the users can share messages. However, the chat room can become overwhelming and ineffective for communication as more users join the chat room (e.g., thousands of users may be sharing messages so quickly that a user may be unable to decipher individual messages). The chat room is disruptive to the user experience of the content because the user has to stop viewing the content in order to type messages for sharing emotional reactions. In another example, the users may be able to share real-time video streams of one another, such as through a video conference. However, the amount of data needed to stream real-time videos captured from cameras of client devices is prohibitively large such that merely a few users are able to participate. Because video streaming is very resource intensive and consumes large amounts of bandwidth, video conferencing for sharing emotional reactions is impracticable and non-scalable.

Accordingly, as provided herein, landmark points, corresponding to facial features of a user, are captured at a client device, transmitted to a user reaction distribution service (e.g., one or more servers) for facial expression recognition, and provided to other client devices for reconstructing the facial expression, such as upon a three-dimensional model of an avatar, for display to users. Sharing of user expressions in a non-disruptive manner (e.g., sharing of user expressions without explicit user input otherwise needed to specify an emotional state of the user to share) can now become highly scalable (e.g., to hundreds of thousands of users) with dramatic reductions in computing resource and network bandwidth utilization.

Network bandwidth is reduced because the number of landmark points (e.g., between about 4 landmark points to about 240 landmark points or any other number of landmark points such as 180 landmark points) transmitted per frame from the client device to the user reaction distribution service is merely a fraction of the amount of pixel data per frame (e.g., 200,000 pixels per frame) that would otherwise be transmitted over a network such as for sharing real-time video streams of users (e.g., transferring 180 landmark points as opposed to 200,000 pixels per frame can reduce network bandwidth by about 200 times per user). In this way, network bandwidth is reduced for client devices and the user reaction distribution service. Computing resources of the user reaction distribution service are also reduced because the user reaction distribution service merely processes a small amount of landmarks point for facial expression determination (e.g., evaluation of 10 sets of landmark points per second, where a set of landmark points comprises about 180 landmark points from a frame) compared to evaluating the large amount of pixel data (e.g., evaluation of 600,000 pixels of data per second when 30 frames, each with 200,000 pixels, are sent to and processed by the user reaction distribution service per second). In this way, the user reaction distribution service may be capable of scaling up to hundreds of thousands to millions of users for emotional reaction sharing.

Also, sharing emotional reactions to content can now be done in a non-disruptive manner because the landmark points can be identified from video automatically captured with a camera and transmitted to the user reaction distribution service for facial expression recognition and distribution to other users without users having to stop watching the content and perform manual actions to explicitly input user emotional information such as typing a message or selecting an emotion to share with others. Also, privacy of the user is improved because the facial expression may be reconstructed upon an avatar (e.g., a three-dimensional model of a generic person, a cat, a monster, a robot, etc.) so that a video stream of frames depicting the user are not captured, transmitted to the user reaction distribution service, and sent to other users and client devices. User experience is also improved because the expressive avatar may be displayed within a region of the content that reduces occlusion of visual features of the content so that user's ability to view the content is not disrupted (e.g., visual features of a soccer video may be identified as a soccer ball, soccer player, goal, etc. using video analysis, feature recognition, and entity identification functionality so that an expressive avatar can be inserted into a region of the soccer video that reduces visual overlap or visual occlusion of the visual features).

It may be appreciated that emotional reaction sharing is not limited to merely facial expressions but may pertain to body movement, speech, etc. For example, emotional reaction data may be identified from video, audio, and/or other information relating to a user (e.g., a microphone recording a user crying, a camera capturing a user crossing her arms in frustration, the camera capturing a user waving her arms while cheering, etc.). The emotional reaction data may be shared with other users in a variety of ways. In an example, the emotional reaction data may be shared through the selection of a statement, animation, or image associated with an emotional reaction (e.g., a statement “#$@%{circumflex over ( )}#” illustrating frustration, a two-dimensional image depicting a cheering cat, etc.) for display to other users. In another example, the emotional reaction data may be shared by rendering a three-dimensional body model of an avatar expressing an emotional reaction (e.g., a dog jumping around cheering) In another example, the emotional reaction data may be shared through augmented reality (e.g., while watching a live speech using an augmented reality device, an avatar, image, symbol, text, animation, etc. associated with emotional reactions of users viewing the live speech may be displayed as augmented reality through the augmented reality device). In another example, the emotional reaction data may be shared through video conferencing (e.g., users of a video conference may be represented by avatars through which emotions are expressed, which may drastically reduce resource utilization and bandwidth otherwise used to display live streams of users during the video conference).

It may be appreciated that the user may take affirmative action, such as providing opt-in consent, to allow access to and/or use of an input device (e.g., a microphone, a camera, etc.) of a client device and/or any other type of information, such as for the purpose of emotional reaction recognition and sharing (e.g., where the user responds to a prompt regarding the collection and/or use of such information). The user may also opt-out from providing access to such information or portions thereof (e.g., access may be provided to the microphone but not the camera).

An embodiment of emotional reaction sharing is illustrated by an example method400ofFIG. 4. A user of a client device may be viewing content through the client device. For example, the user may be watching a live newscast of a videogame conference. The user may desire to share emotional reactions of viewing the videogame conference with other users watching the videogame conference. Accordingly, responsive to determining that the user is viewing the videogame conference through the client device, a camera of the client device may be initialized to capture one or more frames of video of the user, at402. It may be appreciated that any type of input capture device may be initialized for obtaining emotional reaction data of the user (e.g., a microphone may be initialized to capture audio content of the user for identifying cheering, crying, excited statements, etc.).

At404, frames, such as a first frame, of the video may be evaluated to identify a set of facial features of the user. In an example, image recognition functionality may be used to evaluate pixels of the first frame to identify lines, shapes, clusters of similar colors, and/or other image features. Entity recognition functionality may be used to determine that the lines, shapes, clusters of similar colors, and/or other image features are indicative of an entity, such as a person, a nose, an eye, glasses, etc. (e.g., facial recognition may be used to identify a mouth from lines, shapes, and colors). In another example, audio recognition functionality may be used to evaluate the audio content of the user to identify audio features, such as pitch, frequency, voices, music, background noise, silence, etc. that may be indicative of audible user reactions and moods of the user such as crying cheering, etc.

At406, a set of landmark points, representing the set of facial features within the first frame, may be generated (e.g., 4 landmark points, corresponding to portions of an eye, may be identified by the image recognition functionality and the entity recognition functionality). In an example, the set of landmark points may comprise locations/coordinates for between about 4 landmark points to about 240 landmark points, such as 180 landmark points or any other number of landmark points. A landmark point may correspond to one or more pixels depicting a facial feature at a certain point in time. In an example, the set of landmark points may comprise a number of landmark points (e.g., 180 landmark points) that is less than one percent or any other percentage of a pixel count of the first frame (e.g., the first frame may comprise hundreds of thousands to millions of pixels). In an example, the set of landmark points are identified in real-time as the user watches the content, and thus sets of landmark points may be identified and tracked (e.g., locations of landmark points may be tracked over time to identify locational changes of the landmark points that are indicative of facial feature movements, such as movement of an ear) so that facial movement such as a facial expression can be identified (e.g., 180 landmark points may be identified per frame at a rate of 10 frames per second or any other framerate).

At408, the set of landmark points may be sent to a user reaction distribution service (e.g., a server) for identification and reconstruction of a facial expression of the user, based upon the set of landmark points, for display through a second client device to a second user that is viewing the content (e.g., and/or for display to other users viewing the content). In an example, the sets of landmark points, identified in real-time as the user watches the content, are sent to the user reaction distribution service for identification and reconstruction of the facial expression. The facial expression may be reconstructed (e.g., at the second client device) upon an avatar, representing the user, for display through the second client device. For example, the avatar may be selected for presentation to the second user based upon the user specifying a preference for being represented to other users using the avatar (e.g., the user may prefer to be depicted as a robot to other users). In another example, the avatar may be selected for presentation to the second user based upon the second user specifying a preference for other users to be represented as the avatar (e.g., the second user may prefer to see users as monsters).

In an example where audio content of the user is captured by the microphone of the client device, the audio content may be evaluated to identify a mood of the user (e.g., audio characteristics indicative of the user screaming angrily). The mood of the user may be sent to the user reaction distribution service for use in identifying the facial expression of the user.

In an example, the camera may be transitioned into an off state or a lower power state in response to determining that the user is not viewing the content. In this way, battery consumption of the client device and network bandwidth used to send landmark points to the user reaction distribution service may be reduced.

FIG. 5illustrates an example of a system500for emotional reaction sharing. A user501of a client device502may access content, such as a live stream of an eSports championship504. The user501may have provided opt-in consent, such as through a share interface506, to share emotional reactions of the user501while viewing the eSports championship504. The user501may select an avatar to represent the user to other users, such as through a select avatar interface508(e.g., the user501may request to be represented by a face of a frog, which improves user privacy and security by not sharing actual depictions of the user501). The user501may also specify what types of avatar are to represent other users to the user501, such as through a select avatar preference interface510(e.g., the user501may specify that other users are to be represented by robots).

Responsive to determining that the user501is viewing the eSports championship504, a camera512and/or a microphone514may be initialized. The camera512may be used to capture one or more frames of video of the user501. The frames may be evaluated to identify facial features of the user501, such as a first ear518, a second ear516, a first eyebrow520, a second eyebrow522, a first eye528, a second eye526, a nose524, a mouth530, and/or other facial features. The frames may be captured and evaluated in real-time to generate sets of landmark points representing locations of facial features within the frames as the user501views the eSports championship504. The microphone514may be utilized to capture audio content of the user501that is evaluated to identify a mood of the user. In this way, the sets of landmark points and/or the mood may be sent from the client device502to a user reaction distribution service (e.g., user reaction distribution service710ofFIG. 7) for reconstructing a facial expression of the user, based upon the sets of landmark points, for display to other users. For example, the sets of landmark points may comprise locations or coordinates of the facial features, and thus may be indicative of locational changes of the facial features representative of facial movement corresponding to facial expressions (e.g., movements of the mouth530may be indicative of the user shouting with excitement).

An embodiment of emotional reaction sharing is illustrated by an example method600ofFIG. 6. A user reaction distribution service (e.g., a server) may be configured to receive sets of landmark points from client devices (e.g., scalable to hundreds of thousands of client devices), evaluate the sets of landmark points to identify facial expressions of users viewing content, and provide a facial expression to client devices (e.g., a highest ranked or most common facial expression) for reconstruction and display to the users. Accordingly, sets of landmark points, such as a first set of landmark points and a second set of landmark points, may be received from a client device of a user viewing the content, such as a live stream of an eSports championship, at602. The sets of landmark points may represent locations of facial features of the user at various points in time while viewing the eSports championship through the client device.

At604, the sets of landmark points may be evaluated, using a facial expression recognition algorithm that maps changes in location of landmark points to facial movements indicative of facial expressions (e.g., coordinate changes of 5 landmark points representing different points of a mouth may be indicative of the user frowning), to identify a facial expression of the user while the user is viewing the eSports championship. In an example, audio content of the user and/or a mood of the user identified from the audio content may be used to identify and/or verify the facial expression. At606, the facial expression (e.g., landmark points used to reconstruct the facial expression upon a model at a second client device for display; a designation of the facial expression, such as frowning, used by the second client device to query an expression repository to identify an image, animation, text, or symbol representative of the facial expression; the image, animation, text, or symbol representative of the facial expression; etc.) may be sent to the second client device, of a second user that is viewing the eSports championship, for display to the second user. The facial expression may be sent to any number of client devices of users that are viewing the eSports championship.

In an example, a set of facial expressions of users viewing the eSports championship may be identified from landmark points provided by client devices of the users. The set of facial expressions may be ranked based upon various metrics such as an interestingness expression metric (e.g., how unique, interesting, exaggerated, and/or expressive is a facial expression, such as changes in landmark points of a user indicating a much larger or uniquely shaped smile than that of other users), a common expression metric (e.g., a more commonly occurring facial expression may be ranked higher than an uncommon outlier facial expression), etc. In this way, the facial expression may be selected from the set of facial expressions to send based upon the facial expression having a rank exceeding a threshold (e.g., a highest ranked facial expression).

In an example, the facial expression recognition algorithm may be trained to identify various types of facial expressions. For example, a training set of landmark points may comprise sets of labeled landmark points having facial expression labels (e.g., a series of labeled landmark points may be indicative of facial movement of a particular type of facial expression such as eye movement indicative of crying). In this way, precision of the facial expression recognition algorithm may be improved.

FIG. 7illustrates an example of a system700for emotional reaction sharing. A user reaction distribution service710(e.g., one or more servers) may be configured to receive landmark points from client devices, identify facial expressions of users of the client devices while viewing content, and provide facial expressions to client devices of users viewing the content (e.g., provide landmark points used by client devices to reconstruct facial expressions upon models of avatars for display to users). For example, the user reaction distribution service710may receive sets of landmark points706from a first client device702and/or landmark points from other client devices such as landmark points708from an nth client device704. The sets of landmark points706may comprise a first set of landmark points of facial features of a user of the first client device702at a first point in time, a second set of landmark points of the facial features of the user at a second point in time, and/or other sets of landmark points of the facial features of the user over time while viewing the content.

The user reaction distribution service710may execute a facial expression recognition algorithm712used to evaluate the set of landmark points706to identify a facial expression of the user. For example, the facial expression recognition algorithm712may identify locations and locational changes of landmark points, which may be indicative of facial movement of the facial features of the user. In this way, the facial expression recognition algorithm712may create a mapping714of landmark point location changes to facial movements (e.g., 7 landmark points corresponding to facial features of a mouth and 10 landmark points of facial features of eyes may change positions over time in a manner indicative of the user crying). The mapping714may be used to determine that the user has a facial expression716of crying. The sets of landmark points706may be evaluated to determine an expressiveness of the crying (e.g., a uniqueness of 6 out of 10 indicative of a moderately expressive crying expression), a uniqueness of the crying (e.g., a uniqueness of 8 out of 10 indicative of a very unique crying expression), etc. In this way, the facial expression716of crying (e.g., landmark points that may be used to reconstruct a crying facial expression and/or other data such as an avatar preference of the user for representing the user to other users) may be sent to client devices, of users viewing the content, for reconstruction and display.

An embodiment of emotional reaction sharing is illustrated by an example method800ofFIG. 8. In an example, a user, viewing content (e.g., a videogame, a video conference, a live stream video, augmented reality, a video, etc.), may be interested in how other users are emotionally reacting to the content. Accordingly, a facial expression, of a second user while the second user is viewing the content, may be received from a user reaction distribution service at a client device of the user while the user is viewing the content, at802. In an example, the facial expression comprises an image, an animation, text, a symbol, or any other information that can be displayed to the user to convey the facial expression. In another example, the facial expression may comprise a description of the facial expression (e.g., a joyous smile) that may be used to query an expression repository for an image, an animation, text, a symbol, etc. having the facial expression for display to the user. In another example, the facial expression comprises a set of landmark points or other data that may be applied to a model of an avatar for display to the user.

At804, the facial expression may be applied to a three-dimensional model of an avatar to generate an expressive avatar having the facial expression. For example, the facial expression may comprise landmark points that may be used to select a bone structure mapping, from a set of bone structure mappings, which comprises facial bone structures that can be used to construct the facial expression upon the three-dimensional model. The landmark points may be used to select a muscle movement mapping, from a set of muscle movement mappings, which comprises muscles and muscle movements that can be applied to the facial bone structures to construct the facial expression upon the three-dimensional model (e.g., facial muscles and movements that can be applied to facial bone structures to create the expressive avatar having a joyous facial expression). It may be appreciated that the expressive avatar is not limited to merely facial expressions but can be constructed from other emotional reaction data, such as landmark points of the second user's body jumping for joy that may be used to create the expressive avatar jumping for joy (e.g., utilizing body bone mappings and body muscle mappings). In an example, the avatar may be selected based upon a user avatar preference (e.g., the user may specify that emotions of other users are to be displayed using cat avatars; the second user may specify that the second user's facial expressions are to be displayed to other users through a robot avatar; etc.).

At806, the expressive avatar may be displayed to the user while the user views the content. In an example, the expressive avatar may be displayed within a separate user interface than a content user interface through which the content is being displayed. In another example, the expressive avatar may be displayed within the content user interface (e.g., overlaid the content). For example, the content may be evaluated to identify a visual feature of the content (e.g., the content may comprise a president speech in a field, where a president is identified as a first entity visual feature, a flag is identified as a second entity visual feature, and the field of grass is identified as a background). A region within the content, into which the expressive avatar may be added without occluding visual features above a threshold amount (e.g., minimized occlusion of the first entity visual feature and the second entity visual feature), may be identified. For example, the region may encompass the field of grass but not the president and not the flag. In an example, a size, transparency, color, or other characteristic of the expressive avatar and/or the content may be modified to reduce occlusion. In this way, the expressive avatar may be displayed within the region.

In another example, the expressive avatar may be displayed within a user interface element (e.g., a bubble). An animation display property may be applied to the user interface element (e.g., a fade property, a movement property, a zoom in or out property, a transparency property, etc.). For example, the animation display property may specify that the user interface element is to expand in size until reaching a size threshold and then the user interface element is to disappear. In this way, the expressive avatar, the facial expression, and/or other emotional reaction information may be conveyed to users.

FIG. 9illustrates an example of a system900for emotional reaction sharing. A user reaction distribution service902(e.g., one or more servers) may have identified facial expressions of users watching an eSports championship910live stream. The user reaction distribution service902may have ranked the facial expressions such as based upon a uniqueness metric (e.g., more unique and/or interesting facial expressions may be ranked higher than less unique and/or boring facial expressions). Accordingly, facial expression data906of a ranked facial expression may be selected from ranked facial expressions904and provided to client devices of users viewing the eSports championship910. For example, the facial expression data906may comprise landmark points that can be applied to models of avatar to create a crying with tears facial expression of a user watching the eSports championship910. The facial expression data906may comprise other metadata and tags, such as an avatar preference of a user from which the crying with tears facial expression was identified (e.g., the avatar preference may indicate that the user prefers to be represented as a cat if other users have not specified how users are to be displayed to them).

The facial expression data906may be provided to a first client device908of a first user and/or other client devices such as an nth client device920of an nth user. At the first client device908, the facial expression data906, such as the landmark points, may be applied to a bone structure mapping and a muscle movement mapping for a cat avatar, corresponding to the avatar preference of the user, to create an expressive cat avatar912having the crying with tears facial expression. The eSports championship910may be evaluated to identify visual features, such as a first robot player916and a second robot player918. A region within the eSports championship910, such as a lower left region, may be identified as having reduced (e.g., minimized) overlap with the first robot player916the second robot player918, and/or other visual features. Accordingly, the expressive cat avatar912may be displayed within the region to reduce occlusion of the visual features.

In an example, the nth client of the nth client device920may express a preference for round head avatars. Accordingly, the facial expression data906, such as the landmark points, may be applied to a bone structure mapping and a muscle movement mapping for a round head avatar, corresponding to the preference of the nth user, to create an expressive round head avatar914having the crying with tears facial expression. In this way, the expressive round head avatar914may be displayed to the nth client while viewing the eSports championship910.

FIG. 10is an illustration of a scenario1000involving an example non-transitory machine readable medium1002. The non-transitory machine readable medium1002may comprise processor-executable instructions1012that when executed by a processor1016cause performance (e.g., by the processor1016) of at least some of the provisions herein. The non-transitory machine readable medium1002may comprise a memory semiconductor (e.g., a semiconductor utilizing static random access memory (SRAM), dynamic random access memory (DRAM), and/or synchronous dynamic random access memory (SDRAM) technologies), a platter of a hard disk drive, a flash memory device, or a magnetic or optical disc (such as a compact disk (CD), a digital versatile disk (DVD), or floppy disk). The example non-transitory machine readable medium1002stores computer-readable data1004that, when subjected to reading1006by a reader1010of a device1008(e.g., a read head of a hard disk drive, or a read operation invoked on a solid-state storage device), express the processor-executable instructions1012. In some embodiments, the processor-executable instructions1012, when executed cause performance of operations, such as at least some of the example method400ofFIG. 4, at least some of the example method600ofFIG. 6, and/or at least some of the example method800ofFIG. 8, for example. In some embodiments, the processor-executable instructions1012are configured to cause implementation of a system, such as at least some of the example system500ofFIG. 5, at least some of the example system700ofFIG. 7, and/or at least some of the example system900ofFIG. 9, for example.

3. Usage of Terms

As used in this application, “component,” “module,” “system”, “interface”, and/or the like are generally intended to refer to a computer-related entity, either hardware, a combination of hardware and software, software, or software in execution. For example, a component may be, but is not limited to being, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on a controller and the controller can be a component. One or more components may reside within a process and/or thread of execution and a component may be localized on one computer and/or distributed between two or more computers.

Unless specified otherwise, “first,” “second,” and/or the like are not intended to imply a temporal aspect, a spatial aspect, an ordering, etc. Rather, such terms are merely used as identifiers, names, etc. for features, elements, items, etc. For example, a first object and a second object generally correspond to object A and object B or two different or two identical objects or the same object.

Moreover, “example” is used herein to mean serving as an example, instance, illustration, etc., and not necessarily as advantageous. As used herein, “or” is intended to mean an inclusive “or” rather than an exclusive “or”. In addition, “a” and “an” as used in this application are generally be construed to mean “one or more” unless specified otherwise or clear from context to be directed to a singular form. Also, at least one of A and B and/or the like generally means A or B or both A and B. Furthermore, to the extent that “includes”, “having”, “has”, “with”, and/or variants thereof are used in either the detailed description or the claims, such terms are intended to be inclusive in a manner similar to the term “comprising”.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing at least some of the claims.

Furthermore, the claimed subject matter may be implemented as a method, apparatus, or article of manufacture using standard programming and/or engineering techniques to produce software, firmware, hardware, or any combination thereof to control a computer to implement the disclosed subject matter. The term “article of manufacture” as used herein is intended to encompass a computer program accessible from any computer-readable device, carrier, or media. Of course, many modifications may be made to this configuration without departing from the scope or spirit of the claimed subject matter.

Various operations of embodiments are provided herein. In an embodiment, one or more of the operations described may constitute computer readable instructions stored on one or more computer readable media, which if executed by a computing device, will cause the computing device to perform the operations described. The order in which some or all of the operations are described should not be construed as to imply that these operations are necessarily order dependent. Alternative ordering will be appreciated by one skilled in the art having the benefit of this description. Further, it will be understood that not all operations are necessarily present in each embodiment provided herein. Also, it will be understood that not all operations are necessary in some embodiments.

Also, although the disclosure has been shown and described with respect to one or more implementations, equivalent alterations and modifications will occur to others skilled in the art based upon a reading and understanding of this specification and the annexed drawings. The disclosure includes all such modifications and alterations and is limited only by the scope of the following claims. In particular regard to the various functions performed by the above described components (e.g., elements, resources, etc.), the terms used to describe such components are intended to correspond, unless otherwise indicated, to any component which performs the specified function of the described component (e.g., that is functionally equivalent), even though not structurally equivalent to the disclosed structure. In addition, while a particular feature of the disclosure may have been disclosed with respect to only one of several implementations, such feature may be combined with one or more other features of the other implementations as may be desired and advantageous for any given or particular application.

Claims

A computing device comprising: a processor;and memory comprising processor-executable instructions that when executed by the processor cause performance of operations, the operations comprising: receiving, at a user reaction distribution service, a first set of landmark points, a second set of landmark points, and a mood of a first user from a client device, wherein: the first set of landmark points represents a set of facial features of the first user at a first point in time and the second set of landmark points represents the set of facial features of the first user at a second point in time while the first user is viewing content through the client device, and the mood is identified at the client device from audio of the first user while the first user is viewing the content through the client device;evaluating, at the user reaction distribution service, the first set of landmark points and the second set of landmark points, using a facial expression recognition algorithm that maps changes in location of landmark points to facial movements indicative of facial expressions, to identify a facial expression of the first user while the first user is viewing the content;verifying the facial expression of the first user based upon the mood;identifying, at the user reaction distribution service, a set of facial expressions of other users viewing the content during a time interval between the first point in time and the second point in time based upon landmark points received from client devices of the other users, wherein: the client device of the first user and the client devices of the other users define a group of client devices, and the facial expression of the first user and the set of facial expressions of other users define a group of facial expressions;ranking, at the user reaction distribution service, the group of facial expressions to determine a most frequently occurring facial expression, amongst the group of facial expressions, during the time interval;and sending, from the user reaction distribution service, the most frequently occurring facial expression to a plurality of client devices amongst the group of client devices in real-time during viewing of the content by the first user and by the other users.

The computing device of claim 1 , wherein the operations comprise: training the facial expression recognition algorithm to identify a type of facial expression based upon a training set of landmark points associated with the type of facial expression, the training set of landmark points corresponding to facial movement indicative of the type of facial expression.
The computing device of claim 1 , wherein: the first user and the other users define a group of users, and the operations comprise: identifying, at the user reaction distribution service, a second set of facial expressions of the group of users viewing the content;ranking, at the user reaction distribution service, the second set of facial expressions based upon frequency;identifying, at the user reaction distribution service, a change to the most frequently occurring facial expression based upon the ranking the second set of facial expressions;and sending, from the user reaction distribution service, an update to the plurality of client devices in real-time during viewing of the content by the group of users based upon the change.
The computing device of claim 3 , wherein: the second set of facial expressions comprises facial expressions of the group of users during a second time interval, and the time interval corresponds to a first interval of time during which a first portion of the content is viewed and the second time interval corresponds to a second interval of time during which a second portion of the content is viewed.
The computing device of claim 1 , wherein the sending comprises sending a set of landmark points corresponding to the most frequently occurring facial expression to the plurality of client devices.
The computing device of claim 1 , wherein the sending comprises sending at least one of an image, animation, text, or symbol representative of the most frequently occurring facial expression to the plurality of client devices.
A method of emotional reaction sharing, the method comprising: receiving, at a user reaction distribution service, a first set of landmark points, a second set of landmark points, and a mood of a first user from a client device, wherein: the first set of landmark points represents a set of facial features of the first user at a first point in time and the second set of landmark points represents the set of facial features of the first user at a second point in time while the first user is viewing content through the client device, and the mood is identified at the client device from audio of the first user while the first user is viewing the content through the client device;evaluating, at the user reaction distribution service, the first set of landmark points and the second set of landmark points, using a facial expression recognition algorithm that maps changes in location of landmark points to facial movements indicative of facial expressions, to identify a facial expression of the first user while the first user is viewing the content;verifying the facial expression of the first user based upon the mood;identifying, at the user reaction distribution service, a set of facial expressions of other users viewing the content during a time interval between the first point in time and the second point in time based upon landmark points received from client devices of the other users, wherein: the client device of the first user and the client devices of the other users define a group of client devices, and the facial expression of the first user and the set of facial expressions of other users define a group of facial expressions;ranking, at the user reaction distribution service, the group of facial expressions to determine a most frequently occurring facial expression, amongst the group of facial expressions, during the time interval;and sending, from the user reaction distribution service, the most frequently occurring facial expression to a plurality of client devices amongst the group of client devices in real-time during viewing of the content by the first user and by the other users.
The method of claim 7 , comprising: training the facial expression recognition algorithm to identify a type of facial expression based upon a training set of landmark points associated with the type of facial expression, the training set of landmark points corresponding to facial movement indicative of the type of facial expression.
The method of claim 7 , wherein: the first user and the other users define a group of users, and the method comprises: identifying, at the user reaction distribution service, a second set of facial expressions of the group of users viewing the content;ranking, at the user reaction distribution service, the second set of facial expressions based upon frequency;identifying, at the user reaction distribution service, a change to the most frequently occurring facial expression based upon the ranking the second set of facial expressions;and sending, from the user reaction distribution service, an update to the plurality of client devices in real-time during viewing of the content by the group of users based upon the change.
The method of claim 9 , wherein: the second set of facial expressions comprises facial expressions of the group of users during a second time interval, and the time interval corresponds to a first interval of time during which a first portion of the content is viewed and the second time interval corresponds to a second interval of time during which a second portion of the content is viewed.
A non-transitory machine readable medium having stored thereon processor-executable instructions that when executed cause performance of operations, the operations comprising: receiving, at a user reaction distribution service, a first set of landmark points, a second set of landmark points, and a mood of a first user from a client device, wherein: the first set of landmark points represents a set of facial features of the first user at a first point in time and the second set of landmark points represents the set of facial features of the first user at a second point in time while the first user is viewing content through the client device, and the mood is identified at the client device from audio of the first user while the first user is viewing the content through the client device;evaluating, at the user reaction distribution service, the first set of landmark points and the second set of landmark points, using a facial expression recognition algorithm that maps changes in location of landmark points to facial movements indicative of facial expressions, to identify a facial expression of the first user while the first user is viewing the content;verifying the facial expression of the first user based upon the mood;identifying, at the user reaction distribution service, a set of facial expressions of other users viewing the content during a time interval between the first point in time and the second point in time based upon landmark points received from client devices of the other users, wherein: the client device of the first user and the client devices of the other users define a group of client devices, the first user and the other users define a group of users, and the facial expression of the first user and the set of facial expressions of other users define a group of facial expressions;ranking, at the user reaction distribution service, the group of facial expressions to determine a most frequently occurring facial expression, amongst the group of facial expressions, during the time interval;and sending, from the user reaction distribution service, the most frequently occurring facial expression to a plurality of client devices amongst the group of client devices in real-time during viewing of the content by the first user and by the other users.
The non-transitory machine readable medium of claim 11 , comprising: training the facial expression recognition algorithm to identify a type of facial expression based upon a training set of landmark points associated with the type of facial expression, the training set of landmark points corresponding to facial movement indicative of the type of facial expression.
The non-transitory machine readable medium of claim 11 , wherein the sending the most frequently occurring facial expression comprises sending a set of landmark points corresponding to the most frequently occurring facial expression to the plurality of client devices.
The non-transitory machine readable medium of claim 11 , the operations comprising: identifying, at the user reaction distribution service, a second set of facial expressions of the group of users viewing the content;ranking, at the user reaction distribution service, the second set of facial expressions based upon frequency;identifying, at the user reaction distribution service, a change to the most frequently occurring facial expression based upon the ranking the second set of facial expressions;and sending, from the user reaction distribution service, an update to the plurality of client devices in real-time during viewing of the content by the group of users based upon the change.
The non-transitory machine readable medium of claim 14 , wherein: the second set of facial expressions comprises facial expressions of the group of users during a second time interval, and the time interval corresponds to a first interval of time during which a first portion of the content is viewed and the second time interval corresponds to a second interval of time during which a second portion of the content is viewed.
The non-transitory machine readable medium of claim 14 , wherein the sending the update comprises sending a set of landmark points corresponding to the most frequently occurring facial expression to the plurality of client devices.
The non-transitory machine readable medium of claim 11 , the operations comprising: identifying, at the user reaction distribution service, a second set of facial expressions of the group of users viewing the content;ranking, at the user reaction distribution service, the second set of facial expressions based upon frequency;and identifying, at the user reaction distribution service, a change to the most frequently occurring facial expression based upon the ranking the second set of facial expressions.
The non-transitory machine readable medium of claim 17 , wherein: the second set of facial expressions comprises facial expressions of the group of users during a second time interval, and the time interval corresponds to a first interval of time during which a first portion of the content is viewed and the second time interval corresponds to a second interval of time during which a second portion of the content is viewed.

More Claims Show Fewer Claims

Disclaimer: Data collected from the USPTO and may be malformed, incomplete, and/or otherwise inaccurate.