U.S. Pat. No. 10,297,081

METHOD FOR COMMUNICATION VIA VIRTUAL SPACE AND SYSTEM FOR EXECUTING THE METHOD ON COMPUTER

Issue DateDecember 25, 2017

Illustrative Figure

Abstract

A method includes defining a virtual space associated with a first user. The virtual space is associated with a first head-mounted device (HMD). The virtual space includes an avatar object associated with a second user. The method includes receiving line-of-sight data on the second user, wherein the second user is associated with a second HMD. The method includes receiving sound data that is based on utterance of the second user at a timing different from that of the line-of-sight data. The method includes synchronizing a timing of controlling the avatar object in accordance with the line-of-sight data and a timing of outputting sound that is based on the sound data from the first HMD. The method includes controlling the avatar object in accordance with the line-of-sight data based on the synchronized timing. The method includes outputting the sound that is based on the synchronized timing.

Description

DETAILED DESCRIPTION Now, with reference to the drawings, embodiments of this technical idea are described in detail. In the following description, like components are denoted by like reference symbols. The same applies to the names and functions of those components. Therefore, detailed description of those components is not repeated. In one or more embodiments described in this disclosure, components of respective embodiments can be combined with each other, and the combination also serves as a part of the embodiments described in this disclosure. [Configuration of HMD System] With reference toFIG. 1, a configuration of a head-mounted device (HMD) system100is described.FIG. 1is a diagram of a system100including a head-mounted display (HMD) according to at least one embodiment of this disclosure. The system100is usable for household use or for professional use. The system100includes a server600, HMD sets110A,110B,110C, and110D, an external device700, and a network2. Each of the HMD sets110A,110B,110C, and110D is capable of independently communicating to/from the server600or the external device700via the network2. In some instances, the HMD sets110A,110B,110C, and110D are also collectively referred to as “HMD set110”. The number of HMD sets110constructing the HMD system100is not limited to four, but may be three or less, or five or more. The HMD set110includes an HMD120, a computer200, an HMD sensor410, a display430, and a controller300. The HMD120includes a monitor130, an eye gaze sensor140, a first camera150, a second camera160, a microphone170, and a speaker180. In at least one embodiment, the controller300includes a motion sensor420. In at least one aspect, the computer200is connected to the network2, for example, the Internet, and is able to communicate to/from the server600or other computers connected to the network2in a wired or wireless manner. Examples of the other computers include a computer of another HMD set110or the external device700. In at least one aspect, the HMD120includes a sensor190instead ...

DETAILED DESCRIPTION

Now, with reference to the drawings, embodiments of this technical idea are described in detail. In the following description, like components are denoted by like reference symbols. The same applies to the names and functions of those components. Therefore, detailed description of those components is not repeated. In one or more embodiments described in this disclosure, components of respective embodiments can be combined with each other, and the combination also serves as a part of the embodiments described in this disclosure.

[Configuration of HMD System]

With reference toFIG. 1, a configuration of a head-mounted device (HMD) system100is described.FIG. 1is a diagram of a system100including a head-mounted display (HMD) according to at least one embodiment of this disclosure. The system100is usable for household use or for professional use.

The system100includes a server600, HMD sets110A,110B,110C, and110D, an external device700, and a network2. Each of the HMD sets110A,110B,110C, and110D is capable of independently communicating to/from the server600or the external device700via the network2. In some instances, the HMD sets110A,110B,110C, and110D are also collectively referred to as “HMD set110”. The number of HMD sets110constructing the HMD system100is not limited to four, but may be three or less, or five or more. The HMD set110includes an HMD120, a computer200, an HMD sensor410, a display430, and a controller300. The HMD120includes a monitor130, an eye gaze sensor140, a first camera150, a second camera160, a microphone170, and a speaker180. In at least one embodiment, the controller300includes a motion sensor420.

In at least one aspect, the computer200is connected to the network2, for example, the Internet, and is able to communicate to/from the server600or other computers connected to the network2in a wired or wireless manner. Examples of the other computers include a computer of another HMD set110or the external device700. In at least one aspect, the HMD120includes a sensor190instead of the HMD sensor410. In at least one aspect, the HMD120includes both sensor190and the HMD sensor410.

The HMD120is wearable on a head of a user5to display a virtual space to the user5during operation. More specifically, in at least one embodiment, the HMD120displays each of a right-eye image and a left-eye image on the monitor130. Each eye of the user5is able to visually recognize a corresponding image from the right-eye image and the left-eye image so that the user5may recognize a three-dimensional image based on the parallax of both of the user's the eyes. In at least one embodiment, the HMD120includes any one of a so-called head-mounted display including a monitor or a head-mounted device capable of mounting a smartphone or other terminals including a monitor.

The monitor130is implemented as, for example, a non-transmissive display device. In at least one aspect, the monitor130is arranged on a main body of the HMD120so as to be positioned in front of both the eyes of the user5. Therefore, when the user5is able to visually recognize the three-dimensional image displayed by the monitor130, the user5is immersed in the virtual space. In at least one aspect, the virtual space includes, for example, a background, objects that are operable by the user5, or menu images that are selectable by the user5. In at least one aspect, the monitor130is implemented as a liquid crystal monitor or an organic electroluminescence (EL) monitor included in a so-called smartphone or other information display terminals.

In at least one aspect, the monitor130is implemented as a transmissive display device. In this case, the user5is able to see through the HMD120covering the eyes of the user5, for example, smartglasses. In at least one embodiment, the transmissive monitor130is configured as a temporarily non-transmissive display device through adjustment of a transmittance thereof. In at least one embodiment, the monitor130is configured to display a real space and a part of an image constructing the virtual space simultaneously. For example, in at least one embodiment, the monitor130displays an image of the real space captured by a camera mounted on the HMD120, or may enable recognition of the real space by setting the transmittance of a part the monitor130sufficiently high to permit the user5to see through the HMD120.

In at least one aspect, the monitor130includes a sub-monitor for displaying a right-eye image and a sub-monitor for displaying a left-eye image. In at least one aspect, the monitor130is configured to integrally display the right-eye image and the left-eye image. In this case, the monitor130includes a high-speed shutter. The high-speed shutter operates so as to alternately display the right-eye image to the right of the user5and the left-eye image to the left eye of the user5, so that only one of the user's5eyes is able to recognize the image at any single point in time.

In at least one aspect, the HMD120includes a plurality of light sources (not shown). Each light source is implemented by, for example, a light emitting diode (LED) configured to emit an infrared ray. The HMD sensor410has a position tracking function for detecting the motion of the HMD120. More specifically, the HMD sensor410reads a plurality of infrared rays emitted by the HMD120to detect the position and the inclination of the HMD120in the real space.

In at least one aspect, the HMD sensor410is implemented by a camera. In at least one aspect, the HMD sensor410uses image information of the HMD120output from the camera to execute image analysis processing, to thereby enable detection of the position and the inclination of the HMD120.

In at least one aspect, the HMD120includes the sensor190instead of, or in addition to, the HMD sensor410as a position detector. In at least one aspect, the HMD120uses the sensor190to detect the position and the inclination of the HMD120. For example, in at least one embodiment, when the sensor190is an angular velocity sensor, a geomagnetic sensor, or an acceleration sensor, the HMD120uses any or all of those sensors instead of (or in addition to) the HMD sensor410to detect the position and the inclination of the HMD120. As an example, when the sensor190is an angular velocity sensor, the angular velocity sensor detects over time the angular velocity about each of three axes of the HMD120in the real space. The HMD120calculates a temporal change of the angle about each of the three axes of the HMD120based on each angular velocity, and further calculates an inclination of the HMD120based on the temporal change of the angles.

The eye gaze sensor140detects a direction in which the lines of sight of the right eye and the left eye of the user5are directed. That is, the eye gaze sensor140detects the line of sight of the user5. The direction of the line of sight is detected by, for example, a known eye tracking function. The eye gaze sensor140is implemented by a sensor having the eye tracking function. In at least one aspect, the eye gaze sensor140includes a right-eye sensor and a left-eye sensor. In at least one embodiment, the eye gaze sensor140is, for example, a sensor configured to irradiate the right eye and the left eye of the user5with an infrared ray, and to receive reflection light from the cornea and the iris with respect to the irradiation light, to thereby detect a rotational angle of each of the user's5eyeballs. In at least one embodiment, the eye gaze sensor140detects the line of sight of the user5based on each detected rotational angle.

The first camera150photographs a lower part of a face of the user5. More specifically, the first camera150photographs, for example, the nose or mouth of the user5. The second camera160photographs, for example, the eyes and eyebrows of the user5. A side of a casing of the HMD120on the user5side is defined as an interior side of the HMD120, and a side of the casing of the HMD120on a side opposite to the user5side is defined as an exterior side of the HMD120. In at least one aspect, the first camera150is arranged on an exterior side of the HMD120, and the second camera160is arranged on an interior side of the HMD120. Images generated by the first camera150and the second camera160are input to the computer200. In at least one aspect, the first camera150and the second camera160are implemented as a single camera, and the face of the user5is photographed with this single camera.

The microphone170converts an utterance of the user5into a voice signal (electric signal) for output to the computer200. The speaker180converts the voice signal into a voice for output to the user5. In at least one embodiment, the speaker180converts other signals into audio information provided to the user5. In at least one aspect, the HMD120includes earphones in place of the speaker180.

The controller300is connected to the computer200through wired or wireless communication. The controller300receives input of a command from the user5to the computer200. In at least one aspect, the controller300is held by the user5. In at least one aspect, the controller300is mountable to the body or a part of the clothes of the user5. In at least one aspect, the controller300is configured to output at least any one of a vibration, a sound, or light based on the signal transmitted from the computer200. In at least one aspect, the controller300receives from the user5an operation for controlling the position and the motion of an object arranged in the virtual space.

In at least one aspect, the controller300includes a plurality of light sources. Each light source is implemented by, for example, an LED configured to emit an infrared ray. The HMD sensor410has a position tracking function. In this case, the HMD sensor410reads a plurality of infrared rays emitted by the controller300to detect the position and the inclination of the controller300in the real space. In at least one aspect, the HMD sensor410is implemented by a camera. In this case, the HMD sensor410uses image information of the controller300output from the camera to execute image analysis processing, to thereby enable detection of the position and the inclination of the controller300.

In at least one aspect, the motion sensor420is mountable on the hand of the user5to detect the motion of the hand of the user5. For example, the motion sensor420detects a rotational speed, a rotation angle, and the number of rotations of the hand. The detected signal is transmitted to the computer200. The motion sensor420is provided to, for example, the controller300. In at least one aspect, the motion sensor420is provided to, for example, the controller300capable of being held by the user5. In at least one aspect, to help prevent accidently release of the controller300in the real space, the controller300is mountable on an object like a glove-type object that does not easily fly away by being worn on a hand of the user5. In at least one aspect, a sensor that is not mountable on the user5detects the motion of the hand of the user5. For example, a signal of a camera that photographs the user5may be input to the computer200as a signal representing the motion of the user5. As at least one example, the motion sensor420and the computer200are connected to each other through wired or wireless communication. In the case of wireless communication, the communication mode is not particularly limited, and for example, Bluetooth (trademark) or other known communication methods are usable.

The display430displays an image similar to an image displayed on the monitor130. With this, a user other than the user5wearing the HMD120can also view an image similar to that of the user5. An image to be displayed on the display430is not required to be a three-dimensional image, but may be a right-eye image or a left-eye image. For example, a liquid crystal display or an organic EL monitor may be used as the display430.

In at least one embodiment, the server600transmits a program to the computer200. In at least one aspect, the server600communicates to/from another computer200for providing virtual reality to the HMD120used by another user. For example, when a plurality of users play a participatory game, for example, in an amusement facility, each computer200communicates to/from another computer200via the server600with a signal that is based on the motion of each user, to thereby enable the plurality of users to enjoy a common game in the same virtual space. Each computer200may communicate to/from another computer200with the signal that is based on the motion of each user without intervention of the server600.

The external device700is any suitable device as long as the external device700is capable of communicating to/from the computer200. The external device700is, for example, a device capable of communicating to/from the computer200via the network2, or is a device capable of directly communicating to/from the computer200by near field communication or wired communication. Peripheral devices such as a smart device, a personal computer (PC), or the computer200are usable as the external device700, in at least one embodiment, but the external device700is not limited thereto.

[Hardware Configuration of Computer]

With reference toFIG. 2, the computer200in at least one embodiment is described.FIG. 2is a block diagram of a hardware configuration of the computer200according to at least one embodiment. The computer200includes, a processor210, a memory220, a storage230, an input/output interface240, and a communication interface250. Each component is connected to a bus260. In at least one embodiment, at least one of the processor210, the memory220, the storage230, the input/output interface240or the communication interface250is part of a separate structure and communicates with other components of computer200through a communication path other than the bus260.

The processor210executes a series of commands included in a program stored in the memory220or the storage230based on a signal transmitted to the computer200or in response to a condition determined in advance. In at least one aspect, the processor210is implemented as a central processing unit (CPU), a graphics processing unit (GPU), a micro-processor unit (MPU), a field-programmable gate array (FPGA), or other devices.

The memory220temporarily stores programs and data. The programs are loaded from, for example, the storage230. The data includes data input to the computer200and data generated by the processor210. In at least one aspect, the memory220is implemented as a random access memory (RAM) or other volatile memories.

The storage230permanently stores programs and data. In at least one embodiment, the storage230stores programs and data for a period of time longer than the memory220, but not permanently. The storage230is implemented as, for example, a read-only memory (ROM), a hard disk device, a flash memory, or other non-volatile storage devices. The programs stored in the storage230include programs for providing a virtual space in the system100, simulation programs, game programs, user authentication programs, and programs for implementing communication to/from other computers200. The data stored in the storage230includes data and objects for defining the virtual space.

In at least one aspect, the storage230is implemented as a removable storage device like a memory card. In at least one aspect, a configuration that uses programs and data stored in an external storage device is used instead of the storage230built into the computer200. With such a configuration, for example, in a situation in which a plurality of HMD systems100are used, for example in an amusement facility, the programs and the data are collectively updated.

The input/output interface240allows communication of signals among the HMD120, the HMD sensor410, the motion sensor420, and the display430. The monitor130, the eye gaze sensor140, the first camera150, the second camera160, the microphone170, and the speaker180included in the HMD120may communicate to/from the computer200via the input/output interface240of the HMD120. In at least one aspect, the input/output interface240is implemented with use of a universal serial bus (USB), a digital visual interface (DVI), a high-definition multimedia interface (HDMI) (trademark), or other terminals. The input/output interface240is not limited to the specific examples described above.

In at least one aspect, the input/output interface240further communicates to/from the controller300. For example, the input/output interface240receives input of a signal output from the controller300and the motion sensor420. In at least one aspect, the input/output interface240transmits a command output from the processor210to the controller300. The command instructs the controller300to, for example, vibrate, output a sound, or emit light. When the controller300receives the command, the controller300executes anyone of vibration, sound output, and light emission in accordance with the command.

The communication interface250is connected to the network2to communicate to/from other computers (e.g., server600) connected to the network2. In at least one aspect, the communication interface250is implemented as, for example, a local area network (LAN), other wired communication interfaces, wireless fidelity (Wi-Fi), Bluetooth®, near field communication (NFC), or other wireless communication interfaces. The communication interface250is not limited to the specific examples described above.

In at least one aspect, the processor210accesses the storage230and loads one or more programs stored in the storage230to the memory220to execute a series of commands included in the program. In at least one embodiment, the one or more programs includes an operating system of the computer200, an application program for providing a virtual space, and/or game software that is executable in the virtual space. The processor210transmits a signal for providing a virtual space to the HMD120via the input/output interface240. The HMD120displays a video on the monitor130based on the signal.

InFIG. 2, the computer200is outside of the HMD120, but in at least one aspect, the computer200is integral with the HMD120. As an example, a portable information communication terminal (e.g., smartphone) including the monitor130functions as the computer200in at least one embodiment.

In at least one embodiment, the computer200is used in common with a plurality of HMDs120. With such a configuration, for example, the computer200is able to provide the same virtual space to a plurality of users, and hence each user can enjoy the same application with other users in the same virtual space.

According to at least one embodiment of this disclosure, in the system100, a real coordinate system is set in advance. The real coordinate system is a coordinate system in the real space. The real coordinate system has three reference directions (axes) that are respectively parallel to a vertical direction, a horizontal direction orthogonal to the vertical direction, and a front-rear direction orthogonal to both of the vertical direction and the horizontal direction in the real space. The horizontal direction, the vertical direction (up-down direction), and the front-rear direction in the real coordinate system are defined as an x axis, a y axis, and a z axis, respectively. More specifically, the x axis of the real coordinate system is parallel to the horizontal direction of the real space, the y axis thereof is parallel to the vertical direction of the real space, and the z axis thereof is parallel to the front-rear direction of the real space.

In at least one aspect, the HMD sensor410includes an infrared sensor. When the infrared sensor detects the infrared ray emitted from each light source of the HMD120, the infrared sensor detects the presence of the HMD120. The HMD sensor410further detects the position and the inclination (direction) of the HMD120in the real space, which corresponds to the motion of the user5wearing the HMD120, based on the value of each point (each coordinate value in the real coordinate system). In more detail, the HMD sensor410is able to detect the temporal change of the position and the inclination of the HMD120with use of each value detected over time.

Each inclination of the HMD120detected by the HMD sensor410corresponds to an inclination about each of the three axes of the HMD120in the real coordinate system. The HMD sensor410sets a uvw visual-field coordinate system to the HMD120based on the inclination of the HMD120in the real coordinate system. The uvw visual-field coordinate system set to the HMD120corresponds to a point-of-view coordinate system used when the user5wearing the HMD120views an object in the virtual space.

[Uvw Visual-field Coordinate System]

With reference toFIG. 3, the uvw visual-field coordinate system is described.FIG. 3is a diagram of a uvw visual-field coordinate system to be set for the HMD120according to at least one embodiment of this disclosure. The HMD sensor410detects the position and the inclination of the HMD120in the real coordinate system when the HMD120is activated. The processor210sets the uvw visual-field coordinate system to the HMD120based on the detected values.

InFIG. 3, the HMD120sets the three-dimensional uvw visual-field coordinate system defining the head of the user5wearing the HMD120as a center (origin). More specifically, the HMD120sets three directions newly obtained by inclining the horizontal direction, the vertical direction, and the front-rear direction (x axis, y axis, and z axis), which define the real coordinate system, about the respective axes by the inclinations about the respective axes of the HMD120in the real coordinate system, as a pitch axis (u axis), a yaw axis (v axis), and a roll axis (w axis) of the uvw visual-field coordinate system in the HMD120.

In at least one aspect, when the user5wearing the HMD120is standing (or sitting) upright and is visually recognizing the front side, the processor210sets the uvw visual-field coordinate system that is parallel to the real coordinate system to the HMD120. In this case, the horizontal direction (x axis), the vertical direction (y axis), and the front-rear direction (z axis) of the real coordinate system directly match the pitch axis (u axis), the yaw axis (v axis), and the roll axis (w axis) of the uvw visual-field coordinate system in the HMD120, respectively.

After the uvw visual-field coordinate system is set to the HMD120, the HMD sensor410is able to detect the inclination of the HMD120in the set uvw visual-field coordinate system based on the motion of the HMD120. In this case, the HMD sensor410detects, as the inclination of the HMD120, each of a pitch angle (θu), a yaw angle (θv), and a roll angle (θw) of the HMD120in the uvw visual-field coordinate system. The pitch angle (θu) represents an inclination angle of the HMD120about the pitch axis in the uvw visual-field coordinate system. The yaw angle (θv) represents an inclination angle of the HMD120about the yaw axis in the uvw visual-field coordinate system. The roll angle (θw) represents an inclination angle of the HMD120about the roll axis in the uvw visual-field coordinate system.

The HMD sensor410sets, to the HMD120, the uvw visual-field coordinate system of the HMD120obtained after the movement of the HMD120based on the detected inclination angle of the HMD120. The relationship between the HMD120and the uvw visual-field coordinate system of the HMD120is constant regardless of the position and the inclination of the HMD120. When the position and the inclination of the HMD120change, the position and the inclination of the uvw visual-field coordinate system of the HMD120in the real coordinate system change in synchronization with the change of the position and the inclination.

In at least one aspect, the HMD sensor410identifies the position of the HMD120in the real space as a position relative to the HMD sensor410based on the light intensity of the infrared ray or a relative positional relationship between a plurality of points (e.g., distance between points), which is acquired based on output from the infrared sensor. In at least one aspect, the processor210determines the origin of the uvw visual-field coordinate system of the HMD120in the real space (real coordinate system) based on the identified relative position.

[Virtual Space]

With reference toFIG. 4, the virtual space is further described.FIG. 4is a diagram of a mode of expressing a virtual space11according to at least one embodiment of this disclosure. The virtual space11has a structure with an entire celestial sphere shape covering a center12in all 360-degree directions. InFIG. 4, for the sake of clarity, only the upper-half celestial sphere of the virtual space11is included. Each mesh section is defined in the virtual space11. The position of each mesh section is defined in advance as coordinate values in an XYZ coordinate system, which is a global coordinate system defined in the virtual space11. The computer200associates each partial image forming a panorama image13(e.g., still image or moving image) that is developed in the virtual space11with each corresponding mesh section in the virtual space11.

In at least one aspect, in the virtual space11, the XYZ coordinate system having the center12as the origin is defined. The XYZ coordinate system is, for example, parallel to the real coordinate system. The horizontal direction, the vertical direction (up-down direction), and the front-rear direction of the XYZ coordinate system are defined as an X axis, a Y axis, and a Z axis, respectively. Thus, the X axis (horizontal direction) of the XYZ coordinate system is parallel to the x axis of the real coordinate system, the Y axis (vertical direction) of the XYZ coordinate system is parallel to the y axis of the real coordinate system, and the Z axis (front-rear direction) of the XYZ coordinate system is parallel to the z axis of the real coordinate system.

When the HMD120is activated, that is, when the HMD120is in an initial state, a virtual camera14is arranged at the center12of the virtual space11. In at least one embodiment, the virtual camera14is offset from the center12in the initial state. In at least one aspect, the processor210displays on the monitor130of the HMD120an image photographed by the virtual camera14. In synchronization with the motion of the HMD120in the real space, the virtual camera14similarly moves in the virtual space11. With this, the change in position and direction of the HMD120in the real space is reproduced similarly in the virtual space11.

The uvw visual-field coordinate system is defined in the virtual camera14similarly to the case of the HMD120. The uvw visual-field coordinate system of the virtual camera14in the virtual space11is defined to be synchronized with the uvw visual-field coordinate system of the HMD120in the real space (real coordinate system). Therefore, when the inclination of the HMD120changes, the inclination of the virtual camera14also changes in synchronization therewith. The virtual camera14can also move in the virtual space11in synchronization with the movement of the user5wearing the HMD120in the real space.

The processor210of the computer200defines a field-of-view region15in the virtual space11based on the position and inclination (reference line of sight16) of the virtual camera14. The field-of-view region15corresponds to, of the virtual space11, the region that is visually recognized by the user5wearing the HMD120. That is, the position of the virtual camera14determines a point of view of the user5in the virtual space11.

The line of sight of the user5detected by the eye gaze sensor140is a direction in the point-of-view coordinate system obtained when the user5visually recognizes an object. The uvw visual-field coordinate system of the HMD120is equal to the point-of-view coordinate system used when the user5visually recognizes the monitor130. The uvw visual-field coordinate system of the virtual camera14is synchronized with the uvw visual-field coordinate system of the HMD120. Therefore, in the system100in at least one aspect, the line of sight of the user5detected by the eye gaze sensor140can be regarded as the line of sight of the user5in the uvw visual-field coordinate system of the virtual camera14.

[User's Line of Sight]

With reference toFIG. 5, determination of the line of sight of the user5is described.FIG. 5is a plan view diagram of the head of the user5wearing the HMD120according to at least one embodiment of this disclosure.

In at least one aspect, the eye gaze sensor140detects lines of sight of the right eye and the left eye of the user5. In at least one aspect, when the user5is looking at a near place, the eye gaze sensor140detects lines of sight R1and L1. In at least one aspect, when the user5is looking at a far place, the eye gaze sensor140detects lines of sight R2and L2. In this case, the angles formed by the lines of sight R2and L2with respect to the roll axis w are smaller than the angles formed by the lines of sight R1and L1with respect to the roll axis w. The eye gaze sensor140transmits the detection results to the computer200.

When the computer200receives the detection values of the lines of sight R1and L1from the eye gaze sensor140as the detection results of the lines of sight, the computer200identifies a point of gaze N1being an intersection of both the lines of sight R1and L1based on the detection values. Meanwhile, when the computer200receives the detection values of the lines of sight R2and L2from the eye gaze sensor140, the computer200identifies an intersection of both the lines of sight R2and L2as the point of gaze. The computer200identifies a line of sight N0of the user5based on the identified point of gaze N1. The computer200detects, for example, an extension direction of a straight line that passes through the point of gaze N1and a midpoint of a straight line connecting a right eye R and a left eye L of the user5to each other as the line of sight N0. The line of sight N0is a direction in which the user5actually directs his or her lines of sight with both eyes. The line of sight N0corresponds to a direction in which the user5actually directs his or her lines of sight with respect to the field-of-view region15.

In at least one aspect, the system100includes a television broadcast reception tuner. With such a configuration, the system100is able to display a television program in the virtual space11.

In at least one aspect, the HMD system100includes a communication circuit for connecting to the Internet or has a verbal communication function for connecting to a telephone line or a cellular service.

[Field-of-view Region]

With reference toFIG. 6andFIG. 7, the field-of-view region15is described.FIG. 6is a diagram of a YZ cross section obtained by viewing the field-of-view region15from an X direction in the virtual space11.FIG. 7is a diagram of an XZ cross section obtained by viewing the field-of-view region15from a Y direction in the virtual space11.

InFIG. 6, the field-of-view region15in the YZ cross section includes a region18. The region18is defined by the position of the virtual camera14, the reference line of sight16, and the YZ cross section of the virtual space11. The processor210defines a range of a polar angle α from the reference line of sight16serving as the center in the virtual space as the region18.

InFIG. 7, the field-of-view region15in the XZ cross section includes a region19. The region19is defined by the position of the virtual camera14, the reference line of sight16, and the XZ cross section of the virtual space11. The processor210defines a range of an azimuth β from the reference line of sight16serving as the center in the virtual space11as the region19. The polar angle α and β are determined in accordance with the position of the virtual camera14and the inclination (direction) of the virtual camera14.

In at least one aspect, the system100causes the monitor130to display a field-of-view image17based on the signal from the computer200, to thereby provide the field of view in the virtual space11to the user5. The field-of-view image17corresponds to a part of the panorama image13, which corresponds to the field-of-view region15. When the user5moves the HMD120worn on his or her head, the virtual camera14is also moved in synchronization with the movement. As a result, the position of the field-of-view region15in the virtual space11is changed. With this, the field-of-view image17displayed on the monitor130is updated to an image of the panorama image13, which is superimposed on the field-of-view region15synchronized with a direction in which the user5faces in the virtual space11. The user5can visually recognize a desired direction in the virtual space11.

In this way, the inclination of the virtual camera14corresponds to the line of sight of the user5(reference line of sight16) in the virtual space11, and the position at which the virtual camera14is arranged corresponds to the point of view of the user5in the virtual space11. Therefore, through the change of the position or inclination of the virtual camera14, the image to be displayed on the monitor130is updated, and the field of view of the user5is moved.

While the user5is wearing the HMD120(having a non-transmissive monitor130), the user5can visually recognize only the panorama image13developed in the virtual space11without visually recognizing the real world. Therefore, the system100provides a high sense of immersion in the virtual space11to the user5.

In at least one aspect, the processor210moves the virtual camera14in the virtual space11in synchronization with the movement in the real space of the user5wearing the HMD120. In this case, the processor210identifies an image region to be projected on the monitor130of the HMD120(field-of-view region15) based on the position and the direction of the virtual camera14in the virtual space11.

In at least one aspect, the virtual camera14includes two virtual cameras, that is, a virtual camera for providing a right-eye image and a virtual camera for providing a left-eye image. An appropriate parallax is set for the two virtual cameras so that the user5is able to recognize the three-dimensional virtual space11. In at least one aspect, the virtual camera14is implemented by a single virtual camera. In this case, a right-eye image and a left-eye image may be generated from an image acquired by the single virtual camera. In at least one embodiment, the virtual camera14is assumed to include two virtual cameras, and the roll axes of the two virtual cameras are synthesized so that the generated roll axis (w) is adapted to the roll axis (w) of the HMD120.

[Controller]

An example of the controller300is described with reference toFIG. 8AandFIG. 8B.FIG. 8Ais a diagram of a schematic configuration of a controller according to at least one embodiment of this disclosure.FIG. 8Bis a diagram of a coordinate system to be set for a hand of a user holding the controller according to at least one embodiment of this disclosure.

In at least one aspect, the controller300includes a right controller300R and a left controller (not shown). InFIG. 8Aonly right controller300R is shown for the sake of clarity. The right controller300R is operable by the right hand of the user5. The left controller is operable by the left hand of the user5. In at least one aspect, the right controller300R and the left controller are symmetrically configured as separate devices. Therefore, the user5can freely move his or her right hand holding the right controller300R and his or her left hand holding the left controller. In at least one aspect, the controller300may be an integrated controller configured to receive an operation performed by both the right and left hands of the user5. The right controller300R is now described.

The right controller300R includes a grip310, a frame320, and a top surface330. The grip310is configured so as to be held by the right hand of the user5. For example, the grip310may be held by the palm and three fingers (e.g., middle finger, ring finger, and small finger) of the right hand of the user5.

The grip310includes buttons340and350and the motion sensor420. The button340is arranged on a side surface of the grip310, and receives an operation performed by, for example, the middle finger of the right hand. The button350is arranged on a front surface of the grip310, and receives an operation performed by, for example, the index finger of the right hand. In at least one aspect, the buttons340and350are configured as trigger type buttons. The motion sensor420is built into the casing of the grip310. When a motion of the user5can be detected from the surroundings of the user5by a camera or other device. In at least one embodiment, the grip310does not include the motion sensor420.

The frame320includes a plurality of infrared LEDs360arranged in a circumferential direction of the frame320. The infrared LEDs360emit, during execution of a program using the controller300, infrared rays in accordance with progress of the program. The infrared rays emitted from the infrared LEDs360are usable to independently detect the position and the posture (inclination and direction) of each of the right controller300R and the left controller. InFIG. 8A, the infrared LEDs360are shown as being arranged in two rows, but the number of arrangement rows is not limited to that illustrated inFIG. 8. In at least one embodiment, the infrared LEDs360are arranged in one row or in three or more rows. In at least one embodiment, the infrared LEDs360are arranged in a pattern other than rows.

The top surface330includes buttons370and380and an analog stick390. The buttons370and380are configured as push type buttons. The buttons370and380receive an operation performed by the thumb of the right hand of the user5. In at least one aspect, the analog stick390receives an operation performed in any direction of 360 degrees from an initial position (neutral position). The operation includes, for example, an operation for moving an object arranged in the virtual space11.

In at least one aspect, each of the right controller300R and the left controller includes a battery for driving the infrared ray LEDs360and other members. The battery includes, for example, a rechargeable battery, a button battery, a dry battery, but the battery is not limited thereto. In at least one aspect, the right controller300R and the left controller are connectable to, for example, a USB interface of the computer200. In at least one embodiment, the right controller300R and the left controller do not include a battery.

InFIG. 8AandFIG. 8B, for example, a yaw direction, a roll direction, and a pitch direction are defined with respect to the right hand of the user5. A direction of an extended thumb is defined as the yaw direction, a direction of an extended index finger is defined as the roll direction, and a direction perpendicular to a plane defined by the yaw-direction axis and the roll-direction axis when the user5extends his or her thumb and index finger is defined as the pitch direction.

[Hardware Configuration of Server]

With reference toFIG. 9, the server600in at least one embodiment is described.FIG. 9is a block diagram of a hardware configuration of the server600according to at least one embodiment of this disclosure. The server600includes a processor610, a memory620, a storage630, an input/output interface640, and a communication interface650. Each component is connected to a bus660. In at least one embodiment, at least one of the processor610, the memory620, the storage630, the input/output interface640or the communication interface650is part of a separate structure and communicates with other components of server600through a communication path other than the bus660.

The processor610executes a series of commands included in a program stored in the memory620or the storage630based on a signal transmitted to the server600or on satisfaction of a condition determined in advance. In at least one aspect, the processor610is implemented as a central processing unit (CPU), a graphics processing unit (GPU), a micro processing unit (MPU), a field-programmable gate array (FPGA), or other devices.

The memory620temporarily stores programs and data. The programs are loaded from, for example, the storage630. The data includes data input to the server600and data generated by the processor610. In at least one aspect, the memory620is implemented as a random access memory (RAM) or other volatile memories.

The storage630permanently stores programs and data. In at least one embodiment, the storage630stores programs and data for a period of time longer than the memory620, but not permanently. The storage630is implemented as, for example, a read-only memory (ROM), a hard disk device, a flash memory, or other non-volatile storage devices. The programs stored in the storage630include programs for providing a virtual space in the system100, simulation programs, game programs, user authentication programs, and programs for implementing communication to/from other computers200or servers600. The data stored in the storage630may include, for example, data and objects for defining the virtual space.

In at least one aspect, the storage630is implemented as a removable storage device like a memory card. In at least one aspect, a configuration that uses programs and data stored in an external storage device is used instead of the storage630built into the server600. With such a configuration, for example, in a situation in which a plurality of HMD systems100are used, for example, as in an amusement facility, the programs and the data are collectively updated.

The input/output interface640allows communication of signals to/from an input/output device. In at least one aspect, the input/output interface640is implemented with use of a USB, a DVI, an HDMI, or other terminals. The input/output interface640is not limited to the specific examples described above.

The communication interface650is connected to the network2to communicate to/from the computer200connected to the network2. In at least one aspect, the communication interface650is implemented as, for example, a LAN, other wired communication interfaces, Wi-Fi, Bluetooth, NFC, or other wireless communication interfaces. The communication interface650is not limited to the specific examples described above.

In at least one aspect, the processor610accesses the storage630and loads one or more programs stored in the storage630to the memory620to execute a series of commands included in the program. In at least one embodiment, the one or more programs include, for example, an operating system of the server600, an application program for providing a virtual space, and game software that can be executed in the virtual space. In at least one embodiment, the processor610transmits a signal for providing a virtual space to the HMD device110to the computer200via the input/output interface640.

[Control Device of HMD]

With reference toFIG. 10, the control device of the HMD120is described. According to at least one embodiment of this disclosure, the control device is implemented by the computer200having a known configuration.FIG. 10is a block diagram of the computer200according to at least one embodiment of this disclosure.FIG. 10includes a module configuration of the computer200.

InFIG. 10, the computer200includes a control module510, a rendering module520, a memory module530, and a communication control module540. In at least one aspect, the control module510and the rendering module520are implemented by the processor210. In at least one aspect, a plurality of processors210function as the control module510and the rendering module520. The memory module530is implemented by the memory220or the storage230. The communication control module540is implemented by the communication interface250.

The control module510controls the virtual space11provided to the user5. The control module510defines the virtual space11in the HMD system100using virtual space data representing the virtual space11. The virtual space data is stored in, for example, the memory module530. In at least one embodiment, the control module510generates virtual space data. In at least one embodiment, the control module510acquires virtual space data from, for example, the server600.

The control module510arranges objects in the virtual space11using object data representing objects. The object data is stored in, for example, the memory module530. In at least one embodiment, the control module510generates virtual space data. In at least one embodiment, the control module510acquires virtual space data from, for example, the server600. In at least one embodiment, the objects include, for example, an avatar object of the user5, character objects, operation objects, for example, a virtual hand to be operated by the controller300, and forests, mountains, other landscapes, streetscapes, or animals to be arranged in accordance with the progression of the story of the game.

The control module510arranges an avatar object of the user5of another computer200, which is connected via the network2, in the virtual space11. In at least one aspect, the control module510arranges an avatar object of the user5in the virtual space11. In at least one aspect, the control module510arranges an avatar object simulating the user5in the virtual space11based on an image including the user5. In at least one aspect, the control module510arranges an avatar object in the virtual space11, which is selected by the user5from among a plurality of types of avatar objects (e.g., objects simulating animals or objects of deformed humans).

The control module510identifies an inclination of the HMD120based on output of the HMD sensor410. In at least one aspect, the control module510identifies an inclination of the HMD120based on output of the sensor190functioning as a motion sensor. The control module510detects parts (e.g., mouth, eyes, and eyebrows) forming the face of the user5from a face image of the user5generated by the first camera150and the second camera160. The control module510detects a motion (shape) of each detected part.

The control module510detects a line of sight of the user5in the virtual space11based on a signal from the eye gaze sensor140. The control module510detects a point-of-view position (coordinate values in the XYZ coordinate system) at which the detected line of sight of the user5and the celestial sphere of the virtual space11intersect with each other. More specifically, the control module510detects the point-of-view position based on the line of sight of the user5defined in the uvw coordinate system and the position and the inclination of the virtual camera14. The control module510transmits the detected point-of-view position to the server600. In at least one aspect, the control module510is configured to transmit line-of-sight information representing the line of sight of the user5to the server600. In such a case, the control module510may calculate the point-of-view position based on the line-of-sight information received by the server600.

The control module510translates a motion of the HMD120, which is detected by the HMD sensor410, in an avatar object. For example, the control module510detects inclination of the HMD120, and arranges the avatar object in an inclined manner. The control module510translates the detected motion of face parts in a face of the avatar object arranged in the virtual space11. The control module510receives line-of-sight information of another user5from the server600, and translates the line-of-sight information in the line of sight of the avatar object of another user5. In at least one aspect, the control module510translates a motion of the controller300in an avatar object and an operation object. In this case, the controller300includes, for example, a motion sensor, an acceleration sensor, or a plurality of light emitting elements (e.g., infrared LEDs) for detecting a motion of the controller300.

The control module510arranges, in the virtual space11, an operation object for receiving an operation by the user5in the virtual space11. The user5operates the operation object to, for example, operate an object arranged in the virtual space11. In at least one aspect, the operation object includes, for example, a hand object serving as a virtual hand corresponding to a hand of the user5. In at least one aspect, the control module510moves the hand object in the virtual space11so that the hand object moves in association with a motion of the hand of the user5in the real space based on output of the motion sensor420. In at least one aspect, the operation object may correspond to a hand part of an avatar object.

When one object arranged in the virtual space11collides with another object, the control module510detects the collision. The control module510is able to detect, for example, a timing at which a collision area of one object and a collision area of another object have touched with each other, and performs predetermined processing in response to the detected timing. In at least one embodiment, the control module510detects a timing at which an object and another object, which have been in contact with each other, have moved away from each other, and performs predetermined processing in response to the detected timing. In at least one embodiment, the control module510detects a state in which an object and another object are in contact with each other. For example, when an operation object touches another object, the control module510detects the fact that the operation object has touched the other object, and performs predetermined processing.

In at least one aspect, the control module510controls image display of the HMD120on the monitor130. For example, the control module510arranges the virtual camera14in the virtual space11. The control module510controls the position of the virtual camera14and the inclination (direction) of the virtual camera14in the virtual space11. The control module510defines the field-of-view region15depending on an inclination of the head of the user5wearing the HMD120and the position of the virtual camera14. The rendering module520generates the field-of-view region17to be displayed on the monitor130based on the determined field-of-view region15. The communication control module540outputs the field-of-view region17generated by the rendering module520to the HMD120.

The control module510, which has detected an utterance of the user5using the microphone170from the HMD120, identifies the computer200to which voice data corresponding to the utterance is to be transmitted. The voice data is transmitted to the computer200identified by the control module510. The control module510, which has received voice data from the computer200of another user via the network2, outputs audio information (utterances) corresponding to the voice data from the speaker180.

The memory module530holds data to be used to provide the virtual space11to the user5by the computer200. In at least one aspect, the memory module530stores space information, object information, and user information.

The space information stores one or more templates defined to provide the virtual space11.

The object information stores a plurality of panorama images13forming the virtual space11and object data for arranging objects in the virtual space11. In at least one embodiment, the panorama image13contains a still image and/or a moving image. In at least one embodiment, the panorama image13contains an image in a non-real space and/or an image in the real space. An example of the image in a non-real space is an image generated by computer graphics.

The user information stores a user ID for identifying the user5. The user ID is, for example, an internet protocol (IP) address or a media access control (MAC) address set to the computer200used by the user. In at least one aspect, the user ID is set by the user. The user information stores, for example, a program for causing the computer200to function as the control device of the HMD system100.

The data and programs stored in the memory module530are input by the user5of the HMD120. Alternatively, the processor210downloads the programs or data from a computer (e.g., server600) that is managed by a business operator providing the content, and stores the downloaded programs or data in the memory module530.

In at least one embodiment, the communication control module540communicates to/from the server600or other information communication devices via the network2.

In at least one aspect, the control module510and the rendering module520are implemented with use of, for example, Unity® provided by Unity Technologies. In at least one aspect, the control module510and the rendering module520are implemented by combining the circuit elements for implementing each step of processing.

The processing performed in the computer200is implemented by hardware and software executed by the processor410. In at least one embodiment, the software is stored in advance on a hard disk or other memory module530. In at least one embodiment, the software is stored on a CD-ROM or other computer-readable non-volatile data recording media, and distributed as a program product. In at least one embodiment, the software may is provided as a program product that is downloadable by an information provider connected to the Internet or other networks. Such software is read from the data recording medium by an optical disc drive device or other data reading devices, or is downloaded from the server600or other computers via the communication control module540and then temporarily stored in a storage module. The software is read from the storage module by the processor210, and is stored in a RAM in a format of an executable program. The processor210executes the program.

[Control Structure of HMD System]

With reference toFIG. 11, the control structure of the HMD set110is described.FIG. 11is a sequence chart of processing to be executed by the system100according to at least one embodiment of this disclosure.

InFIG. 11, in Step S1110, the processor210of the computer200serves as the control module510to identify virtual space data and define the virtual space11.

In Step S1120, the processor210initializes the virtual camera14. For example, in a work area of the memory, the processor210arranges the virtual camera14at the center12defined in advance in the virtual space11, and matches the line of sight of the virtual camera14with the direction in which the user5faces.

In Step S1130, the processor210serves as the rendering module520to generate field-of-view image data for displaying an initial field-of-view image. The generated field-of-view image data is output to the HMD120by the communication control module540.

In Step S1132, the monitor130of the HMD120displays the field-of-view image based on the field-of-view image data received from the computer200. The user5wearing the HMD120is able to recognize the virtual space11through visual recognition of the field-of-view image.

In Step S1134, the HMD sensor410detects the position and the inclination of the HMD120based on a plurality of infrared rays emitted from the HMD120. The detection results are output to the computer200as motion detection data.

In Step S1140, the processor210identifies a field-of-view direction of the user5wearing the HMD120based on the position and inclination contained in the motion detection data of the HMD120.

In Step S1150, the processor210executes an application program, and arranges an object in the virtual space11based on a command contained in the application program.

In Step S1160, the controller300detects an operation by the user5based on a signal output from the motion sensor420, and outputs detection data representing the detected operation to the computer200. In at least one aspect, an operation of the controller300by the user5is detected based on an image from a camera arranged around the user5.

In Step S1170, the processor210detects an operation of the controller300by the user5based on the detection data acquired from the controller300.

In Step S1180, the processor210generates field-of-view image data based on the operation of the controller300by the user5. The communication control module540outputs the generated field-of-view image data to the HMD120.

In Step S1190, the HMD120updates a field-of-view image based on the received field-of-view image data, and displays the updated field-of-view image on the monitor130.

[Avatar Object]

With reference toFIG. 12AandFIG. 12B, an avatar object according to at least one embodiment is described.FIG. 12andFIG. 12Bare diagrams of avatar objects of respective users5of the HMD sets110A and110B. In the following, the user of the HMD set110A, the user of the HMD set110B, the user of the HMD set110C, and the user of the HMD set110D are referred to as “user5A”, “user5B”, “user5C”, and “user5D”, respectively. A reference numeral of each component related to the HMD set110A, a reference numeral of each component related to the HMD set110B, a reference numeral of each component related to the HMD set110C, and a reference numeral of each component related to the HMD set110D are appended by A, B, C, and D, respectively. For example, the HMD120A is included in the HMD set110A.

FIG. 12Ais a schematic diagram of HMD systems of several users sharing the virtual space interact using a network according to at least one embodiment of this disclosure. Each HMD120provides the user5with the virtual space11. Computers200A to200D provide the users5A to5D with virtual spaces11A to11D via HMDs120A to120D, respectively. InFIG. 12A, the virtual space11A and the virtual space11B are formed by the same data. In other words, the computer200A and the computer200B share the same virtual space. An avatar object6A of the user5A and an avatar object6B of the user5B are present in the virtual space11A and the virtual space11B. The avatar object6A in the virtual space11A and the avatar object6B in the virtual space11B each wear the HMD120. However, the inclusion of the HMD120A and HMD120B is only for the sake of simplicity of description, and the avatars do not wear the HMD120A and HMD120B in the virtual spaces11A and11B, respectively.

In at least one aspect, the processor210A arranges a virtual camera14A for photographing a field-of-view region17A of the user5A at the position of eyes of the avatar object6A.

FIG. 12Bis a diagram of a field of view of a HMD according to at least one embodiment of this disclosure.FIG. 12(B) corresponds to the field-of-view region17A of the user5A inFIG. 12A. The field-of-view region17A is an image displayed on a monitor130A of the HMD120A. This field-of-view region17A is an image generated by the virtual camera14A. The avatar object6B of the user5B is displayed in the field-of-view region17A. Although not included inFIG. 12B, the avatar object6A of the user5A is displayed in the field-of-view image of the user5B.

In the arrangement inFIG. 12B, the user5A can communicate to/from the user5B via the virtual space11A through conversation. More specifically, voices of the user5A acquired by a microphone170A are transmitted to the HMD120B of the user5B via the server600and output from a speaker180B provided on the HMD120B. Voices of the user5B are transmitted to the HMD120A of the user5A via the server600, and output from a speaker180A provided on the HMD120A.

The processor210A translates an operation by the user5B (operation of HMD120B and operation of controller300B) in the avatar object6B arranged in the virtual space11A. With this, the user5A is able to recognize the operation by the user5B through the avatar object6B.

FIG. 13is a sequence chart of processing to be executed by the system100according to at least one embodiment of this disclosure. InFIG. 13, although the HMD set110D is not included, the HMD set110D operates in a similar manner as the HMD sets110A,110B, and110C. Also in the following description, a reference numeral of each component related to the HMD set110A, a reference numeral of each component related to the HMD set110B, a reference numeral of each component related to the HMD set110C, and a reference numeral of each component related to the HMD set110D are appended by A, B, C, and D, respectively.

In Step S1310A, the processor210A of the HMD set110A acquires avatar information for determining a motion of the avatar object6A in the virtual space11A. This avatar information contains information on an avatar such as motion information, face tracking data, and sound data. The motion information contains, for example, information on a temporal change in position and inclination of the HMD120A and information on a motion of the hand of the user5A, which is detected by, for example, a motion sensor420A. An example of the face tracking data is data identifying the position and size of each part of the face of the user5A. Another example of the face tracking data is data representing motions of parts forming the face of the user5A and line-of-sight data. An example of the sound data is data representing sounds of the user5A acquired by the microphone170A of the HMD120A. In at least one embodiment, the avatar information contains information identifying the avatar object6A or the user5A associated with the avatar object6A or information identifying the virtual space11A accommodating the avatar object6A. An example of the information identifying the avatar object6A or the user5A is a user ID. An example of the information identifying the virtual space11A accommodating the avatar object6A is a room ID. The processor210A transmits the avatar information acquired as described above to the server600via the network2.

In Step S1310B, the processor210B of the HMD set110B acquires avatar information for determining a motion of the avatar object6B in the virtual space11B, and transmits the avatar information to the server600, similarly to the processing of Step S1310A. Similarly, in Step S1310C, the processor210C of the HMD set110C acquires avatar information for determining a motion of the avatar object6C in the virtual space11C, and transmits the avatar information to the server600.

In Step S1320, the server600temporarily stores pieces of player information received from the HMD set110A, the HMD set110B, and the HMD set110C, respectively. The server600integrates pieces of avatar information of all the users (in this example, users5A to5C) associated with the common virtual space11based on, for example, the user IDs and room IDs contained in respective pieces of avatar information. Then, the server600transmits the integrated pieces of avatar information to all the users associated with the virtual space11at a timing determined in advance. In this manner, synchronization processing is executed. Such synchronization processing enables the HMD set110A, the HMD set110B, and the HMD120C to share mutual avatar information at substantially the same timing.

Next, the HMD sets110A to110C execute processing of Step S1330A to Step S1330C, respectively, based on the integrated pieces of avatar information transmitted from the server600to the HMD sets110A to110C. The processing of Step S1330A corresponds to the processing of Step S1180ofFIG. 11.

In Step S1330A, the processor210A of the HMD set110A updates information on the avatar object6B and the avatar object6C of the other users5B and5C in the virtual space11A. Specifically, the processor210A updates, for example, the position and direction of the avatar object6B in the virtual space11based on motion information contained in the avatar information transmitted from the HMD set110B. For example, the processor210A updates the information (e.g., position and direction) on the avatar object6B contained in the object information stored in the memory module530. Similarly, the processor210A updates the information (e.g., position and direction) on the avatar object6C in the virtual space11based on motion information contained in the avatar information transmitted from the HMD set110C.

In Step S1330B, similarly to the processing of Step S1330A, the processor210B of the HMD set110B updates information on the avatar object6A and the avatar object6C of the users5A and5C in the virtual space11B. Similarly, in Step S1330C, the processor210C of the HMD set110C updates information on the avatar object6A and the avatar object6B of the users5A and5B in the virtual space11C.

[Details of Module Configuration]

With reference toFIG. 14, details of a module configuration of the computer200are described.FIG. 14is a block diagram of a configuration of modules of the computer according to at least one embodiment of this disclosure.

InFIG. 14, the control module510includes a virtual camera control module1421, a field-of-view region determination module1422, a reference-line-of-sight identification module1423, a virtual space definition module1424, a virtual object generation module1425, a line-of-sight detection module1426, a synchronization module1427, a chat control module1428, and a sound control module1429. The rendering module520includes a field-of-view image generation module1439. The memory module530stores space information1431, object information1432, and user information1433.

In at least one aspect, the control module510controls display of an image on the monitor130of the HMD120. The virtual camera control module1421arranges the virtual camera14in the virtual space11, and controls, for example, the behavior and direction of the virtual camera14. The field-of-view region determination module1422defines the field-of-view region15in accordance with the direction of the head of the user5wearing the HMD120. The field-of-view image generation module1439generates a field-of-view image to be displayed on the monitor130based on the determined field-of-view region15. Further, the field-of-view image generation module1439generates a field-of-view image based on data received from the control module510. Data on the field-of-view image generated by the field-of-view image generation module1439is output to the HMD120by the communication control module540. The reference-line-of-sight identification module1423identifies the line of sight of the user5based on the signal from the eye gaze sensor140.

The sound control module1429detects, from the HMD120, input of a sound signal that is based on utterance of the user5into the computer200. The sound control module1429assigns the sound signal corresponding to the utterance with an input time of the utterance to generate sound data. The sound control module1429transmits the sound data to a computer used by a user who is selected by the user5among the other computers200B and200C in the state of being capable of communicating to/from the computer200as chat partners of the user5.

The control module510controls the virtual space11to be provided to the user5. First, the virtual space definition module1424generates virtual space data representing the virtual space11, to thereby define the virtual space11in the HMD set110.

The virtual object generation module1425generates data on objects to be arranged in the virtual space11. For example, the virtual object generation module1425generates data on avatar objects representing the respective other users5B and5C, who are to chat with the user5via the virtual space11. Further, the virtual object generation module1425may change the line of sight of the avatar object of the user based on the lines of sights detected in response to utterance of the other users5B and5C.

The line-of-sight detection module1426detects the line of sight of the user5based on output from the eye gaze sensor140. In at least one aspect, the line-of-sight detection module1426detects the line of sight of the user5at the time of utterance of the user5when such utterance is detected. Detection of the line of sight is implemented by a known technology, for example, non-contact eye tracking. As an example, as in the case of the limbus tracking method, the eye gaze sensor140may detect motion of the line of sight of the user5based on data obtained by radiating an infrared ray to eyes of the user5and photographing the reflected light with a camera (now shown). In at least one aspect, the line-of-sight detection module1426identifies each position that depends on motion of the line of sight of the user5as coordinate values (x, y) with a certain position on a display region of the monitor130serving as a reference point.

The synchronization module1427implements synchronization of sound and video when communication is performed via the virtual space11. For example, in at least one embodiment of this disclosure, when data (eye tracking data) representing an eye detection result and sound data that are acquired at the same timing by another computer200B reach the computer200at different timings, the synchronization module1427synchronizes the timing of outputting sound and the timing of outputting data of the avatar object so that a change (e.g., movement of line of sight and change of posture) of the avatar object and output of sound are performed at the same timing.

For example, in at least one aspect, sound data transmitted by another computer200B (namely, same chat partner) arrives at the computer200before arrival of eye tracking data transmitted by that computer200B. In this case, the synchronization module1427temporarily stores the sound data into a work area of the memory module230, and waits to output sound until receiving eye tracking data.

In contrast, the eye tracking data transmitted by the computer200B may arrive at the computer200before arrival of the sound data. In this case, the synchronization module1427generates image data for presenting an avatar object whose line of sight has been changed based on the eye tracking data, temporarily stores the image data into a work area of the memory module230, and waits to output the image data until receiving sound data. When the synchronization module1427detects reception of sound data, the synchronization module1427reads the image data from the memory module230for output to the HMD120, and also outputs the sound data to the speaker180via the sound control module1429.

The chat control module1428controls communication via the virtual space. In at least one aspect, the chat control module1428reads a chat application from the memory module230based on operation by the user5or a request for starting a chat transmitted by another computer200B, to thereby start communication via the virtual space11. When the user5inputs a user ID and a password into the computer200to perform a login operation, the user5is associated with a session (also referred to as “room”) of a chat as one member of the chat via the virtual space11. After that, when the user5B using the computer200B logs in to the chat of the session, the user5and the user5B are associated with each other as members of the chat. When the chat control module1428identifies a user5B of the computer200B, who is to be a communication partner of the computer200, the virtual object generation module1425uses the object information1432to generate data for presenting an avatar object corresponding to the user5B, and outputs the data to the HMD120. When the HMD120displays the avatar object corresponding to the user5B on the monitor130based on the data, the user5wearing the HMD120recognizes the avatar object in the virtual space11.

In at least one embodiment of this disclosure, the chat control module1428waits for input of sound data that is based on utterance of the user5and input of data from the eye gaze sensor140. When the user5performs an operation (e.g., operation of controller, gesture, selection by voice, or gaze by line of sight) for selecting an avatar object in the virtual space11, the chat control module1428, based on the operation, detects the fact that the user (e.g., user5) corresponding to the avatar object is selected as the chat partner. When the chat control module1428detects utterance of the user5, the chat control module1428transmits sound data that is based on a signal transmitted by the microphone170and eye tracking data that is based on a signal transmitted by the eye gaze sensor140to the computer200B via the communication control module540based on a network address of the computer200B used by the user5B. The computer200B updates the line of sight of the avatar object of the user5based on the eye tracking data, and transmits the sound data to the HMD120B. When the computer200B has a synchronization function, the line of sight of the avatar object is changed on the monitor130and sound is output from the speaker180substantially at the same timing, and thus the user5B is less likely to feel strange.

The space information1431stores one or more templates that are defined to provide the virtual space11.

The object information1432stores data for displaying an aviator object to be used for communication via the virtual space11, content to be reproduced in the virtual space11and information for arranging an object to be used in the content. The content may include, for example, game content and content representing landscapes that resemble those of the real world. The data for displaying an avatar object may contain, for example, image data schematically representing a communication partner who is established as a chat partner in advance, and a photo of the communication partner.

The user information1433stores, for example, a program for causing the computer200to function as a control device for the HMD set110, an application program that uses each piece of content stored in the object information1432, and a user ID and a password that are required to execute the application program. The data and programs stored in the memory module530are input by the user5of the HMD120. Alternatively, the processor210downloads programs or data from a computer (e.g., server600) that is managed by a business operator providing the content, and stores the downloaded programs or data into the memory module530.

[Operation Between Computers Through Communication Between Two Users]

Now, a description is given of operation of the computers200and200B at the time when the two users5and5B communicate to/from each other via the virtual space11. In the following, a description is given of a case in which the user5B wearing the HMD120B connected to the computer200B utters sound toward the user5wearing the HMD120connected to the computer200.

(Transmission Side)

In at least one aspect, the user5B wearing the HMD120B utters sound toward the microphone170in order to chat with the user5. The sound signal of the utterance is transmitted to the computer200B connected to the HMD120B. The sound control module1429converts the sound signal into sound data, and associates a timestamp representing the time of detection of the utterance with the sound data. The timestamp is, for example, time data of an internal clock of the processor210. In at least one aspect, time data on a time when the communication control module540converts the sound signal into sound data is used as the timestamp.

When the user5B is uttering sound, motion of the line of sight of the user5B is detected by the eye gaze sensor140. The result (eye tracking data) of detection by the eye gaze sensor140is transmitted to the computer200B. The line-of-sight detection module1426identifies each position (e.g., position of pupil) representing a change in line of sight of the user5B based on the detection result.

The computer200B transmits the sound data and the eye tracking data to the computer200. The sound data and the eye tracking data are first transmitted to the server600. The server600refers to a destination of each header of the sound data and the eye tracking data, and transmits the sound data and the eye tracking data to the computer200. At this time, the sound data and the eye tracking data may arrive at the computer200at different times.

(Reception Side)

The computer200receives the data transmitted by the computer200B from the server600. In at least one aspect, the processor210of the computer200detects reception of the sound data based on the data transmitted by the communication control module540. When the processor210identifies the transmission source (i.e., computer200B) of the sound data, the processor210serves as the chat control module1428to cause a chat screen to be displayed on the monitor130of the HMD120.

The processor210further detects reception of the eye tracking data. When the processor210identifies a transmission source (i.e., computer200B) of the eye tracking data, the processor210serves as the virtual object generation module1425to generate data for displaying the avatar object of the user5B.

The synchronization module1427synchronizes timings of outputting the sound data and eye tracking data that are received from the computer200B. For example, the synchronization module1427compares a transmission source identification number and time data contained in the sound data with a transmission source identification number and time data contained in the eye tracking data. When those pieces of data match each other, the synchronization module1427determines that the sound data and the eye tracking data are transmitted by the same computer200B, and outputs data for displaying an avatar object and the sound data at the same timing to the HMD120.

In at least one aspect, the processor210receives eye tracking data before reception of sound data. In this case, when detecting the transmission source identification number from the eye tracking data, the processor210determines that there is sound data transmitted in association with the eye tracking data. The processor210waits to output data for displaying an avatar object until the processor210receives sound data containing the same transmission source identification number and time data as the transmission source identification number and time data contained in the eye tracking data.

Further, in at least one aspect, the processor210receives sound data before reception of eye tracking data. In this case, when detecting the transmission source identification number from the sound data, the processor210determines that there is eye tracking data transmitted in association with the sound data. The processor210waits to output the sound data until the processor210receives eye tracking data containing the same transmission source identification number and time data as the transmission source identification number and time data contained in the sound data.

In at least aspect described above, pieces of time data to be compared do not completely indicate the same time, i.e., the detection of the utterance and the eye tracking within a threshold time period of each other are considered to be at the same time in at least one embodiment.

When confirming reception of sound data and eye tracking data containing the same time data, the processor210outputs the sound data to the speaker180, and outputs, to the monitor130, data for displaying an avatar object in which the change that is based on the eye tracking data is translated. As a result, the user5is able to recognize the sound uttered by the user5B and the avatar at the same timing, and is thus able to enjoy a chat without feeling a time lag (e.g., deviation between change in avatar object and timing of outputting sound) due to delay of signal transmission.

In the same manner as in the processing described above, the processor210of the computer200B used by the user5B is also able to synchronize the timing of outputting sound data and the timing of outputting an avatar object in which the movement of the line of sight of the user5is translated. As a result, the user5B is also able to recognize output of the sound uttered by the user5and the change in avatar object at the same timing, and is thus able to enjoy a chat without feeling a time lag due to delay of signal transmission.

[Outline of Chat]

Next, a description is given of an outline of a chat via the virtual space, which is performed in accordance with at least one embodiment, with reference toFIG. 15.FIG. 15is a conceptual diagram of one mode of representation of the respective virtual spaces11presented by the computers200,200B, and200C according to at least one embodiment of this disclosure.

InFIG. 15, each of the computers200,200B, and200C is able to communicate to/from the server600via the network2. The computers200,200B, and200C provide panorama images13,13B, and13C via the connected HMDs120,120B, and120C, respectively. The panorama images13,13B, and13C present the avatar objects6,6B, and6C corresponding to respective users of the computers200,200B, and200C, respectively.

For example, the avatar objects6,6B, and6C correspond to the users5,5B, and5C, respectively. For example, the avatar objects6B and6C are presented as communication partners of the user5in the panorama image13visually recognized by user5. The avatar objects6and6C are displayed as communication partners of the user5B in the panorama image13B visually recognized by the user5B. The avatar objects6and6B are displayed as communication partners of the user5C in the panorama image13C visually recognized by the user5C.

The HMDs120,120B, and120C transmit pieces of motion detection data corresponding to the positions and inclinations of the users5,5B, and5C to the server600via the computers200,200B, and200C, respectively. The motion detection data may contain eye tracking data. The server600transmits the motion detection data received from the HMD120to the HMDs120B and120C. The HMDs120B and120C change the mode (e.g., position and inclination of avatar object) of display of an avatar object, which is a chat partner presented in the virtual space11, in accordance with the motion detection data.

In at least one aspect, the HMDs120,120B, and120C transmit pieces of sound data corresponding to utterance of the users5,5B, and5C to the server600, respectively. The server600transmits, for example, the sound data and eye tracking data received from the HMD120to the computers200B and200C. The computers200B and200C change the mode (e.g., direction of eyes and head) of display of the avatar object in accordance with the eye tracking data. The HMDs120B and120C output sound that is based on the sound data from the speakers180.

In this manner, when the user5wearing the HMD120moves the eyes and utters sound, the mode of display of the avatar object corresponding to the user5is changed in the virtual space11presented by the other HMDs120B and120C in the state of being capable of communicating to/from the HMD120, and sound is output from the speaker180. The timing of changing the display mode and the timing of outputting sound are synchronized, and thus in communication via the virtual space11, each communication partner performs communication using sound and an avatar object without feeling strange.

[Synchronization]

Now, with reference toFIG. 16, a description is given of synchronization in the system100according to at least one embodiment of this disclosure.FIG. 16is a timing chart of a mode of synchronization between the sound data and the eye tracking data according to at least one embodiment of this disclosure. In at least one embodiment of this disclosure, delay of signal transmission from the HMD120B (or computer200B to which HMD120B is connected) to the computer200to which the HMD120is connected is described. In at least one aspect, the computer200connected to the HMD120and the computer200B connected to the HMD120B are executing a chat application for communicating to/from each other via the virtual space11.

When the user5B utters sound at a time t(0), the computer200B detects the line of sight of the user5B at that time, and converts content of the utterance into sound data. The result of detecting the line of sight contains, for example, eye tracking data. The eye tracking data contains a plurality of data records acquired within a predetermined period of time. Each data record contains an x coordinate value and a y coordinate value of a viewpoint, data on a time at which each coordinate value is acquired, and an identification number of a transmission source (for example, HMD120B) of the data record. The computer200B transmits the sound data and the eye tracking data to the computer200based on a destination (e.g., user5) designated by the user5B at the time of executing the chat application.

Deviation may occur between arrival of the sound data and arrival of the eye tracking data. For example, in at least one aspect, as shown in a graph1610, the eye tracking data arrives at the computer200before arrival of the sound data shown in a graph1620.

As an example, at a time t(1), the eye tracking data (graph1610) has arrived at the computer200, but the sound data (graph1620) has not arrived at the computer200. Therefore, the computer200stores data on the avatar object changed based on the eye tracking data into an internal volatile memory without outputting the data to the HMD120.

At a time t(2), the sound data arrives at the computer200. The computer200determines whether or not the time data and the identification number of the transmission source contained in the sound data match the time data and identification number of the transmission source contained in the eye tracking data received at the time t(1). When those pieces of information match each other, at a time t(3), the computer200outputs, to the HMD120, data for displaying the avatar object changed based on the eye tracking data and the sound data. The HMD120displays an avatar object on the monitor130based on the data, and outputs sound based on the sound data from the speaker180. The user5wearing the HMD120recognizes the change in line of sight translated in the avatar object, and recognizes content of the utterance by the user5B. At this time, the change in mode of display of the avatar object and the output of sound are synchronized, and thus the user5does not feel strange.

[Algorithm]

In the following, a description is given of an algorithm for implementing operation of the system100according to at least one embodiment of this disclosure.

[Control Structure]

First, a description is given of a control structure in the system100with reference toFIG. 17.FIG. 17is a flowchart of processing to be executed by a first HMD (e.g., HMD120) and a second HMD (e.g., HMD120B) according to at least one embodiment of this disclosure.

In Step S1210, the computer200, which is connected to the HMD120, connects to the server600to start a chat via the virtual space11based on operation of the user5. In Step S1215, the computer200B, which is connected to the HMD120B, connects to the server600to start a chat via the virtual space11based on operation of the user5B.

In Step S1220, the processor210of the computer200serves as the virtual space definition module1424to define the virtual space11. After that, the processor210serves as the chat control module1428to start communication to/from the computer200B. In Step S1225, the processor210of the computer200B serves as the virtual space definition module1424to define the virtual space11. After that, the processor210serves as the chat control module1428to start communication to/from the computer200.

In Step S1230, the processor210detects motion of the line of sight of the user5based on a signal output from the eye gaze sensor140. The detection result is, for example, eye tracking data. The processor210transmits the eye tracking data to a chat partner, namely, the HMD120B. In Step S1235, the processor210of the computer200B detects motion of the line of sight of the user5B based on a signal output from the eye gaze sensor140. The detection result is, for example, eye tracking data. The processor210transmits the eye tracking data to a chat partner, namely, the HMD120.

In Step S1240, the processor210receives utterance of the user5based on reception of a signal output from the microphone170, and transmits sound data based on the signal to the chat partner (HMD120B). In Step S1245, the processor210of the computer200B receives utterance of the user5B based on reception of a signal output from the microphone170, and transmits sound data based on the signal to the chat partner (HMD120).

In Step S1250, the processor210receives the result of detecting motion of the line of sight of the user5B wearing the HMD120B from the computer200B. In Step S1255, the processor210of the computer200B receives the result of detecting motion of the line of sight of the user5wearing the HMD120from the computer200.

In Step S1260, the processor210receives sound data based on utterance of the user5B from the computer200B. In Step S1265, the processor210of the computer200B receives sound data based on utterance of the user5from the computer200.

In Step S1270, the processor210translates the detection result(eye tracking data) in the avatar object to generate data on the avatar object in which the change in line of sight of the user5B is translated. In Step S1275, the processor210of the computer200B translates the detection result(eye tracking data) in the avatar object to generate data on the avatar object in which the change in line of sight of the user5is translated.

In Step S1280, the processor210performs processing of synchronizing the timing of presenting the avatar object to the HMD120and the timing of outputting the sound from the speaker180. For example, in at least one aspect, when the computer200receives eye tracking data acquired by the HMD120B at a certain timing from the computer200B before reception of sound data, the processor210waits to output data on the avatar object generated in Step S1270until reception of the sound data. When the processor210receives the sound data, the processor210determines whether or not the transmission source of the sound data and the timing of acquiring the sound data are the same as the transmission source of the eye tracking data and the timing of acquiring the eye tracking data. When those transmission sources and acquisition timings are the same, the processor210switches the processing to Step S1290.

In Step S1285, the processor210of the computer200B performs processing of synchronizing the timing of presenting the avatar object to the HMD120B and the timing of outputting the sound from the speaker180B. For example, in at least one aspect, when the computer200B receives eye tracking data acquired by the HMD120at a certain timing from the computer200before reception of sound data, the processor210waits to output data on the avatar object generated in Step S1275until reception of the sound data. When the processor210receives the sound data, the processor210determines whether or not the transmission source of the sound data and the timing of acquiring the sound data are the same as the transmission source of the eye tracking data and the timing of acquiring the eye tracking data. When those transmission sources are the same, the processor210switches the processing to Step S1295.

In Step S1290, the processor210outputs the data on the avatar object and the sound data at the same timing, and the user5wearing the HMD120recognizes that the direction of the line of sight of the avatar object displayed on the monitor130has changed. At the same time, the user5is able to hear the utterance of the user5B.

In Step S1295, the processor210of the computer200B outputs the data on the avatar object and the sound data at the same timing, and the user5B wearing the HMD120B recognizes that the direction of the line of sight of the avatar object displayed on the monitor130has changed. At the same time, the user5B is able to hear the utterance of the user5.

In at least one aspect, the computer200receives sound data acquired by the HMD120B at a certain timing from the computer200B before reception of eye tracking data. In this case, the processing of Step S1260is performed before the processing of Step S1250. The processor210waits to output the sound data until reception of the eye tracking data. When the processor210receives the eye tracking data, the processor210determines whether or not the transmission source of the eye tracking data and the timing of acquiring the eye tracking data are the same as the transmission source of the sound data and the timing of acquiring the sound data. When those transmission sources are the same, the processor210outputs the data on the avatar object and the sound data at the same timing after reception of the eye tracking data. In this case too, the user5wearing the HMD120recognizes that the direction of the line of sight of the avatar object displayed on the monitor130has changed. At the same time, the user5is able to hear the utterance of the user5B.

In at least one aspect, the computer200B receives sound data acquired by the HMD120at a certain timing from the computer200before reception of eye tracking data. In this case, the processing of Step S1265is performed before the processing of Step S1255. The processor210waits to output the sound data until reception of the eye tracking data. When the processor210receives the eye tracking data, the processor210determines whether or not the transmission source of the eye tracking data and the transmission source of the sound data are the same. When those transmission sources are the same, the processor210outputs the data on the avatar object and the sound data at the same timing after reception of the eye tracking data. In this case too, the user5B wearing the HMD120B recognizes that the direction of the line of sight of the avatar object displayed on the monitor130has changed. At the same time, the user5B is able to hear the utterance of the user5.

[Data Structure]

Next, a description is given of a structure of data to be transferred in the system100according to at least one embodiment of this disclosure with reference toFIG. 18AtoFIG. 18D.FIG. 18Arepresents eye tracking data acquired by the HMD120used by the user5according to at least one embodiment of this disclosure.FIG. 18Brepresents sound data that is based on utterance of the user5.FIG. 18Crepresents eye tracking data acquired by the HMD120B used by the user5B.FIG. 18Drepresents sound data that is based on utterance of the user5B.

Referring toFIG. 18A, eye tracking data1810contains a user ID, an x coordinate value, a y coordinate value, an elapsed time, and a data ID. The user ID identifies a user whose line of sight is detected, that is, a transmission source of the eye tracking data1810. The x coordinate value represents an x coordinate value of a center coordinate (pupil center point) of a pupil of the user5at the time when the line of sight is detected. The y coordinate value represents a y coordinate value of the center coordinate of the pupil of the user5at the time when the line of sight is detected. The elapsed time represents a time that has elapsed since start of a chat. During a chat, the line of sight is detected continuously, for example, periodically, and thus the elapsed time identifies a timing at which the line of sight is detected. Instead of the elapsed time, an actual time in the real space, for example, a clock of the computer200or time information contained in a positioning signal may be used. The data ID identifies data acquired at each timing.

Referring toFIG. 18B, the sound data1820contains a user ID, sound data, an elapsed time, and a data ID. The user ID identifies a user whose line of sight is detected, that is, a transmission source of the sound data1820. The sound data is digital sound data generated from utterance of the user5. The elapsed time represents a time that has elapsed since start of a chat in the same manner as in the eye tracking data1810. The data ID identifies data acquired at each timing.

Referring toFIG. 18C, the eye tracking data1830has the same structure as that of the eye tracking data1810shown inFIG. 18A. Thus, a description of the same structure is not repeated here.

Referring toFIG. 18D, the sound data1840has the same structure as that of the sound data1820shown inFIG. 18B. Thus, a description of the same structure is not repeated here.

Now, a description is given of an output mode of the HMD120according to at least one embodiment of this disclosure with reference toFIG. 19.FIG. 19is a diagram of transition of a chat screen displayed on the monitor130of the HMD120according to at least one embodiment of this disclosure.

(When Update of Avatar Object and Output of Sound are Synchronized)

In a chat screen1917-1, in at least one aspect, when the user5starts to chat with the user5B, the monitor130of the HMD120displays the avatar object6B of the user5B. When the user5B wearing the HMD120B utters sound while moving his or her eyes, the line-of-sight detection result(eye tracking data1830) and the sound data1840are transmitted from the computer200B to the computer200. The processor210of the computer200synchronizes the timing of presenting the avatar object and the timing of outputting the sound data.

In a chat screen1917-2, after that, the HMD120outputs the avatar object6B in which motion of the eyes of the user5B is translated and sound that is based on utterance at the same timing. With this, the user5is able to recognize the change in line of sight of the avatar object6B and the output of sound at the same timing, and thus does not feel strange in a chat via the virtual space11.

(When Update of Avatar Object and Output of Sound are not Synchronized)

In contrast, in a chat screen1917-3, in at least one aspect, when output of sound data and change of the avatar object6B are not synchronized in a chat via the virtual space11, for example, only the change in line of sight of the avatar object6B is displayed first on the monitor130. After that, in a chat screen1917-4, the sound is output from the speaker180of the HMD120with time delay. Therefore, in this case, timings of changing the avatar object6B and outputting sound are different, and hence the user5may feel strange.

As described above, in the system100according to at least one embodiment of this disclosure, the HMD120synchronizes the timing of presenting the avatar object6B and the timing of outputting sound in video-and-sound communication like a chat via the virtual space11. As a result, the user5wearing the HMD120recognizes the change of the avatar object6B that is based on movement of the line of sight of the user5B, who is a chat partner, and the output of sound at the same time, and is thus able to continue chatting without feeling strange.

In the example described above, the HMD120of the system100is configured to synchronize video and sound by synchronizing the timing of presenting the line of sight of the avatar object and the timing of outputting sound. However, motion of the avatar object is not limited to motion of the line of sight (motion of eye balls). In a modification example of this disclosure, a description is given of synchronization between sound and motion (video corresponding thereto) of the avatar object other than the line of sight.

[Configuration of System Including HMD]

FIG. 20is a schematic diagram of a configuration of the system100in at least one embodiment of this disclosure.FIG. 21is a block diagram of a configuration of the memory220of the computer200in at least one embodiment of this disclosure. The HMD set110′ is different from the HMD set110described with reference toFIG. 1in that the HMD set110includes a third camera165and does not include the controller300. The hardware configuration of the computer200is the same as the hardware configuration illustrated inFIG. 2except that the memory220′ includes ring buffers220-1and220-2. Thus, a description of the same hardware configuration is not repeated here.

The HMD120′ is different from the HMD120described with reference toFIG. 1in that the HMD120includes the first camera150, the second camera160, and the third camera165. The first camera150photographs a lower part of the face of the user5. As an example, the first camera150photographs the nose, the mouth, and other such face parts of the user5. The second camera160photographs the eyes, eyebrows, and other such face parts of the user. A part of a casing of the HMD120′ on the user5side is defined as the inside of the HMD120′, and another part of the casing of the HMD120′ opposite to the user5is defined as the outside of the HMD120′. In at least one aspect, the first camera150is arranged outside of the HMD120′, and the second camera160is arranged inside of the HMD120′. InFIG. 20, the first camera150is connected to a frame extending from the outside of the HMD120′, and photographs the lower part of the face of the user5. Images generated by the first camera150and the second camera160are input to the computer200.

The third camera165is capable of acquiring depth information on a target object. As an example, the third camera165acquires the depth information on a target object in accordance with a time-of-flight method. In at least one embodiment, the third camera165acquires the depth information on a target object in accordance with a pattern irradiation method. In at least one embodiment of this disclosure, the third camera165may be a stereo camera capable of photographing a target object in two or more different directions. The third camera165may be an infrared camera. The third camera165is mounted to the upper part of the outside of the HMD120′, and photographs a part of the body of the user5. In the following, as an example, the third camera165photographs the hand of the user5. The third camera165outputs the acquired depth information on the target object (hand) to the computer200.

The configurations of the HMD set110B used by the user5B and the HMD set110C used by the user5C are the same as that of the HMD set110′, and thus a description thereof is omitted here. In the following, a description is given by assigning components of the HMD set110B with a symbol “B” and components of the HMD set110C with a symbol “C”. For example, the HMD120B is included in the HMD set110B. A virtual space presented by the computer200B is defined as the virtual space11B, and a virtual space presented by the computer200C is defined as a virtual space11C.

[Module Configuration of Control Apparatus]

FIG. 22is a block diagram of a module configuration of the computer200in at least one embodiment of this disclosure. The module configuration of the computer200is different from that of the computer200described with reference toFIG. 14in the following points.

Referring toFIG. 22, the control module510′ inFIG. 22further includes a face part detection module2241, a face tracking module2242, a hand tracking module2243, and an avatar control module2244. The memory module530′ inFIG. 22further stores face information2234.

The face part detection module2241detects face parts (e.g., mouth, eyes, cheeks, and nose) forming the face of the user5from images of the face of the user5, which are generated by the first camera150and the second camera160. The face tracking module2242detects motion (shape) of each face part detected by the face part detection module2241. Processing of the face part detection module2241and the face tracking module2242is described later with reference toFIG. 23toFIG. 25.

The hand tracking module2243detects (tracks) the position of a part of the body of the user5. In this modification example, the hand tracking module2243detects a position of the hand of the user5in the uvw visual-field coordinate system set in the HMD120based on the depth information input from the third camera165. Processing of the hand tracking module2243is described later with reference toFIG. 27toFIG. 29.

The face information2234contains a template prepared in advance for the face part detection module2241to detect face parts of the user5. As an example, the face information2234contains a mouth template2235, an eye template2236, a cheek template2237, and a nose template2238. Each of the templates may be an image corresponding to each of the parts forming the face. For example, the mouth template2235may be an image of a mouth. Each template may include a plurality of images.

The virtual object generation module1425translates motion of face parts detected by the face tracking module2242in the face of the avatar object arranged in the virtual space11. The virtual object generation module1425translates motion of the hand detected by the hand tracking module2243in the hand of the avatar object arranged in the virtual space11.

[Face Tracking]

In the following, with reference toFIG. 23toFIG. 25, an example of detecting motion (shape) of the face of the user is described. InFIG. 23toFIG. 25, a specific example of detecting motion of the mouth of the user is described as an example. The detection method described with reference toFIG. 23toFIG. 25is not limited to detection of motion of the mouth of the user, but may be applied to detection of motion of other parts (e.g., eyes, eyebrows, cheeks, and nose) forming the face of the user.

FIG. 23is a face image2351of the user photographed by the first camera150according to at least one embodiment of this disclosure. The face image2351includes the nose and the mouth of the user5.

The face part detection module2241identifies a mouth region2352from the face image2351by pattern matching using the mouth template2235stored in the face information2234. In at least one aspect, the face part detection module2241sets a rectangular comparison region in the face image2351, and changes the size, position, and angle of this comparison region to calculate a similarity degree between an image of the comparison region and an image of the mouth template2235. The face part detection module2241may identify, as the mouth region2352, a comparison region for which a similarity degree larger than a threshold value determined in advance is calculated.

The face part detection module2241may further determine whether or not the comparison region corresponds to the mouth region based on a relative positional relationship between positions of other face parts (e.g., eyes and nose) and the position of the comparison region for which the calculated similarity degree is larger than the threshold value.

The face tracking module2242detects a more detailed shape of the mouth from the mouth region2352detected by the face part detection module2241.

FIG. 24is a diagram of detecting the shape of the mouth by the face tracking module2242according to at least one embodiment of this disclosure. Referring toFIG. 24, the face tracking module2242sets a contour detection line2453for detecting the shape of the mouth (contour of lips) contained in the mouth region2352. A plurality of contour detection lines2453are set at predetermined intervals in a direction (hereinafter referred to as “lateral direction”) orthogonal to a height direction (hereinafter referred to as “longitudinal direction”) of the face.

The face tracking module2242may detect change in brightness value of the mouth region2352along each of the plurality of contour detection lines2453, and identify a position at which the change in brightness value is abrupt as a contour point. More specifically, the face tracking module2242may identify, as the contour point, a pixel for which a brightness difference (namely, change in brightness value) between the pixel and an adjacent pixel is equal to or larger than a threshold value determined in advance. The brightness value of a pixel is obtained by, for example, integrating RBG values of the pixel with predetermined weighting.

The face tracking module2242identifies two types of contour points from the image corresponding to the mouth region2352. The face tracking module2242identifies a contour point2454corresponding to a contour of the outer side of the mouth (lips) and a contour point2455corresponding to a contour of the inner side of the mouth (lips). In at least one aspect, when three or more contour points are detected on one contour detection line2453, the face tracking module2242may identify contour points on both ends of the contour detection line2453as the outer contour points2454. In this case, the face tracking module2242may identify contour points other than the outer contour points2454as the inner contour points2455. When two or less contour points are detected on one contour detection line2453, the face tracking module2242may identify the detected contour points as the outer contour points2454.

FIG. 25is a diagram of processing for detecting the shape of the mouth by the face tracking module2242according to at least one embodiment of this disclosure. InFIG. 25, the outer contour points2454and the inner contour points2455are indicated by white circles and hatched circles, respectively.

The face tracking module2242interpolates the space between the inner contour points2455, to thereby identify a mouth shape2556(degree of opening of mouth). In at least one aspect, the face tracking module2242identifies the mouth shape2556using a nonlinear interpolation method, for example, spline interpolation. In this case, the inner contour points2455can be said to be feature points representing the mouth shape2556. In at least one aspect, the face tracking module2242interpolates the space between the outer contour points2454, to thereby identify the mouth shape2556. In at least one aspect, the face tracking module2242identifies the mouth shape2556by removing contour points that greatly deviate from an assumed mouth shape (predetermined shape that may be formed by upper lip and lower lip of person) and using left contour points. In this manner, the face tracking module2242may identify motion (shape) of the mouth of the user.

The face tracking module2242may also detect the upper lip and the lower lip that form the mouth. As an example, the face tracking module2242identifies, among the outer contour points2454, a contour point2454-R and a contour point2454-L present at both ends in the lateral direction. The face tracking module2242may detect, as the lower lip, a region2557surrounded by those contour points present at both ends and the inner contour points2455and the outer contour points2454present on a lower side in the up-down direction from the contour points present at both ends. The face tracking module2242may detect, as the upper lip, a region surrounded by the outer contour points2454-R and2454-L present at both ends and the inner contour points2455and the outer contour points2454present on an upper side in the up-down direction from the contour points present at both ends.

The method of detecting the mouth shape2556is not limited to the above-mentioned method, and the face tracking module2242may detect the mouth shape2556by another method. The face tracking module2242may detect the shapes of eyes, cheeks, and nose of the user5in a similar manner. More specifically, the face tracking module2242detects feature points representing the shape of the cheek of the user5based on image information generated by the first camera150and the cheek template2237. The face tracking module2242detects feature points representing the shape of the nose of the user5based on the image information generated by the first camera150and the nose template2238. The face tracking module2242detects feature points representing the shape of eyes of the user5based on image information generated by the second camera160and the eye template2236.

In at least one aspect, the first camera150and the second camera160are capable of acquiring the depth information on the target object(face part) in the same manner as in the third camera165. The face tracking module2242acquires, based on the depth information input from the first camera150and the second camera160, position information on feature points (hereinafter also referred to as “face tracking points”) representing the shape of the face part of the user5in the uvw visual-field coordinate system set in the HMD120. The face tracking points contain, for example, the inner contour points2455representing the shape of the mouth of the user5. The computer200outputs the position information on face tracking points detected by the face tracking module2242as “face tracking data” to the other computer200B based on image information generated by the first camera150and the second camera160. The face tracking data can also be said to be data representing the facial expression of the face of the user. The data structure of the face tracking data may be formed by the position information on each face tracking point detected for each face part like the hand tracking data described later with reference toFIG. 29.

FIG. 26AandFIG. 26Bare diagrams of a facial expression of the user5in the real space and a facial expression of the avatar object6of the user5in the virtual space, respectively, according to at least one embodiment of this disclosure.FIG. 26Ais diagram of the user5in the real space.FIG. 26Bis a diagram of a field-of-view image2617B to be visually recognized by the user5B.

Referring toFIG. 26A, the first camera150and the second camera160constructing the HMD set110photograph the user5. The user5is smiling at the time of photography. InFIG. 26A, the user is wearing the HMD120, but the HMD120is omitted for the sake of clarity.

The face tracking module2242generates face tracking data based on images generated by the first camera150and the second camera160. The face tracking data contains position information on contour points representing the shape of the mouth of the user5. The computer200outputs the generated face tracking data to the server600. The server600transfers the data to the computer200B communicating to/from the computer200by a chat application.

The processor210B of the computer200B serves as the virtual object generation module1425B to translate the facial expression of the face of the user5in the avatar object6based on the received face tracking data. As an example, a plurality of movable points are set in the avatar object6so as to correspond to a plurality of face tracking points detected by the face tracking module2242. The virtual object generation module1425B updates the position of each of the plurality of movable points so as to follow the position of the received face tracking point(face tracking data). As a result, the user5B is able to recognize the facial expression of the user5via the facial expression of the avatar object6in the virtual space11B. In the example inFIG. 26B, the user5B recognizes the fact that the user5is smiling by visually recognizing the face of the avatar object6displayed in the view field image2617B.

[Hand Tracking]

Next, with reference toFIG. 27AandFIG. 27BtoFIG. 29, a description is given of processing of tracking motion of the hand.FIG. 27AandFIG. 27Bare diagrams of processing of tracking the hand of the user5according to at least one embodiment of this disclosure.FIG. 27Ais a diagram of the user5in the real space.FIG. 27Bis a diagram of the avatar object6contained in a field-of-view image2717B of the user5B.

Referring toFIG. 27A, the user5is wearing the HMD120in the real space. The third camera165is mounted on the HMD120. The third camera165acquires depth information on objects contained in a space2720ahead of the HMD120. InFIG. 27A, the third camera165acquires depth information on a hand2710of the user5contained in the space2720.

The hand tracking module2243acquires position information on the hand2710of the user5based on the depth information acquired by the third camera165. The third camera165is mounted on the HMD120, and thus the position information on the hand2710may indicate the position in the uvw visual-field coordinate system set in the HMD120. The computer200transmits the position information to the computer200B via the server600as hand tracking data.

InFIG. 27B, the processor210B of the computer200B serves as the virtual object generation module1425B to cause a hand2730of the avatar object6arranged in the virtual space11B to follow the position derived from the received hand tracking data. As an example, the processor210B converts the position indicated by the received hand tracking data (position in uvw visual-field coordinate system set in HMD120) into a position in the XYZ coordinate system based on the position of the head of the avatar object6. The processor210B moves the hand2730of the avatar object6to the position after the conversion. In this manner, the motion of the hand530of the user5is translated in the avatar object6visually recognized by the user5B.

FIG. 28is a diagram of processing of the hand tracking module2243according to at least one embodiment of this disclosure. The hand tracking module2243tracks motion of bones of the hand2710of the user5based on the hand depth information input from the third camera165. InFIG. 28, the hand tracking module2243detects a position of each of joints a, b, c, . . . , x of the hand2710of the user5.

The hand tracking module2243is capable of recognizing the shape (motion of fingers) of the hand2710of the user5based on the positional relationship among the joints a to x. In this sense, the joints a to x of the hand2710can be said to be feature points (hereinafter also referred to as “hand tracking points”) representing the shape of the hand2710. For example, the hand tracking module2243is able to recognize the fact that the hand2710of the user5points with a finger, that the hand2710is open, that the hand2710is closed, that the hand2710is pinching something, or that the hand2710is twisted. The hand tracking module2243is able to further determine whether or not the recognized hand is a left hand or a right hand based on the positional relationship among the joints a to d and other joints. Such third camera165and hand tracking module2243may be implemented, for example, by Leap Motion® provided by Leap Motion, Inc.

FIG. 29is an example of the data structure of the hand tracking data according to at least one embodiment of this disclosure. InFIG. 29, the hand tracking data represents position information on each of the joints a to x of the hand2710in the uvw visual-field coordinate system set in the HMD120. The hand tracking module2243generates hand tracking data representing the position information on hand tracking points based on the image information generated by the third camera165.

The computer200transmits the acquired hand tracking data to the server600. The server600transfers the data to the computer200B, which communicates to/from the computer200by the chat application. The processor210B of the computer200B serves as the virtual object generation module1425B to update the positions of joints forming the hand2730of the avatar object6arranged in the virtual space11B based on the received hand tracking data. As a result, the user5B is able to recognize the motion of the hand2710of the user5via the hand2730of the avatar object6in the virtual space11B.

[Synchronization Between Video and Sound]

FIG. 30A,FIG. 30B, andFIG. 30Care examples of data structures to be transmitted/received between the computer200and the computer200B according to at least one embodiment of this disclosure.FIG. 30Ais an example of a data structure of face data according to at least one embodiment of this disclosure.FIG. 30Bis an example of a data structure of eye data according to at least one embodiment of this disclosure.FIG. 30Cis an example of a data structure of a sound packet according to at least one embodiment of this disclosure.

Referring toFIG. 30A, the face data contains a user ID, face tracking data, time information, and a data ID. The user ID identifies the source of face tracking data. The face tracking data represents the position information on face tracking points. The time information may be a time at which the corresponding face tracking data was generated by the face tracking module2242. In at least one aspect, the time information is a time obtained by subtracting a delay time from the time when the face tracking data was generated by the face tracking module2242. This delay time may contain a time required for the first camera150and the second camera160to generate image information and a time required for the face tracking module2242to generate face tracking data based on the image information. The time information that takes the delay time into consideration may accurately represent a time when the user5exhibited the facial expression corresponding to the face tracking data. This data ID identifies each of a plurality of pieces of face data. This data ID is used for synchronizing face tracking data and other data.

In at least one aspect, hand data containing hand tracking data is generated. This hand data has the same data structure as that of the face data. Specifically, the hand data contains a user ID, hand tracking data, time information, and a data ID. This time information may be a time at which the hand tracking data was generated by the hand tracking module2243. In at least one aspect, the time information is a time obtained by subtracting a delay time from the time when the hand tracking data was generated by the hand tracking module2243. This delay time may contain a time required for the third camera165to generate image information and a time required for the hand tracking module2243to generate hand tracking data based on image information.

Referring toFIG. 30B, the eye data contains a user ID, eye tracking data, time information, and a data ID. The user ID identifies the source of eye tracking data. The eye tracking data represents the center coordinate values (x coordinate value and y coordinate value) of the pupil of the user5detected by the line-of-sight detection module1426. The time information may be a time at which the corresponding eye tracking data was generated by the line-of-sight detection module1426. In at least one aspect, the time information is a time obtained by subtracting a delay time from the time when the eye tracking data was generated by the line-of-sight detection module1426. This delay time may contain a time required for the line-of-sight detection module1426to perform processing of generating eye tracking data. The time information that takes the delay time into consideration may accurately represent the time when the user5performed motion corresponding to the eye tracking data. The data ID identifies each of a plurality of pieces of eye data. This data ID is used for synchronizing eye tracking data and other data.

Referring toFIG. 30C, the sound packet contains a user ID, a sound signal, time information, and a data ID. The user ID identifies the source of the sound packet. The sound signal is digital data generated from utterance of the user5. The time information represents a time of utterance corresponding to the sound signal. In at least one aspect, the time information represents a time at which the sound signal started to be acquired. In at least one aspect, the time information is a time obtained by subtracting a delay time from the time when the sound signal started to be acquired. This delay time may be a time required for converting the analog data acquired by the microphone170into digital data. The time information that takes the delay time into consideration may accurately represent a time at which the user5uttered sound corresponding to the sound signal. The data ID is identification information for synchronizing the sound signal and other data.

In at least one aspect, the processor210determines a data ID to be associated with various kinds of pieces of data described above based on the time information. Specifically, the processor210is configured to assign various kinds of pieces of data acquired substantially at the same time with the same data ID.

FIG. 31is a flowchart of processing of the computer200and the computer200B communicating to/from each other by a chat application according to at least one embodiment of this disclosure. InFIG. 31, the same processing as that ofFIG. 17is assigned with the same reference numeral ofFIG. 17. Thus, a description of such processing is not repeated here.

In Step S2630, the processor210detects motion of the user5, and transmits a detection result to the computer200B, which is the chat partner. The motion of the user5includes, for example, the movement of the line of sight of the user5, the facial expression of the face, and the movement of the hand. The result of detecting the motion of the user5may include, for example, eye tracking data, face tracking data, and hand tracking data. In at least one aspect, the processor210transmits, to the computer200B, eye data, face data, and hand data each including a user ID, time information, and a data ID. In Step S2635, the processor210B detects motion of the user5B, and transmits a detection result to the chat partner, namely, the computer200, in the same manner as in the processing of Step S2630.

In Step S2640, the processor210receives utterance of the user5with the microphone170, and transmits a sound signal that is based on the signal to the chat partner (computer200B). In at least one aspect, the processor210transmits a sound packet containing a user ID, time information, and a data ID to the computer200B. In Step S2645, the processor210B receives utterance of the user5B with the microphone170B, and transmits a sound packet to the chat partner (computer200B) in the same manner as in the processing of Step S2640.

In Step S2650, the processor210receives the result of detecting the motion of the user5B wearing the HMD120B from the computer200B. In Step S2655, the processor210B receives the result of detecting the motion of the user5wearing the HMD120from the computer200.

In Step S2660, the processor210receives from the computer200B the sound packet containing the sound signal that is based on the utterance of the user5B. In Step S2665, the processor210B receives from the computer200the sound packet containing the sound signal that is based on the utterance of the user5.

In Step S2670, the processor210generates data in which the result(eye tracking data, face tracking data, and hand tracking data) of detecting the motion of the user5B is translated in the avatar object6B arranged in the virtual space11. In Step S2675, the processor210generates data in which the result(eye tracking data, face tracking data, and hand tracking data) of detecting the motion of the user5is translated in the avatar object6arranged in the virtual space11B.

In Step S2680, the processor210performs processing of synchronizing the timing of presenting the avatar object6B to the HMD120and the timing of outputting the sound from the speaker180. For example, in at least one aspect, when the computer200receives the result of detecting the motion of the user5B from the computer200B earlier than the sound packet, the processor210waits to output data on the avatar object generated in Step S2670until reception of the sound packet. Upon receiving the sound packet, the processor210determines whether or not the user ID and time information of the sound packet are substantially the same as the user ID and time information of the motion detection result(eye data, face data, and hand data). As an example, the processor210determines that those pieces of time information are substantially the same when a time difference between those pieces of time information is within 10 msec. When the user ID and the time information are substantially the same, the processor210switches the processing to Step S1290. In Step S2685, the processor210B performs processing of synchronizing the timing of presenting the avatar object6to the HMD120B and the timing of outputting the sound from the speaker180B in the same manner as in the processing of Step S2680.

In at least one aspect, the computer200receives the sound packet containing the sound signal of the user5B from the computer200B earlier than the result of detecting the motion of the user5B. In this case, the processor210waits to output the sound signal until reception of the result of detecting the motion of the user5B. The subsequent processing is similar to the above-mentioned processing, and thus a description of such processing is not repeated here.

In at least one aspect, the processor210executes the synchronization processing described above by using the user ID and the data ID. For example, in at least one aspect, the computer200receives the result of detecting the motion of the user5B corresponding to a user ID “190B” and a data ID “001” earlier than a sound packet corresponding to the user ID “190B” and the data ID “001”. In this case, the processor210waits to output data on the avatar object generated based on the detection result corresponding to the user ID “190B” and the data ID “001” until reception of the sound packet corresponding to the user ID “190B” and the data ID “001”. When receiving the sound packet corresponding to the user ID “190B” and the data ID “001”, the processor210outputs the data on the avatar object and the sound signal at the same timing. In the processing of Step S2680and Step S2685, the receiving side performs the synchronization processing based on the time information, but the synchronization processing that is based on the data ID can be said to be performed by the transmitting side.

The HMD set110in the modification example synchronizes the timing at which the avatar object6B corresponding to the user5B is presented to the HMD120and the timing at which the sound of the user5B is output in communication (chat) to/from the HMD set110B. As a result, the user5wearing the HMD120recognizes the motion of the avatar object6B that is based on the motion of the user5B, who is the chat partner, and the sound of the user5B at the same time, and is thus able to continue chatting without feeling strange.

[Processing of Transmitting Data Representing Motion]

Next, a description is given of a method of transmitting data (e.g., eye tracking data, face tracking data, and hand tracking data) representing the motion of the user.

Different numbers of eye tracking data, face tracking data, and hand tracking data are generated per unit time. This is due to, for example, the fact that data acquisition intervals of the eye gaze sensor140, the first camera150, the second camera160, and the third camera165are different from one another, and the fact that periods of time required for executing processing of generating various kinds of pieces of data described above are different from one another.

In at least one aspect, the computer200B of the chat partner may update the facial expression, line of sight, and the hand at the same frames per second (FPS) based on the various kinds of pieces of data described above received from the computer200. In such a case, when all the various kinds of pieces of data generated by the computer200are transmitted to the computer200B, data transmission may be inefficient.

When all the generated various kinds of pieces of data are transmitted to the chat partner, increase in processing load on the server600and drop in frames during motion of the avatar object on the chat partner's side may occur due to increase in traffic. When drop in frames occurs during motion of the avatar object, the user of the chat partner recognizes that the avatar object moves unnaturally, and cannot concentrate on the chat. In view of the above, a description is given of processing that may solve such a problem.

As an example, with reference toFIG. 32toFIG. 34, a description is given of processing of transmitting the eye tracking data and the face tracking data to the chat partner.

FIG. 32is a diagram of an example of a data structure of a ring buffer220-1according to at least one embodiment of this disclosure.FIG. 33is a diagram of an example of a data structure of a ring buffer220-2according to at least one embodiment of this disclosure.

In at least one aspect, the processor210serves as the face tracking module2242to receive output signals from the first camera150and the second camera160every 12.5 msec (at 80 frames per sec (FPS)). Each time the face tracking module2242receives an output signal, the face tracking module2242generates face tracking data based on the received signal. As an example, the face tracking module2242generates face tracking data at 80 FPS.

InFIG. 32, the processor210stores the generated face tracking data into the ring buffer220-1of the memory220. In the example ofFIG. 32, the ring buffer220-1is capable of storing ten pieces of face tracking data. The greater number assigned after the symbol “F” of a piece of face tracking data F represents the fact that the piece of face tracking data F is a newer piece of data. The processor210is configured to update the oldest face tracking data with newly input face tracking data.

The processor210serves as the line-of-sight detection module1426to receive an output signal from the eye gaze sensor140every 8.3 msec (at 120 FPS). Each time the line-of-sight detection module1426receives an output signal, the line-of-sight detection module1426generates eye tracking data based on the received signal. As an example, the line-of-sight detection module1426generates eye tracking data at 120 FPS.

InFIG. 33, the processor210stores the generated eye tracking data into the ring buffer220-2. In the example ofFIG. 33, the ring buffer220-2is capable of storing ten pieces of face tracking data. The greater number assigned after the symbol “E” of a piece of face tracking data E represents the fact that the piece of face tracking data E is a newer piece of data. The processor210is configured to update the oldest eye tracking data with newly input eye tracking data.

(Processing of Generating Animation Data and Processing Based on Input Timing)

FIG. 34is a diagram of processing of generating animation data according to at least one embodiment of this disclosure. The animation data is data containing two or more kinds of data (e.g., eye tracking data, face tracking data, and hand tracking data) representing motion of the user. From another perspective, the animation data is data required for translating the motion of the user in the avatar object arranged in the virtual space. In the example ofFIG. 34, the animation data contains face tracking data and eye tracking data.

In at least one aspect, the processor210generates animation data at predetermined time intervals. The predetermined time interval is set to 167 msec (60 FPS) as one example. At a time T2900, the processor210generates animation data. More specifically, at the time T2900, the processor210associates face tracking data (F5ofFIG. 34), which is the latest among a plurality of pieces of face tracking data (F4and F5ofFIG. 34) stored in the ring buffer220-1, with eye tracking data (E8ofFIG. 34), which is the latest among a plurality of pieces of eye tracking data (E6to E8ofFIG. 34) stored in the ring buffer220-2, to thereby generate animation data.

At a time T2910after a predetermined period of time from the time T2900, the processor210generates animation data again. In the example ofFIG. 34, at the time T2910, the processor210associates the latest face tracking data (F7ofFIG. 34) and the latest eye tracking data (E10ofFIG. 34) with each other, to thereby generate animation data. The processor210transmits the generated animation data to the chat partner (e.g., computer200B) via the server600.

According to the configuration described above, the total number of pieces of eye tracking data and face tracking data forming each of the plurality of pieces of animation data to be transmitted to the chat partner via the server600is smaller than the total number of pieces of eye tracking data generated by the line-of-sight detection module1426and pieces of face tracking data generated by the face tracking module2242. That is, the system100according to at least one embodiment of this disclosure is able to suppress the amount of data to be transmitted to the chat partner via the server600. With this, the traffic in the network2is reduced, and animation data is more easily transmitted to the chat partner. As a result, the avatar object that is displayed on the chat partner's side may move smoothly without drop in frames.

The HMD set110′ in at least one aspect of this disclosure uses the ring buffers220-1and220-2to associate eye tracking data and face tracking data generated at substantially the same time with each other, to thereby generate animation data. As a result, the time at which the user5performs motion corresponding to the eye tracking data and the time at which the user5performed motion corresponding to the face tracking data are substantially the same. Therefore, the computer of the chat partner is able to synchronize eye tracking data and face tracking data (motion of avatar objects corresponding to those pieces of data) contained in the received animation data with each other simply by translating those pieces of data in the avatar object at the same timing.

In the above-mentioned example, the time interval (8.3 msec) at which the eye gaze sensor140outputs a signal to the computer200is shorter than the time interval (12.5 msec) at which the first camera150and the second camera160output a signal to the computer200. In at least one aspect, the processor210generates such animation data that the total number of pieces of eye tracking data contained in the plurality of generated pieces of animation data is smaller than the total number of pieces of face tracking data generated by the face tracking module2242.

In the above-mentioned example, the processor210generates animation data by associating one piece of eye tracking data with one piece of face tracking data, but the method of generating animation data is not limited thereto. For example, the processor210may generate such animation data that the eye tracking data and the face tracking data have a one-to-one correspondence (e.g., two pieces of data are contained for each of eye tracking data and face tracking data).

Further, in at least one aspect, the processor210generates such animation data that eye tracking data and face tracking data have a relationship other than the one-to-one correspondence. For example, the frequency at which the computer200B of the chat partner updates motion of the line of sight of the avatar object6arranged in the virtual space11B may be twice the frequency of updating motion of the facial expression. In this case, the processor210may generate animation data containing two pieces of eye tracking data and one piece of face tracking data.

(Processing of Generating Animation Data—Processing Based on Time Information)

FIG. 35is a diagram of processing of generating animation data in at least one aspect. In the example ofFIG. 34, the processor210is configured to generate animation data by associating the latest eye tracking data and face tracking data with each other. In the example ofFIG. 35, the processor210generates animation data based on time information associated with the eye tracking data and time information associated with the face tracking data.

The time information associated with the eye tracking data may be the time information described with reference toFIG. 30B. In this case, the computer200may acquire the time information from the internal real time clock (RTC). In at least one aspect, the time information associated with the eye tracking data is a time at which corresponding data was detected by the eye gaze sensor140. In this case, the computer200may acquire the time information from the eye gaze sensor140.

The time information associated with the face tracking data may be the time information described with reference toFIG. 30A. In this case, the computer200may acquire the time information from the internal RTC. In at least one aspect, the time information associated with the face tracking data is a time at which corresponding data was detected by the first camera150or the second camera160. In this case, the computer200acquires the time information from the first camera150or the second camera160.

Referring toFIG. 35, at a time T2900, which is a timing of generating animation data, the processor210generates animation data using two kinds of pieces of data that are the latest and acquired at times closest to each other among a plurality of pieces of eye tracking data and face tracking data stored in the ring buffers220-1and220-2.

As an example, the processor210identifies the latest data (F5inFIG. 35) for the face tracking data having a low FPS (number of pieces of data generated per unit time) among the eye tracking data and the face tracking data.

Next, the processor210identifies eye tracking data associated with a time that is closest to a time T3020associated with the identified face tracking data F5. In the example ofFIG. 35, the processor210identifies eye tracking data E7with which a time T3010is associated. The processor210generates animation data by associating the identified pieces of face tracking data and eye tracking data with each other.

According to the configuration described above, the system100in at least one aspect is able to synchronize the face tracking data and the eye tracking data contained in animation data more accurately.

(Animation Packet)

In at least one aspect, the processor210is configured to transmit an animation packet containing a plurality of generated pieces of animation data at predetermined time intervals. As one example, the animation packet is transmitted at an interval of 100 msec (10 FPS). The time interval for transmitting an animation packet is not required to be a fixed value, but may also be a variable value.

FIG. 36is a diagram of processing of generating and transmitting animation packets according to at least one embodiment of this disclosure. Referring toFIG. 36, the processor210transmits an animation packet AP1to the chat partner via the server600at a time T3100. The processor210transmits an animation packet AP2, which is the next animation packet, at a time T3110, which is after a predetermined period of time (e.g., 100 msec) from the time T3100.

The animation packet AP2contains pieces of animation data A1to A6generated between the time T3100and the time T3110. At this time, the animation data has been generated in the above-mentioned period, but eye tracking data and face tracking data forming the animation data have not necessarily been generated within the above-mentioned period. InFIG. 36, the face tracking data F5forming the animation data A1has been generated before the time T3100.

FIG. 37is an illustration of an example of a data structure of an animation packet according to at least one embodiment of this disclosure. The animation packet contains various kinds of header information such as a Media Access Control (MAC) header, an Internet Protocol (IP) header, and a Transmission Control Protocol (TCP) header, a payload, and a frame check sequence (FCS) for error correction.

The MAC header may contain information for identifying the computer (e.g., computer200B) of the chat partner. The server600refers to the MAC header to transfer the animation packet to the computer of the chat partner.

The payload contains a plurality of pieces of animation data, FPS information, a user ID, and a data ID. The FPS information represents the number of pieces of animation data to be generated per unit time. The FPS information may be used for translating animation data in the computer of the chat partner. InFIG. 36, the FPS information indicates “60”. The user ID identifies a transmission source of an animation packet. The data ID distinguishes between a plurality of animation packets. The user ID and the data ID are used for processing of synchronizing motion and sound of an avatar object described later.

When the transmission time interval of an animation packet is a fixed value, the animation packet does not need to contain the FPS information. In such a case, the processor210may transmit information indicating the fixed value to the computer of the chat partner at the start of a chat.

(Synchronization Between Video and Sound)

Next, a description is given of the processing of synchronizing video and sound executed when animation data containing data representing a plurality of types of motion of the user is transmitted or received.

FIG. 38is a diagram of processing of synchronizing motion of an avatar object and a timing of outputting sound according to at least one embodiment of this disclosure.FIG. 39is a diagram of an example of a data structure of a sound packet according to at least one embodiment of this disclosure.

Referring toFIG. 38, the processor210is configured to transmit, to the chat partner, a sound packet containing sound signals input from the microphone170over a predetermined period of time. In at least one aspect, the predetermined period of time is set to the same period as the transmission time interval of an animation packet.

More specifically, the processor210generates such a sound packet that the sound packet contains sound signals input over the predetermined period of time from a timing indicated by time information associated with a piece of animation data generated first among a plurality of pieces of animation data forming the animation packet. In at least one aspect, the time information associated with animation data may be time information (e.g., time information indicating earlier time) associated with any one of eye tracking data and face tracking data forming the animation data. In at least one aspect, the time information associated with animation data represents a time at which the animation data was generated.

In the example ofFIG. 38, the animation packet AP1is formed of the plurality of pieces of animation data A1to A6. The animation data A1is generated first among the plurality of pieces of animation data A1to A6. The processor210generates such a sound packet that the sound packet contains sound signals input over the predetermined period of time from a time T3300indicated by time information associated with the animation data A1generated first. In the example ofFIG. 38, the time T3300indicated by the time information associated with the animation data A1is time information associated with eye tracking data E8indicating the earliest time among pieces of time information respectively associated with the face tracking data F5and the eye tracking data E8forming the animation data A1.

InFIG. 39, the sound packet in at least one aspect contains a user ID and a data ID. The user ID identifies a transmission source of the sound packet. The data ID distinguishes between a plurality of sound packets. The processor210sets the data ID contained in a sound packet and the data ID contained in an animation packet corresponding to the sound packet to the same value. More specifically, the processor210sets the data ID of an animation packet and the data ID of a sound packet containing sound signals acquired with respect to time information associated with animation data contained in the animation packet to the same value. In the example ofFIG. 38, the processor210assigns the same data ID to the animation packet AP2and the sound packet containing sound signals acquired over a period of time from the time T3300to the time T3310.

The processor210transmits those animation packets and sound packets to the computer200B of the chat partner via the server600. The computer200B sequentially translates a plurality of pieces of animation data (formed of eye tracking data and face tracking data) contained in the received animation packets in the avatar object6arranged in the virtual space11B.

At this time, the computer200B controls a timing of translating a plurality of pieces of animation data contained in an animation packet in the avatar object6based on the FPS information contained in the animation packet. For example, inFIG. 38, the FPS information indicates “60” (FPS) and the refresh rate (number of times image is updated per unit time) of the monitor130B of the HMD120B indicates “120”. In such a case, the computer200B translates animation data in the avatar object6once for every two frames. This is because translating animation data for each frame causes the user5B to feel strange due to display of motion of the avatar object6at double speed.

In at least one aspect, an animation packet contains information indicating the transmission time interval of an animation packet instead of the FPS information. This is because the FPS information can be derived from the number of pieces of animation data contained in the animation packet and the transmission time interval.

At a timing when the computer200B translates animation data contained in the received animation packet in the avatar object6, the computer200B outputs, from the speaker180B, sound signals contained in a sound packet assigned with the same user ID and data ID as those of the animation packet. With this, motion of the avatar object6in the virtual space11B and sound of the user5synchronized with the motion are presented to the user5b. As a result, the user5B is able to chat with the user5smoothly in the virtual space11B.

In the example given above, the sound packet contains sound signals acquired over the transmission time interval of an animation packet. In at least one embodiment, the sound packet may contain sound signals acquired over a period of an integral multiple of the transmission time interval. This is because motion of the avatar object6in the virtual space11B and sound of the user5synchronized with the motion are presented to the user5balso with this configuration. In this case, an animation packet (data ID is null) not assigned with the data ID may be generated periodically.

(Control Structure)

FIG. 40is a flowchart of processing of synchronizing video and sound using animation packets and sound packets according to at least one embodiment of this disclosure. In the processing inFIG. 40, the same processing as that ofFIG. 17is assigned with the same reference symbol. Thus, a description of such processing is not repeated here.

In Step S3510, the processor210of the computer200serves as the face tracking module2242to receive output signals from the first camera150and the second camera160at predetermined time intervals (e.g., 12.5 msec), and generate face tracking data based on the received signals. The processor210stores the generated face tracking data into the ring buffer220-1as appropriate.

In Step S3520, the processor210serves as the line-of-sight detection module1426to receives output signals of the eye gaze sensor140at predetermined time intervals (e.g., 8.3 msec), and generate eye tracking data based on the received signals. The processor210stores the generated eye tracking data into the ring buffer220-2as appropriate.

In Step S3530, at a timing of generating animation data at predetermined intervals (e.g., 60 FPS), the processor210associates the latest face tracking data stored in the ring buffer220-1with the latest eye tracking data stored in the ring buffer220-2to generate animation data.

In Step S3540, the processor210determines whether or not the transmission time interval of an animation packet has elapsed. When the processor210determines that the transmission time interval has elapsed (YES in Step S3540), the processor210advances the processing to Step S3550. On the other hand, when the processor210determines that the transmission time interval has not elapsed (NO in Step S3540), the processor210returns the processing to Step S3510to execute processing of generating animation data again.

In Step S3550, the processor210generates an animation packet containing a plurality of pieces of animation data generated within the transmission time interval, and transmits the animation packet to the computer200B of the chat partner via the server600.

In Step S3555, the processor210bof the computer200B receives the animation packet from the computer200via the server600.

In Step S3560, the processor210generates a sound packet containing sound signals acquired over the transmission time interval from a timing indicated by time information associated with a piece of animation data generated first among a plurality of pieces of animation data forming the animation packet transmitted in Step S3550. The processor210transmits the generated sound packet to the computer200B. After that, the processor210returns the processing to Step S3510.

In Step S3565, the processor210B receives the sound packet from the computer200.

In Step S3575, at a timing when the processor210B translates animation data contained in the received animation packet in the avatar object6arranged in the virtual space11B, the processor210B outputs, from the speaker180B, sound signals contained in the sound packet assigned with the same user ID and data ID as those of the animation packet. After that, the processor210B returns the processing to Step S3555.

According the configuration described above, the HMD set110B is able to synchronize motion of the avatar object6arranged in the virtual space11B and sound of the user5output from the speaker180B. Thus, the user5B is able to chat with the user5smoothly in the virtual space11B.

Further, the computer200does not transmit all the generated pieces of eye tracking data and face tracking data to the computer200B. Thus, the system100is able to reduce the traffic in the network, the processing load on the server600, and the processing load on the computer200B of the chat partner.

(Control Structure in Consideration of Refresh Rate of Chat Partner)

FIG. 41is a flowchart of processing of synchronizing video and sound in consideration of a refresh rate of the chat partner according to at least one embodiment of this disclosure. In the processing inFIG. 41, the same processing as that ofFIG. 40is assigned with the same reference symbol. Thus, a description of such processing is not repeated here.

In Step S3615, the processor210B of the computer200B transmits a refresh rate of the monitor130bto the computer200of the chat partner. This refresh rate represents the number of times an image corresponding to the virtual space11bis updated per unit time in the monitor130B.

In Step S3620, the processor210of the computer200receives information indicating the refresh rate of the monitor130bfrom the computer200B.

In Step S3630, the processor210sets a time interval of generating animation data. More specifically, the processor210sets the time interval of generating animation data to be equal or smaller than the refresh rate of the monitor130B. This is because, for example, when all the generated pieces of animation data are transmitted to the computer200B on the assumption that animation data is generated at 60 FPS and the refresh rate of the monitor130B is 30 FPS, half the pieces of animation data are not translated in the avatar object6and discarded.

According to the configuration described above, the system100may further reduce the traffic in the network, the processing load on the server600, and the processing load on the computer200B of the chat partner.

The above-mentioned technical features disclosed in the above description as the aspects of at least one embodiment of this disclosure are summarized in the following manner, for example.

(Configuration 1)

According to at least one embodiment of this disclosure, there is provided a method to be executed on a computer200to communicate via a virtual space11. The method includes defining the virtual space11in an HMD120connected to the computer200. The method further includes receiving a result of detecting a line of sight of a user5B of an HMD120B connected to a computer200B. The method further includes receiving a sound signal that is based on utterance of a user5B. The method further includes synchronizing presentation of an avatar object6B corresponding to the user5B to the HMD120based on the result of detecting the line of sight and output of sound that is based on the sound signal from a speaker180of the HMD120.

(Configuration 2)

According to at least one embodiment of this disclosure, the detection result contains, for example, eye tracking data and a time at which the line of sight has been detected. The sound signal contains a time at which the utterance has been given. The synchronizing of the presentation and the output includes waiting, when the detection result is received earlier than the sound signal, to translate the detection result in the avatar object6B until the sound signal is received.

(Configuration 3)

According to at least one embodiment of this disclosure, the detection result contains a time at which the line of sight has been detected. The sound signal contains a time at which the utterance has been given. The synchronizing of the presentation and the output includes waiting, when the sound signal is received earlier than the detection result, to output the sound that is based on the sound signal until the detection result is received.

(Configuration 4)

According to at least one embodiment of this disclosure, the presenting of the avatar object6B to the HMD120includes presenting, to the HMD120, the avatar object6B whose line of sight is directed in a direction of the line of sight of the user5B.

(Configuration 5)

According to at least one embodiment of this disclosure, the synchronizing of the presentation and the output includes moving a mouth of the avatar object6B in synchronization with the output of sound.

(Configuration 6)

According to at least one embodiment of this disclosure, there is provided a method to be executed on a computer200to communicate via a virtual space11. The method includes defining the virtual space11in an HMD120connected to the computer200. The method further includes receiving a result of detecting motion of a user5B of an HMD120B connected to a computer200B. The method further includes receiving a sound signal that is based on utterance of a user5B. The method further includes synchronizing presentation of an avatar object corresponding to the user5B to the HMD120based on the result of detecting the motion and output of sound that is based on the sound signal from a speaker180of the HMD120.

(Configuration 7)

According to at least one embodiment of this disclosure, the motion of the user5includes at least any one of motion of an eye, motion of a line of sight, motion of a mouth, motion of a cheek, motion of a nose, or motion of a hand.

(Configuration 8)

According to at least one embodiment of this disclosure, the receiving (Step S2655) of the result of detecting the motion of the user5B includes receiving a time at which motion corresponding to the detection result has been detected. The receiving (Step S2660) of the sound signal includes receiving a time at which the utterance has been given. The synchronizing (Step S2680) of the presentation and the output includes waiting to, when the detection result is received earlier than the sound signal, translate the detection result in the avatar object until the sound signal is received.

(Configuration 9)

According to at least one embodiment of this disclosure, the receiving (Step S2655) of the result of detecting the motion of the user5B includes receiving a time at which motion corresponding to the detection result has been detected. The receiving (Step S2660) of the sound signal includes receiving a time at which the utterance has been given. The synchronizing (Step S2680) of the presentation and the output includes waiting to, when the sound signal is received earlier than the detection result, output the sound that is based on the sound signal until the detection result is received.

(Configuration 10)

According to at least one embodiment of this disclosure, there is provided a program for executing any one of the methods described above on a computer200.

(Configuration 11)

According to at least one embodiment of this disclosure, there is provided an information processing apparatus including: a memory configured to store the program described above; and a processor coupled to the memory and configured to execute the program.

It is to be understood that the embodiments disclosed herein are merely examples in all aspects and in no way intended to limit this disclosure. The scope of this disclosure is defined by the appended claims and not by the above description, and it is intended that this disclosure encompasses all modifications made within the scope and spirit equivalent to those of the appended claims.

In the at least one embodiment described above, the description is given by exemplifying the virtual space (VR space) in which the user is immersed using an HMD. However, a see-through HMD may be adopted as the HMD. In this case, the user may be provided with a virtual experience in an augmented reality (AR) space or a mixed reality (MR) space through output of a field-of-view image that is a combination of the real space visually recognized by the user via the see-through HMD and a part of an image forming the virtual space. In this case, action may be exerted on a target object in the virtual space based on motion of a hand of the user instead of the operation object. Specifically, the processor may identify coordinate information on the position of the hand of the user in the real space, and define the position of the target object in the virtual space in connection with the coordinate information in the real space. With this, the processor can grasp the positional relationship between the hand of the user in the real space and the target object in the virtual space, and execute processing corresponding to, for example, the above-mentioned collision control between the hand of the user and the target object. As a result, an action is exerted on the target object based on motion of the hand of the user.

Claims

A method, comprising: defining a virtual space associated with a first user, wherein the virtual space is associated with a first head-mounted device (HMD) connected to a first computer, and wherein the virtual space comprises an avatar object associated with a second user different from the first user;receiving line-of-sight data related to the second user from a second computer at a first time, wherein the second user is associated with a second HMD, and wherein the second HMD is connected to the second computer;receiving sound data, at the first HMD, that is based on a detected utterance of the second user at a second time different from the first time;synchronizing a timing of controlling the avatar object in accordance with the received line-of-sight data and a timing of outputting sound that is based on the sound data received by the first HMD;controlling the avatar object in accordance with the line-of-sight data based on the synchronized timing;and outputting the sound that is based on the sound data received by the first HMD based on the synchronized timing.

The method according to claim 1 , wherein the line-of-sight data contains first time data for identifying a time at which a line of sight of the second user has been detected by the second HMD, wherein the sound data contains second time data for identifying a time at which the utterance has been detected in the second HMD, and wherein the method further comprises: receiving, by the first computer, the line-of-sight data prior to the sound data;and delaying, in response to the time identified by the first time data and the time identified by the second time data substantially coinciding, controlling the avatar object based on the line-of-sight data until the sound data is received.
The method according to claim 1 , wherein the line-of-sight data contains first time data for identifying a time at which a line of sight has been detected by the second HMD, wherein the sound data contains second time data for identifying a time at which the utterance has been detected in the second HMD, and wherein the method further comprises: receiving the sound data prior to the line-of-sight data;and delaying, in response to the time identified by the first time data and the time identified by the second time data substantially coinciding, outputting the sound that is based on the sound data from the first HMD until the line-of-sight data is received.
The method according to claim 1 , wherein the line-of-sight data identifies a direction of a line of sight of the second user in the virtual space, and wherein the method further comprises controlling the avatar object so that a line of sight of the avatar object is directed in the direction defined by the line-of-sight data.
The method according to claim 1 , further comprising moving a mouth of the avatar object in accordance with the sound data based on the synchronized timing.
A method, comprising: defining a virtual space associated with a first user, wherein the virtual space is associated with a first head-mounted device (HMD) connected to a first computer, and wherein the virtual space comprises an avatar object associated with a second user different from the first user;receiving motion data at a first time, wherein the motion data identifies motion of a part of a body of the second user from a second computer, the second user being associated with a second HMD, the second HMD being connected to the second computer;receiving sound data, at the first HMD, that is based on utterance of the second user at a second time different from the first time;synchronizing a timing of controlling the avatar object in accordance with the received motion data and a timing of outputting sound that is based on the sound data received by the first HMD;controlling the avatar object in accordance with the motion data based on the synchronized timing;and outputting the sound that is based on the sound data received by the first HMD based on the synchronized timing.
The method according to claim 6 , wherein the second HMD comprises: a first camera configured to capture an image of a first part of the second user comprising surroundings of an eye of the second user;and a second camera configured to capture an image of a second part of the second user other than the first part in a face of the second user, or a position sensor configured to detect motion of a part of a body other than the face of the second user, and wherein the motion data contains data for identifying at least one of motion of the eye detected by the first camera, motion of a line of sight detected by the first camera, motion of a mouth detected by the second camera, motion of a cheek detected by the second camera, motion of a nose detected by the second camera, or motion of a hand detected by the position sensor.
The method according to claim 6 , wherein the motion data contains first time data for identifying a time at which the motion has been detected by the second HMD, wherein the sound data contains second time data for identifying a time at which the sound has been detected by the second HMD, and wherein the method further comprises: receiving the motion data prior to the sound data;and delaying, in response to the time identified by the first time data and the time identified by the second time data substantially coinciding, controlling of the avatar object based on the motion data until the sound data is received.
The method according to claim 6 , wherein the motion data contains first time data for identifying a time at which the motion has been detected by the second HMD, wherein the sound data contains second time data for identifying a time at which the sound has been detected by the second HMD, and wherein the method further comprises: receiving the sound data prior to the motion data;and delaying, in response to the time identified by the first time data and the time identified by the second time data substantially coinciding, outputting the sound that is based on the sound data until the motion data is received.
An information processing apparatus, comprising: a memory configured to store a program;and a processor coupled to the memory and configured to execute the program for: defining a virtual space associated with a first user, wherein the virtual space is associated with a first head-mounted device (HMD) connected to a first computer, and wherein the virtual space comprises an avatar object associated with a second user different from the first user;receiving line-of-sight data related to the second user from a second computer at a first time, wherein the second user is associated with a second HMD, and wherein the second HMD is connected to the second computer;receiving sound data, at the first HMD, that is based on a detected utterance of the second user at a second time different from the first time;synchronizing a timing of controlling the avatar object in accordance with the received line-of-sight data and a timing of outputting sound that is based on the sound data received by the first HMD;controlling the avatar object in accordance with the line-of-sight data based on the synchronized timing;and outputting the sound that is based on the sound data received by the first HMD based on the synchronized timing.

More Claims Show Fewer Claims

Disclaimer: Data collected from the USPTO and may be malformed, incomplete, and/or otherwise inaccurate.