U.S. Pat. No. 10,991,110

Methods and Systems to Modify a Two Dimensional Facial Image to Increase Dimensional Depth and Generate a Facial Image That Appears Three Dimensional

AssigneeActivision Publishing, Inc.

Issue DateApril 9, 2020

Patent Arcade analysis Read the full post

U.S. Patent No. 10,991,110: Methods and Systems to Modify a Two Dimensional Facial Image to Increase Dimensional Depth and Generate a Facial Image That Appears Three Dimensional

Issued April 27, 2021 to Activision Publishing Inc.
Filed: April 9, 2020 (claiming priority to December 6, 2016)

Overview:

U.S. Patent No. 10,991,110 (the ‘110 patent) is a continuation of U.S. Patent No. 10,055,880 and relates to creating an augmented reality (AR) face mask that appears three-dimensional by increasing the depth of a two-dimensional image. The ‘110 patent details a computer-implemented method of making a three-dimensional appearing AR face mask from a two-dimensional image, achieved by the computer first acquiring an image of a face from a camera, then mapping two groups of key points on the two-dimensional image to generate a texture map.

One of the groups of key points represents anatomical locations on the eyebrows, eyes, nose, and lips. The two groups of key points are compared by the computer to determine the distance and proportions of facial features as well as a several scaling factors, each scaling factor being a function of one or more proportions, from each of the groups of key points. The key points are used by the computer to generate a texture map of the two-dimensional image and then project the map onto a three-dimensional mesh stored in the computer. The scaling factors are then used to adjust the 3D mesh to yield a three-dimensional appearing AR face mask.

This could be used for several features in games, such as but not limited to: displaying faces to users of Heads-up displays (HUDs), displaying faces on normally faceless characters in multiplayer shooters, or printing custom avatars.

Abstract:

The specification describes methods and systems for increasing a dimensional depth of a two-dimensional image of a face to yield a face image that appears three dimensional. The methods and systems identify key points on the 2-D image, obtain a texture map for the 2-D image, determines one or more proportions within the 2-D image, and adjusts the texture map of the 3-D model based on the determined one or more proportions within the 2-D image.

Illustrative Claim:

A computer-implemented method for increasing a dimensional depth of a two-dimensional image to yield an augmented reality (AR) face mask, said method being implemented in a computer having a processor and a random access memory, wherein said processor is in data communication with a display and with a storage unit, the method comprising: acquiring from the storage unit the two-dimensional image; acquiring an image of a face of a person from a camera; using said computer and executing a plurality of programmatic instructions stored in the storage unit, identifying a first plurality of key points on the two-dimensional image; using said computer and executing a plurality of programmatic instructions stored in the storage unit, identifying a second plurality of key points on the two-dimensional image; using said computer and executing a plurality of programmatic instructions stored in the storage unit, generating a texture map of the two-dimensional image; using said computer and executing a plurality of programmatic instructions stored in the storage unit, projecting said texture map of the two-dimensional image onto the image of the face of the person; using said computer and executing a plurality of programmatic instructions stored in the storage unit, modifying the first plurality of key points based on the second plurality of key points; and using said computer, outputting the AR face mask image based on the modified first plurality of key points.

Illustrative Figure

Abstract

Description

DETAILED DESCRIPTION In an embodiment, a method is provided for converting a two-dimensional (2-D) image for three-dimensional (3-D) display using a computing device, such as a laptop, mobile phone, desktop, tablet computer, or gaming console, comprising a processor in data communication with a non-transient memory that stores a plurality of programmatic instructions which, when executed by the processor, perform the methods of the present invention. The 2-D image may be in any known format, including, but not limited to, ANI, ANIM, APNG, ART, BMP, BPG, BSAVE, CAL, CIN, CPC, CPT, DDS, DPX, ECW, EXR, FITS, FLIC, FLIF, FPX, GIF, HDRi, HEVC, ICER, ICNS, ICO/CUR, ICS, ILBM, JBIG, JBIG2, JNG, JPEG, JPEG 2000, JPEG-LS, JPEG XR, KRA, MNG, MIFF, NRRD, ORA, PAM, PBM/PGM/PPM/PNM, PCX, PGF, PlCtor, PNG, PSD/PSB, PSP, QTVR, RAS, RBE, SGI, TGA, TIFF, UFO/UFP, WBMP, WebP, XBM, XCF, XPM, XWD, CIFF, DNG, AI, CDR, CGM, DXF, EVA, EMF, Gerber, HVIF, IGES, PGML, SVG, VML, WMF, Xar, CDF, DjVu, EPS, PDF, PICT, PS, SWF, XAML and any other raster, raw, vector, compound, or other file format. In embodiments, the conversion from a 2-D image to a modified image with increased dimensional depth to thereby appear to be 3-D, generally referred to as a 3-D image, is performed automatically after the 2-D image is obtained by the computing device. In an embodiment, a single 2-D image is processed to identify key points of interest. These points are used to define three-point regions that are exclusive of each other. In an embodiment, a Delaunay triangulation method is used to define the three-point regions automatically. The triangulation is used to synchronize with pre-indexed points of interest laid out on a 3-D model, thereby enabling a UV mapping of the 2-D image to yield a texturized 3-D model. In various embodiments, proportions and ratios ...

DETAILED DESCRIPTION

In an embodiment, a method is provided for converting a two-dimensional (2-D) image for three-dimensional (3-D) display using a computing device, such as a laptop, mobile phone, desktop, tablet computer, or gaming console, comprising a processor in data communication with a non-transient memory that stores a plurality of programmatic instructions which, when executed by the processor, perform the methods of the present invention. The 2-D image may be in any known format, including, but not limited to, ANI, ANIM, APNG, ART, BMP, BPG, BSAVE, CAL, CIN, CPC, CPT, DDS, DPX, ECW, EXR, FITS, FLIC, FLIF, FPX, GIF, HDRi, HEVC, ICER, ICNS, ICO/CUR, ICS, ILBM, JBIG, JBIG2, JNG, JPEG, JPEG 2000, JPEG-LS, JPEG XR, KRA, MNG, MIFF, NRRD, ORA, PAM, PBM/PGM/PPM/PNM, PCX, PGF, PlCtor, PNG, PSD/PSB, PSP, QTVR, RAS, RBE, SGI, TGA, TIFF, UFO/UFP, WBMP, WebP, XBM, XCF, XPM, XWD, CIFF, DNG, AI, CDR, CGM, DXF, EVA, EMF, Gerber, HVIF, IGES, PGML, SVG, VML, WMF, Xar, CDF, DjVu, EPS, PDF, PICT, PS, SWF, XAML and any other raster, raw, vector, compound, or other file format.

In embodiments, the conversion from a 2-D image to a modified image with increased dimensional depth to thereby appear to be 3-D, generally referred to as a 3-D image, is performed automatically after the 2-D image is obtained by the computing device. In an embodiment, a single 2-D image is processed to identify key points of interest. These points are used to define three-point regions that are exclusive of each other. In an embodiment, a Delaunay triangulation method is used to define the three-point regions automatically. The triangulation is used to synchronize with pre-indexed points of interest laid out on a 3-D model, thereby enabling a UV mapping of the 2-D image to yield a texturized 3-D model. In various embodiments, proportions and ratios that are unique to the 2-D image and the texturized 3-D model are used to calculate at least one scale factor. The scale factors are used to sculpt the 3-D image corresponding to the original 2-D image.

The present specification is directed towards multiple embodiments. The following disclosure is provided in order to enable a person having ordinary skill in the art to practice the invention. Language used in this specification should not be interpreted as a general disavowal of any one specific embodiment or used to limit the claims beyond the meaning of the terms used therein. The general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the invention. Also, the terminology and phraseology used is for the purpose of describing exemplary embodiments and should not be considered limiting. Thus, the present invention is to be accorded the widest scope encompassing numerous alternatives, modifications and equivalents consistent with the principles and features disclosed. For purpose of clarity, details relating to technical material that is known in the technical fields related to the invention have not been described in detail so as not to unnecessarily obscure the present invention. In the description and claims of the application, each of the words “comprise” “include” and “have”, and forms thereof, are not necessarily limited to members in a list with which the words may be associated.

It should be noted herein that any feature or component described in association with a specific embodiment may be used and implemented with any other embodiment unless clearly indicated otherwise.

FIG. 1illustrates the sequence of processing an image with a general purpose cross-platform software library that contains machine learning algorithms. Exemplary features in the software library enables the detection of face features, including eyebrows, eyes, nose, mouth, nostrils, ears, cheekbones, chin, and/or lips. It should be appreciated that any facial feature detection software may be implemented, provided the software detects more than one location on (and assigns a distinct point to more than one location) on each of the two eyebrows, two eyes, nose, and lips of the face and assigns each of those distinct points with a distinct horizontal position (e.g., X position) and vertical position (e.g., Y position) in a coordinate system.

Referring back toFIG. 1, a facial image102is processed using a software application. In various embodiments, the image102is sourced from a viewing element such as a camera, a database of images, a video, a memory local to the computing device, a memory remote from the computing device, or any other source of images. In one embodiment, the image102is a result of a selfie shot taken by an individual through a camera of a mobile phone and stored locally within the mobile phone. Processing validates that the image is a frontal image of a face. A frontal image of a face may be best suited in a display for gaming applications and several other virtual reality, augmented reality or mixed reality applications. The remaining components (body parts) of a display created for an individual may be created and presented in various imaginative formats.

Using facial feature detection software, the frontal face portion104may thus be isolated from the remaining image. Optionally, the facial image may be analysed to determine if the image is sufficiently “front-facing”. More specifically, if the facial image is too skewed, whether up, down, diagonally, left, right, or otherwise, the presently disclosed methods and systems may have a difficult time generating a quality three dimensional image. As a result, in one embodiment, the presently disclosed system analyses the image to determine if the face is turned greater than a predefined angle, if the edges of the face are substantially the same distance from the center of the face, and/or if the features on one side of the face, such as lips, eyes or ears, are dimensionally different, in excess of a predefined threshold, relative to the features on the other side of the face.

If the facial image is sufficiently “front-facing”, subsequently, the system identifies multiple key anatomical points, as seen in image106, which indicate anatomically defined features of the face in image102. A key anatomical point is a location on a face that is detected and provided by a software application. An exemplary software application uses a face detection function that returns a list of 67 points on the face (in pixels). In embodiments of the present specification, the system numbers key points in image106, as seen in image108. Image108illustrates key points indexed up to 67 and the numbers shown in image108indicate an assigned identity (ID) of each key point. It should be appreciated that the system may identify any number of anatomical points that may be less than or greater than 67.

Subsequently, the system generates a texture map for image106. The system generates a texture map using two steps.FIG. 2illustrates a first step, where an image206with multiple key points is used to define a plurality of non-overlapping regions, each of which may be defined by at least three points, as seen in image210. In an embodiment, the regions define various anatomical regions, thereby capturing various anatomical features, of a front-face of a human being. In an embodiment, at least the anatomical regions that define brows, eyes, nose, lips, and face, are covered by the key points. In embodiments, the non-overlapping, three point, triangular regions are defined using Delaunay triangulation, automatically through the execution of programmatic instructions on the computing device. Use of Delaunay triangulation, which is a known analytical method to persons of ordinary skill in the art, ensures that none of the key points are inside the circumcircle of any triangular region, thus maximizing the minimum angle of all the angles of the triangles in the triangulation. Maximizing the minimum angles thereby improves subsequent interpolation or rasterization processes that may be applied to the image for creating a corresponding 3-D mesh.

Referring toFIG. 3, the system then initiates a process of generating a texture map for a 3-D model. Triangulation from the previous step is automatically used to synchronize a plurality of pre-defined and/or pre-indexed points of interest identified on a UV layout. In an alternative embodiment, the points of interest are automatically generated by the system. In embodiments, the points of interest identified for the UV layout are termed as landmark points of UV map. A UV layout is a layout of a 2D model where U and V refer to the two axes (corresponding to X and Y axes in the 3D model). In embodiments, the system stores, in a memory, a generic 2D UV layout and its corresponding generic 3D model. The system uses the triangulation data to modify the generic UV layout and uses the modified UV layout to create a modified 3D model.

InFIG. 3, an image302illustrates a pre-defined or automatically generated generic 3-D mesh model. An image304illustrates the texture coordinates or the landmark points on the UV map of the generic 3-D mesh model. The texture coordinates or landmark points are generated for each vertex of the triangles derived through the triangulation. In embodiments, image304is generated automatically by a software application. Image306illustrates the triangles derived through the triangulation process for the UV map of the generic 3D model.

FIG. 4illustrates processing of the images to arrive at an image406, which is a texture map of the 3-D model. The figure shows an image402corresponding to the original image104, with triangulations of the key points. Images402ofFIG. 4 and 210ofFIG. 2illustrate a resulting collection of tri-faces within the original image104. The system matches the tri-faces of image402with corresponding triangles within the UV map404of the generic 3-D mesh model. In an embodiment, this is achieved by matching each key point of the original image with the texture coordinates or landmark points of the UV map. Thus, the system uses the key points to morph the generic UV layout such that the UV layout is modified in to a front-face image406, which includes the texture of the original image of the face. The texture includes colors and other visual parameters, such as hue, luminance, brilliance, contrast, brightness, exposure, highlights, shadows, black point, saturation, intensity, tone, grain, neutrals; which are automatically sampled from various parts of the face in the original image and averaged, or subjected to some other processing formula, to fill out the backdrop within the morphed 3-D mesh image. The resultant image may be used as a texture map for the 3D model.

Various embodiments of the present specification enable accounting for the unique proportions and ratios of the original image. In the given example, the original image104used for processing a frontal face of an individual, is additionally used to identify face feature lengths, distances, proportions, dimensions, or ratios, collectively referred to as positional relationships. In embodiments, the system analyses image104to generate values indicative of the positional relationships of an individual's facial features. For example, the values may be representative of the relative distances between width of the nose compared to the chin, distance between the two eyes, width of eyebrows, thickness of the lips, and other measurements that mark the positional relationships of various anatomical points and/or regions on the face.

In an embodiment, the system determines a plurality of distances between various anatomical facial features. In embodiments, the distances are used to adjust the generated 3D model of the original image of the face. Referring toFIG. 5, image502illustrates various horizontal (green lines) and vertical (red lines) measurements with the original image (104) that may be used in accordance with some embodiments of the present specification, to identify the positional relationships. Image504illustrates various measurements of similar features of the generic 3-D mesh model face (image302ofFIG. 3).

FIGS. 5A and 5Bshow an enlarged view of key points 1 to 67521, identified through a plurality of programmatic instructions configured to graphically identify a plurality of key points. The number IDs for the key points521shown inFIGS. 5A and 5Bare recalled (in brackets) in the following examples. The system uses the corresponding similar measurements to derive positional relationships between the measured features. One of the plurality of exemplary distances is a face width, defined as a distance508(2 and 14) from one point located proximate the left edge of the face laterally across the face to a point located proximate on the right edge of the face. Another of the plurality of exemplary distances is a jaw width, defined as a distance510(3 and 13) from one point located proximate the left edge of the jaw laterally across the face to a point located proximate on the right edge of the jaw. Yet another of the plurality of exemplary distances is a temple width, defined as a distance512(0 and 16) from one point located proximate the left edge of the temple laterally across the face and through the eyes, to a point located proximate on the right edge of the temple. Still another of the plurality of exemplary distances is an eyebrow width, defined as a distance514(22 and 26) from one point located proximate the left edge of an eyebrow, laterally across the width of the eyebrow, to a point located in line with the right edge of the same eyebrow. Another of the plurality of exemplary distances is a chin width, defined as a distance516(7 and 9) from one point located proximate the left edge of the chin laterally across the chin, to a point located proximate on the right edge of the chin. Another of the plurality of exemplary distances is a lip width, defined as a distance518(48 and 64) from one point located proximate the left corner of the lips where the upper and the lower lip meet, across the mouth, to a point located proximate on the right corner of the lips. Another of the plurality of exemplary distances is a nose width, defined as a distance520(31 and 35) from one point located proximate the left edge of the left opening of the nose, to a point located proximate on the right edge of the right opening of the nose. In embodiments, additional and/or other combinations of horizontal distances between various anatomical points mapped laterally across the face are used to obtain the positional relationships.

Distances may also be measured vertically across the length of the face. One of the plurality of exemplary distances is a lip thickness, defined as a distance522(51 and 57) from one point located proximate the centre of a top edge of an upper lip, vertically across the mouth, to a point located proximate the centre of a bottom edge of a lower lip. Another one of the plurality of exemplary distances is a distance between nose and nose bridge, defined as a distance524(27 and 33) from one point located proximate the centre of the eyes where the top of a nose bridge is positioned, vertically across the nose, to a point located proximate the centre of the nose openings. Yet another of the plurality of exemplary distances is a distance between lip and nose bridge, defined as a distance526(27 and 66) from one point located proximate the centre of the eyes where the top of a nose bridge is positioned, vertically across the nose and the upper lip, to a point located proximate the center of the mouth. Still another of the plurality of exemplary distances is a distance between chin and nose bridge, defined as a distance528(27 and 8) from one point located proximate the centre of the eyes where the top of a nose bridge is positioned, vertically across the nose, the upper lip, and the mouth, to a point located proximate the centre of the chin. Another of the plurality of exemplary distances is an eye length, defined as a distance530(44 and 46) from one point located proximate the centre of a top of an eye, vertically across the eye, to a point located proximate the centre of a bottom of the eye. Another of the plurality of exemplary distances is an eyebrow height, defined as a distance532(24 and 44) from one point located proximate the centre of the eyebrow, vertically across the eye, to a point located proximate the centre of the eye under the eyebrow. Another of the plurality of exemplary distances is a jaw and nose bridge distance, defined as a distance534(27 and 3) from one point located proximate the centre of the nose bridge, vertically across the length of the cheek, to a point located proximate the jaw. In embodiments, additional and/or other combinations of vertical distances between various anatomical points mapped vertically across the face are used to obtain the positional relationships.

In embodiments, additional and/or other combinations of diagonal distances between various anatomical points mapped laterally across the face are used to obtain the positional relationships. An example is a distance536(22 and 26) between a point located proximate the left edge of one eyebrow to the right edge of the same eyebrow, which indicates the brow angle.

In embodiments, the system obtains positional relationships in image502by determining one or more proportions, based on the one or more of lateral, vertical, and diagonal distances. In an exemplary embodiment, face width508, measured between key points with IDs2and14, is used as a constant to determine proportions of other measured distances. For example, one of the plurality of proportions is derived by using distance510(3 and 13) as the numerator and face width508as the denominator. The exemplary proportion described here provides the positional relationship of the jaw with respect to the face. In an alternative embodiment, the system uses distance512between key points with ID0and with ID16, which may indicate the entire temple width of the facial image, as a whole unit in the denominator to subsequently calculate ratios on all the rest of the face. While other anatomical distances may be used as the denominator to calculate one or more proportions, temple width is the preferred distance because it tends to remain predictably static, even if people gain weight, lose weight, age, or undergo collagen or botox injections.

In embodiments, similar proportions are determined for the 3-D mesh model image504. As described above in relation to image502, the system obtains positional relationships in image504by determining one or more proportions, based on the one or more of lateral, vertical, and diagonal distances in relation to a standard anatomical distance, such as temple width.

Once both sets of proportions are obtained, the system uses proportions from both images502and504to calculate their ratio, in order to determine scale factors506. In an embodiment, scale factor506is the ratio of proportions or positional relationships of image502, to the corresponding proportions or positional relationships of image504. Image506illustrates exemplary scale factors derived using corresponding proportions from the image of the face of an individual502and the generic 3-D mesh model504.

In an embodiment, these measurements are communicated to a console of a computing and/or a mobile computing device, along with the newly generated texture map406.FIG. 6illustrates a generic 3-D mesh model602504,302to which the system applies the scale factors604,506, in order to obtain a modified 3-D mesh model606. In embodiments, the texture map406obtained previously is applied to modified 3-D mesh model606, to obtain a final 3-D model608of the original 2-D image. On the console side, the 3-D model's texture is swapped out with the newly created model606, and the proportion measurements are used to drive adjustments to mirror the individual's actual face structure. This, in effect, sculpts the mesh to more closely resemble the captured face.

FIG. 7is a flow chart illustrating an exemplary computer-implemented method to convert a 2-D image into a modified image that has increased dimensional depth (and, therefore, appears three dimensional to a user) for display, in accordance with embodiments of the present specification. At702, the system, according to various embodiments of the present specification, identifies key points on the 2-D image. In an embodiment, the system uses a plurality of programmatic instructions designed to graphically identify a plurality of key points, to identify at least 67 key points. Subsequently the system derives a texture map for the 2-D image. The system derives a texture map using the following steps. First, at704, the system identifies a plurality of non-overlapping, three-point regions based on the identified key points. In embodiments, a set of pre-defined landmark points of UV map of a generic 3-D mesh304are brought in coherence with the key points on the 2-D image206. Preferably, the landmark points of UV map of the generic 3-D mesh are generated automatically. At705, the system uses Delaunay triangulation to define the three-point regions, based on the identified key points. The three-point regions, which may also be termed as pieces of tri-faces, represent an average texture of the face within each region. Each vertex of the triangles are the UV coordinates (also known as texture coordinates), which are used to recreate the texture of the 2D image on a 3D model. At706, the system projects the triangulated 2-D image on UV map of the generic 3-D mesh model304.

At708, the system determines one or more positional relationships within the 2-D image. As described above, the positional relationships comprise a plurality of distances between anatomical features in the facial image, and ratios of those distances to a specific anatomical distance such as temple width, which are necessarily unique to the 2-D image. Similarly, the system determines one or more positional relationships within the generic 3-D mesh model of a face. As described above, the positional relationships comprise a plurality of proportions that are standard for a generic 3-D face model and comprise a plurality of distances between anatomical features in the 3-D face model, and ratios of those distances to a specific anatomical distance such as temple width, which define the generic 3-D face model.

At710, the system then uses proportions for the 2-D image and the generic 3-D image to determine a ratio, which may be termed as the ‘scale factor’. In one embodiment, each scale factor is calculated by taking a proportion for the 2-D image and dividing it by a proportion of the same anatomical features for the 3-D face model. In another embodiment, each scale factor is calculated by any function of a proportion for the 2-D image and a proportion of the same anatomical features for the 3-D face model. It should be appreciated that the aforementioned proportions, for either the 2-D image or 3-D face model, can be determined by taking a distance defining any of the following anatomical features and dividing it by a distance defining a temple width: a distance defining lip thickness, a distance between the nose and nose bridge, a distance between a lip and nose bridge, a distance between chin and nose bridge, a distance defining an eye length, a distance defining an eyebrow height, and a distance between a jaw and nose bridge distance.

The illustrations ofFIG. 1toFIG. 6demonstrate the method ofFIG. 7implemented on the image of a face. In the exemplary embodiment, proportions may be determined from measurements between at least two anatomical positions on the face. The anatomical positions may include anatomical points and/or regions on the face such as but not limited to the jaw, the nose bridge, the chin, the lip, the eyes, the eyebrow, and other anatomical regions on the face, as described above. At712, the system adjusts proportions of the 3-D mesh model that contains the texture created till step706. The proportions are adjusted on the basis of the scale factor determined at step710, in order to create the 3-D display for the original 2-D image.

Applications

FIG. 8is a flow chart illustrating an exemplary computer-implemented method to convert a 2-D image into a modified image that has increased dimensional depth (and, therefore, appears three dimensional to a user) for display as an avatar, in accordance with embodiments of the present specification. In embodiments, an avatar of an individual is recreated in 3-D virtual reality, augmented reality, or mixed reality environment, such as but not limited to gaming environments. In this case, a 2-D frontal face image of the individual is used to create a replica of at least the face in 3-D. The method is similar to the process described inFIG. 7.

At800, the system obtains an image of the individual from one of the sources including, but not limited to, an independent camera, a camera integrated with a mobile or any other computing device, or an image gallery accessible through a mobile or any other computing device.

At802, the system, according to various embodiments of the present specification, identifies key points on the 2-D image. In an embodiment, the system uses a plurality of programmatic instructions designed to graphically identify a plurality of key points, to identify at least 67 key points. Subsequently the system derives a texture map for the 2-D image. The system derives a texture map using the following steps. At804, the system identifies a plurality of non-overlapping, three-point regions based on the identified key points. The system uses Delaunay triangulation to define the three-point regions, based on the identified key points, as described above. At806, the system projects the triangulated 2-D image on UV map of the generic 3-D mesh model. At808, the system determines one or more positional relationships within the 2-D image. As described above, the positional relationships comprise a plurality of distances between anatomical features in the facial image, and ratios of those distances to a specific anatomical distance such as temple width, which are necessarily unique to the 2-D image. Similarly, the system determines one or more positional relationships within the generic 3-D mesh model of a face. As described above, the positional relationships comprise a plurality of proportions that are standard for a generic 3-D face model and comprise a plurality of distances between anatomical features in the 3-D face model, and ratios of those distances to a specific anatomical distance such as temple width, which define the generic 3-D face model.

At810, the system then uses proportions for the 2-D image and the corresponding proportions from the generic 3-D image to determine the scaling factors. At812, the system adjusts the 3-D model based on the determined scaling factors and at812, the system creates an avatar using the 3-D display of the face. The avatar may be used in various applications, such as gaming applications.FIG. 9illustrates a set of images that describe the conversion of a 2-D frontal face image902to a 3-D avatar906, in accordance with some embodiments of the present specification. An image904indicates the conversion of image902to a corresponding 3-D image (904). In embodiments, image904is further used by computer systems to create an avatar, for example avatar906.

FIG. 10is a flow chart illustrating an exemplary computer-implemented method to convert a 2-D image into a modified image that has increased dimensional depth (and, therefore, appears three dimensional to a user) for display using AR masks, in accordance with embodiments of the present specification. In embodiments, the masks are animated masks, or masks created from alternate images. In embodiments, a mask for an individual is recreated in 3-D virtual reality, augmented reality or mixed reality environments, such as but not limited to online, chatting and gaming environments. In embodiments, the mask is for the face of an individual and can track the individual's face in real time, thereby functioning with the changing facial expressions of the individual. In embodiments, the mask could be controlled by the individual's facial features. In this case, a 2-D frontal face image of the individual is used to create a replica of at least the face, in 3-D. At1000, the system obtains an image from a database of images. In embodiments, the selected image is an image of a face, an animated character, or a facial expression created using animation or special effects. In embodiments, the image is obtained from a database of images, a video, a memory local to the computing device, a memory remote from the computing device, or any other source of images. The image obtained at this step is subsequently used as a mask, for example an AR mask, which can be applied to the face of the individual (user). At1002, the system obtains an image of the individual (user) from one of the sources including, but not limited to, an independent video camera, or a video camera integrated with a mobile or any other computing device. In an embodiment, the image of the individual is obtained from a webcam. At1004, key points on the 2-D image obtained at step1000are identified. At1006, key points on the 2-D image of the individual (user) obtained at step1002, are identified.

At1008, the system modifies the key points on the 2-D image to be used as a mask, based on the key points identified for the 2-D image of the individual (user). The positioning of key points of the mask image are modified to match the positioning of key points of the individual's image. The modified image of the mask is then applied by the system on the image of the face of the individual. In another embodiment, the front-face image of the individual, comprised within the key points, are replaced by the modified mask-image.

The system is therefore capable of rapidly generating a masked image of a 2-D face. In this embodiment, AR masks are created for each consecutive frame, or each frame after a pre-defined number of frames, obtained from a video captured through a camera or taken from a video gallery. In an embodiment, the system uses a combination of programmatic instructions to identify frames from a video and use them to process according to the steps described above in context ofFIG. 10. In the embodiment, AR masks are created on a frame by frame basis, thereby allowing for the generation of a plurality of facial images, each corresponding to one of the frames. In various embodiments, the system can superimpose any other image, such as glasses, hats, crazy eyes, facial hair, on each image, thereby creating a video feed with AR.

FIG. 11illustrates multiple AR images1102,1104,1106,1108,1110,1112,1114,1116, and1118, created with the method outlined with reference toFIG. 10. Heart-shaped figures overlay eyes of a user, as seen in image1102. In an embodiment, the heart-shaped figures may shrink each time the user blinks, and expand when the user has open eyes. As shown in image1106, an AR image of a rainbow-like vomit falls out of the user's mouth each time the user opens the mouth. Images1114to1118illustrate face of the user that has been augmented by faces of different individuals. In embodiments, the faces of different individuals may be sourced from one or more digital databases.

FIG. 12is a flow chart illustrating an exemplary computer-implemented method to convert a 2-D image into a modified image that has increased dimensional depth (and, therefore, appears three dimensional to a user) for display of faceless gaming and other interactive display characters, which are controlled by real individuals. These may include users of Heads-up Displays (HUDs) and players involved in multiplayer games. In embodiments, 3-D images of the users/players are recreated with expressions and/or reactions within an interactive virtual reality, augmented reality, or mixed reality environment, such as but not limited to chatting and gaming environments. In embodiments, the expressions/reactions of the users/individuals/players are tracked in real time and thereby reflected through their corresponding 3-D images seen on a display. In this case, a 2-D frontal face image of the individual is used to create a replica of at least the face, in 3-D. At1200, the system obtains an image of the individual from one of the sources including, but not limited to, an independent video camera, or a video camera integrated with a mobile or any other computing device.

At1202, the system, according to various embodiments of the present specification, identifies key points on the 2-D image. In an embodiment, the system uses a plurality of programmatic instructions designed to graphically identify a plurality of key points, to identify at least 67 key points. Subsequently the system derives a texture map for the 2-D image. The system derives a texture map using the following steps. At1204, the system identifies a plurality of non-overlapping, three-point regions based on the identified key points. The system uses Delaunay triangulation1205to define the three-point regions, based on the identified key points, as described above. At1206, the system projects the triangulated 2-D image on UV map of the generic 3-D mesh model. At1208, the system determines one or more positional relationships within the 2-D image. As described above, the positional relationships comprise a plurality of distances between anatomical features in the facial image, and ratios of those distances to a specific anatomical distance such as temple width, which are necessarily unique to the 2-D image. Similarly, the system determines one or more positional relationships within the generic 3-D mesh model of a face. As described above, the positional relationships comprise a plurality of proportions that are standard for a generic 3-D face model and comprise a plurality of distances between anatomical features in the 3-D face model, and ratios of those distances to a specific anatomical distance such as temple width, which define the generic 3-D face model.

At1210, the system then uses proportions for the 2-D image and the corresponding proportions from the generic 3-D image to determine the scaling factors. At1212, the system adjusts the 3-D model based on the determined scaling factors and at1214, the system modifies 3-D display of the face based on expressions and/or reactions of the individual.

FIG. 13illustrates exemplary images1304and1306captured from gaming displays. Image1304shows expressions/reactions of a player created using an embodiment of the present specification, and displayed as image1302of an otherwise faceless First Person Shooter (FPS) character in game using HUD. In an alternative multiplayer gaming environment1306, expressions/reactions of four players in a game are each seen in images1308,1310,1312, and1314.

FIG. 14is a flow chart illustrating an exemplary computer-implemented method to convert a 2-D image in to a 3-D image for display of players in gaming environments and other interactive display characters, which are controlled by real individuals. In embodiments, 3-D images of the users/players are recreated with expressions and/or reactions within an interactive virtual reality, augmented reality, or mixed reality environment, such as but not limited to chatting and gaming environments. In embodiments, the expressions/reactions of the users/individuals/players are tracked in real time and thereby reflected through their corresponding 3-D images seen on a display. In this case, a 2-D frontal face image of the individual is used to create a replica of at least the face, in 3-D. At1400, the system obtains an image of the individual from one of the sources including, but not limited to, an independent camera, a camera integrated with a mobile or any other computing device, or an image gallery accessible through a mobile or any other computing device.

At1402, the system, according to various embodiments of the present specification, identifies key points on the 2-D image. In an embodiment, the system uses a plurality of programmatic instructions designed to graphically identify a plurality of key points, to identify at least 67 key points. Subsequently the system derives a texture map for the 2-D image. The system derives a texture map using the following steps. At1404, the system identifies a plurality of non-overlapping, three-point regions based on the identified key points. The system uses Delaunay triangulation1405to define the three-point regions, based on the identified key points, as described above. At1406, the system projects the triangulated 2-D image on UV map of the generic 3-D mesh model. At1408, the system determines one or more positional relationships within the 2-D image. As described above, the positional relationships comprise a plurality of distances between anatomical features in the facial image, and ratios of those distances to a specific anatomical distance such as temple width, which are necessarily unique to the 2-D image. Similarly, the system determines one or more positional relationships within the generic 3-D mesh model of a face. As described above, the positional relationships comprise a plurality of proportions that are standard for a generic 3-D face model and comprise a plurality of distances between anatomical features in the 3-D face model, and ratios of those distances to a specific anatomical distance such as temple width, which define the generic 3-D face model.

At1410, the system then uses proportions for the 2-D image and the corresponding proportions from the generic 3-D image to determine the scaling factors. At1412, the system adjusts the 3-D model based on the determined scaling factors and at1414, the 3-D display of the face is modified based on expressions and/or reactions and movements of the individual.

In an embodiment, the system is capable of rapidly generating a 3D image of a 2D face. In this embodiment, 3D images are created for each consecutive frame, or each frame after a pre-defined number of frames, obtained from a video captured through the camera. In an embodiment, the system uses a combination of programmatic instructions to identify frames from the video and use them to process according to the steps described above in context ofFIG. 14. In the embodiment, facial expressions of a user are recreated through their 3D images on a frame by frame basis, thereby allowing for the generation of a plurality of 3D facial expressions, each corresponding to one of the frames.

FIG. 15illustrates a still image1502from a display of a gaming application, where a 3-D image1504of a user (player) is seen in the display. In embodiments, the expressions/reactions and movements of image1504reflect the user's expressions/reactions and movements in real time.

FIG. 16is a flow chart illustrating an exemplary computer-implemented method to convert a 2-D image of an individual in to a 3-D image that may be used to print one or more personalized avatars of the individual. In this case, a 2-D frontal face image of the individual is used to create a replica of at least the face, in 3-D. The recreated face in 3-D may then be combined with an image of a body to create a personalized avatar which can be printed using 3-D printing methods. At1600, the system obtains an image of the individual from one of the sources including, but not limited to, an independent camera, a camera integrated with a mobile or any other computing device, or an image gallery accessible through a mobile or any other computing device.

At1602, the system, according to various embodiments of the present specification, identifies key points on the 2-D image. In an embodiment, the system uses a plurality of programmatic instructions designed to graphically identify a plurality of key points, to identify at least 67 key points. Subsequently the system derives a texture map for the 2-D image. The system derives a texture map using the following steps. At1604, the system identifies a plurality of non-overlapping, three-point regions based on the identified key points. The system uses Delaunay triangulation1605to define the three-point regions, based on the identified key points, as described above. At1606, the system projects the triangulated 2-D image on UV map of the generic 3-D mesh model. At1608, the system determines one or more positional relationships within the 2-D image. As described above, the positional relationships comprise a plurality of distances between anatomical features in the facial image, and ratios of those distances to a specific anatomical distance such as temple width, which are necessarily unique to the 2-D image. Similarly, the system determines one or more positional relationships within the generic 3-D mesh model of a face. As described above, the positional relationships comprise a plurality of proportions that are standard for a generic 3-D face model and comprise a plurality of distances between anatomical features in the 3-D face model, and ratios of those distances to a specific anatomical distance such as temple width, which define the generic 3-D face model.

At1610, the system then uses proportions for the 2-D image and the corresponding proportions from the generic 3-D image to determine the scaling factors. At1612, the system adjusts the 3-D model based on the determined scaling factors and at1614, the system prints a personalized avatar of the 3-D display of the face. In embodiments, printing is performed using 3-D printing methods.FIG. 17illustrates some examples of different (four) avatars of different users printed using embodiments of the process described in context ofFIG. 16.

FIG. 18is a flow chart illustrating an exemplary computer-implemented method to convert a 2-D image in to a 3-D image for driving key frame animation based on facial expressions of an individual. In embodiments, the expressions/reactions of the users/individuals/players are tracked in real time and thereby reflected through their corresponding 3-D images seen on a display, which in turn is used to drive key frame animation. In this case, a 2-D frontal face image of the individual is used to create a replica of at least the facial expressions and movements of an animated figure, in 3-D. In embodiments, the animated figure could be one of the 3D image of the face of the user, an object, or any other face created for the animation. At1800, the system obtains an image of the individual from one of the sources including, but not limited to, an independent video camera, a video camera integrated with a mobile or any other computing device, or a video gallery accessible through a mobile or any other computing device.

At1802, the system, according to various embodiments of the present specification, identifies key points on the 2-D image. In an embodiment, the system uses a plurality of programmatic instructions designed to graphically identify a plurality of key points, to identify at least 67 key points. Subsequently the system derives a texture map for the 2-D image. The system derives a texture map using the following steps. At1804, the system identifies a plurality of non-overlapping, three-point regions based on the identified key points. The system uses Delaunay triangulation1805to define the three-point regions, based on the identified key points, as described above. At1806, the system projects the triangulated 2-D image on UV map of the generic 3-D mesh model. At1808, the system determines one or more positional relationships within the 2-D image. As described above, the positional relationships comprise a plurality of distances between anatomical features in the facial image, and ratios of those distances to a specific anatomical distance such as temple width, which are necessarily unique to the 2-D image. Similarly, the system determines one or more positional relationships within the generic 3-D mesh model of a face. As described above, the positional relationships comprise a plurality of proportions that are standard for a generic 3-D face model and comprise a plurality of distances between anatomical features in the 3-D face model, and ratios of those distances to a specific anatomical distance such as temple width, which define the generic 3-D face model.

At1810, the system then uses proportions for the 2-D image and the corresponding proportions from the generic 3-D image to determine the scaling factors. At1812, the system adjusts the 3-D model based on the determined scaling factors and at1814, the system drives key frame animation using the adjusted 3-D display of the face.

In an embodiment, the system is capable of rapidly generating a 3D image of a 2D face. In this embodiment, key frame animations are created for each consecutive frame, or each frame after a pre-defined number of frames, obtained from the video captured through the camera or taken from the video gallery. In an embodiment, the system uses a combination of programmatic instructions to identify frames from the video and use them to process according to the steps described above in context ofFIG. 18. In the embodiment, key frame animations are created on a frame by frame basis, thereby allowing for the generation of a plurality of 3D facial images including expressions and movements of the user, each corresponding to one of the frames. In embodiments, additional animation effects are superimposed on the frames.

The above examples are merely illustrative of the many applications of the system of present invention. Although only a few embodiments of the present invention have been described herein, it should be understood that the present invention might be embodied in many other specific forms without departing from the spirit or scope of the invention. Therefore, the present examples and embodiments are to be considered as illustrative and not restrictive, and the invention may be modified within the scope of the appended claims.

Claims

A computer-implemented method for increasing a dimensional depth of a two-dimensional image to yield an augmented reality (AR) face mask, said method being implemented in a computer having a processor and a random access memory, wherein said processor is in data communication with a display and with a storage unit, the method comprising: acquiring from the storage unit the two-dimensional image;acquiring an image of a face of a person from a camera;using said computer and executing a plurality of programmatic instructions stored in the storage unit, identifying a first plurality of key points on the two-dimensional image;using said computer and executing a plurality of programmatic instructions stored in the storage unit, identifying a second plurality of key points on the two-dimensional image;using said computer and executing a plurality of programmatic instructions stored in the storage unit, generating a texture map of the two-dimensional image;using said computer and executing a plurality of programmatic instructions stored in the storage unit, projecting said texture map of the two-dimensional image onto the image of the face of the person;using said computer and executing a plurality of programmatic instructions stored in the storage unit, modifying the first plurality of key points based on the second plurality of key points;and using said computer, outputting the AR face mask image based on the modified first plurality of key points.

The computer-implemented method of claim 1 , wherein each of the first plurality or second plurality of key points include points representative of a plurality of anatomical locations on the face, wherein said anatomical locations include points located on the eyebrows, eyes, nose, and lips.
The computer-implemented method of claim 1 , wherein the texture map comprises a plurality of non-overlapping, triangular regions.
The computer-implemented method of claim 1 , further comprising, using said computer and executing a plurality of programmatic instructions stored in the storage unit, determining a first set of one or more proportions within the two-dimensional image, determining a second set of one or more proportions within the image of the face of the person, and determining a plurality of scaling factors, wherein each of said scaling factors is a function of one of said first set of one or more proportions and a corresponding one of said second set of one or more proportions and wherein each of said plurality of scaling factors is a ratio of one of said first set of one or more proportions to the corresponding one of said second set of one or more proportions.
The computer-implemented method of claim 1 , further comprising, using said computer and executing a plurality of programmatic instructions stored in the storage unit, determining a first set of one or more proportions within the two-dimensional image and determining a second set of one or more proportions within the image of the face of the person, wherein the determining the first set of one or more proportions within the two-dimensional image comprises determining proportions from measurements between at least two anatomical positions on the face of the person.
The computer-implemented method of claim 1 , further comprising, using said computer and executing a plurality of programmatic instructions stored in the storage unit, determining a first set of one or more proportions within the two-dimensional image and determining a second set of one or more proportions within the image of the face of the person, wherein the determining a first set of one or more proportions within the two-dimensional image comprises determining a first anatomical distance and dividing said first anatomical distance by a second anatomical distance.
The computer-implemented method of claim 6 , wherein the first anatomical distance is at least one of a lateral face width, a lateral jaw width, a lateral temple width, a lateral eyebrow width, a lateral chin width, a lateral lip width, and a lateral nose width and wherein the second anatomical distance is a distance between two temples of the face.
The computer-implemented method of claim 6 , wherein the first anatomical distance is at least one of a vertically defined lip thickness, a vertical distance between a nose and a nose bridge, a vertical distance between a lip and a nose bridge, a vertical distance between a chin and a nose bridge, a vertical eye length, and a vertical distance between a jaw and a nose bridge and wherein the second anatomical distance is at least one of a distance between two anatomical positions on said face and a distance between two temples of the face.
The computer-implemented method of claim 6 , wherein the first anatomical distance is a distance between two anatomical positions on said face and the second anatomical distance is a distance between a point located proximate a left edge of a left eyebrow of the face and a point located proximate a right edge of a right eyebrow of the face.
The computer-implemented method of claim 1 , wherein the determining a second set of one or more proportions within the image of the face of the person comprises determining a first anatomical distance and dividing said first anatomical distance by a second anatomical distance.
The computer-implemented method of claim 10 , wherein the first anatomical distance is at least one of a lip thickness, a distance between a nose and a nose bridge, a distance between a lip and a nose bridge, a distance between a chin and a nose bridge, an eye length, and a distance between a jaw and a nose bridge of the image of the face of the person and wherein the second anatomical distance is a distance between two anatomical positions on said image of the face of the person.
The computer-implemented method of claim 10 , wherein the first anatomical distance is a distance between two anatomical positions on said image of the face of the person and the second anatomical distance is a distance between a point located proximate a left edge of a left eyebrow of the image of the face of the person and a point located proximate a right edge of a right eyebrow of the image of the face of the person.
A computer readable non-transitory medium comprising a plurality of executable programmatic instructions wherein, when said plurality of executable programmatic instructions are executed by a processor, a process for increasing a dimensional depth of a two-dimensional image to yield an augmented reality (AR) face mask is performed, said plurality of executable programmatic instructions comprising: programmatic instructions, stored in said computer readable non-transitory medium, for acquiring from the storage unit the two-dimensional image;programmatic instructions, stored in said computer readable non-transitory medium, for identifying a first plurality of key points on the two-dimensional image;programmatic instructions, stored in said computer readable non-transitory medium, for identifying a second plurality of key points on the two-dimensional image;programmatic instructions, stored in said computer readable non-transitory medium, for generating a texture map of the two-dimensional image;programmatic instructions, stored in said computer readable non-transitory medium, for translating said texture map of the two-dimensional image onto an image of a face of a person acquired from a camera;programmatic instructions, stored in said computer readable non-transitory medium, for modifying the first plurality of key points based on the second plurality of key points;programmatic instructions, stored in said computer readable non-transitory medium, for outputting the AR face mask image based on the modified first plurality of key points.
The computer readable non-transitory medium of claim 13 , wherein each of the first plurality or second plurality of key points include points representative of a plurality of anatomical locations on the face, wherein said anatomical locations include points located on the eyebrows, eyes, nose, and lips.
The computer readable non-transitory medium of claim 13 , wherein the texture map comprises a plurality of non-overlapping, triangular regions.
The computer readable non-transitory medium of claim 13 , further comprising programmatic instructions, stored in said computer readable non-transitory medium, for determining a first set of one or more proportions within the two-dimensional image and determining a second set of one or more proportions within the image of the face of the person, wherein the determining the first set of one or more proportions within the two-dimensional image comprises determining proportions from measurements between at least two anatomical positions on the face of the person.
The computer readable non-transitory medium of claim 13 , further comprising programmatic instructions, stored in said computer readable non-transitory medium, for determining a first set of one or more proportions within the two-dimensional image, for determining a second set of one or more proportions within the image of the face of the person, and for determining a plurality of scaling factors, wherein each of said scaling factors is a function of one of said first set of one or more proportions and a corresponding one of said second set of one or more proportions and wherein each of said plurality of scaling factors is a ratio of one of said first set of one or more proportions to the corresponding one of said second set of one or more proportions.
The computer readable non-transitory medium of claim 13 , further comprising programmatic instructions, stored in said computer readable non-transitory medium, for determining a first set of one or more proportions within the two-dimensional image and determining a second set of one or more proportions within the image of the face of the person, wherein the determining a first set of one or more proportions within the two-dimensional image comprises determining a first anatomical distance and dividing said first anatomical distance by a second anatomical distance.
The computer readable non-transitory medium of claim 18 , wherein the first anatomical distance is at least one of a lateral face width, a lateral jaw width, a lateral temple width, a lateral eyebrow width, a lateral chin width, a lateral lip width, and a lateral nose width and wherein the second anatomical distance is a distance between two temples of the face.
The computer readable non-transitory medium of claim 18 , wherein the first anatomical distance is at least one of a vertically defined lip thickness, a vertical distance between a nose and a nose bridge, a vertical distance between a lip and a nose bridge, a vertical distance between a chin and a nose bridge, a vertical eye length, and a vertical distance between a jaw and a nose bridge and wherein the second anatomical distance is a distance between two temples of the face.
The computer readable non-transitory medium of claim 18 , wherein the first anatomical distance is a distance between two anatomical positions on said face and the second anatomical distance is a distance between a point located proximate a left edge of a left eyebrow of the face and a point located proximate a right edge of a right eyebrow of the face.
The computer readable non-transitory medium of claim 13 , wherein the determining a second set of one or more proportions within the image of the face of the person comprises determining a first anatomical distance and dividing said first anatomical distance by a second anatomical distance.
The computer readable non-transitory medium of claim 22 , wherein the first anatomical distance is at least one of a lip thickness, a distance between a nose and a nose bridge, a distance between a lip and a nose bridge, a distance between a chin and a nose bridge, an eye length and a distance between a jaw and a nose bridge of the image of the face of the person and wherein the second anatomical distance is a distance between two anatomical positions on said three-dimensional mesh image of the face of the person.
The computer readable non-transitory medium of claim 22 , wherein the first anatomical distance is a distance between two anatomical positions on said image of the face of the person and the second anatomical distance is a distance between a point located proximate a left edge of a left eyebrow of the image of the face of the person and a point located proximate a right edge of a right eyebrow of the image of the face of the person.

More Claims Show Fewer Claims

Disclaimer: Data collected from the USPTO and may be malformed, incomplete, and/or otherwise inaccurate.