U.S. Pat. No. 11,496,814

METHOD, SYSTEM AND COMPUTER PROGRAM PRODUCT FOR OBTAINING AND DISPLAYING SUPPLEMENTAL DATA ABOUT A DISPLAYED MOVIE, SHOW, EVENT OR VIDEO GAME

AssigneeFlick Intelligence, LLC

Issue DateApril 10, 2020

Illustrative Figure

Abstract

Method, system and computer program product for providing additional information to a handheld device (HHD) about a displayed point of interest in video programming displayed on a multimedia display. A image of the video programming captured by a HHD camera can be used at a remote server to identify the video programming by matching it with archived programming. If identified, additional information related to the video programming can be obtained/provided. A region within a particular frame of displayed video programming can be selected at the HHD to access additional information about a point of interest associated with the region. The additional information can be displayed on the HHD or a secondary display, in response to selecting the region to access the additional information from a remote server.

Description

DETAILED DESCRIPTION The particular values and configurations discussed in these non-limiting examples can be varied and are cited merely to illustrate at least one embodiment and are not intended to limit the scope thereof. The embodiments now will be described more fully hereinafter with reference to the accompanying drawings, in which illustrative are shown. The embodiments disclosed herein can be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete and will fully convey the scope of the invention to those skilled in the art. Like numbers refer to like elements throughout. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosed embodiments. As used herein, the singular forms “a”, “an”, and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which disclosed embodiments belong. It will be further understood that terms such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art ...

DETAILED DESCRIPTION

The particular values and configurations discussed in these non-limiting examples can be varied and are cited merely to illustrate at least one embodiment and are not intended to limit the scope thereof.

The embodiments now will be described more fully hereinafter with reference to the accompanying drawings, in which illustrative are shown. The embodiments disclosed herein can be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete and will fully convey the scope of the invention to those skilled in the art. Like numbers refer to like elements throughout. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosed embodiments. As used herein, the singular forms “a”, “an”, and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which disclosed embodiments belong. It will be further understood that terms such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

As will be appreciated by one skilled in the art, the present invention can be embodied as a method, system, and/or a processor-readable medium. Accordingly, the embodiments may take the form of an entire hardware application, an entire software embodiment or an embodiment combining software and hardware aspects all generally referred to herein as a “circuit” or “module.” Furthermore, the embodiments may take the form of a computer program product on a computer-usable storage medium having computer-usable program code embodied in the medium. Any suitable computer-readable medium or processor-readable medium may be utilized including, for example, but not limited to, hard disks, USB Flash Drives, DVDs, CD-ROMs, optical storage devices, magnetic storage devices, etc.

Computer program code for carrying out operations of the disclosed embodiments may be written in an object oriented programming language (e.g., Java, C++, etc.). The computer program code, however, for carrying out operations of the disclosed embodiments may also be written in conventional procedural programming languages such as the “C” programming language, HTML, XML, etc., or in a visually oriented programming environment such as, for example, Visual Basic.

The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer. In the latter scenario, the remote computer may be connected to a user's computer through a local area network (LAN) or a wide area network (WAN), wireless data network e.g., WiFi, Wimax, 802.xx, and cellular network or the connection may be made to an external computer via most third party supported networks (for example, through the Internet using an Internet Service Provider).

The disclosed embodiments are described in part below with reference to flowchart illustrations and/or block diagrams of methods, systems, computer program products, and data structures according to embodiments of the invention. It will be understood that each block of the illustrations, and combinations of blocks, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function/act specified in the block or blocks.

The computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions/acts specified in the block or blocks.

Note that the instructions described herein such as, for example, the operations/instructions and steps discussed herein, and any other processes described herein can be implemented in the context of hardware and/or software. In the context of software, such operations/instructions of the methods described herein can be implemented as, for example, computer-executable instructions such as program modules being executed by a single computer or a group of computers or other processors and processing devices. In most instances, a “module” constitutes a software application.

Generally, program modules include, but are not limited to, routines, subroutines, software applications, programs, objects, components, data structures, etc., that perform particular tasks or implement particular abstract data types and instructions. Moreover, those skilled in the art will appreciate that the disclosed method and system may be practiced with other computer system configurations such as, for example, hand-held devices, multi-processor systems, data networks, microprocessor-based or programmable consumer electronics, networked PCs, minicomputers, tablet computers (e.g., iPad and other “Pad” computing device), remote control devices, wireless hand held devices, Smartphones, mainframe computers, servers, and the like.

Note that the term module as utilized herein may refer to a collection of routines and data structures that perform a particular task or implements a particular abstract data type. Modules may be composed of two parts: an interface, which lists the constants, data types, variable, and routines that can be accessed by other modules or routines, and an implementation, which is typically private (accessible only to that module) and which includes source code that actually implements the routines in the module. The term module may also simply refer to an application such as a computer program designed to assist in the performance of a specific task such as word processing, accounting, inventory management, etc. Additionally, the term “module” can also refer in some instances to a hardware component such as a computer chip or other hardware.

It will be understood that the circuits and other means supported by each block and combinations of blocks can be implemented by special purpose hardware, software or firmware operating on special or general-purpose data processors, or combinations thereof. It should also be noted that, in some alternative implementations, the operations noted in the blocks can occur out of the order noted in the figures. For example, two blocks shown in succession may in fact be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, or the varying embodiments described herein can be combined with one another or portions of such embodiments can be combined with portions of other embodiments in another embodiment.

FIG. 1illustrates a high level diagram of portions of a system100including a wireless Hand Held Device (HHD)120, located visually near a multimedia display screen, which can capture images of video programming being presented on the display screen and/or be implemented as an active pointer, in accordance with the disclosed embodiments. Note that the HHD120may be, for example, a wireless hand held device such as, for example, a Smartphone, cellular telephone, a remote control device such as a video game control device, a television/set-top box remote control, a table computing device (e.g., “iPad”) and so forth. An active pointer is one that emits a signal. Remote controls emitting an ultraviolet signal, a flashlight, or a laser pointer are examples of an active pointer. Thus an active pointing application can be incorporated into an HHD such as HHD120and can also communicate via wireless bidirectional communications with other devices, such as, for example the multimedia display121shown inFIG. 1. Such a multimedia display can be, for example, a flat panel television or monitor or other display screen (e.g., a movie theater display screen), which in turn can be, in some scenarios, associated with a controller or set-top box, etc., or can integrate such features therein. As shown inFIG. 1a user122can hold the HHD120, which in turn can emit and receive wireless signals. The multimedia display device121(i.e., display device) generally includes a multimedia display area or multimedia display126. The HHD120can also include an integrated digital camera and wireless data communications, which are features common in, for example, smartphone devices and tablets that are in wide use today. Configured in this manner, the HHD can capture an image (e.g. photograph) of video programming as it is rendering on the multimedia display126.

When connecting and engaging in bidirectional communication with the multimedia display, information can be embedded in the emitted signal such as identification information that identifies the pointer or HHD120or that identifies the user122. The user122can be recognized by the HHD120or pointer in a variety of ways such as by requiring a passcode tapped out on one or more buttons, a unique gesture detected by accelerometers, or biometric data such as a fingerprint, retinal pattern, or one of the body's naturally occurring and easily measurable signals such as heartbeat that has been shown to often contain unique traits. One of the datums often encoded in the emitted signal is the state of a selection button. One very simple case would involve the use of a laser pointer as the HHD120that is simply on or off. An advantage of embedding a unique identifier in the emitted signal is that multiple pointers can be used simultaneously.

A pointer detector114can read and track the emitted signal. The pointer detector114typically has multiple sensors and can determine where the pointer or HHD120is pointing. In many cases the pointer detector114and the display126being presented on a multimedia display such as multimedia display126are separate units that can be located near one another. A calibration routine can be utilized so that the pointer detector114can accurately determine where on the display126the pointer is aimed. The output of the pointer detector114can include the pointer's aim point and the status of the selection button or other selection actuator.

The multimedia display device121and hence the multimedia display126present video data such as a movie, television show, or sporting event to the user122. The video data can be annotated by associating it with annotation selection data. The annotation selection data110specifies the times and screen locations at which certain scene elements, each having an element identifier116, are displayed. The annotation selection data110can be included with the video data as a data channel similar to the red, green, blue channels present in some video formats, can be encoded into blank spaces in an NTSC type signal, can be separately transmitted as a premium service, can be downloaded and stored, or another technique by which the user has access to both the video data and the annotation data can be used.

It can be an important aspect that the annotation data and the video data be time synchronized because selectable scene elements can move around on the display. Thus, synchronization data106can be utilized to ensure that the video data and annotation selection data110are in sync. One example of synchronization data is the elapsed time from the video start until the current frame is displayed.

The annotation data can specify selectable zones. A simple image can have selectable areas. A video, however, is a timed sequence of images. As such, a selectable zone is a combination of selectable areas and the time periods during which the areas are selectable. A selectable zone has coordinates in both space and time and can be specified as a series of discrete coordinates, a series of temporal extents (time periods) and areal extents (bounding polygons or closed curves), as areal extents that change as a function of time, etc. Areal extents can change as a function of time when a selectable area moves on the display, changes size during a scene, or changes shape during a scene.

For brevity, the terms “selectable zone” and “cursor coordinate” will be used in this disclosure as having both spatial and temporal coordinates. As discussed above, a selectable zone can be specified in terms of on-screen extents and time periods. Similarly, a cursor coordinate combines the aim point and a media time stamp. Annotated scene elements can be specified as selectable zones associated with one or more element identifiers, such as, for example, element identifier116.

U.S. Pub No.: 20080066129 A1 submitted by Katcher et al., titled: “Method and Apparatus for Interaction with Hyperlinks in a Television Broadcast” was filed Nov. 8, 2007 and is herein incorporated by reference in its entirety. U.S. Pub No.: 20080066129 A1 teaches and discloses the annotation of video, video display systems, video data systems, databases, and viewer interaction with annotated video to obtain information about on-screen items. It is for its teachings of video, video display systems, video data systems, databases, viewer interaction with annotated video, and viewer obtainment of information about on-screen items that U.S. Pub No.: 20080066129 A1 is herein incorporated by reference in its entirety.

U.S. Pub No.: 20100154007 A1 submitted by Touboul et al., titled: “Embedded Video Advertising Method and System” was filed Apr. 21, 2009 and is herein incorporated by reference in its entirety. U.S. Pub No.: 20100154007 A1 teaches and discloses the annotation of video, video display systems, video data systems, databases, and viewer interaction with annotated video to obtain information about on-screen items. It is for its teachings of video, video display systems, video data systems, databases, viewer interaction with annotated video, and viewer obtainment of information about on-screen items that U.S. Pub No.: 20100154007 A1 is herein incorporated by reference in its entirety.

As also shown inFIG. 1, a cursor control module108can change the appearance of a displayed cursor124based on the cursor coordinate. The aim point can be passed to the cursor control module108that also receives synchronization data. The synchronization data can indicate the displayed scene, elapsed time within a scene, elapsed time from video start, the media time stamp, or other data that helps determine the media time stamp. The cursor coordinate can be determined from the aim point and the media time stamp. The cursor control module108can examine the annotation selection data for the cursor coordinate to determine if the pointer is aimed at an annotated scene element. In terms of selectable zones, the cursor coordinate lies within a selectable zone or it doesn't. If it does, the cursor control module108can cause a “selectable” cursor to be displayed at the aim point. Otherwise, a different cursor can be displayed.

The cursor control module108can communicate with a media device104, which in turn can communicate with the display device121. The media device104also receives data from the annotated video data102and then sends data as, for example, synchronization data106to the cursor control module108. The pointer detector114can transmit, for example, pointer aim point and/or selection aimpoint information112to the cursor control module108. Additionally annotation selection data110can be provided to the cursor control module108and also to one or more element identifiers, such as, for example, the element identifier116, which in turn can send information/data to a video annotation infrastructure.

Rather than, or in addition to, engaging in bi-directional communication with a multimedia display to retrieve data about a scene or determine the identity of the scene, the HHD120can capture an image of the scene (e.g., video programming) as it is being rendered (displayed) on the display device121. The image can be captured with a digital camera commonly found integrated in HHDs. The captured image can then be used by the HHD to determine the identity of the video programming and obtain addition information about it by communicating with a remote server over a data communication network using wireless communication capabilities also commonly found in HHDs. Additional teachings of video capture, programming identification, and access to related data are further discussed with respect toFIG. 4below.

FIG. 2illustrates two cursors125and127being displayed over selectable and non-selectable screen areas, in accordance with the disclosed embodiments. The screen area occupied by the “Bletch Cola” image123is within a selectable zone and a solid ring cursor127is shown. Other areas are not inside selectable zones and an empty ring (i.e., cursor125) is shown. Any easily distinguished set of cursors can be utilized because the reason for changing the cursor is to alert a user, such as user122that the screen element is selectable.

Returning toFIG. 1, aiming at a selectable scene element can result in a “selectable” cursor being displayed. The selectable element can be selected by actuating a trigger or button on the pointing device. Other possibilities include letting the cursor linger over the selectable element, awaiting a “select this” query to pop up and pointing at that, or performing a gesture with the pointing device/HHD120.

The annotation data can include additional data associated with the selectable areas or selectable zones such as element identifiers, cursor identifiers, cursor bitmaps, and zone executable code. The zone executable code can be triggered when the cursor enters a selectable zone, leaves a selectable zone, lingers in a selectable zone, or when a selection type event occurs when the cursor coordinate is inside a selectable zone. The zone executable code can augment or replace default code that would otherwise be executed when the aforementioned events occur. For example, the default code for a selection event can cause an element identifier to be embedded in an element query that is then sent to a portion of the video annotation infrastructure. The zone executable code can augment the default code by sending tracking information to a market research firm, playing an audible sound, or some other action. In general, the additional data allows for customization based on the specific selectable zone.

FIG. 3illustrates a system300that includes an HHD120and a multimedia display126, in accordance with the disclosed embodiments. System300ofFIG. 3is similar to system100ofFIG. 1with some notable exceptions. System100ofFIG. 1includes “annotated video data” (i.e., annotated video data102) wherein the annotation selection data110is distributed with the video data. In system300ofFIG. 3, the annotation selection data110is obtained from some other source. As inFIG. 1, however, the media device104still provides the synchronization data106used for determining the time datum for the cursor coordinate. Another difference is that a passive pointing device such as a hand can be utilized. A device such as, for example, the Microsoft Kinect® can determine an aim point, selection gesture, and other actions by analyzing the user's stance, posture, body position, or movements. Of course, HHD120may be, as indicated earlier, a Smartphone, game control device, tablet or pad computing device, and so forth. It can be appreciated, for example, the HHD120and the pointer detector113may be, for example, the same device, or associated with one another.

FIG. 3illustrates an element identification module115examining the cursor coordinate and annotation data to determine an element identifier116. The element identifier and some user preferences140can be combined to form a scene element query142. The user preferences can include the user's privacy requests, location data, data from a previous query, or other information. Note that the privacy request can be governed by the law. For example, the US has statutes governing children's privacy and many nations have statutes governing the use and retention of anyone's private data.

The user's element query can pass to a query router144. The query router144can send the query to a number of servers based on the element identifier116(e.g., element ID), the user's preferences, etc. Examples include: local servers that provide information tailored to the user's locale; product servers that provide information directly from manufacturers, distributors, or marketers, ad servers wherein specific queries are routed to certain servers in return for remuneration, and social network servers that share the query or indicated interest amongst the user's friends or a select subset of those friends. The query router144can route data to, for example, one or more scene element databases148, which in turn provide information to an information display module150. The information display module150can assemble the query responses and displays them to a user, such as user122.

Note thatFIG. 3also illustrates a secondary multimedia display129on which query results can be presented. On response is an advertisement, another is an offer for a coupon, and a third is an offer for online purchase. The secondary display129can be, for example, a cell phone, computer, tablet, television, or other web-connected device.

FIG. 4illustrates a system400that includes a user122accessing video information, video annotation infrastructure and/or video programming identification with a HHD120having a video camera and adapted to capture a photographic image of video programming being displayed on the display screen121, and/or alternatively video annotation infrastructure through an augmented reality device161, in accordance with the disclosed embodiments. Here, the augmented reality (AR) device161can include a local display that the user122can observe, a video camera, and other devices for additional functionality. One example is that the display is a movie screen and the user122is one of many people watching a movie in a theater. The multimedia display device121in this example can include a movie screen and projector. The movie screen can include markers that help the AR device161determine the screen location and distance. The user122can watch the movie through the AR device161with the AR device161introducing a cursor into the user's view of the movie.

In the particular embodiment or example shown inFIG. 4, the AR device161can be one having a forward pointing camera and a display presenting what is in front of the AR device161to the user122. In essence, users can view the world through their own eyes or can view an augmented version by viewing the world through the device. As seen inFIG. 4, the user122is aiming the camera at the display and is, oddly, watching the display on a secondary display129almost as if the AR device161is a transparent window with the exception that the secondary display129can also present annotation data, can zoom in or out, and can overlay various other visual elements within the user's view.

The AR device161can thus include a number of modules and features, such as, for example, a scene alignment module160, pointer aimpoint and/or selection aimpoint162, and a cursor control module164. Other possible modules and or features can include a synchronization module166, and an element identification module170, along with an annotation data receiver172, annotation selection data174, and a user authentication module178along with user preferences180, element identifier176, user's scene element query182, a query router184, a local scene element database186, and an information overlay module188. The display129can be, in some examples, a local display. The information overlay module can also access, for example, data from a remote scene element database190.

In the movie theater scenario discussed above, the theater can transmit synchronization data168, for example, as a service to its customers, perhaps as a paid service or premium service. The AR device161can include a synchronization module166that obtains synchronization data168from some other source, perhaps by downloading video fingerprinting data or by processing signals embedded within the movie itself.

The scene alignment module160can locate the display position from one or more of the markers153,155, and157, by simply observing the screen edges, or some other means. The display position can be used to map points on the local display, such as the cursor, to points on the main display (movie screen). Combining the cursor position and the synchronization data gives the cursor coordinate. Annotation selection data can be downloaded into the AR device161as the movie displays, ahead of time, or on-demand. Examining the annotation selection data for the cursor coordinate can yield the element identifier176.

The user authentication module178is depicted within the AR device161. In practice, and as discussed in relation to system100ofFIG. 1, any of the disclosed pointing scenarios can include the use of an authentication module such as user authentication module178. In some embodiments, the passive pointer113ofFIG. 2, for example, can include authentication by recognizing the user's face, movements, retina, or pre-selected authentication gesture. User authentication is useful because it provides for tying a particular person to a particular purchase, query, or action.

The query router144can send queries to various data servers perhaps including a local data store within the AR device161themselves. The query results are gathered and passed to an overlay module for presentment to the user122via the HHD120and/or via the multimedia display126and/or the multimedia display129. The local display129on the AR device161shows the cursor127and the query results overlaid on top of the displayed image of the movie screen. Here, a movie can be used as an example whereas the display the user is viewing through the AR device161is can be any display such as a television in a home, restaurant, or other social venue.

An HHD120including a camera, such as a smartphone or tablet, can be used instead of an augmented reality device (although the AR device features and functions could be carried out by a smartphone) to capture an image of the scene being displayed on the display device121for the purpose of identifying the scene and obtaining additional information related to the scene, such as elements of the scene described above. Rather than integrated or communicating wirelessly with the local infrastructure to accomplish identification and data retrieval, however, the HHD120can communicate with a remote server over a data communication network, provide the captured image of the scene (video programming displayed on the multimedia display121) where the scene can be identified by matching the captured image to images of video programming stored in a database, e.g., remote scene element database190, associated with the remote server. The sever can then notify the HHD120of the availability of data related to the scene (e.g., information regarding scene elements) If the server can match the scene and identify it. Scene elements can be selected on a touch-sensitive display screen commonly included with HHDs by selection of the area or element of interest within the scene, which can then enable the server to provide additional information about the selected area/scene element. The sever and database can be provided in the form of a service where registered users can determine the identity of a scene (e.g., a captured image of live or recorded video programming) being displayed on any screen utilizing a captured image to match, identity and provide data about or related to the scene. Databases would require continuous updating in the event programs are live. Live scenes (e.g., live football or baseball games) would not likely be tagged or have limited scene selection capabilities, where recorded programs (e.g., movies) would be updated as scene tagging is updated. In either case, additional data can be collected and provided to users. Additionally, advertising content can be provided before, during or after the provision of scene-related data; thereby enabling revenue generation should the service be free to end users. An application (“App”) supporting this aspect of the embodiments can be downloaded from a server supporting the HHD (e.g.,10S, Android stores).

FIG. 5illustrates an annotated group video500of a user's social contacts, in accordance with the disclosed embodiments. Automated facial recognition applications already exist for annotating still images; this video version combines the selectable areas of single images into the selectable zones of annotated video. In this application, selection of a scene element equates to selecting a person. The selection event can trigger bringing up the persons contact information, personal notes, and can even initiate contact through telephone, video chat, text message, or some other means. Additionally, lingering the cursor (or a sensed finger over a touch sensitive screen) over a face can bring up that person's information.

An interesting variation is that crowd videos or images can be artificially created. For example, an image of a friend can be isolated within a picture or video to create content with only that one person. The content from a set of friends or contacts can be combined to form a virtual crowd scene. This, combined with today's pinch-to-zoom type technology for touch screens yields an interesting interface to a contacts database. Selecting a person with the cursor or by tapping a touch screen overlying the display can automatically open a video chat window such as the “Vikram” window that can have Vikram's image and an indication that Vikram hasn't responded. Examples of a “not-connected” or “pending” indicator can include a dimmed or greyed window, a thin border frame, and an overlaid pattern such as the “connection pending text” shown or a connection pending icon. An established connection can be indicated by a different means such as the thick border frame shown inFIG. 5.

FIG. 6illustrates a system600, which includes differences between cursor control and element identification, in accordance with the disclosed embodiments. System600includes, for example, a pointer detection module197which can send data to cursor control module108, which in turn can provide data192indicative of cursor style and location, which can be provided to, for example, a display device such as display device121or even secondary displays such as multimedia display129. Annotation selection data110can also be provided to the element identification module115. As shown inFIG. 6, the pointer detection module197can determine a cursor coordinate. As already discussed, a cursor coordinate can include both a location on a display screen and a time stamp for synchronizing the time and position of the cursor with the time varying video image. The annotation data196can include selection zones such that the cursor control module determines if the cursor coordinates lies within a selection zone and determines the cursor style and location to present on the display device. Note that this example assumes that only default “selectable” and “not-selectable” styles are available.

Element identification requires slightly more information and the figure indicates this by show “annotation selection data” that can be the selection data augmented with element identifiers associated with each selectable zone. The element identification module receives data from a selection event that includes the time stamp and the cursor display location at the moment the selection was made. Recall that a selection can be made with an action as simple as clicking a button on a pointing device. The element identification module115can examine the annotation data196to determine which, if any, selectable zone was chosen and then determines the element Id associated with that selection zone. The element ID can then be employed for formulating a scene element query194.

Note that the examples do not account for differently sized displays with different resolutions. Different display sizes can be compensated for by mapping all displays to a normalized size, such as a 1.0 by 1.0 square, and similarly maintaining the selection zones in that normalized coordinate space. The aim point can be translated into normalized space to determine if the cursor location lies within a selectable zone. For example, a screen can be 800 pixels wide and 600 pixels tall. The pixel at screen location (80, 120) would have normalized location (80/800, 120/600)=(0.1, 0.2). This normalization example is intended to be non-limiting because a wide variety of normalization mappings and schemes are functionally equivalent.

Also note that the examples do not account for windowed display environments wherein a graphical user interface can present a number of display windows on a display device. Windowed display environments generally include directives for mapping a cursor location on a display into a cursor location in a display window. If a video is played within a display window then the cursor location or aim point must be mapped into “window coordinates” before the cursor coordinate can be determined. Furthermore, different window sizes can be compensated for by mapping all displays to a normalized size as discussed above and similarly maintaining the selection zones in that normalized coordinate space.

FIG. 7illustrates a system700that depicts additional processes and methods of cursor selection, in accordance with the disclosed embodiments. As indicated at block202inFIG. 7, annotation data can include, for example, a cursor image, template, and/or executable code data. Such information can be provided to a pointer detection module204, which in turn can generate cursor style data212, including, for example, cursor style and location information210, which in turn is transmitted to a display device, such as, for example, multimedia display device121and/or other multimedia display devices. The pointer detection module204also provides for cursor position data and/or selection data, as shown at block206. Such data can be provided to, for example, a cursor style selection module214, which can communicate with and retrieve data from, for example, a cursor style database218. Such a database218may be, for example, a remote database retrieved over, for example, a data network such as the Internet. The data shown at block206can also be provided to an element identification module208, which in turn can generate a scene element query. Note thatFIG. 7is about having different on screen cursors based on what selectable element is under the cursor. Content developers can choose, for example, the cursor and an updateable database allows selection to be altered over time.

Cursor styles and actions can be dependent of the selectable or non-selectable zone containing the cursor. A cursor style is the cursors appearance. Most users are currently familiar with cursor styles such as arrows, insertion points, hands with a pointing finger, Xes, crossed lines, dots, bulls eyes, and others. An example of an insertion point is the blinking vertical bar often used within text editors. The blinking insertion point can be created in a variety of ways including an animated GIF, or executable computer code. An animated GIF is a series of images that are displayed sequentially one after the other in a continuous loop. The blinking cursor example can be made with two pictures, one with a vertical bar and the next without. Another property of cursors is that they often have transparent pixels as well as displayed pixels. That is why users can often see part of what is displayed under the cursor. The insertion point can alternatively be created with a small piece of executable code that draws the vertical bar and then erases it in a continuous loop.

The specific cursor to be displayed within a selectable zone can be stored within the annotation data, can be stored within a media device or display device, or can be obtained from a remote server. In one example, the cursor enters a selectable zone and thereby triggers a cursor style selection module to query a cursor style database to obtain the cursor to be displayed. In another example, the annotation data can contains a directive that the cursor style can be obtained from a local database. The local database can return a directive that that the cursor can be obtained from a remote database. The chain of requests can continue until eventually an actual displayable cursor is returned or until an error occurs such as a bad database address, a timeout, or another error. On error, a default cursor can be used. Cursor styles that are successfully fetched can be locally cached.

Note that the infrastructure for obtaining a cursor style has many components that are identical or similar to those for obtaining element data. In many systems, moving the cursor into a selectable area results in a “cursor entry” event that triggers the various actions required for changing the cursor appearance. A similar event, “cursor exited” can be triggered when the cursor leaves the selectable zone. Notice that with a simple image display these events only occur when the cursor moves into and out of a selectable area. Video data, having a time axis as well as the other axes, can have selectable zones that appear, move, and disappear as the video is displayed. As such, the cursor entered and cursor exited events can occur without cursor movement.

Another event is the “hover event” that can occurs when the cursor remains within a selectable zone for a set period of time. A hover event can cause a change within or near the selectable zone. One example is that a small descriptive text box appears.

FIG. 8illustrates a graphical view of a system800that includes a multimedia display126with an annotated element220, in accordance with the disclosed embodiments.FIG. 8thus depicts a change within the selectable area wherein a finger is detected within the selectable zone associated with a can of Bletch Cola. The video can be streamed to the user's display device or the user can be using a device such as that shown inFIG. 4. At first, an event such as the cursor entered event is triggered and as a result the Bletch Cola is highlighted as shown in the figure. If the finger remains within the zone then the “Get Some” text box can be displayed in response to a “hover” type event that is triggered when the finger remains with the selectable zone for more than a preset time period. A selectable zone can be highlighted in a variety of way such as by brightening all the individual pixels (e.g. increasing the RGB color values), brightening pixels having certain values or properties, tinting pixels, or passing the pixels through an image filter. Image processing software such as Photoshop and the GIMP has a wide variety of image filters that can be applied to standard images. These filters can be applied to video data by applying them to each video frame. Furthermore, the selectable zone can define a filter mask for each frame so that the image filtering operations appear within the selectable zone and can even appear to move with the selectable zone.

FIG. 9illustrates a system810utilizing annotated video with a setting have multiple independent users viewing a single display such as a movie screen, in accordance with the disclosed embodiments. System810generally, includes for example, a movie theater infrastructure240and a movie screen display242. Movie video data246and/or movie annotation data248, and/or movie permission data250(e.g., can be limited to certain geographic areas and to certain time periods in some embodiments) can be provided to, for example, an HHD120, which may be, for example, a Smartphone, a user's AR device, a pad computing device, etc. Coded ticket stubs (e.g., QR code) or an electronic ticket can have additional permissions data, as indicated at block254, which also may be provided to the HHD120.

An important aspect of this embodiment is that users do not interfere with each other's enjoyment of the movie unless invited to do so. The venue can display video on a large display while also providing movie annotation data to people watching the video through augmented reality devices as discussed above in reference toFIG. 3. The venue can also stream video data or annotated video data so that a user with a cell phone type device can receive view the large display or can view an augmented but smaller version on the cell phone. One of the issues that arise whenever users have camera within any venue is piracy. For example, people can use augmented reality devices or cell phones to record a movie as they watch it.

Transmitting permission data to the user's device so that the user can record the movie and/or watch the recording only when located within a certain space or within a certain time period can control recording. Examples include allowing the recording to be viewed only within the theater or on the same day as the user viewed a movie. GPS or cell tower triangulation type techniques are often used to determine location. The user's ticket to the venue can contain further permission information so that only paying customers can make recordings. Furthermore, the user can purchase or otherwise obtain additional permissions to expand the zones or times for permissibly viewing the recording.

FIG. 10illustrates a system820similar to that of system810ofFIG. 9, but adapted for sports venues, music venues, and theatrical events, and other events (e.g., political conventions, trade shows, etc.) in accordance with the disclosed embodiments. System810can include, for example, an annotation infrastructure260, a sports venue infrastructure262, one or more video cameras or other video recording devices and systems264with respect to a field of play shown at block266. Video data268, annotation data270and permissions data272(can be limited to certain geographic areas and to certain time periods) can be transmitted from a sports venue infrastructure262to, for example, an HHD120(e.g., user's Smartphone, pad computing device, AR device, etc.).

In some embodiments, the venue displays the action on a big screen and the user records that. In other embodiments, the venue streams camera data from numerous cameras and the user chooses one or more camera view. In yet other embodiments, users can point their own cameras (such as the AR device's front facing camera) directly at the field of play. Live venues can annotate video data in near real time to thereby provide users with a full Flicklntel experience.

FIG. 11illustrates a system830, in accordance with the disclosed embodiments. System830includes a number of potential features and modules, such as, for example, finger printing, as indicated at block280. Video data120can be provided to a finger pointing module280and/or a multimedia display device such as, for example, display device121. A data acquisition device290(e.g., microphone, camera, etc.) can receive data from display121and be subject to a synchronization data request as depicted at block288. An identification service282can, for example, respond to the synchronization data request288and provide data284(e.g., media ID, frame/scene estimate, synchronization refinement data).

A synchronization module286can receive data284and data indicative of a synchronization data request and can transmits synchronization data298, which in turn provides selection data300and user language selection310. A user request302based on selection data300can be transmitted to an element identification service304, which in turn generates element information and other data308(e.g., element app, site, page, profile or tag), which can provided via messaging318(e.g., user email or other messaging). The element identification service304also provides for user language selection310, which in turn can be provided to a language translation module292, which in turn can provide for digital voice translations294which are provided to a user audio device296. The element identification service can also generate an element offer/coupon312which can provide a user electronic wallet314and a printing service316. The element offer/coupon312can also be rendered via a user selected display or device320.

It is important that the annotation data be synchronized with the displayed video and various synchronization means have been presented. System830ofFIG. 11includes aspects of a fingerprinting service. Video data, including the audio channel, can be fingerprinted by analyzing it for certain identifying features. Video can be fingerprinted by simplifying the image data by, for example, converting color to grey scale having at least one bit per pixel to thereby greatly reduce the amount of data per video frame. The frame data can also be reduced by transform techniques. For example, one of the current video image compression techniques is based on the discreet cosine transform (DCT). DCT coefficients have also been used in pattern recognition. As such, compressed video already contains DCT descriptors for each scene and those descriptors can be used as fingerprints. Furthermore, video is a sequence of images and thereby presents a timed sequence of descriptors. It is therefore seen that current video compression technology provides one fingerprinting solution. In addition, the compression type algorithms can be applied to reduced data such as the grey scaled video or lower resolution video to provide fingerprints that are quicker to produce and match.

A display device or media device can calculate fingerprints for the video data being presented to a user. Alternatively, a camera, such as that of an AR device, can record the displayed video data and fingerprints be calculated from the recording. The fingerprint calculations can be calculated in near real time and submitted to an identification service. The identification service comprises a massive search engine containing fingerprint data for numerous known videos. The identification service accepts the fingerprints for the unknown video, matches them to a known video, and returns synchronization data. The search engine can determine a number of candidates to thereby reduce search size and can continue to receive and process fingerprints of the unknown video until only one candidate remains. Note that two of the candidates can be the same movie at different times. The fingerprint data can be calculated from a single video frames, a sequence of video frames, a portion of the soundtrack, or a combination of these or other data from the video.

FIG. 12illustrates a system840depicting a Sync Data Request359that can be sent to an identification service, in accordance with an embodiment. Other embodiments can have additional or fewer elements. A Video Visual and/or Sound Data field358can include fingerprint data for the unknown video that is to be identified. Many of the illustrated elements can be used to reduce the size of the identification service's search. The User Id332identifies the user such that the user's viewing habits can be tracked. A fan of a certain weekly show is likely to be watching that show at a specific time every week. A Device ID330has similar utility. Location and time information can be used to limit an initial search to whatever is being transmitted at that location and time. Note that the time is not too useful unless the user is watching a recoding, in which case the recording time can be included in the request. Media History340can be used to target the initial search at one or more shows or genres. Service type338can indicate, satellite service, over the air, cable, or streamed which when combined with location and time (i.e. block336labeled “Location and Time”) greatly limits the search. Service, location, time, and channel in combination are almost certain to limit the search to a single show.

User authentication data350can be used to verify the user, to limit the identification service to only authorized or subscribing users, and even to tie criminal or unlawful acts, such as piracy or underage viewing, to a specific viewer. The request ID334is typically used to identify a matching response. Display Device Id342and Address344can be used favorably when the display device is a movie screen or similar device at a public venue because many requests from the same venue can be combined and because the identification service might already know what is being displayed by the display device. User preferences356can also be designated. Note that device id330and the display device id342may be unique because the device id330can refer to, for example, an AR device or to a device (e.g. Smartphone) with a display slaved to a primary display that has the display device id. Sync response history356can indicate recent response from the identification service because it is likely that a user has continued watching the same video. Synch test results352can also be provided.

FIG. 13illustrates a system850that includes a Sync Data Response411that an identification service can send in response to a sync data request, in accordance with an embodiment. Other embodiments can have additional or fewer elements. A response Id402can be included and used later in a request's sync response history, such as the sync response history356shown inFIG. 12. A request Id404can refer to a sync data request for which a sync data response is being sent. Note that a single request can result in numerous responses if the system is configured to process numerous responses. A piracy flag408can indicate that the video has been identified and appears to be being pirated, stolen, or used in violation of copyright based on the User Authentication, User Id, or other data. A parental block flag410can indicate that the video has been identified and that it appears that the user shouldn't be viewing it based on the user's age or on permissions or policies set for the user by the user's parent, guardian, or overseer. The ‘Num Candidates’ field406indicates the number of candidate videos being indicated in the data sync response.

The identification field360indicates that the video has been positively matched. A Media Id362and a timestamp are sufficient for identifying a frame in a video. Therefore, the identification field360can include nothing more than a media Id362and a sync estimate264. The sync estimate364can be a timestamp or functionally equivalent information that can specify a frame in the identified media. The time difference between sending the request and receiving the response can be indeterminate and significant enough that the sync estimate is not precise enough and the user experience feels delayed, out of sync, jittery, or unresponsive. Test data368and a test algorithm366can be used to refine the sync estimate. The test data368can include the sync estimate and/or some of the fingerprinting data from the identification service360. The test algorithm field366can specify which test algorithm to use or can include a test algorithm in the form of executable code. In any case, certain embodiments can have a synchronization module that can use a test algorithm on the test data and the video visual or sound data to produce or refine a sync estimate. A sync data field370is also shown.

The identification service360can submit one or more candidates when the video has not been positively identified. Candidate field380is thus shown inFIG. 13and refers to a first candidate, but additional candidate fields can be provided for other candidates. Candidate field380thus includes media id372, sync estimate374, test algorithm376, test data378, and a test sync field380. The candidate field389can contain data very similar to that in an identification field360such that all of the candidates can be tested to determine which, if any of the candidates are the unidentified video. In many cases a threshold test, a best match, a likelihood test, some other test or some combination of tests can be used to determine when the video has been positively identified.

One interesting case is when a scene has been positively identified but the scene is included in multiple videos. This case can occur when stock footage is used and when different versions exist such as a director's cut and a theatrical release. In these cases, candidates can be submitted to a synchronization module but with sync estimates that are in the future. The synchronization module can test the candidates anytime the needed fingerprint data from the unidentified video is available. In some cases, such as when a video stream is being viewed as it is downloaded, the needed fingerprint data will become available only after the time offset of the future sync estimate is reached. In other cases, the entire video is available such as when it is on a disk or already downloaded. In these cases the candidate having a sync estimate in the future can be tested immediately because all the needed data is available.

Another case that can occur is when a sync data request is submitted without any fingerprinting data (video visual and/or sound data). The other data in the request can contain sufficient information to limit the number of likely candidates. For example, the time, location, service type, and channel can almost certainly be used to fully identify a video and a sync estimate when the video starts on time and proceeds at a predictable rate. Candidates can be returned to further narrow the search and the sync test results field contains data that helps guide the search engine. The sync test results field352shown inFIG. 12, for example, can also indicate when the identified video or the candidates do not match the unknown video.

Returning toFIG. 11, the synchronization module286can produce synchronization data298that can be used to keep the selection zones synchronized with the video and that can be used to synchronize alternative language tracks with the video. Videos are often streamed or distributed with only one or a select few languages. Typically, the video has a few audio tracks with different audio tracks having dialog in different languages. Dialog in additional languages can be distributed separately and can provide a good user experience if it is synchronized with the video. Multiple audio tracks can be mixed by, at, or near the user audio device with different tracks having different content. For example, some tracks can include background or other sounds that are not dialog and some other tracks can be dialog in various languages. Mixing data can specify relative sound levels and other details necessary for mixing the tracks. Here, track refers generically to electronically encoded audio that can include numerous audio channels for surround sound or stereo sound and with some tracks distributed with the video. For example, video, which includes some audio tracks, can be distributed on a data disk, a radio frequency transmission, satellite download, computer data file, or data stream. Additional audio tracks can be distributed separately via a data disk, a radio frequency transmission, satellite download, computer data file, or data stream.

The synchronization data298can be used to ensure that the additional tracks are played, perhaps after mixing with other tracks, in synchronization with the displayed video. A use can therefore select a video to watch and a preferred language. A system enabled according to one or more embodiments herein can then obtain dialog in the desired language, mix the sound tracks, and present the video to the user. The user will then hear dialog in the desired language. Certain embodiments can also include optionally displayed subtitles that can be rendered into video and overlayed on the video and in synchronization with the video. A further aspect is that a user can choose to have different actors speak in different languages. The different actors can each have audio tracks in different languages that can be mixed to provide a multilingual auditory experience.

The synchronization data298can also be used to keep the selection data synced up with the video. Selection data can be distributed separately from the video just as additional audio tracks can be. This is particularly true when the distributed video predates an enabled system or for some other reason is does not already include selection data, annotation data, or synchronization data. Recall that it is the selection data that can specify the selection zones. A selection zone can be identified from a screen coordinate and synchronization data that specifies the video or media being displayed as well as a time, frame, or likely time period. The selection data can reside in either a local data store or a remote data store.

A user request can specify what the user wants. Recalling the local display129ofFIG. 4, a first user request brought up options for “Bletch Cola” that included “fetch” and “order online”. The options can be displayed to the user in a number of ways including as graphical overlays on a display, local display, or AR display. The graphical overlays are essentially screen elements with their own selection zones that are created in response to the first user request. A second user request, for example a selection of “order online”, can bring up a new set of options and selection zones for ordering some cola from an online merchant. Another possibility is that the user select the “$1 @ Marty's Grocer” to obtain an electronic coupon or printable coupon for use at a merchant location. It is important to note that the various selection zones can be dynamically created and dismissed (perhaps by selecting an area outside the selection zones) and that the selection data for the dynamically created selection zones can be distributed in a wide variety of ways and can accessed locally or remotely.

A further note is that broadcasters often overlay advertising or information onto displayed video. Examples include station identifiers and graphics of varying levels of intrusiveness advertising television shows at other times. These can also be examples of dynamically created screen elements with selection zones. User action may not have caused these screen elements to appear, but those elements can be selected. The dynamically created screen elements can obscure, occlude, or otherwise overlay other selectable screen elements. The selectable elements and selection zones associated with a video overlay should usually take precedence such that a user selecting an overlaying element does not receive data or a response based on an occluded or not-currently-visible screen element.

A user request can imply an element Id by specifying a selection zone or can explicitly include the element Id. In any case, element Ids can have ambiguities. An element identification service can resolve ambiguities. For example, two different videos can use the same element Id. An element identification service can use a media Id and an element Id to identify a specific thing. The element identification service can amend the user request to resolve any ambiguities and can send the user request to another service, server, or system to fulfill the user request. The illustrated examples include selecting a language, obtaining an offer for sale or coupon, obtaining information, or being directed to other data sources and services. The obtained information can be provided to the user, perhaps in accordance with the user request or a stored user profile, in email or another messaging service, or to a display device such as the user's cell phone, AR device, tablet, or television screen.

A user can be directed to other data services by being directed to an element app, site, page, profile, or tag. An element app is an application that can be downloaded and run on the user's computer or some other computing or media device. A site can be a web site about the identified element and perhaps even a page at an online store that includes a description and sale offer. A page, profile, or tag can refer to locations, information, and data streams within a social media site or service and associated with the identified element.

Returning to the multilingual example, a user can select a menu icon, a menu button, or in some other way cause a menu of viewing options to appear on one of the screens upon which the user can select screen elements. One of the options can provide the user with available voice translations at which time the user selects a language or dialect.

FIG. 14illustrates a system860for maintaining unique identifiers for elements that appear in videos and in using those unique identifiers for scattering and gathering user requests and responses, in accordance with the disclosed embodiments. A universal element identifier (UEI)420is an identifier with an element descriptor422that is uniquely identified with a certain thing, such as “Bletch Cola”, so that whenever that thing appears on screen with a selectable zone that it is associated with the UEI. For example, every advertisement and product placement of Bletch Cola can be associated with that one UEI thereby greatly simplifying the routing of and responses to user requests. A UEI generator may, for example, generate data for storage in a UEI database480.

The UEI database480can be kept on one or more servers (e.g., server424) with the database storing data including, for example, one or more element records, such as, for example, element record460. The UEI database482may be responsive to, for example, an element admin request484. The element record460can be created in response to a request that includes an element descriptor. The request can be sent to a (UEI) server424or directly to the database server480. In many cases, to help ensure data security, the UEI server424and the database480can be maintained on separate computer systems having extremely limited general access but with widely available read only access for UEI servers. The element record460can contain data including but not limited to, for example, the Universal Element Id474, registered response servers472, element descriptors470, registration data,466administrator credentials468, a cryptographic signature464, and an element public key462. The UEI server424can generate a UEI426.

A UEI request420is an example of an administrative request, an example of which is also shown at block484. Element administrators can have read, write, and modify access to element records and should be limited to writing and modifying only those records for which they are authorized. Different records can have different administrators. Element administrators have far more limited access and rights than server or database administrators. Element records can contain admin credentials specifying who can administer the record or similar password or authentication data. The admin credential can also contain contact information, account information, and billing information (if the UEI is a paid service). In many cases the admin credentials can refer to an administrator record in a database table. The important aspects are that only authorized admins can alter the element record, that the admins can be contacted, and that the record can be paid for if it is a paid service. Paid services can have various fees and payment options such as one time fees, subscription fees, prepayment, and automatic subscription renewal.

The element descriptor422is data describing the thing that is uniquely identified. It can be whatever an element record admin desires such as simply “Bletch Cola” or something far more detailed. Registration data can include a creation date and a log of changes to the record. It can be important that responses to user queries come from valid sources because otherwise a miscreant can inject a spurious and perhaps even dangerous response. A cryptographic signature and element public key are two parts of a public key encryption technique. Selectable zones and UEls can include or be associated with a public key. Responses to user queries can be signed with the cryptographic signature and verified with the public key. Similarly, the queries can be signed with the public key and verified with the cryptographic signature.

The element record460can contain a list of registered response servers. A query routing module426can obtain the list of registered response servers and direct user queries to one or more of the registered servers as shown as block448. Note that a query to one or more unregistered servers is depicted at block450, which in turn will provide data to an unregistered data server456. Unsigned element data444and/or signed element data446may be output from the unregistered data server456and in turn can be provided to an element data gathering module, which in turn can provide data to an element data resolution module, which in turn can provide element data and presentation directives as shown at block440.

A typical query428may contain, for example a UEI430and a query type432. The query routing module426can be a purpose built hardware device such as an Internet router or can be executable code run by a computing device. The list of registered servers can be obtained directly from the UEI database server480or from a different but authorized server (or responder) that has obtained the list. Public key techniques can ensure that the responder is authorized or that the response contains signed and valid data.

A man in the middle responder434can be authorized or can be relaying signed and authorized information. The man-in-the-middle responder434can communicate with a third party responder452and also the database/server480. In the case of no or weak encryption, the man-in-the-middle responder434can provide spurious information and thereby cause the query routing module436to send the query to a hostile data server. For example, the makers of Bletch Cola would prefer that their own or only authorized servers respond to user queries having the UEI for their product. A competitor might want to send a different response such as a coupon for their own cola. A truly hostile entity such as a hacker might want to inject a response that compromises the user's media devices, communications devices or related consumer electronics.

In some scenarios a network operator or communications infrastructure provider may desire to add, delete, or modify the registered response server list. Such modification is not always nefarious because it may be simply redirecting the queries to a closer server or a caching server. The network operator might also wish to intercept “UEI not found” type responses and inject other information that is presumably helpful to the user. In other cases, the network operator may intentionally interfere with disclosed systems and services by responding to user queries or registered response service requests with information about its own services, premium data charges, or reasons for not allowing the queries on its network.

Ideally the query routing module436will send the query to at least one registered data/response server that returns a element data, perhaps signed, that can be presented to the user. When multiple servers are responding an element gathering module can collect the responses and format them for the user. A response gathering module can cooperate with the routing module such that it knows how many responses are expected and when to timeout on waiting for an expected response. An element data resolution module can inspect the gathered responses and resolve them by removing duplicate data, deleting expired data, applying corrections to data, and other operations. Data can be corrected when it is versioned and the responses contain an older version of the data along with ‘diffs’ that can be applied to older versions to obtain newer versions.

The user query can also be sent to an unregistered data server with good or bad intent as discussed above. The unregistered server can not reply with signed data unless the encryption technique is compromised. The data element gathering module or a similar module can discard unsigned or wrongly signed responses. Alternatively, the module can be configured to accept unsigned response from certain sources or to accept wrongly signed responses with certain signatures. All responses can be treated the same once accepted and systems lacking encryption or verification capabilities will most likely accept all response to user queries.

Based on the foregoing, a number of options and features are possible. For example, a screen/display device/associated device that transmits an annotation signal can be provided that includes a media identifier as well as screen annotation data such that local devices (remotes, phones, etc.) can create “augmented reality” cursor and transmit cursor icon & location to display device etc. Note that the video data may be in a narrative format that does not respond to user inputs except, perhaps, for panning and zooming around a scene.

In some embodiments, a system can be implemented which includes a display device presenting a display of time varying video data; annotation data comprising time and position based annotations of display elements appearing on the display; a pointing device that transmits a pointer signal wherein the pointing device comprises a selection mechanism (always on or only when button pushed); a pointer sensor that detects the pointer signal and determines a cursor location wherein the pointing device is pointing at the cursor location; a cursor presented on the display at the cursor location; and an annotation identification module that determines when an on screen element is selected. Note that annotated video data can be video data that is associated with, bundled with, or distributed with annotation data. In another embodiment, the pointer signal can encode cursor data wherein the cursor data determines the on-screen cursor appearance. In another embodiment, the pointer signal can encode a pointer identifier. In yet other embodiments, the on-screen cursor can change appearance when overlying an annotated element. In still other embodiments, the pointing device can transmit the pointer signal when a pointer actuator is activated.

In yet other embodiments, a system can be implemented that includes annotated video data comprising annotation data; an annotation data transmitter that transmits the annotation data to a pointing device (e.g., Kinect or camera based, for example in the context of an AR browser). In a Kinect version, cameras may detect where a person/passive pointer (like specially painted or shaped baton or weapon type thing) is pointing, in order to detect who the person is via facial recognition, biometric measurement, gesture as password, etc. In other embodiments a contact list for video conferencing can be provided. In still other embodiments, a navigating screen with a remote control phone as a pointing device can be provided, and the phone camera and display used to aim an on-phone cursor at the display. Additionally, the phone camera and display can share images, poke phone, and the phone can send screen location data to the display which then treats it as a local selection. In a movie theater setting, an HHD can synch in on frame/position and communicates with a server. Also, theater/movie “cookies” can phone so that user can redisplay/rewatch movie for xxx time period such that queries can be made outside the theater. Additionally, it is not a requirement that the device (e.g., HHD) be aimed at the movie screen.

Additionally, tagged searches are possible. For example, YouTube can be searched by a tag so that people can tag the videos and then search on the tags. An extension is to tag movie scenes so that a media identifier and offset into the ‘movie’ comes back, perhaps even a link for viewing that scene directly. As a further refinement, someone can obtain a tag by selecting a screen element and assemble a search string by combining tags some of which are entered via keyboard and some via element selection.

Synchronization data can be obtained through a technique like the fingerprinting type technique described above or more simply by knowing what is being watched (media Id or scene Id) and how far into it the person is (time stamp or frame number). A person can tag a scene or a frame with any of the above mentioned devices (AR device, cell phone, remote control, kinect type device, pointing device, . . . ). Basically, click to select frame/scene, key in or otherwise enter in the tag, and accept. On accept the tag and the sync data (includes media Id) is sent off to be stored in a server. People can search the tag database to check out tagged scenes. For example, Joe tags a scene “explosion” and Jill tags the same scene “gas”. A search for “gas explosion” can result in a link to watch the scene. A social media tie in is that Joe's friends might see that “Joe tagged a scene with ‘explosion’” along with a single frame from the scene and embedded links to the scene. If an actor, “Kyle Durden” was blown up in the explosion, Joe's tag could have been assembled by selecting Kyle Durden in the scene and requesting a tag (assumes style annotated video) and then adding “explosion”. The “Kyle Durden” tag can be the textual name, or a small graphic. The small graphic can be the actor's picture or a symbol or graphic people associate with the actor. Similarly, anything having a trademark (or service mark) can use that mark (word, logo, design, or combo mark) can use the mark as a tag. Searching for such a tag requires entering it in some way such as by “clicking” on it via the pointing device or HHD.

FIG. 15illustrates a system870generally including a multimedia display126associated with a controller and which communicates with one or more wireless HHD's, in accordance with the disclosed embodiments. The controller503can be integrated with the multimedia display126or may be a separate device, such as, for example, a set-top box that communicates with the multimedia display126. Communications between the multimedia display126and the control503may be wireless (e.g., bidirectional wireless communications) or wired (e.g., USB, HDMI, etc.). One or more HHD's, such as, for example, wireless HHD502can communicate via wireless bidirectional communications with the controller503and or the multimedia display870(e.g., in the case where the controller503is integrated with the multimedia display126).

A user of the HHD502can utilize the HHD502to register the HHD502with the multimedia display126. The HHD502is thus associated via this registration process with the multimedia display126. Once registered and thus associated with that particular multimedia display126(and/or the controller503), the user can utilize the HHD502to move an onscreen graphically displayed cursor to content/media of interest and then “click” the HHD502and select that area on the screen multimedia display126, which then prompts a display of data supplemental to the selected content/media in a display of the HHD502itself and/or, for example, in a sub-screen512within the display126. If within the display126, the sub-screen would “pop up” and display the supplemental data within display126. Alternatively, the supplemental data can be displayed for the user of the HHD502within a display area or display screen of the HHD502.

FIG. 16illustrates a system880that generally includes a multimedia display126and a secondary multimedia display129, in accordance with the disclosed embodiments. As inFIG. 15, multimedia display126can be associated with a controller503and multimedia display129can be associated with controller505. Controller503can communicate wirelessly or via wired means with multimedia display126and controller505can similarly communicate wirelessly or via wired means with multimedia display129(e.g., a secondary screen). The wireless HHD502can communicate via wireless bidirectional communications means with controller502and/or controller505and/or directly with multimedia displays126and/or129. In the embodiment shown inFIG. 16, the supplemental data can be displayed via display126, the HHD502and/or the secondary display129. In some embodiments, the supplemental data can be displayed in a window or sub-screen514that “pop ups” in multimedia display129. A similar display process for the supplemental data can occur via the window or sub-screen512that can “pop up” in multimedia display126.

FIG. 17illustrates and represents a system890in which HHD512can communicate wirelessly via a data network520(e.g., the Internet) for storage and/or retrieval supplemental data in a memory or database522, in accordance with the disclosed embodiments. In some instances, a user of HHD502may wish to access from or save the supplemental data to the “cloud” for later viewing, rather displaying the data immediately via a pop up window or sub-screen512of via a display associated with the HHD502.FIG. 17further illustrates and represents a system890in which an HHD512can communicate wirelessly via a data network520(e.g., the Internet) with a remote server522associated with a database for providing at least one of: identification based on a captured image provided from a HHD512of event or video programming rendering on a multimedia display screen126, for obtaining supplemental data associated with the event or video programming by the HHD502once the event or video programming is identified, and storage of the supplemental data in a memory or database522for later retrieval, in accordance with the disclosed embodiments

FIG. 18illustrates a system900in which multiple HHD's can communicate with the controller503and/or the multimedia displays126and/or129, in accordance with the disclosed embodiments. Each HHD502,504,506, etc., can be registered with the same controller503or with other controllers, such as controller505discussed previously. Similarly each controller HHD502,504and/or506, etc., can be registered with multimedia displays126and/or129. All of the HHD's502,506,506, etc. can interact with, for example, the main screen126and collaborate with one another (e.g., in a gaming environment). A user of HHD504may desire, for example, to display selected supplemental data on display129rather than display126. A user of HHD506may desire to display the supplemental data only on his or her HHD. Each HHD502,504,506, etc. can also independently interact with displayed media on either screen126or129to retrieve data of personal interest to that particular user of a respective HHD. In this sense, the “screen” via a controller such as controllers503and/or504can function as director of two or more HHD's, particularly in the context of a gaming environment.

FIG. 19illustrates a system910in which multiple HHD's (or a single HHD in some cases) can communicate with a data network520that communicates with a server524and database522associated with and/or in communication with sever524, in accordance with the disclosed embodiments. Supplemental data can thus be stored via HHD502,504and/or506, etc. in database522or another appropriate memory for later retrieval. Thus data can be retrieved from remote servers such as server524via data network520(e.g., the Internet). Furthermore, the illustrated system910supports communication of HHDs502over a data network520to communication with a server524for purposes of identifying video programing based on an image captured by the HHD502of video programming being displayed on a multimedia display126by matching the image with images of video programming stored in a database522. If there is a match found by the server524, then the HHD can be notified of the availability of additional data related to the video programming in accordance with features described herein.

FIG. 20illustrates a graphical view of a system920that includes a multimedia display126and one or more wireless HDD's502,504,506, etc., in accordance with the disclosed embodiments. It can be assumed that the configuration shown inFIG. 20can be implemented in the context of the other arrangements and systems disclosed herein (e.g., including various controllers, primary and secondary screens, data networks, servers, databases/memory components and so forth). Each user of an HHD can designate his or her own personal profile icon during, for example, the registration process or at a later time. Each profile icon can be displayed as a cursor via a multimedia display such as multimedia display126. Thus, for example, a user of HHD502may select a profile icon532in the form of a dollar sign, which is displayed graphically in display126. Similarly, a user of HHD504may be, for example, a hunter or outdoorsman and may select a profile icon534in the symbol of a deer as his profile icon. A user of HHD506may, for example, select a profile icon536in the symbol of skeleton as his or her profile icon. It can be assumed that users may select primary and alternate profile icons so that in the case of two or more people with the same profile icon, a secondary profile icon may be displayed instead of a primary icon so as not to cause confusion among users of the various HHD's502,504,506, etc. during the same session or event.

FIG. 21illustrates a flow chart depicting logical operational steps of a method940for registering and associating an HHD with a display screen and displaying supplemental data, in accordance with the disclosed embodiments. As indicated at block550, the process can be initiated. Next, as shown at block552, an HHD such as, for example, the HHD's502,504,506, etc. can be registered with a multimedia display and/or a controller such as described previously. Thereafter, as illustrated at block555, following registration and association with the display (e.g., via a controller), an icon or cursor can be displayed, which associated with the HHD. Then, as depicted at block558, the user of the HHD can select content/media of interest using the HHD with respect to the icon/cursor displayed on the multimedia display screen. Next, as shown at block560, supplemental data can be displayed on the HHD display or on the multimedia display (or both).

FIG. 22illustrates a flow chart depicting logical operational steps of a method950for transmitting, storing and retrieving supplemental data and registering an HHD with a secondary screen, in accordance with the disclosed embodiments. As indicate at block570, the process begins. Then, as shown at block572, the HHD can be registered with a secondary screen and/or associated controller. An example of a secondary screen is the multimedia display129discussed earlier. Thereafter, in response to a particular user input via the HHD, supplemental data can be transmitted to a database (e.g., database/memory522) via a data network, (e.g., network520, the Internet, etc.), as illustrated at block576. Then, as shown at block578, the supplemental data can be retrieved for display via the secondary screen, the primary screen and/or other screens, etc. The process can then terminate, as shown at block580.

FIG. 23illustrates a flow chart depicting logical operational steps of a method930for designating and displaying profile icons for selection of content/media of interest and supplemental data thereof, in accordance with the disclosed embodiments. As shown at block560, the process can be initiated. Next, as shown at block552, the HHD (or multiple HHD's) can be registered (together or individually) with a multimedia display screen and/or associated controller. Thereafter, as shown at block554, user profiles can be established via the HHD and one or more profile icons selected. Examples of profile icons include icons/cursors532,534, and536shown inFIG. 20. Next, as illustrated at block556, one or more of the profile icons can be displayed “on screen”.

Thereafter, as shown at block558, content/media of interest can be selected via the HHD with respect to the profile icon displayed on screen. For example, a profile icon/cursor may be moved via the HHD to an area of interest (e.g., an actor on screen). The user then “clicks” an appropriate HHD user control (or touch screen button in the case of an HHD touch screen) to initiate the display of supplemental data (with respect to the content/media of interest). The supplemental data can be displayed on, for example, an HHD display screen, as shown at block560. The process can then end as depicted at block562.

Referring toFIG. 24, illustrated is a flow diagram of an alternate process of using an image captured by an HHD of a scene (e.g., video programming) as it is being displayed on a multimedia display, determining the identity of the scene at a remote server, and providing access to additional information related to the scene from a database. As shown in Block610, an APP on a HHD can be activated to capture an image of a scene of video programming displayed on a multimedia display and enable access to data associated with video programming from a remote server if the scene is matched and identified By the remote server. The image of the scene is captured using a digital camera integrated in the HHD. The HHD then wirelessly accesses a remote server to match the captured scene with images of video programs in a database to determine the identity of the scene and the availability of related data, as shown in Block620. Wireless communication can be supported by wireless communications hardware integrated in the HHD to support WIFI or cellular communications (e.g., 802.xx, 4g, LTE, and so on). Then as shown in block630, the HHD can receive notification of a match and the availability of related data that can be selectively retrieved from the server by HHD. Then as shown in Block640, the HHD selectively accesses additional data related to the scene. The data can logically be accessed and manipulated consistent with many of the teachings provide herein.

Most views actually contain layers of content. For example, a set such as the set700illustrated inFIG. 25can have a few elements such as actors705and some props715positioned in front of a “green screen”710. A green screen710is simply a known background that can be easily replaced by other imagery. Green screens were originally a single uniform color. The non-green parts of a scene were, by definition, the foreground711and the green parts were the background712. Views of background scenes such as mountain vistas, moving traffic, or bustling cities can be captured independently of the foreground views. A final view can be produced by compositing the foreground view711and the background view712.FIG. 26illustrates an example wherein the scene ofFIG. 25is imaged to produce Final view713, that includes background view712has been composited in with foreground view711to produce the Final view713.

Compositing with a green screen amounts to taking the foreground view711and replacing everything green with the background view712. Computer technology made this process particularly convenient because the green pixels are simply replaced by background pixels. The above example of using a green screen actually produced three content layers. The background, the dynamic set elements, and the static set elements. The static set elements are foreground things that don't move such as a table or a sphere resting on the table. The dynamic set elements are foreground things that do move such as the actor and anything the actor manipulates.

FIG. 27illustrates a view changing because the camera moves while imaging a scene. Note that the foreground and background are shifted as indicated by arrows, but are otherwise unchanged.FIG. 27more clearly illustrates this concept by showing a complete view and a series of presented views. A scene can be much larger than the view imaged by a camera and the camera can pan and tilt to move the view around and thereby capture different areas of the scene.

A movie, television show, or video is essentially a sequence of views. More conveniently though, the sequences of views are first combined into acts and the sequence of acts is combined to produce the movie. An act is often the sequence of views of a scene. For example, the views taken from two cameras filming a scene can be cut and spliced to produce an act in a movie.FIG. 28illustrates two views being captured from the same scene.

Changes in technology have allowed the green screen technique to be replaced by computer algorithms that can automatically replace any element in a view, even moving elements. Similar algorithms can track elements as they move. Furthermore, in the example above a final view in composited from a foreground view and from a background view. In practice, many more layers can be composited to form final views.

Changes in technology have also allowed virtual scenes to be filmed by virtual cameras. Virtual scenes are computer generated and can be static, dynamic, foreground, background, or any other layer. The virtual cameras are mathematically defined to produce views of the virtual scenes. As with any other view, a view from a virtual scene can be composited with any other view.

Annotating views and scenes is the process of determining which spots on a display correspond to what elements in a scene. Each scene or view can have a “screen map” wherein specific areas in a view are identified as being an actor, consumer product, or other scene element. One very data intensive embodiment would use a different screen map for every frame of every scene in a movie. The user selects a scene element by pointing at it with a pointing device or finger, the point of aim/selection is identified then submitted to a software module that uses the screen map to determine what thing in the scene corresponds to that point. The identity of the pointed at thing is then returned for subsequent use such as finding an actor's biography or purchasing an on screen item.

Virtual scenes can be automatically annotated as they are created because the rendering software used to place the scene elements can also record the viewed locations of those elements.

The elements in any of the views can be annotated before compositing. Returning theFIG. 25, the actor, table and sphere can be annotated for each of the two images views. Similarly, the mountainous background view ofFIGS. 26 and 27can be annotated independently. Compositing annotated views together requires that the annotations also be composited. Annotation compositing can be performed by tracking which items are composited on top of other items and producing a final annotation for the final view. Another option is to preserve the annotations for each view. As shown inFIG. 28, a user can select an item by selecting a point within the final view (i.e., a spot on the screen). A software module can then determine if the point coincides with a foreground view element such as the actor. If not, then the software module can determine if the point coincides with an element in the background. If there are additional layers composited between the foreground and background, then each of those layers is checked in turn beginning with the foremost and progressing to the hindmost. Green screen areas are either not annotated at all or are annotated as green screen, transparency, or some other value that signals the software module to proceed to the next layer.

A complete view can also be annotated. The selected point ofFIG. 28does not move within the complete view but does move within the presented views. Vector addition of the view offset and the selection vector locates the selected point within the complete view. A similar mathematical operation (multiplication) provides for locating the selection point when the camera zooms into or out of the scene. As such, annotating the complete view and tracking both the view offset and zoom factor allows the same annotation data to be used as the camera pans, tilts, and zooms through a scene.

An excellent example of a complete view is a sports arena. The views change radically as a game progresses. Many scene elements, such as scoreboards, and advertisements, do not move. A complete view of a sports arena can be annotated once for each camera angle and the annotation then used as the background annotation when a game is played. Note that in this case the background is changing and is not composited into the final view but that the same background annotation can be used throughout a gaming season.

Annotation compositing and complete view annotation are advantageous because they greatly ease burden of annotating views. An entire scene can be annotated once and then the view trajectories recorded. View trajectories are records of view positions and zoom factors at different times as a scene is imaged. Just as the annotation task is reduced, the amount of annotation data is reduced because view trajectories are much smaller than screen maps. A further advantage is that the different views in different layers (foreground, background, etc.) can be reused in different parts of a presentation or even in completely different presentation. Movies, videos, and sporting events are examples of presentations.

An annotation database accessible via an annotation server800, as shown inFIG. 29, can annotate every pixel in a frame, can annotate only those pixels overlying certain elements, or can annotate only certain points in select frames. Every annotated point can be termed an “annotation coordinate”. Every piece of annotated media content is identified by a media identifier and every annotation coordinate includes a frame (or subframe) identifier, a coordinate, and an element identifier. Annotation coordinates can also contain the media identifier. Similarly, the user's selected element can be specified by selection data containing a screen coordinate, media content identifier, and frame identifier. The annotation coordinate best matching the selection data has the same media identifier and is closest in both time and space. Closest in space can be a Euclidean, Mahalinobis, or Manhattan distance. Closest in time can be measured in frames or seconds. A useful distance measure for annotation purposes can be a weighted sum of spatial and temporal distance, can step forward or backward through frames seeking a nearby annotation coordinate, or can combine the spatial and temporal date in other ways. As such, an element that does not move during a contiguous series of frames can be annotated with a set of annotation coordinates arranged at either end of the frame sequence and spatially centered over the element. A further refinement to overcome irregularities in temporal distance measurement is to track scene changes and to exclude annotation coordinates from other scenes.

A yet further refinement is give each scene in a movie, show, sporting event, etc. a unique media content identifier. This refinement addresses issues that may arise when clips or portions of a show are presented instead of the entire performance.

A yet further refinement is to annotate a media clip with a media content identifier and a time offset specifying at what time in the original does the media clip begin. For example, a media clip could begin 35 minutes into a movie. In this manner it is far easier to annotate previews, clips, mash-ups, or different cuts (theatrical, director's, unrated, etc.) of a presentation. Mash-ups are a series of clips taken from different presentations or rearrangements of a single presentation. Note that sound tracks, songs, dialog, and other sounds can be clipped, mixed, or otherwise arranged within a mashup. An annotation coordinate can refer to a place or a sound as easily as to an on screen element by specifying special spatial elements as flags for such place elements or sound elements. For example, (−1,−1) could indicate sound while (−2,−2) could indicate location.

Compositing can be used to replace scene elements in live or recorded video content. Returning to the stadium discussed above, there are many advertisements placed around the stadium such as posters, billboards, signage, and lighted signage. The giant video screens present in many stadiums often show multiple advertisements at the same time as well as images, video, and information about an event (game, concert, race, etc.) occurring at the stadium, about athletes, about attendees, and about stadium activities. For example, the “kiss cam” shows attendees who are encouraged to kiss. Stadium activities can include singing the national anthem, throwing t-shirts and swag into the audience, mascot races, half time shows, etc. Compositing can replace the stadium advertisements with other advertisements or images.

In fact, compositing has been used to replace stadium advertisements with other advertisements for some years. Stadium attendees can see actual physical billboards and posters while remote views, such as a TV audience, can see other images. For example, attendees at a Brazilian soccer game might see Portuguese language billboards advertising a Brazilian beer. A viewer in Germany might instead see a German language advertisement for a German beer. The German advertisement has been composited over the Brazilian advertisements for viewers in Germany.

Similar compositing can be performed for recorded video. The Brazilian soccer game can be recorded and shown years later with completely different ads composited over the Brazilian beer advertisement. Such compositing can be particularly useful when a product or service cannot be legally advertised or becomes socially unacceptable. An example of that is cigarette ads which have become regulated in some countries and are sometimes viewed with disdain or antipathy.

Compositing can also be used to modernize or otherwise replace product placement within movies and television shows. Product placement occurs when an item is placed within a scene as a scene element but is not overtly advertised. Examples include what an actor eats, drinks, or wears while on screen. Examples also include background elements such as an advertisement on a bench that an actor walks past or sits on. Compositing can update a product placement when a can of Tab in decades old video is composited out in favor of a can of Coke, Jolt, etc. Compositing can also be used to adjust product placements based on geographic markets with an American seeing Jolt instead of Tab and an Australian seeing Solo instead of Tab. Geographic markets can be narrowed to the point that people in different parts of the same city, perhaps even in neighboring houses, see different scene elements. Such fine levels of marketing may require location data from a person's mobile or wearable devices. A persons IP (internet protocol) address can also be used to determine location data. Such location data is currently available.

Other compositing opportunities include demographic based or targeted advertisements. Returning to the beer can example, a viewer's location or profile can be queried and the composited in product placements and stadium ads targeted at the viewer. Sadly, targeted advertising via compositing can be used to harass specific individuals or groups when another person or group obtains a right to place products or advertisements into that individual's viewed video content. Compositing also provides a solution wherein the harassed individual opts out of targeted ads or obtains the right to place products or ads into his own viewed video content. Here, “obtains a right” can mean a purchase or license of advertising rights. For generality, “obtains a right” can also include stealing or hijacking wherein video content is intercepted and scene elements composited in (or out) without the consent of the video content provider and/or the person viewing the content.

The issue now is: What of the annotation tags related to the advertisements and product placements that were composited out? Composited out means replaced by something else via compositing. Composited in means replacing something that was originally a scent element via compositing. As discussed above, the composited in scene elements are in a video layer in front of the background and the original foreground. As also discussed above, annotations can be composited. The annotation selection data can therefore be composited such that the correct element identifier is obtained. The annotations can be composited when the video is composited or can be annotated downstream, such as at the users media device (element104ofFIG. 1).

Some embodiments can use local or downloaded annotation data, or composited annotation data while others can assemble a query containing information sufficient to specify a cursor aim point and a frame of video. Without loss of generality, the data sufficient to a frame of video is herein referred to as a media tag. The query can be submitted to an annotation server800that queries an annotation database to determine the element identifier of the object selected by the user. This scenario is complicated when the frame of video has new scene elements composited in because the media tag of the original video may not be associated with or linked to the composited in material. One solution is to add annotation data for the composited in material to the annotation database (or another annotation database) and to cause the server800to find the element identifier for the composited in object, if selected. The media tag can be augmented with a “composited annotation indicator” indicating that there is annotation data corresponding to composited in video. The composited annotation indicator can contain annotation data for the composited in scene elements or can contain a tag, address, URL (uniform resource locator), pointer, or identifier that can be used access annotation data for the composited in scene elements. This augmented media tag can contain an ordered list of composited annotation indicators if multiple layers of video have been composited over the original video. The list's order can indicate the order in which compositing occurred. Recall that the composited layers can contain large transparent regions through which lower video layers are visible.

Alternatively, the media tag can be replaced with a new media tag. The new media tag can reference the annotation data and the media tag of the original (or underlying) video. The underlying video can also have composited in items. Another alternative is for the new media tag to be an ordered list of older media tags with the bottom media tag indicating the original video and the rest indicating video layers composited over one another, and the list order indicating the order in which the compositing occurred. Lists can be ordered by putting them in order or by ranking each list element such that the list can be put in order.

The annotation data for the composited in scene elements can indicate which screen coordinates in a frame correspond to the composited in items. In this manner, the determination of an element identifier is similar to that for the non-composited video. If the user selected a composited in item, the element identifier for that item is returned and the annotations for the original video need not be searched (although searching in parallel can lead to higher performance). For example, “Jolt” is composited over “Tab” and the user selects “Jolt” in a particular frame. The annotation data for the composited in scene elements provides the “Jolt” element identifier. The annotation data for the original video provides the “Tab” element identifier. The “Jolt” element identifier is selected because the composited in scene elements overlay the original ones. Note that if the user selected a hat in the original video then the annotation data for the composited in scene elements can return “no item” thereby indicating that the hat's element identifier obtained from the annotations for the original video is the correct element identifier. Another example is that a new scene element may be composited into the video in which case the annotation data for the composited in scene elements provides an element identifier.

Alternatively, the annotation data can indicate an element identifier that should replace one in the original video. For example, the annotation data for the original video can indicate that the user selected a can of Tab. The annotation data can indicate that the “Jolt” element identifier should be returned instead of the “Tab” element identifier. This alternative requires that lower level annotation data, such as that for the original video, be searched such that lower level element Ids can be replaced.

When multiple layers of video are composited over top of one another there can be multiple layers of annotations for composited in elements. The annotation data can be queried as discussed above with the element identifier for the topmost item at the selected coordinates being the correct element identifier.

Returning to the stadium, the people at the stadium can see advertisements and activities with their naked eyes, can see selected views on the stadium's giant screen (or screens), or can view their surroundings through mobile devices such as smart phones, tablet computers, glasses with integrated displays, and other wearable devices with display capabilities. For example, an attendee's surroundings can be recorded or streamed from the integrated camera in a mobile device and viewed on the display of the streaming device or another device that is registered with the recording/streaming device. When viewing through a device, a person can view an augmented version of reality with overlaid information related to tagged content or recognized content. Another augmentation is that the viewer use the camera or display device to magnify or zoom-in on an activity or object.

Remote viewers of an event happening at the stadium can also see advertisements and activities. The remote viewers, however, can be provided with video having composted in items. The remote viewers can select a composited in item and receive information about the composited in element.

As shown inFIG. 29, streaming services or other intermediaries can receive video from mobile device cameras (or other cameras) and record it or stream the video to other intermediaries or display devices. In this scenario, the display, camera, and intermediaries would be registered directly or indirectly with one another. One example of indirect registration is when devices or intermediaries are not directly registered with each other, may be unaware of each other, but can receive data from one another. For example, the camera and display may both register with an intermediary that receives the camera video stream and provides it to the display. The video stream can pass through multiple intermediaries in series and in parallel. The video stream can reach multiple display devices. Every one of those intermediaries has the opportunity to composite items and advertisements into the video stream before passing it along. Certain intermediaries can interpret annotation data for the composited in scene elements and composite over those scene elements. Certain other intermediaries can interpret media tags and use the data therein to attempt to register with or contact the camera streaming the video or an intermediary closer, streamwise, to the camera.

Based on the foregoing, it can be appreciated that a number of embodiments, disclosed and alternative, are disclosed herein. For example, in an embodiment, a method can be implemented for supporting bidirectional communications and data sharing. Such a method can include, for example, registering at least one wireless hand held device with a controller associated with at least one at least one multimedia display; selecting at least one profile icon for use as a cursor during interaction of the at least one wireless hand held device with the at least one multimedia display during rendering of an event as data on the at least one multimedia display; and providing supplemental data from at least one of the at least one multimedia display and/or a remote database to the at least one wireless hand held device based on a selection of the data rendered on the at least one multimedia display marked by the cursor utilizing the at least one wireless hand held device.

In other embodiments, a step can be implemented for providing the event as at least one of a movie, television program, recorded event, a live event, or a multimedia game. In yet other embodiments, a step can be implemented for manipulating via the controller, the data rendered on the at least one multimedia display utilizing the at least one wireless hand held device. In still other embodiments, a step can be implemented for storing the supplemental data in a memory in response to a user input via the at east one wireless hand held device. In other embodiments, a step can be implemented for pushing the supplemental data for rendering via at least one other multimedia display. In yet other embodiments, a step can be implemented for pushing the supplemental data via at least one of the at least one wireless hand held device, a remote server, or the controller associated with the at least one multimedia display. In still other embodiments, a step can be provided for filtering the supplemental data for rendering via a multimedia display. “Filtering” may include, for example, automatically censing supplemental data based on a user profile or parameters for censoring or filing such data. For example, a public place such as a bar or restaurant, may wish to prevent certain types of supplemental data (e.g., pornography, violence, etc.) from being displayed on multimedia displays in its establishment. Filtering may occur via the controller (set-top box), controls or settings on the displays themselves and/or via the HHD accessing media of interest.

In another embodiment a system can be implemented for supporting bidirectional communications and data sharing. Such a system can include a processor, and a data bus coupled to the processor. In additional, such a system can include a computer-usable medium embodying computer code, the computer-usable medium being coupled to the data bus, the computer program code comprising instructions executable by the processor and configured for: registering at least one wireless hand held device with a controller associated with at least one at least one multimedia display; selecting at least one profile icon for use as a cursor during interaction of the at least one wireless hand held device with the at least one multimedia display during rendering of an event as data on the at least one multimedia display; and providing supplemental data from at least one of the at least one multimedia display and/or a remote database to the at least one wireless hand held device based on a selection of the data rendered on the at least one multimedia display marked by the cursor utilizing the at least one wireless hand held device.

In another embodiment, such instructions can be further configured for providing the event as at least one of a movie, television program, recorded event, a live event, or a multimedia game. In yet another embodiment, such instructions can be further configured for manipulating via the controller, the data rendered on the at least one multimedia display utilizing the at least one wireless hand held device. In still other embodiments, such instructions can be further configured for storing the supplemental data in a memory in response to a user input via the at east one wireless hand held device. In yet another embodiment, such instructions can be configured for pushing the supplemental data for rendering via at least one other multimedia display. In other embodiments, such instructions can be further configured for pushing the supplemental data via at least one of the at least one wireless hand held device, a remote server, or the controller associated with the at least one multimedia display. In other embodiments, such instructions can be further configured for filtering the supplemental data for rendering via a multimedia display.

In another embodiment, a processor-readable medium can be implemented for storing code representing instructions to cause a processor to perform a process to support bidirectional communications and data sharing. Such code can in some embodiments, comprise code to register at least one wireless hand held device with a controller associated with at least one at least one multimedia display; select at least one profile icon for use as a cursor during interaction of the at least one wireless hand held device with the at least one multimedia display during rendering of an event as data on the at least one multimedia display; and provide supplemental data from at least one of the at least one multimedia display and/or a remote database to the at least one wireless hand held device based on a selection of the data rendered on the at least one multimedia display marked by the cursor utilizing the at least one wireless hand held device.

In other embodiments, such code can further comprise code to provide the event as at least one of a movie, television program, recorded event, a live event, or a multimedia game. In still other embodiment, such code can comprise code to manipulate via the controller, the data rendered on the at least one multimedia display utilizing the at least one wireless hand held device. In yet other embodiments, such code can comprise code to store the supplemental data in a memory in response to a user input via the at east one wireless hand held device. In yet another embodiment, such code can comprise code to push the supplemental data for rendering via at least one other multimedia display. In still other embodiments, such code can comprise code to push the supplemental data via at least one of the at least one wireless hand held device, a remote server, or the controller associated with the at least one multimedia display. In other embodiments, such code can comprise code to filter the supplemental data for rendering via a multimedia display.

In other embodiments, a method can be implemented for displaying additional information about a displayed point of interest. Such a method may include selecting a region within a particular frame of a display to access additional information about a point of interest associated with the region, and displaying the additional information on a secondary display, in response to selecting the region within the particular frame of the display to access the additional information about the point of interested associated with the region.

In other embodiments, selecting the region within the particular frame to access the additional information about the point of interest associated with the region can further comprise selecting the region within the particular frame utilizing a mobile device. In still other embodiments, selecting the region within the particular frame to access the additional information about the point of interest associated with the region can further comprise selecting the region within the particular frame utilizing a gyro-controlled pointing device. In yet other embodiments, selecting the region within the particular frame to access the additional information about the point of interest associated with the region can further comprise selecting the region within the particular frame utilizing a laser pointer.

In some embodiments, the mobile device can be, for example, a Smartphone. In other embodiments, the mobile device can be, for example, a pad computing device (e.g., an iPad, an Android tablet computing device, a Kindle (Amazon) device etc. In still other embodiments, the mobile device can be a remote gaming device. In yet other embodiments, a step or operation can be implemented for synchronizing the display with the secondary display through a network. In other embodiments, the aforementioned network can be, for example, a wireless network (e.g., WiFi), a cellular communications network, the Internet, etc.

In another embodiment, a system can be implemented for displaying additional information about a displayed point of interest. Such a system can include, for example, a memory; and a processor in communications with the memory, wherein the system is configured to perform a method, wherein such a method comprises, for example, selecting a region within a particular frame of a display to access additional information about a point of interest associated with the region; and displaying the additional information on a secondary display, in response to selecting the region within the particular frame of the display to access the additional information about the point of interested associated with the region.

In an alternative system, for example, selecting the region within the particular frame to access the additional information about the point of interest associated with the region can further comprise selecting the region within the particular frame utilizing a mobile device. In yet another system, selecting the region within the particular frame to access the additional information about the point of interest associated with the region can further comprise selecting the region within the particular frame utilizing a gyro-controlled pointing device. In still other system embodiments, selecting the region within the particular frame to access the additional information about the point of interest associated with the region can further comprise selecting the region within the particular frame utilizing a laser pointer.

In some system embodiments, the mobile device may be, for example, a Smartphone. It still other system embodiments, the mobile device can be a pad computing device. In yet other system embodiments, the mobile device may be, for example, a remote gaming device. In still other system embodiments, the aforementioned method can include synchronizing the display with the secondary display through a network.

In other embodiments, a computer program product for displaying additional information about a displayed point of interest can be implemented. Such a computer program product can include, for example, a non-transitory storage medium readable by a processor and storing instructions for execution by the processor for performing a method comprising: selecting a region within a particular frame of a display to access additional information about a point of interest associated with the region; and displaying the additional information on a secondary display, in response to selecting the region within the particular frame of the display to access the additional information about the point of interested associated with the region. In some embodiments of such a computer program product, selecting the region within the particular frame to access the additional information about the point of interest associated with the region, can further comprise selecting the region within the particular frame utilizing a mobile device.

It will be appreciated that variations of the above-disclosed and other features and functions, or alternatives thereof, may be desirably combined into many other different systems or applications. Also, that various presently unforeseen or unanticipated alternatives, modifications, variations or improvements therein may be subsequently made by those skilled in the art, which are also intended to be encompassed by the following, claims.

Claims

  1. A system comprising: a display device that displays video content to a person wherein the video content comprises a plurality of frames;a media device that provides a video stream including the video content to the display device;a pointing device that the person uses to select a screen coordinate on the display device;frame specification data that identifies the video content of the video stream and which of the frames among the plurality of frames was displayed on the display device when the person chose the screen coordinate;a query comprising the screen coordinate and the frame specification data, the query containing data sufficient to specify a cursor aim point and to specify a frame of video among the plurality of frames, wherein the data sufficient to specify the frame of video comprises a media tag augmented with a composited annotation indicator containing annotation data for composited in scene elements in the video content, wherein the augmented media tag contains an ordered list of composited annotation indicators and the annotation data for the composited in scene elements indicates screen coordinates in the frame of video among the plurality of frames that correspond to a composited in item;and an element identifier provided in response to receiving the query, wherein the element identifier includes an element descriptor that is uniquely identified with an object that is displayable on the display device, the element identifier determined based upon the query comprising the screen coordinate and the frame specification data, wherein the annotation server annotates every pixel in the frame of video among the plurality of frames.
  1. The system of claim 1 further comprising an annotation server comprising an annotation database running on a remote server and wherein the annotation database receives the query and provides the element identifier.
  2. The system of claim 2 further comprising an annotation database accessible by an annotation module that causes the frame specification data, the screen coordinate, and the element identifier to be entered into the annotation database.
  3. The system of claim 1 further comprising an additional data server that produces element data based on the element identifier and wherein the annotation module causes the element data to be provided to the person.
  4. The system of claim 1 wherein the annotation server annotates pixels overlying elements in the frame of video among the plurality of frames or annotates points in selected frames of video among the plurality of frames, wherein each point annotated in the selected frames of video comprises an annotation coordinate.
  5. The system of claim 5 wherein each piece of annotated media content in the frame of video among plurality of frames is identified by a media identifier and wherein the annotation coordinate includes a frame identifier, a coordinate, and the element identifier.
  6. The system of claim 5 wherein the annotation coordinate also contains the media identifier.
  7. The system of claim 1 wherein: the annotation data specifies selectable zones wherein each selectable zone among the selectable zones comprises a combination of selectable areas and time periods during which the selectable areas are selectable;each selectable zone comprises coordinates on space and time and are specified as a series of discrete coordinates, a series of temporal extents and areal events, and as areal extents that change as a function of time when a selectable area among the selectable zones moves on the display, changes size during a scene in the video content of the video stream, or changes shape during the scene in the video content of the video stream.
  8. The system of claim 8 wherein the areal extents comprise at least one of: a bounding polygon or a closed curve.
  9. The system of claim 1 wherein the annotation data is synchronized with the video content displayed in the display device.
  10. The system of claim 10 wherein the video content is subject to video fingerprinting for identifying features in the video content for synchronization of the annotation data with the video content.
  11. The system of claim 11 wherein the element descriptor is used as a fingerprint in the video fingerprinting.
  12. The system of claim 12 wherein the element descriptor comprises a discreet cosine transform (DCT) descriptor for a scene in the video content.
  13. A system comprising: a display device that displays video content to a person wherein the video content comprises a plurality of frames;a media device that provides a video stream including the video content to the display device;a pointing device that the person uses to select a screen coordinate on the display device;frame specification data that identifies the video content of the video stream and which of the frames among the plurality of frames was displayed on the display device when the person chose the screen coordinate;a query comprising the screen coordinate and the frame specification data, the query containing data sufficient to specify a cursor aim point and to specify a frame of video among the plurality of frames, wherein the data sufficient to specify the frame of video comprises a media tag augmented with a composited annotation indicator containing annotation data for composited in scene elements in the video content, wherein the augmented media tag contains an ordered list of composited annotation indicators and the annotation data for the composited in scene elements indicates screen coordinates in the frame of video among the plurality of frames that correspond to a composited in item;and an element identifier provided in response to receiving the query, wherein the element identifier includes an element descriptor that is uniquely identified with an object that is displayable on the display device, the element identifier determined based upon the query comprising the screen coordinate and the frame specification data, wherein the annotation server annotates pixels overlying elements in the frame of video among the plurality of frames or annotates points in selected frames of video among the plurality of frames, wherein each point annotated in the selected frames of video comprises an annotation coordinate.
  14. The system of claim 14 wherein each piece of annotated media content in the frame of video among plurality of frames is identified by a media identifier.
  15. The system of claim 14 wherein the annotation coordinate includes a frame identifier, a coordinate, and the element identifier.
  16. The system of claim 14 wherein the annotation coordinate includes at least one of: a frame identifier, a coordinate, and the element identifier.
  17. The system of claim 14 wherein each piece of annotated media content in the frame of video among plurality of frames is identified by a media identifier and wherein the annotation coordinate includes a frame identifier, a coordinate, and the element identifier.
  18. The system of claim 14 wherein the annotation coordinate further contains the media identifier.
  19. A system comprising: a display device that displays video content to a person wherein the video content comprises a plurality of frames;a media device that provides a video stream including the video content to the display device;a pointing device that the person uses to select a screen coordinate on the display device;frame specification data that identifies the video content of the video stream and which of the frames among the plurality of frames was displayed on the display device when the person chose the screen coordinate;and a query comprising the screen coordinate and the frame specification data, the query containing data sufficient to specify a cursor aim point and to specify a frame of video among the plurality of frames, wherein the data sufficient to specify the frame of video comprises a media tag augmented with a composited annotation indicator containing annotation data for composited in scene elements in the video content, wherein the augmented media tag contains an ordered list of composited annotation indicators and the annotation data for the composited in scene elements indicates screen coordinates in the frame of video among the plurality of frames that correspond to a composited in item.

Disclaimer: Data collected from the USPTO and may be malformed, incomplete, and/or otherwise inaccurate.