In this post I'm going to show a perspective on the future of mixed reality that might help clarify what you're seeing in industry news. Specifically the perspective that mixed reality experiences are the mainstream goals and companies are simply keeping pace with one another until hardware is available. Most of these observations aren't new, but I think it's important to see a possible big picture. First some terminology so we're all on the same page.

Virtual reality (VR) - A virtual world with no connection to the real world except for headset tracking and user inputs.
Augmented/Mixed reality (AR/MR) - Augmenting reality with virtual elements like user interfaces and objects. While augmented and mixed reality are largely interchangeable, mixed reality is sometimes used to explicitly refer to blending of the virtual and real world. To do this, geometry in the real world is scanned allowing virtual objects to seamlessly exist in the real world and interact with it. Users can manipulate virtual items like they would real ones. Examples are like placing furniture in your house from a store to see how it would look before buying it.
Pixels per degree (ppd) - The horizontal resolution divided by the horizontal field of view per eye.
Foveated rendering - The eye has foveal vision at the center which is high quality and peripheral vision that's less high quality. Eye tracking can be used to ensure high quality rendering where the eye is looking and lower quality outside of foveal vision. The brain is none the wiser if this is done correctly and small artifacts are ignored. This can be exploited such that even though a display is massive in pixel density (say 16K per eye) the rendering cost is primarily at the center of vision, so only a fraction of the pixels are rendered at a high quality. (See DeepFovea which renders 10% of the pixels). This also lowers the bandwidth required when streaming content - assuming a low-latency connection. It also lowers the power usage by requiring less rendering.
Inside-out tracking - The headset tracks itself using cameras on the device allowing it to track over a large or infinite area.
Geometry reconstruction - Using inside-out tracking to 3D scan the world and place a tight fitting triangle mesh or point cloud over everything.
Scene understanding - Identifying objects in the world. This can be simple from floors, ceilings, and walls to more advanced systems that attempt to identify everything.

Companies

Currently the following companies are iterating various technologies related to mixed reality. Some are a lot more public with their research, but all have various teams dedicated to the R&D.

Amazon, Apple, ByteDance, Google, HP, HTC, Lenovo, Magic Leap, Meta, Microsoft, Panasonic, Philips, Pimax, Qualcomm (reference designs), Samsung, Sony, Ultraleap, Valve, and Varjo

Pimax and Valve might be the outliers currently for focusing on just VR without evidence of further visions. (Though Deckard is a standalone VR headset with inside-out tracking so Valve is definitely still iterating toward MR features). Sony might seem similar, but they have mentioned having goals beyond PlayStation.

So what do these companies picture happening and why is mixed reality a big deal?

Monitor replacement

A headset capable of over 60 ppd can begin to render virtual monitors that appear identical to physical ones. Think about how much you, your work, or others spend on computer monitors, phone displays, and TVs. Essentially most of the market for these can be taken by an adequate MR device. Any industry using a display can be disrupted by this. Game consoles like Nintendo Switch and handhelds like Steam Deck are good examples. As MR devices become mainstream casting to them becomes the standard such that having a separate display makes little sense. Imagine a future console that was built into an Xbox Elite controller and cast to your headset. Furthermore, older features like split screen on a console would instead cast to multiple independent MR devices. So when mainstream adoption spreads all the money that went to the TV manufacturers begins going to whoever sells the MR headsets. It's no surprise then that almost all TV manufacturers have R&D in this area not even specifically for the display hardware, but to control the headsets as well.

This might beg the question, what about people that don't have a headset and can't view a shared display? As with most technology the initial cost will be large, but as MR grows and eats into the marketshare of other displays the costs will drop. Simple cheap lightweight display glasses might exist for viewing monitors at a desk. It's also probably that cellphones can be used to view MR content during the transition period. If you don't like glasses, it's likely there will be contact lenses available also. The hardware required for the glasses is so tiny that the jump to contacts isn't as large as it might seem, though it might have disadvantages.

Extracting information from the world

In order for virtual objects in a mixed reality environment to look seamless the real world must be analyzed (usually by machine learning techniques) and information extracted. The following information would be extracted:

Light sources and properties: Their location, color, temperature, intensity, and the shape of the light.
Material properties: From a rendering perspective this involves calculating metallic and smoothness values for the surfaces.
Reflections and transparency: Identifying mirrors and shiny and transparent objects can prove challenging. A number of papers have tackled this to some degree.

The computation required for this is expensive. Various companies above have solved pieces of this; however, nearly all solutions are held back by computing currently. The hardware required for high quality results will happen, but not for a while unless it's offloaded to a powerful desktop or more specialized and expensive chips. What this means is while many basic applications are feasible some advanced mainstream applications that customers might want are simply not possible right now.

Basic Applications

Wikipedia has a list of possible applications ranging from simple to advanced.

One application is overlaying information on the world and buildings. Looking at an intersection and seeing clearly defined street names. Another common example is overlaid directions on the ground when walking someplace or in a store when searching for items. Google has been able to do this for a while, but holding a phone up has never quite been elegant. Google is very aware of this.

Collaborative seamless working

Casting oneself virtually to another person's environment would offer a nearly identical experience of being there. Even a physical whiteboard or document could be instantly shared by looking at it by either party. With eye, face, and pose tracking realistic avatars can offer seamless presence. Companies (especially Microsoft and Meta) have long envisioned and prototyped these kind of seamless social interactions at work and home. Their hardware however has fallen short of mainstream expectations, but it is clear they have a direction they're going for.

Extreme data collection

In some configurations a headset can always be recording and collecting data. This means it can scan the world and store a persistent memory of everything that can be recalled. When combined with machine learning this allows for retrieval of information. "Where are my car keys?" "You placed them on the sofa yesterday. Their last known location has been marked for you." Historical queries have been possible with services like Google Timeline for a while to discover somewhere you visited a year ago but forgot the name of. MR allows such queries, but at a much finer detail depending on the data that's stored. It should be noted that video isn't required for this. A temporal knowledge graph that simply remembers descriptions, times, and locations of events can be sufficient for creating compact histories that can be searched.

Interfaces everywhere inside

In addition to replacing larger displays, the displays on various Internet of Things (IoT) devices can offload their interfaces to an MR device. Home and office automation software would handle the permissions and security as normal like when accessing from a mobile device. The difference being that instead of using a phone or tablet you'd simply look at a device like blinds or a light and activate the item.

Interfaces everywhere outside

A standardized Bluetooth Beacon can offer MR interfaces for various devices when in range. Walking up to an elevator for instance could display elevator buttons and a wayfinding system. In a secured building using the interface's elevator controls could unlock floors identically to how hotel cards do in some buildings. These Beacons can also be used at restaurants to transmit menus without QR codes.

In the further future stores could display all prices using virtual price tags.

Gaming experiences

In general the idea of overlapping a fantasy world over our own captures many of these experiences. With edge cloud computing utilizing 5G and machine learning the idea of creating a world over our own procedurally is realistic. This includes identifying objects in the real world and replacing them with themed versions. With powerful enough hardware an MR headset could perform face and pose tracking on regular people changing their appearance. A game set in a fantasy setting would then look like it was in a fantasy setting replacing fire hydrants with wooden posts and people with villagers. Others players could be rendered with their armor and items.

Pokemon Go had a number of people imagining what one of these experiences might look like later. The idea of real-time battles in 3D space among different players and NPCs. An actual MR game could interact with the physical environment and introduce procedural virtual elements.

Humans vs Zombies is a game played with Nerf guns. In MR, defining boundaries (safe zones) and rules is easier. It's also easier to create powerups, abilities, and more complex mechanics. In practice one could remove the Nerf guns entirely and use virtual weapons allowing players to scavenge ammo and identify zombies and survivors from their headset. Players could also play virtually moving around separate from their physical location, but still visible to others as an avatar.

These kind of gaming experiences can be computationally demanding. Different pieces are easier than others. Various companies like Google, Meta, and Microsoft have invested research into pose tracking. Getting it to run quickly and not drain one's battery is still an open area of research, though Google's work is fairly performant. It's not mandatory, but it would go a long ways to making an MR device feel feature complete.

Mixed reality operating systems

There's always a concern for privacy and how much will be on the cloud. As monitors are replaced at work and home, users will connect to PCs with their MR headset. This connection in theory will remain connected and is only limited by network latency and bandwidth. Operating systems on the headset and on user's PCs could move toward supporting MR devices in this way. Similarly entertainment systems that would normally control TVs and projectors would cast to one or more headset through WiFi or the Internet. It's probable that an MR device could offload compute to multiple connected PCs if one had a work and home PC based on latency and resources required.

On the actual device, this area of research is still fairly open. Meta and Magic Leap use Android which seems like it will evolve into an MR operating system over time. Apple has rumors of building a new RealityOS. The big difference between a cellphone and MR device is how multitasking can function. While a cellphone has one or maybe two foreground applications running, an MR device can have multiple pinned applications in one's FOV active at a time. For example having sticky notes on your left pinned to a wall while your work monitors are running and a weather app runs. Many of these can be in a paused state, but it's also possible to have a bunch of Bonzi Buddy/Tamagotchi apps active on one's desk. MR OSes have to handle these cases especially when one looks away or walks to another room. Some applications would also follow the headset and be in a more persistent state like if you have Google Maps directing you when walking while also playing a game. All of these applications have to take turns rendering and in many cases exist at different depths. A complex example would be an application that generates snow with snowflakes that fall in front of and behind other objects from other applications. All of this can be quite expensive on hardware and requires special rendering.

Mainstream hardware

So what hardware are these companies waiting for? The goal for a truly mainstream device is one that looks like a pair of glasses, lightweight, allows real light in, discrete cameras, eye tracking, face tracking, hand tracking, long battery life, and a powerful computing unit either on the device or connected wirelessly (think a phone without a display). The hardware for each of these is at various stages.

Displays: MicroLED allows for 16K per eye displays at 240Hz+ which are effectively perfect for single focal plane rendering where objects appear static in the world as the head/eyes moves. At 210x135 degrees field of view per eye that gives over 60 ppd not taking into account optics which might skew the pixels closer to the center resulting in a higher ppd where it matters. The brightness of MicroLED is also sufficiently high enough to work in any environment including direct sunlight. Companies (like JBD and Samsung) are well aware of hardware timelines and the cost associated with mass producing MicroLED displays. They require foundries on the scale of chip fabrication, but this is doable albeit expensive right now.

Optics: Metalenses are small nanostructure that can focus light. These structures are incredibly tiny and can be tuned for specific wavelengths for subpixels on a MicroLED display. Right now this technology is probably the most promising for producing glasses thin configurations. Similar to MicroLEDs these require specialized manufacturing similar to chips and might need to be printed directly over the top of MicroLEDs further complicating things. Alternatively they might need to be printed on the lenses. Another avenue is more advanced waveguides, but various companies have ran into limitations with them.

Opacity filter: This terminology might not be right, but this refers to an optical component between incoming light and the display light source. The opacity part refers to the ability to gradually block light that's incoming. If this is done on a per pixel basis it means the display can render darkness and thus shadows. Without an opacity filter all the light in the world is visible to the user and the display can only add light to the scene. Since the goal of mixed reality is to simply add objects and not disrupt the normal view of the world there's no way to make shadows unless an opacity filter can darken areas of the world by removing incoming light. This per pixel transparent piece of hardware has not yet been invented. The closest is dimming regions like in Magic Leap 2. As the component gets smaller it's believed there will be diffraction issues to deal with making this quite complex.

Cameras: Event cameras allow for extremely fast tracking at around 10K Hz for inside-out positioning and eye tracking. Rather than producing full pictures every frame they emit events for intensity changes at individual pixels. Their ability to detect small pixel changes makes them ideal for pretty much all tracking including face and hand tracking. It is not necessary to have a full picture of the face to extract information as muscle movements cause detectable changes elsewhere in the face. What this means is eye tracking cameras that can see a cheek might be enough to extrapolate full face movements. Also since event cameras don't necessarily deal with full images the computation required can be lower and thus more energy efficient. (There are other technical things that can further reduce power related to how they operate, but I digress). Event cameras for hand tracking are almost mandatory as tracking the hands and generating depth masks at 240Hz+ is generally not feasible using standard cameras due to motion blur. Specialized depth cameras might be doable for this, but they have to produce high quality masks. A user would expect their hand to render on top of objects and visually pass through them without appearing fuzzy on the edges. For eye tracking this high sample rate is ideal and can even be used to predict eye movements. As mentioned eye tracking is required for foveated rendering and mandatory when dealing with 16K displays as rendering that much at high quality is infeasible. Currently no company utilizes event cameras, but both Samsung and Sony manufacture them. It's unclear what plans either has for them. For reference, Apple is rumored to have 14 regular RGB cameras on their headset, which requires a lot of image processing. I'd say this indicates most companies are still a far ways off from a compact glass's friendly solution. An event camera headset I'd expect to have 4 wide angle tracking cameras and 2 eye/face tracking cameras for a total of 6 cameras.

Power: Computing in a decade will be more powerful and energy efficient. By ~2030 there will be solid-state batteries offering higher energy density, increasing the battery life for headsets. In addition wireless power phased array transmitters exist and per the FCC are allowed to operate at 15W with approval. Utilizing one of these it's possible to have a very small battery and charge continuously in an office or living room allowing unlimited use. If such wireless power is ever widespread is yet to be seen, but as the hardware becomes cheaper to produce it might be inevitable.

Computing: It's probable a standalone computing unit would connect wirelessly to a headset. This would allow the headset to be very lightweight. In the further future a 5G low-latency connection could offload work to edge servers or a person's own computer simplifying the computing requirements. Another part of all of this is offloading operations to custom ASIC chips to both free up general computing units and lower the power requirements on the device. Things like headset tracking, eye, and face tracking can be handled by a dedicated chip. Creating these chips are expensive, but the benefits of having them means a huge competitive advantage. A number of the larger companies are well-versed in producing such chips. It's also possible after a time Qualcomm will simply offer such chips. That said, it cannot be understated that some advanced scene understanding tasks using machine learning are very expensive. I foresee this being a roadblock for a while for more ambitious ideas.

Controllers: Hand tracking is one form of input, but using hands for prolonged periods of time is not ideal for many applications. Using eye tracking to select menus and a small clicker to select might be sufficient for simple applications. Both Quest Pro 3 and Magic Leap 2 have inside-out tracking cameras on the controller which paints a good direction for where controllers are going. Valve Index has a hand strap to hold the controller without gripping the controller. Combining these two setups will probably result in a future controller. An inside-out tracking camera on the back of the hand strap along with other inside-out cameras could see in all directions for each controller allowing the controllers to assist in full body pose tracking. That is the users arms and legs would be tracked by the controllers when not visible by the headset. Furthermore such camera configurations means an avatar could be generated by simply moving one's arms around for a second to get a scan. Similarly moving one's hand over something to capture data would be enough to get a detailed scan. This would use event cameras as mentioned before, so probably 3 per controller. With the headset's 6 that would be 12 total cameras that could scan the world as one walks around.

There are other hardware designs that could work for a mainstream device, but I'd argue they'd be of equal complexity. The big picture is when the hardware is ready and widespread, no company wants to be behind. Missing even one piece of the MR puzzle could be disastrous and put a company behind on market share for a long time. I will say there will be a point when Qualcomm and others offer reference designs that fulfill mainstream requirements, but companies that have a brand and widespread adoption on storefronts might be difficult to unseat. That's assuming that a company that falls behind has the cash to comeback after years.

Metaverse

One of the points raised for a metaverse type environment not taking off is the lack of the above mainstream affordable hardware. Companies like Meta weirdly ignore this point, presumably to gain brand and market share awareness by starting with VR and simpler experiences. A VR-based metaverse seems unlikely to become mainstream. Apple's Tim Cook succinctly explained why in 2017 when he mentioned AR doesn't isolate. Essentially an MR headset and applications can always be active while the user does other things in the world. There's no VR and the real world dichotomy where a user simply disconnects from one to use the other. A phone is a closer example where you still kind of interact with the world with a hand holding it. The primary difference is MR is more immediate and hands-free. In a conference call you can see your coworkers as they are and only the clients virtually. It's not isolating and is instead minimally virtual. This is even more minimal than a regular conference where you see their backgrounds. You just see the people.

Conclusion

While some companies have VR products at the enterprise or consumer level, the big picture is MR and ensuring they keep pace with competitors. It's very likely having these long-term research projects are seen as waste and thus VR products are pushed out the door to get money back. (Not that this is bad, since it's creating pancake lenses and refining controller designs). The actual mainstream MR hardware is still a ways off from being affordable to produce. Even when it's first available it'll be 5K+ USD just from the MicroLED displays I suspect. It's hard to estimate though as there are no real data-points for a lot of this hardware. I digress, but companies are definitely working toward a science fiction level device, but all the various pieces from hardware to software are 5+ years away.

As a side-note it's fascinating watching Meta burn through billions and seemingly not unlock the hardware aspect of this. (Though they do have connection to MicroLED R&D, but they are very silent on it. Their machine learning research does seem fruitful). Perhaps that shows money isn't necessarily the solution right now but time is.

Comments

Dull_Veterinarian_33 t1_itkty6y wrote on October 24, 2022 at 11:57 AM

#200,034

I still don't see how it could be sold to a large public.

Either you need theses devices for very complex task, or for very simple task.

For everything in between it s not really needed.

Also we are already living in a mixed reality... the information in the brain is always superposed and interacting with the reality before our eyes.

Once you have acquired the information from the MR device, you don't need it anymore.

Who has to learn 50x new things each and every day ? no one.

the need for information is always limited.... but basically MR/VR/ETC is selling information.

So imo the need for theses devices is also limited, and they tend to become counter productive quick. (imparing vision , distraction, giving unwanted information, or sollicitation,etc)

Sirisian OP t1_itmdmuo wrote on October 24, 2022 at 6:34 PM

#205,613

Replying to BlaineBMA (#199,794)

I was very critical of Glass back then and thought it was going to poison the well of AR by having one display. I did not expect it was the camera part that showed up in the news. I keep wondering if that will comeback with MR which will have way more cameras.

I think the tablet setup for self-driving cars will probably be what we see for a while. Especially when the steering wheels are removed. The main thing is you need some form of input and touch is intuitive.

> Meta VR appears to require more of a disconnect from reality.

That's the issue of using passthrough video signals. Can't see people's eyes or anything and for enterprise and collaborative systems with people in a room it's not ideal. I don't envy the people trying to sell such systems. Hololens for enterprise is much closer to what people expect and even then the hardware is fraught with issues.

Sirisian OP t1_itmjdp1 wrote on October 24, 2022 at 7:10 PM

#206,239

Replying to Dull_Veterinarian_33 (#200,034)

> I still don't see how it could be sold to a large public.
> Either you need theses devices for very complex task, or for very simple task.
> For everything in between it s not really needed.

This is a very real point. I kind of tried to show a similar viewpoint that a non-mainstream device can only do the "basic" stuff and thus has very little utility. Similar to Meta trying to do very basic collaborative conferences, but being unable to do much more complex scenarios. Their recent realization about people expecting legs on avatars for instance showing even their basic experiences required some more advanced features. When the hardware doesn't accommodate that it falls short.

I really don't think it can be sold to the public until it's a complete mainstream device that covers basic to complex tasks. It's one of those devices where you hand it to someone and once it's there people will be like "okay, I can't go back to 2D displays". It's something you'd have at work and home and is part of your life like a cellphone once was. Anything that you just take off and set to the side will end up similar to VR headsets.

A perspective on mixed reality and industry directions

Comments

[deleted] t1_itkotv8 wrote on October 24, 2022 at 11:01 AM

BlaineBMA t1_itkr29f wrote on October 24, 2022 at 11:26 AM

kimmeljs t1_itksa6s wrote on October 24, 2022 at 11:40 AM

Dull_Veterinarian_33 t1_itkty6y wrote on October 24, 2022 at 11:57 AM

bigboyeTim t1_itky569 wrote on October 24, 2022 at 12:36 PM

Shot-Job-8841 t1_itm6aiq wrote on October 24, 2022 at 5:47 PM

Sirisian OP t1_itm74n9 wrote on October 24, 2022 at 5:52 PM

kimmeljs t1_itm7rxf wrote on October 24, 2022 at 5:56 PM

Sirisian OP t1_itmdmuo wrote on October 24, 2022 at 6:34 PM

Sirisian OP t1_itmjdp1 wrote on October 24, 2022 at 7:10 PM

Sirisian OP t1_itmkn1o wrote on October 24, 2022 at 7:18 PM

BlaineBMA t1_itmkyzl wrote on October 24, 2022 at 7:21 PM

Rauleigh t1_itnfe54 wrote on October 24, 2022 at 10:45 PM

ILikeCutePuppies t1_iua39ru wrote on October 29, 2022 at 7:04 PM