[Originally published as a Medium article on January 21, 2018]
In January of 2015, Microsoft announced the HoloLens, a wearable augmented reality headset, to the public. This was a huge catalyst for me. I had been secretly working on the HoloLens team for over a year, but on the data analytics side, rather than… the cool stuff. It was during this time that I realized how exciting virtual and augmented reality could be, both for my career and for my personal life. Rather than stay on the sidelines, I decided to get into the game and learn how to develop HoloLens applications.
At the time, I had some experience with 3D modeling from high school, but zero 3D development experience. I decided to get involved with the Seattle Virtual Reality meetup where I met a ton of passionate people with similar goals, participated in some hackathons, and dedicated a major portion of my free time to learning 3D development tools. I was certain that this new technology was the next big thing, so I wanted to get in early and plant my flag in the VR/AR app stores.
The rest of the year was humbling. I gradually learned that building 3D apps and games for VR and AR headsets by yourself is HARD. It requires a massive amount of time and specialized knowledge to build anything beyond a prototype. You need to learn how to develop with a game engine, something I’ve gotten good enough at to be paid a salary for, but that I’m still learning to this day. You also need to learn how to create 3D models, texture them, and animate them. This is a career on its own. Even if you have these skills (and all of the other requisite skills I didn’t mention), you need to devote a colossal amount of time to a project and not get distracted with new ideas. Despite my efforts, I did not plant my flag in any app stores.
During this time, I learned a lot, had some very interesting discussions, and spent a lot of time thinking about the future of the industry. Notably, I developed my opinion that AR had a lot more potential than VR. The way I see it, virtual reality is well suited for entertainment and not much else. Augmented reality, on the other hand, has massive potential as a tool we can use at work and in our everyday lives. Sure, right now the headsets are clunky. They hurt to wear for more than a few minutes at a time. But when the form factor improves, it can overlay information on top of anything you see (or can’t see) in the world around you. That’s powerful. It’s also non-trivial.
The (next) big problem with AR
As I continued to work with the HoloLens, I realized that while the short term problem is the display technology and the form factor, the next big problem will be environmental understanding. Sure, I can overlay info in the world, but what’s the point if it’s not overlaying something interesting? I’ll explain.
When you develop a HoloLens app, you have access to a 3D mesh of vertices of the surfaces the sensors detect around you. The mesh is highly imperfect, unique to every room, and constantly updating as the environment changes (i.e. things and people are physically changing positions/orientations) or the user moves to new areas. As a developer, you have to get pretty creative to successfully use the 3D mesh. One of the best solutions is to do some math to detect roughly flat horizontal and vertical surfaces and interact with them accordingly. Another solution is to pre-map the environment where you’ll be using the app and add triggers for when the user looks at a particular thing. Both of these solutions break down pretty quickly as soon as the user enters an unexpected environment.
It’s fairly easy to see why the “find flat surfaces” method is pretty limited, but it’s also not too difficult to see why the “pre-mapped environment” method isn’t practical. As a developer, I have no idea where my users will try to use my app. Even if I do know that my users are restricted to controlled environments, I have to model these environments in my app and sync up my position in the virtual world with the actual world. The most common solution, is to just force the user to place content where they want it. Once placed, you hope that the environment doesn’t change and ruin the experience. I’m not satisfied with any of these options.
Artificial intelligence is essential
Without specific knowledge of what a user is looking at, developers cannot create compelling, large scale, flexible augmented reality applications. That means it’s imperative that we use AI to leap the gap and actually understand environments, the objects in them, and the situations taking place.
AI can be used to detect objects and people in an environment, feed that context to the application, and allow for all sorts of interesting scenarios. The AI can decide where to put content and can move it if something changes.
With the power of AI, all sorts of new applications for AR are unlocked. I’m constantly imagining new ideas where AI finds a problem, then AR gives a hands free, visual indicator to a person that can solve it. On top of that, by putting cameras on so many heads, we can have visibility that can never be achieved by cameras bolted onto architecture.
Let’s imagine an application used on a construction site. A statically mounted camera monitoring a work site uses AI to recognize that a stack of building materials has spilled into a high traffic area. A worker wearing an AR headset receives a notification with specific instructions to clear the spill with visual directions to the incident. All workers operating heavy machinery in the vicinity are then warned of the hazard if it affects them, including being given visual indicators of where the problem is, which route to take to avoid it, and finally, some indication when the issue is resolved. None of this is possible without artificial intelligence classifying and reacting to a complex situation.
My recent efforts at learning AI/Deep Learning
I recently moved to Austin, TX to work at General Motors. I was hired for my AR/VR experience to build interesting prototypes alongside an eclectic senior dev team with diverse technical skills. A few months ago we decided we should probably get around to learning AI development, but it took us a while to get moving on it. I actually had it on my daily to-do list for an embarrassing amount of time before taking substantial action. We finally decided, as a team, to dive head first and enroll in the Deep Learning Nanodegree on Udacity. It’s a challenging course and I’ve been spending six to eight hours per work day learning this stuff so far. It’s not free, but the cost pays for excellent instruction. I find the course absolutely fascinating and since I’m not content going at the pace they’ve set, I’m about a month ahead of the syllabus. Even still, I can tell I’m only scratching the surface.
I’m excited to begin applying what I’m learning and just as excited to see how the technology evolves over the coming years. I see so many applications of it inside and outside of the AR space and I think the implications it will have on us as people will be staggering. The way humans do work will transform radically over the coming years. I highly recommend that if you have any interest in the field of AR or AI, you start learning now, because these skills are going to be invaluable. Fortunately, there are tons of free online resources, from open source software frameworks, to public data sets, to tutorials and courses.
Thanks for reading!