Podcast: Generational Discovery and The New Age of Computer Vision
In this episode, we explore groundbreaking research from Princeton University where scientists have developed ultracompact cameras utilizing metasurfaces—engineered materials that manipulate light in novel ways.
In this episode, we explore groundbreaking research from Princeton University where scientists have developed ultracompact cameras utilizing metasurfaces—engineered materials that manipulate light in novel ways. These tiny devices not only capture images but also process visual information using light-based neural networks, enabling rapid and energy-efficient image classification. This fusion of optics and artificial intelligence heralds a new era in computing and imaging technology.
This podcast is sponsored by Mouser Electronics.
Episode Notes
(3:08) - A new way of seeing, a new way of computing
This episode was brought to you by Mouser, our favorite place to get electronics parts for any project, whether it be a hobby at home or a prototype for work. Click HERE to learn more about the role of optical sensors in the tech that makes our daily lives seamless and accessible!
Become a founding reader of our newsletter: http://read.thenextbyte.com/
Friends, let me ask you a very simple question. Have you ever looked at something like a car, a laptop, a phone, and wondered, does this really have to be this way? Is there a better way to do it? Because in today's podcast, we're talking about a professor at Princeton University who did just that with his PhD student, specifically for cameras. And in the process of doing so, they have completely changed what a camera or a lens could be. And it's going to cause a revolution in terms of computer vision and object video processing.
What's up friends, this is The Next Byte Podcast where one gentleman and one scholar explore the secret sauce behind cool tech and make it easy to understand.
Farbod: Welcome back folks, and as you heard, we are talking all about optical sensors today. But before we get into today's episode, let's talk about today's sponsor, Mouser Electronics. Now you probably know by now, but Mousers, our favorite sponsor, and that's because they like to share cool information with the average person, just like we do. So, they want to make it easy to understand. And we're linking a resource in our show notes today. This resource is all about the value of gesture-controlled devices. So, I think a couple months ago, BMW announced that if in their flagship models you wanna change the song that you're listening to, you don't even have to interact with the touchscreen anymore. You just have to wave the next button and it just does it for you. And I know, it's amazing. And the same applies for some of these high-performance headphones that you're seeing out there where you don't even have to, again, press a button, you just sign on the ear cup, like pause or play or whatever, and it just does it. And I think maybe it's just me, but sometimes you wonder how all these things are happening. So, what Mouser has done in this resource is break down one by one the powerful technologies behind all these things from proximity sensors to time-of-flight sensors and super relevant to today's article, optical sensors. They talk about how they work, how they're applied to these products and what the future of it looks like. Again, if anything that I just said piques your interest, even in the slightest, definitely check him out. As always, we're linking him to show notes. And yeah, I found it to be a pretty good precursor for the topic we're talking about today.
Daniel: Absolutely. And it just goes to show, again, you said it, but this, I feel like Mouser is like our spirit animal. If our spirit animal was a spirit electronics distributor. Mouser is our spirit electronics distributor. Taking cool, interesting technology, making it easy to understand. That’s what we're all about. So, check out that link in the show notes.
Farbod: And it's funny, every time we do an article, I'm like, I wonder if they have a resource for this because it's so niche. They always do. It's insane.
Daniel: They don't miss.
Farbod: They really don't. But I guess with that said, let's segue to today's topic. And we're taking a trip up north to Princeton. This effort is actually a collaboration between Princeton and the University of Washington. And we're talking all about, again, optical sensors, the new age of cameras, the new age of processing, exciting stuff, but it's got a really great story. So, it's all about this professor who, when they first arrived at Princeton, they wanted to see if it was possible to get more information out of the existing camera slash lens architecture, right? So, the way a lens works is light comes in, it bends it a certain way, the image sensor of the camera then picks it up, and then you have computing that is interpreting it to give you the actual image that you're looking at, right? So, the professor was like, I wonder if this is just underutilized and there's more information within the actual light that's being, you know, relayed back to the sensor that we could do things with. And he actually, I think, published some research about how to detect objects that are behind blind corners or like behind some fogs and things like that using machine learning. So that was really powerful. But then he kind of hits this wall and is like, I actually think we need to start rethinking what a camera even is or what a lens could be. Like first principles thinking, like does it actually have to be a piece of glass that is bent and therefore passing light around? And it just so happens he started asking this question around the same time that he got his first PhD student. And together they started investigating metasurfaces, which is right up our alley, Daniel, right? We love anything at the nanostructure level, let's say.
Daniel: And metasurfaces, by the way, they look flat to the naked eye. But if you go down to the microstructure or the nanostructure, there's tiny shapes included in this flat, it looks flat from far away. But if you zoom in with a microscope, it's got a bunch of specially designed shapes on it. In this case, they're using those to control the way the light is bent as it passes through the surface. Right. So as opposed to having a lens, which is supposed to be smooth and curved, and it's meant to bend the light as it passes through. These meta surfaces are designed to diffract light and pass light through different channels in different directions. In my mind, it's like helping, I guess the way a camera works or the way that computer vision works is we're lenses, cameras to help collect the light. And then it's passed to the computer where it tries to sort that image data into what's relevant before it gets passed to the AI model. But in this case, we're using the lens that's doing some of the presorting before it even involves a computer. So, you're sorting the light before it cut or before it gets a sense by the light sensor. And before it even hits the computer, you're some of the pre computing. You're like pre-digesting the light before it even makes its way to the light sensor.
Farbod: Absolutely. And this might not be the best analogy, but what came to my mind is like the existing camera lenses operate kind of like a dam where they hold the water back and they just let it out from the other side whenever they want to. So, it's just like big stream before a big stream afterwards. Whereas what you get with these meta surfaces is kind of like intricate plumbing. Right. The water's being routed in all these different waves, big pipes, little pipes, pipes connected together with these, you know, four-way connectors, things like that. And by the way, the way these metastructures do their routing is by this nano architecture of what looks like a little city. If you look down these structures that come out of it. Right. And these things, like the reason they look so flat to the naked eye is because they're like 1000th of a millimeter tall. And even like, they're even skinnier than they are tall. it's like, tons of tons of these spikes that are manipulating light and diffracting in all these different ways. But that's the mechanism that's happening here for metasurfaces. And you kind of hinted or kind of got the answer right there, but like the reason this is so powerful is because these metasurfaces can start filtering the image data that's already coming through almost like what a computer would do at a very low level, they are able to do with hardware instead of software. So again, PhD student and professor, they started investigating these metasurfaces. They see a lot of value in it because now you're extracting more information than you would if you were just passing it through with a conventional lens. So, what that means is that you can create a miniature lens. I think they said the size of a grain of rice. And this was back in 2021. I think they published their findings. It took the world by storm. Everyone was like, this is fascinating. You can get full color images from these tiny, tiny lenses. And they didn't stop there. They were like, I think we can push this even more. And the reason they did this was again, because they realized that there was actually filtering happening at that hardware level, which could be incredibly powerful for object detection algorithms. Now AI has, you know, taken the world by storm. You have video doorbell cameras in your home that are doing postman detection, dog detection. You have autonomous vehicles that are using cameras to detect, am I about to hit a car? Is there a person in front of me? Things like that. So, the value of computer vision has gone up a lot over the past, let's say 10 to 15 years. Latency still exists. And that's kind of inherent to it because image processing can be computationally heavy. And quick primer on how image detection works, right? You have these deep neural networks which have a bunch of filters and they start filtering at the smallest pixel level. So, imagine if there's like a hundred-by-hundred pixel array. They start looking at, you know, a two-by-two grid at a time. They're like, all right, am I looking at nothing or am I looking at the edge of an object? And they start scaling these pixels up from two by two to four by four to five by five, whatever. So, they can go from, am I looking at an edge to am I looking at an eye to am I looking at a face? What's happening here and the reason it's so powerful with this lens is that those low-level computations of, I looking at something or nothing, or am I looking at an edge or again, an eye? That is actually happening at the hardware level. So, the only thing that the computational program has to do, the only compute expense you have here is only the high-level detection. Like, is this a person or is this an animal?
Daniel: And to elaborate there on the hardware level. Like obviously this is awesome to take this thing that's very computationally intensive and then pass it into the hardware. So, it's not being done in software, it's being done in hardware. But to be more specific about how it's being done in software or in hardware rather than software, they call these kernels, which is a little bit confusing because you might think we're talking about software kernels or firmware kernels, but we're talking about optical filters. They call these kernels. They can analyze the full image as light passes through, but they use zero electricity, there's zero latency. It happens at the speed of light because it is the light passing through these lenses. So, to go back to your analogy of randomly looking across a matrix of a hundred by a hundred pixels, it's like as though by the time it reaches the software algorithm, by the way that the light was passing through the lenses, it had already sorted those into blocks that maybe makes it about as complex as like a 10 by 10, as opposed to a hundred by a hundred. Or in this case, it's actually more like turning a thousand by thousand into a 10 by 10 because they were able to get a hundred X improvement, 99% reduction computational power required to detect. I think they were doing horses versus dogs to detect images. They got a 99% reduction in compute power required because it doesn't use any power. Doesn't use any compute power. Doesn't take any, it's not taxing at all to the system. It's using light itself, passing through these lenses, these metasurfaces that look like flat materials with tiny shapes, they're able to sort the light as it goes through by the time it hits the sensor and software has to process it. It's already been pre-sorted and it can complete the task 100 times faster.
Farbod: Absolutely. I mean, you already hit the nail on the head on why it's so impactful with the basically using 1% of the compute that the conventional methods are using. But what really stands out is that AlexNet, which is the industry benchmark for computer vision object detection to beat. When comparing it against AlexNet, this process was able to actually beat it in the databases commonly used by industry and academia. So not only is it substantially more power efficient, it's also just better at detecting things. And one thing that really stood out to me is that the professor, when discussing their when’s. First of all, they were like, this could not have been possible in industry because it takes two different domains to come together that are so drastically different. Like no one's thinking of, if you go to, I don't know, DeepMind, if you go to Anthropic, they're not thinking about how do we integrate optical sensors here, right? They're thinking about how do we optimize the weights of our LL model, for example, to be 1% more accurate. But if you bridge the gap, which academia tends to do, you get some fascinating things happening with interdisciplinary research. So, the professor was like, this was possible because we're in an academic setting and we get to be the curious people that we are, which enables these awesome discoveries that the industry can benefit from and scale, which is better for the average folk like you and I that are using this technology on a day-to-day basis.
Daniel: Absolutely. And they broke out of the box here, right? Cause even if you were a camera engineer, camera hardware engineer, you'd be focusing on like, how can I reduce the latency? How can I increase the frame rate? How can I reduce the amount of power that my camera requires to get the data to pass to an AI model? But you wouldn't be doing what these folks did, which is like, hey, we're actually not going to record a perfect image. We're going to filter the light before it even hits the sensor. So, you're going to get filtered data. You're not going to be able to take a perfect image in color like their grain of rice could before, but now they've reduced this to the size of a grain of salt. And what they're getting is the most important part, which where's the edges, where are the borders? What's the lights? What's the darks? That's what's being used for object detection. You actually don't need a full perfect image for that. You can just filter it to get the parts of the image that are most important. And that's what's being passed through. You alluded to it earlier, but it's a very first principles approach. If you're using an AI model to filter, to find these borders between light and dark and to find the edges. Why not just only pass that light through? And that's what these meta surfaces are doing. They're only passing through the lights and the darks and the edges, which helps massively increase the speed at which you can do this image recognition with less computer power.
Farbod: A hundred percent. And what I love about this story, because it is a good story, is that there's a bit of a full circle moment here. Back in 1981, it was a Princeton professor who asked a question of, could we mimic the way a human brain works by mathematically representing neurons, right? Which led to neural networks and then deep neural networks that are powering the computer vision algorithms we use today. And the professor won a Nobel prize for it. So now here we are, you that was a revolution, a paradigm shift. Here we are, we're 40 years later. Same institution is once again, trying to revolutionize the way we do image detection. I don't know, one of my favorite stories that we've done in a minute. The impact and implications of this are just fascinating to think about. For me, I remember months ago at this point, we did the Toyota Research Institute collaboration with Stanford where they were tandem drifting two cars and they were talking about why high-performance computing is so critical because you're trying to figure out, am I going to hit this car or not? Like it could be a life-or-death situation. So, you need to be as efficient as possible. And now you have cars on the road already, leveraging computer vision technology for autonomous driving like Tesla's. Imagine the kind of benefit that those cameras could have leveraging metasurfaces like this for their lenses.
Daniel: Insane.
Farbod: I don't know.
Daniel: One last thing I want to mention here before we wrap things up. I was going to say I have a full circle ending too, but yours was much more interesting than mine. So, I'm not going to call it a full circle. But one of the things that's interesting about this is there's a lot of pre-work required to design the best layout of pillars to complete these image tasks.
Farbod: Right.
Daniel: And basically, design this meta-surface, design this tiny matrix of skyscrapers, like you're saying that bends light in the correct way so that the sensor only picks up the lights and the darks and the edges and filters the light before it even hits the light sensor. What's interesting is the team figured out a way to use machine learning to help design the best layout of pillars. So, they're not just manually designing every single part in this matrix manually by themselves. They're basically using machine learning to tell them what they need to do to filter the light. And that's helped them design the optimal layout for these meta surfaces. So, it's basically instead of like, I don't know, them doing all the hard work themselves, they're like, hey, AI computer vision model, what do you need me to do to the light? Okay. Can you help me design this filter to do it? Okay, let's do it. And they did it and it worked. And it used 99.4% less computer power than usual for the same comparable image recognition tasks.
Farbod: I think that's a good full circle moment too. I don't know what you're talking about. That was solid.
Daniel: I like your more. I like, I think I like history a little bit more than like, hey, let me show AI at every step of the process.
Farbod: Understandable, but still good. All right. So, to wrap things up folks, basically what we're talking about here today is this professor who was trying to figure out if we can make more sense of the light that is captured by a conventional camera lens can be utilized to detect objects around corners or behind a fog or something like that. Just decided to question everything about what a camera is or what a lens is. And in the process of doing that, invented a completely new lens that is 99% more efficient and it pre-computes all these images coming through to the point that it can beat AI algorithms that are doing object detection. All that to say, what has been accomplished by this team will be a paradigm shift. It'll be a revolution in the computer vision world, and it'll find its way in terms of impacting your life for autonomous vehicles or even the video doorbell that's on your home's front door.
Daniel: Boom!
Farbod: Money. That's the pod.
As always, you can find these and other interesting & impactful engineering articles on Wevolver.com.
To learn more about this show, please visit our shows page. By following the page, you will get automatic updates by email when a new show is published. Be sure to give us a follow and review on Apple podcasts, Spotify, and most of your favorite podcast platforms!
--
The Next Byte: We're two engineers on a mission to simplify complex science & technology, making it easy to understand. In each episode of our show, we dive into world-changing tech (such as AI, robotics, 3D printing, IoT, & much more), all while keeping it entertaining & engaging along the way.