Podcast: AI Model Tests AI Models To Tell Us How They Work

author avatar
Podcast: AI Model Tests AI Models To Tell Us How They Work

In this episode, we talk about MIT researchers making a smart tool (like a robot scientist) that uses AI to understand and explain how other AI brains (neural networks) work.

In this episode, we talk about MIT researchers making a smart tool (like a robot scientist) that uses AI to understand and explain how other AI brains (neural networks) work. It's like having a detective who can figure out what's happening inside a complex machine.


(0:50) - AI agents help explain other AI systems


Folks, we're living in a world where AI is taking over all aspects of the world as we know it. This team from MIT is gonna fight fire with fire. They've created an AI-ception, AI model that will study other AI, fighting AI with AI, to help us understand where AI is strong, where it's weak, and what we should be worried about in the future. I think this one left me a little bit flabbergasted, but it's super interesting, so let's jump right on into it.

I'm Daniel, and I'm Farbod. And this is the NextByte Podcast. Every week, we explore interesting and impactful tech and engineering content from Wevolver.com and deliver it to you in bite sized episodes that are easy to understand, regardless of your background. 

Daniel: What's up peeps? Like we said today, we're talking all about AI inception. I think, right? We're talking about a team from MIT that's creating a kind of AI robotic scientist type deal that understands other AI brains, other AI models, and tries to interpret them for us. So, in my mind, this is like, you're building an AI creation to go evaluate other AI creations and explain to us humans how they're working and how they're doing.

Farbod: Well, I gotta say, as someone who works primarily in test automation, I have a background working with QE, QA, this sounds at a glance like a recipe for disaster because you have one model that's trained on God knows what data set, trying to understand another model that's trained on who knows what data set. And there's so much unknown in there that you kind of say, okay, what's the standardized approach? How do you even approach this testing? Is it just two black boxes trying to understand each other?

Daniel: Well, and I think the black box analogy is a really good one, right? It kind of explains why we need something like this. Neural networks, big AI models, large language models, think of things like GPT, which are now becoming common use among people. If you're trying to understand how that works and explain all the intricacies of how that works and how effective it is and what the limitations are and where it's really, really good at doing certain things and where it's really, really bad at doing certain things, I think we're still, even with millions and millions of users trying to understand what GPT is really good at and what GPT is really bad at. So, what they're saying is, if you've got AI capability to study large sets of data, to understand and interpret parts of it, and we know that GPT is good at stuff like that, can we kind of create another AI monster in this case? The analogy I'm gonna use is like, in Mary Shelley's novel Frankenstein. Right. Dr. Frankenstein creates his monster. Yeah. And then his monster starts to talk and like have discourse about like its own consciousness and its own self-discovery. That's how I feel like we're doing with AI right now. Right. We're like we're creating an AI to study AI. And obviously, right, we have a lot of challenges understanding huge complex AI models. This team from MIT thinks that they can use this new tool which they're calling an AIA, Automated Interpretability Agent, that kind of does poking and prodding and tests and explains what's happening inside other AI models for a couple different reasons. One, so we can understand them. Two, so we can understand its strengths, right? Know where to apply AI models and understand where they're really robust. And then the third aspect there is understand their limitations as well. So that kind of three main benefits, right? We want to understand how these AI models work, two, what they're really, really good at, and three, what they're bad at so we know not to apply them in those certain scenarios.

Farbod: Absolutely. And one thing I'll refer back to again is thinking about it like in a traditional tester's perspective, you have all these different AI models coming out. A lot of them are vastly different products. Like you have the popular LLMs like ChatGPT, Grok, Bard, and then you have the mid journeys, which are completely photo based. To gauge every one of those stand alone, that would be a whole endeavor on its own. What AIA allows researchers to do is to have one AI model that thinks, it understands the scientific approach of poking and prodding, ingesting the feedback it's getting, and then coming up with new tests to again, gauge its ability. And I think that's the real value add here within the research that these folks at MIT CSAIL are doing.

Daniel: Well, and I think there's a key difference between what you do in test engineering and what AIA might be able to do is, I think a big part of what, you know, and correct me if I'm wrong here, a big part of what you're trying to do is predict and understand failures and then help debug them before they ever get out into the field. In this case, a lot of these large language models are already out in the field. Right. People are using them on a daily basis and we still don't understand every single computation on how they work and we still don't understand their strengths and limitations. So, what this AI model is intended to do is do a couple of different things. One, look at like kind of discretize a large language model into a bunch of different, they call them neurons, right? A bunch of different processing bits and understand how does this neuron work and how does it interact with this other neuron? So, it kind of breaks everything down into its constituent parts, tests all of them individually, and it truly creates test cases, passes them through the subject AI model that it's trying to learn about, and then gets out the results and then tries to draw conclusions from that. It does that at the part-by-part level, and then it also does it at the composite level interacting with entire models. So not just neurons, which are the building blocks that build up a model, it also helps try and evaluate the strengths, weaknesses, and kind of the step-by-step reasoning of how an entire model might compute as well.

Farbod: Absolutely. And the example that they gave in the paper, which I really appreciate it, it was something along the lines of it even pushes the model to understand how good it is at a given categorization. So, for example, you say, here's a tree, a car, and a toaster. Can you tell me the difference? And if it can, you're like, oh, cool. So, you know what a car is. Well, now that we're talking about cars, can you tell me the difference between a F-150, Ferrari Italia, and a Toyota Corolla. And if it can, you keep pushing it and pushing it and start drawing the boundaries of like, this is how well it knows this topic or how good it is at identifying this versus that. And it can do it in all the different levels for the different categories.

Daniel: Well, and that's one thing I wanted to mention is this AIA system uses pre-trained language models, which we know are good at understanding and generating human-like text. So, there's a lot of boundary cases that we're trying to understand, like you're saying, distinguishing between different types of automobiles, as an example for a certain neuron that is trying to process special types of cars. We're not sure whether AI is really, really good at that. But one thing that we know for sure, based on the large, vast use of large language models today, is AI is actually pretty darn good at interpreting and understanding human text and then generating new human text to interact with that. So, they're relying on one of, you know, we're relying on home base. We're relying on a strength that we know AI has and then using that to try and measure the capabilities of other different types of capabilities of AI that we're trying to develop, which I thought was pretty cool, right? Because it's not really a black box whether or not large language models are good at processing language. We know they're really, really strong at interpreting written outputs and then creating new outputs or new inputs, let's say, to put back into a system. We know AI is really, really good at that. So, it's leveraging one of the strengths of AI to try and measure the different strengths and weaknesses of new functions of AI that we're trying to develop, which I thought was pretty resourceful to use the thing that we know AI is really, really good at to try and save us time and honestly complete types of studies that are impossible to complete if we were just using human force, brute force to try and understand it, to try and measure the capabilities of all different types of different functions of AI.

Farbod: For sure, and I'll add one thing about the black box. I think where these, and I might be wrong, but this is just my general understanding of it, where AI models differ from a lot of different products is that you can tune the behavior by adjusting the various weights, but you still can't fully characterize the system once you're ready to release it, right? That characterization will only come from thorough testing of it. And that differs a lot from the other products that we're used to seeing in the field. Like for example, the backend of the Google search engine, I'm sure it has some detailed documentation somewhere that tells you if you do this, this happens and there's these safeties in place that prevent X, Y, or Z from failing. We don't have that level of characterization with these models and this is what the AIA can provide to us. So, it's not just a tool that's, I guess, helpful for the public to know how good or bad is this for using to do my homework, for example. I'm not saying that I'm not promoting that. I'm just putting it out there. It's also a good tool for the developers to be like, oh, I just tweaked this. Let's run these tests again. Did it have the expected performance gain that I was expecting for this categorization or whatever?

Daniel: And not only understanding the strengths and limitations of a model, also trying to break down and understand how each level of the computation works. Yeah. I agree. And one tidbit that I thought was really interesting here is so they developed this AI model that's trying to help us interpret and understand how AI models work, how good they are, how bad they are at certain different tasks. Then they're like, you know, we actually don't have a measuring stick to understand how good it is. We know it's pretty good. We know it's better than us. We know it's better than humans at certain tasks. How good is it, you know?

Farbod: Generally.

Daniel: I would say like on an absolute scale, not a relative scale. And they're like, huh, a test like that doesn't really exist. So, they went and created what I'm gonna call the SAT for these AI interpretation models. They call it FIND actually, which is less scary than SAT. It's the Function Interpretation and Description model. But they basically created a standardized test, a benchmark test that they hope to use for all different types of interpretation models to measure how good these AI models are at interpreting a large complex neural network and getting not only the proper understanding of the capabilities, but also the proper understanding of how every neuron in the system interacts properly to create the computations.

Farbod: We've been throwing AI models around for a second, to just reeling it back. You have AIA, which is an AI model that gauges other AI models. And then you have the FIND, which will be use as a measuring stick to gauge models that test other models, for example, like the AIA.

Daniel: Yeah, I agree. So again, to go back to the nicknames we've given them, AIA is like Sherlock Holmes, right? Poking around the different AI models, understanding what the mystery is. FIND is the one that we said was like the SAT. And both of these are related to the way that AIA, Sherlock Holmes, tries to interpret and understand the mystery that is other AI models that are out in the world. Lots of, like you mentioned, lots of different characters in this AI picture here. But I think the way that they developed FIND is really, really interesting because it's not only meant to check and understand does AIA work, it's also meant to understand how well AIA does its job. And one of the things that it does to like really push AIA, right? Sherlock Holmes to its limit is kind of creating boundary cases. So fine generates cases with bias, it generates cases with irregular inputs and outputs to try and understand whether AIA, right, Sherlock Holmes, can catch certain limitations in new AI models that might have bias baked into them based on the training data or might have irregular inputs and outputs if it's working near its boundary cases. So, I like how they kind of impregnated certain defects that they expect AIA to be able to find. And so far, they've actually said it's not perfect yet. I think, I forget the exact score that it got on the fine test, but I think it got less, less than half of the benchmark testing that they expect it to be before they call AIA perfect.

Farbod: That's not even, it's not even a C. Yeah. It's disappointing, but you know what? We've been doing this podcast for a little over three years now. We've seen MIT CSAIL lead the charge on a lot of AI, like novel AI initiatives. So, it's not super surprising that they're the ones that are trying to tackle this problem. At least it's the first that I've heard of it. And I think for what it is right now, it's promising. And I'm sure it's one of the things that they're gonna keep investing in in the coming years. So, we've said this before about a lot of technologies, but it's one of the ones I'm excited to keep up with as the months and the years come.

Daniel: Well, I agree. Especially the part we just mentioned at the end, right? FIND is meant to make tools like AIA, our Sherlock Holmes character in this entire thing. It's meant to make AIA really, really robust to find things like bias that's baked into AI, to find things like the limitations where we should and shouldn't use AI for certain applications. It takes me back to even a couple episodes ago, our three-year recap where we talked about the most impactful technology, we reviewed during the year 2023 was AI and understanding where limitations are and where humans should be kept in the loop. It sounds kind of counterintuitive here to say, we need more humans in the loop. We need to understand and monitor AI to make sure that it's safe with human oversight. And then me saying, I think we should use AI to identify those areas where humans should be involved. But I truly believe that this team from MIT is developing or aiming to develop a really robust tool. I would say, they would mention it. They're about 50% of the way there right now. Developing a really robust tool that can take a new AI model it's never seen before, tear it down to its constituent parts, start to understand all the different neurons, right? The building blocks that build up a model, understand what the limitations are, understand where there might be bias baked in. You know, this is a lot easier actually to use AI to help police other AI. In some ways I think about this as like a, kind of like a, you know, all these dystopian end of the world, apocalypse movies where AI takes over the world. Maybe the only possible way to overcome bad actor AI is with AI that we know is good, that that's strong enough to analyze it, understand where the weaknesses are, and let us know where humans should be involved. And they said that's actually the outcome they're gunning for, right, is something here where we've got AI policing other AI and telling us where we need to have human oversight, telling humans where we need to focus, where we need to look to make sure that, you know, we're not using AI for things like hiring with bias baked into it, and then hiring the wrong people because of that.

Farbod: Daniel just said, if you can't beat him, join him.

Daniel: Maybe.

Farbod: He's making his pitch so that when the AI overlords take over, they can listen back and be like, hey, I've been advocating for you guys.

Daniel: I was pro AI the whole time.

Farbod: But no, I definitely see where you're coming from. And I don't know, Timmy. AI and human collaborating to keep AI in line is peak collaboration.

Daniel: Yeah, and obviously a huge part of this is not just the fact that they've created this AI interpretation model, AIA, they've also created this test find, which I imagine other researchers will be able to use as a benchmark to understand how well their interpretation models are working as well. So, they didn't just create the new technology, they also created the measuring stick by which you can measure how well these technologies are performing, which again, kudos to this team from MIT CSAIL. We constantly see you at the forefront of developing new AI technologies and in this case also the standards to measure how well your AI technologies are performing.

Farbod: We're fans.

Daniel: Yeah, we've long been fans and we will long continue to be fans.

Farbod: Yeah, you guys should invite us out again. I wish, we haven't shared our MIT content yet?

Daniel: No, we haven't.

Farbod: We got a backlog to work through but before we wrap up. Let's do a little summary of what's going on here.

Daniel: Yeah, I will man. So, I'll try and wrap it up here. So, just as the creature and Mary Shelley's a novel Frankenstein, right? With AI, we've developed this monster that we don't understand quite how it works. But now this team from MIT is trying to tame that monster and allow it AI that can study AI to help us understand how different AI models work, where they might be strong, where they might be weak. And this tool they've created is called AIA, the Automated Interpretability Agent. But it's basically meant to test and explain what's going on inside other AI models. Let us know where we should use them, where we shouldn't use them, where there might be strengths, where there might be weaknesses, and then break down each new AI model to understand exactly how it works. They also created this test called FIND, which is like the SAT, the measuring stick by which you can understand how well these interpretability agents like AIA are functioning. Right now, they say they're about half of the way there, but it's because they've got a test like fine that lets them know how well their tool is doing and allows them to improve it in the future to where we might have AI helping us police AI in the future.

Farbod: That was one fine summary.

Daniel: And here's another you said if you can't beat them, join them, right? They're fighting fire with fire here. They're fighting AI with AI, making sure that bad AI doesn't take over the world. They're using hopefully good AI to help police it.

Farbod: Yeah, I bet. I bet you wouldn't care if it's good or bad, would you? You're aligning yourself with them anyways. Getting your vote in.

Daniel: Yeah. Trying to hide from them at least. I will say before we wrap up today's episode, we have to give a thank you to our friends in Taiwan. I think it's our first-time trending in Taiwan.

Farbod: You're correct.

Daniel: Top 200, we were actually podcast number 114 in technology in Taiwan. So, appreciate our friends in Taiwan. I'm gonna try and attempt, Farbod committed us to be something that might be a tall task for folks like me who suck at pronunciation.

Farbod: You're not alone, he listened to the last episode.

Daniel: We're gonna say thank you to every country, every new country that you help us trend in the top 200 in your predominant language in your country. So, in this case, from Taiwan, I'm gonna speak in Mandarin and say, Xiè Xiè Nimun, which means thank you everyone for listening to our podcast and helping us be a part of the top 200 technology podcasts in the country of Taiwan.

Farbod: That was pretty good. You gotta be easy on yourself. That was pretty good. I mean, you know what? I don't, I'm not from Taiwan and I've never been, so take it with a grain of salt, but I think it was good. And with that, thank you guys for listening. As always, we'll catch you the next one.

Daniel: Peace.

As always, you can find these and other interesting & impactful engineering articles on Wevolver.com.

To learn more about this show, please visit our shows page. By following the page, you will get automatic updates by email when a new show is published. Be sure to give us a follow and review on Apple podcasts, Spotify, and most of your favorite podcast platforms!


The Next Byte: We're two engineers on a mission to simplify complex science & technology, making it easy to understand. In each episode of our show, we dive into world-changing tech (such as AI, robotics, 3D printing, IoT, & much more), all while keeping it entertaining & engaging along the way.


The Next Byte Newsletter

Fuel your tech-savvy curiosity with “byte” sized digests of tech breakthroughs.

More by The Next Byte

The Next Byte Podcast is hosted by two young engineers - Daniel and Farbod - who select the most interesting tech/engineering content on Wevolver.com and deliver it in bite-sized episodes that are easy to understand regardless of your background. If you'd like to stay up to date with our latest ep...