Over The Edge

Human Learning Versus Artificial General Intelligence with Ananta Nair, Artificial Intelligence Engineer at Dell Technologies

Episode Summary

This episode features an interview between Bill Pfeifer and Ananta Nair, Artificial Intelligence Engineer at Dell Technologies, where she leads AI and ML software initiatives for large enterprises. Ananta discusses differences between human learning and AI models, highlighting the complexities and limitations of current AI technologies.

Episode Notes

This episode features an interview between Bill Pfeifer and Ananta Nair, Artificial Intelligence Engineer at Dell Technologies, where she leads AI and ML software initiatives for large enterprises.

Ananta discusses differences between human learning and AI models, highlighting the complexities and limitations of current AI technologies. She also touches on the potential and challenges of AI in edge computing, emphasizing the importance of building efficient, scalable, and business-focused models.

--------

Key Quotes:

“It's very hard to take these AI structures and say that they can do all of these very complex things that humans can do, when they're architecturally designed very differently. I'm a big fan of biologically inspired, not biologically accurate.”

“Do you really need AGI for a lot of real world applications? No, you don't want that. Do you want some really complex system where you have no idea what you're doing, where you're pouring all this money in and you know, you're not really getting the results that you want? No, you want something very simple outcomes razor approach, make it as simple as possible, scalable, can adapt, can measure for all of your business metrics”

“We have reached a point where you can do the most with AI models with minimal compute than ever. And so I think that is very exciting. I think we have reached a point where you have very capable models that you can deploy at the edge and I think there's a lot of stuff happening in that realm.”

--------

Timestamps:

(01:20) How Ananta got started in tech and neuroscience

(04:59) Human learning vs AI learning

(15:11) Explaining dynamical systems

(26:57) Exploring AI agents and human behavior

(30:43) Edge computing and AI models

(32:58) Advancements in AI model efficiency

--------

Sponsor:

Edge solutions are unlocking data-driven insights for leading organizations. With Dell Technologies, you can capitalize on your edge by leveraging the broadest portfolio of purpose-built edge hardware, software and services. Leverage AI where you need it; simplify your edge; and protect your edge to generate competitive advantage within your industry. Capitalize on your edge today with Dell Technologies.

--------

Credits:

Over the Edge is hosted by Bill Pfeifer, and was created by Matt Trifiro and Ian Faison. Executive producers are Matt Trifiro, Ian Faison, Jon Libbey and Kyle Rusca. The show producer is Erin Stenhouse. The audio engineer is Brian Thomas. Additional production support from Elisabeth Plutko.

--------

Links:

Follow Ananta on LinkedIn

Follow Bill on LinkedIn

Episode Transcription

Producer: [00:00:00] Hello and welcome to Over the Edge. This episode features an interview between Bill Pfeiffer and Ananta Nair, artificial intelligence engineer at Dell Technologies, where she leads AI and ML software initiatives for large enterprises. Ananta discusses the differences between human learning and AI models, highlighting the complexities and limitations of current AI technologies.

She also touches on the potential and challenges of AI and edge computing, emphasizing the importance of building efficient, scalable, and business focused models. But before we get into it, here's a brief word from our sponsor.

Ad: Edge solutions are unlocking data driven insights for leading organizations.

With Dell Technologies, you can capitalize on your edge by leveraging the broadest portfolio of purpose built edge hardware, software, and services. Leverage AI where you need it. Simplify your edge. And protect your edge to generate [00:01:00] competitive advantage within your industry. Capitalize on your edge today with Dell Technologies.

Producer: And now, please enjoy this interview between Bill Pfeiffer and Ananta Nair, artificial intelligence engineer at Dell Technologies.

Bill: Ananta, thanks so much for joining us today. You are an AI engineer, so we're going to be talking about the AI aspects of Edge, which is kind of awesome and fun. But first, a little bit of background, if we can. How did you get started in technology? What brought you here to us?

Ananta: Well, it was a bit of a meandering path.

that was really laced with fair fortune and good mentorship and a lot of chance, right place, right time. So I'm not one of those people who knew what they wanted to do when they were growing up. I was actually very far from that. And so when I enrolled in my undergrad, I chose to do psychology and neuroscience, mostly because it seemed like a palatable choice.

Nothing profound, but I thought, you know, the brain is interesting. The mind is interesting. [00:02:00] People are somewhat interesting. And I was like, you know, let's Let's go with it. Let's see what happens. And it doesn't really force me into a particular career path. And I'm, I kind of meandered around a little bit in my undergrad trying to figure out if this is what I wanted to do.

And it wasn't until my final semester, I took a class called psychology of perception, which also I took on a whim where we got into like the physics and the math of how do you actually like learn from the world, which is something like I had never heard of. I was like, this is a thing.

Bill: I want to take that.

Ananta: Right. And it was really cool. Like we got, uh, we, we learned a lot about like psychophysics. We learned a lot about, you know, how do you code, you know, how do you build out experiments in Python, lots of statistical analysis. And so that's what got me really excited about, hey, there's more to this. field. And I think it was like that physics, that math, that computer side of it that got me super excited.

It was through the professor who taught that class, Lou Harvey, that I met another professor in the department, Randy O'Reilly, who specialized in computational cognitive neuroscience, which I also did not know was a [00:03:00] thing. And I worked with his lab and then with his company. And essentially we build models of higher order cognition.

So essentially thinking of, you know, how do you understand people? their behavior, what makes them intelligent? And then how do you model that computationally? And so I spent a lot of time building out those kinds of models. We kind of compared them to existing AI models, traditional deep learning models.

And I think over time I kind of moved from what was biologically realistic to more biologically inspired. And I worked for that company for, I think about six or seven years. And now as I've kind of transitioned into this role at Dell, been here about maybe two and a half years. I feel like I got, I get to kind of get the best of both worlds.

And I feel like I get to kind of take that interest in technology one step further, because like, I still get to carry on a lot of my research. So I do a lot of research with, you know, how do you build out these models of higher order cognition, a little bit afraid to say the word AGI because it comes a bit loaded, but we can unpack that later.

And I also get to kind of balance that [00:04:00] big pie in the sky problem of AGI, the big picture problem with. What is more realistic, which is, you know, how do we help Dell customers actually build solutions that are practical, implementable, you know, understandable, that kind of get measured on metrics like ROI and KPI that you don't even think about in the research world.

So I feel like I, I kind of have meandered a lot, but I've landed up in this spot where I feel like, you know, ideally, I feel like I get best of both worlds and it's, it's really exciting to kind of see technology from both sides.

Bill: Pretty cool. I want to look into the psychology of perception. If you hear my keyboard going later, that's what I'm doing.

It's pretty cool. So with AI, we're always talking about thinking about exploring this idea of AGI. You mentioned it, artificial general intelligence. So we'll talk more about that. But getting there, we have to know more about how people learn and how we help machines, some sort of AI. learn equivalently, what have we learned about how people learn?

Ananta: Oh, this is a [00:05:00] very loaded question, and I'm going to try to not meander too much. Otherwise, we will be here till the end of time. Um, but I, well, I would kind of start with, you know, artificial intelligence models are very loosely inspired by the brain, right? Especially when you look at, um, more of the trending models today, like transformers.

I think we saw it a lot more with some of the older architectures, like, you know, spiking neural networks, Hopfield networks, Boltzmann machines, even variational auto encoders. I'm an acolyte, I

Bill: understood all that.

Ananta: You know, there are different types of architectures, but you know, kind of the leading models that we have today, they're very loosely inspired by the brain. And I think there's a fundamental difference with how people learn and how these AI models learn. So like, you know, when you think about people, we're very complex.

We are very dynamic. We're dynamical systems where we're learning from this very complex world, right? And kind of how that works is that you have different sorts of like expert systems. So kind of think, you know, Your visual [00:06:00] system, your auditory system, you have specializations. And then in the part of the brain that makes us really smart called the neocortex, you have, you know, even further specializations like error monitoring or reward integration or selection of your next action.

So, you know, getting very granular and. Generally, these experts are put together in some sort of hierarchical structure, right? So they're going to be learning together, so they're going to be communicating laterally. It's not like your eyes are learning and your ears are not hearing things. They're all learning together.

All of these systems learn together. And then there's also bidirectionality. So what that means is basically, you know, you can have your eyes communicate up to the more complex regions, or you can have the more complex regions come all the way down and tell you what to do. And I think, you know, you have these really complex things that, that work.

And so, you know, in, in our mind, we're trying to recreate the real world, right? The real world, we're trying to have some sort of representation of that. And we do that by taking little pieces of knowledge, little pieces of things from these expert systems in what are called generally in the field, they're called abstractions.[00:07:00]

And the brain has what is called a global workspace. So that's where essentially you have a whiteboard and you can just throw things at it. You can have all these things kind of interacting with each other. It's very much targeted because, you know, unlike AI models, you can't just throw lots of data, lots of compute at people.

Uh, we do learn from a lot of examples, but we have, you know, restraints like energy, you're restrained by the limitations of the real world. And so a lot of these kind of drive people. We develop needs, we develop goals, we develop some sort of motivated behavior, and we kind of move in some sort of path, even though we may meander.

We do move in, in some sort of directed path, right? How our behavior unfolds. And I would say just kind of at that point, like it's just taking this up to the surface level. That's very different from how, you know, say LLMs, because a lot of people associate LLMs with AGI today, or AI models in general.

That's very different from how they learn, right? They have curated data sets. Now they use a lot of human annotation, but they essentially throw a bunch of data at them. And they have to go and figure out what is the [00:08:00] meaning in it and then figure out how do I solve for things. And this is not to kind of take away from the fact that, you know, how the AI models do it today is really cool and it's really interesting.

And I, you know, love to see kind of how they use the sine cosine waves and how they, you know, take out information and how they build these like really complex multidimensional embeddings. It's all really cool. But I think it's just very different. And, you know, you can't just isolate language as the key for intelligence, right?

Language is definitely important, but it's not like the end all be all, right? You know, we're taking in things from a lot of different sensory modalities, and we're learning and we're building on that. Language, you know, plays a role. Like, I think it's very good at, you know, how do you take things like a wheel?

or a door or a window, you know, and it constructed into something like a car, right? And then how do you build upon that? Where you can say, okay, I, this is what a car is made of. This is how a car drives. You know, I, this, these are the motor movements that I need to know to drive a car. This is how I could drive a boat or a go kart based on what I know about cars.

You know, this, there's a lot of like this [00:09:00] unfolding complex dynamics. It's like a graph that's, you know, continuously unfolding lots of related things. We don't really have that. with AI models, right? And, and language is, is important for, you know, how we think about things, both for like our internal, you know, how we're reasoning through things.

Should I be doing this? Or is this a really bad idea? As much as it is for like, you know, how do we communicate? You know, social communication is huge for people and. The AI models is, you know, it's not to say that they don't have any of it. I think they have little pockets of this because it's very hard, very expensive to replicate.

Even if you look at like the VJAPA models, uh, OpenAI had some really interesting work done in reinforcement learning, which is like a branch of AI, where they built out like these, um, agents where they did corporation competition dynamics. So how can they either corporate or compete for playing hide and seek?

You know, teams of two, you see a lot of really interesting behaviors unfold. You see a lot of like interesting, very intelligent things are seemingly intelligent things unfold there. And I think we can replicate these kinds of pockets, [00:10:00] but when you actually look at, you know, when you actually sit down with your, your magnifying glass and you really look at the details of these AI models, how are they learning?

It's very different than how people learn. And so, you know, you could change things in that game. And the AI model won't be able to fully compute, you know, compute what has happened. Now it's doing the wrong thing, even though it was trained. And there was some really interesting work that was done several years ago where, you know, they took, you know, some very simple games like collect keys to open chests or, you know, collect the coin and they, the model mastered it, obviously.

But once they move the, the coin to a different location or once they add too many keys, you know, they try to see what the model is doing and it's not learning the goal of the game. It's not learning a purpose. Whereas I, and I did an experiment with a kid in grad school, just, you know, trying different games that it's never seen and asking it to tell me what is happening here.

And the kid could reason through it, and I promise I had parental supervision before someone comes after me. So I, I did have, it was my professor's child. But, you know, I feel like, you know, even very young children, I think the child was about two or [00:11:00] three, could reason through that. These AI models don't have these very flexible architectures where they can get to that point.

And so I think there's a huge difference. And even just, you know, not to go on a tangent, because I can talk about this till the end of time, but to even think about the learning mechanism. You know, learning in the brain is so complicated. In AI models, we have just what is called backpropagation. It's the key mechanism.

So all you're doing is loss minimization. So how do I minimize this loss? And how do I get the most reward? I mean, that, that works well in, in specific scenarios, right? And you know, that it's, it kind of builds on essence of. calculus. So, you know, calculus is really good at these little tiny changes that lead to, you know, very accurate results.

Um, and I think it, it works really well. And it's one of those things that's very controversial because people worry it can't scale. It has all these problems, so no one can find anything better, like most good things. So far, so far. And, you know, there are a lot of very smart people looking for alternatives, but I think that's very different than how people learn, you know, we're, we're [00:12:00] very, different.

We're very dynamic in the way we learn. It's kind of comparing apples and oranges. It's very hard to take these AI structures and say that, you know, they can do all of these very complex things that humans can do when they're architecturally designed very different. And, you know, I'm a big fan of biologically inspired, not biologically accurate, but I think there are fundamental differences in that architecture to get to that point.

But I think this is also probably where I need to put my science hat away and then kind of transition more into like the business side of things where it's like, do you really need AGI for a lot of real world applications? No. You don't want that, right? Do you want some really complex system where you have no idea what you're doing?

You have, you're pouring all this money in and, you know, you're not really getting the results that you want. No, you want something very simple. You know, Occam's razor approach, make it as simple as possible. Scalable, you know, can adapt, can measure for all of your business metrics. You know, how do you get to that business value as soon as possible?

And so I [00:13:00] think for most everyday, most business applications. AGI is not what you want. It's like the furthest thing from what you want. You just want to build these smaller, more capable systems, and that's going to help you actually be able to utilize the AI.

Bill: So I hadn't actually thought about the connection to language.

It would be interesting to see what that does. There was a book a while ago, The Geography of Time, that was talking about, you know, tropical people who live in tropical climates. You walk to the ocean, you get some fish. You get tired, you just lie down and sleep. And the weather's not bad, and the growing season is long, and food is plentiful, and so time is just kind of like, uh, whatever.

And so their sense of time and their language about time changes it, and the way they think about Things related to time is just very different than people who live up in the mountains where there's a very Short growing season and you have to get wood and you have to band together in a community or you will die And you know, they are [00:14:00] very precise about their time and their community planning and working together and their language reflects it And it changes the way they think about so many things.

I wonder It would be interesting to see more about How, what language we're using and what scenarios we put the AI into as we're training, it changes that because I'm sure there are, I'm sure there are nuances to how we're training it that we just haven't thought about before, and you'll probably get different results.

You did say two things in there that I wanted to follow up on the first, you said something about dynamical systems, which I have never heard before. Which is kind of fascinating, and I want to know more about that. And the other one, I'll say now before I forget about it, rightly, is expert systems. And I know in the past, that was how they were trying to build AI.

So in your experience, mind, what is an expert system? Instead of AI or as relates to AI or as part of AI. Uh, what does that mean? But first dynamical systems, because what?

Ananta: What is [00:15:00] that? So it's, it's a really common thing used in like engineering or physics or math and also actually neuroscience. And if you're familiar with John, uh, Hopfield, we just won the Nobel prize.

He worked with a lot of like more neuromorphic dynamical systems. And so essentially what it is, you know, a system that can model data or data points and kind of see how their state changes over a period of time. So, you know, think about the weather, think of your points as being like either, you know, temperature or humidity or wind speed or something like that.

Or you could go even higher. You could do something like a particular weather condition, right? It has to be focused at a specific time because the whole point of this is to kind of look at how things change over time. So what dynamical systems do is they're governed by certain rules, you know, law, the, in this case, it would be like, say, laws of physics.

And what you're looking at is, you know, how do these things based on those rules change over time? And generally it's used for very complex scenarios. Like, you know, weather is one example and you're trying to get at the most accurate [00:16:00] prediction. And weather is another great example, because I feel like we're so terrible at predicting weather.

It can be raining outside and my phone says, Oh, it is sunny right now. Might rain soon. Exactly. You can never predict it. And so usually they use very complex scenarios and getting to that point of accuracy is pretty hard. Right. And you know, they're not just linear. They can, a lot of times they're nonlinear.

They're very complicated system, but they can apply really well to a lot of different things and kind of two things that pop up in dynamic systems, kind of building on that complexity thing. First of all, you have what is called. chaos. So that, you know, chaotic behavior. So kind of think that some sort of input that you have leads to kind of weirdly different outputs.

And I think this is commonly the butterfly effect, and I'm so bad at quoting things. So I'm going to remember this example very badly, but the butterfly that. flaps its wings, I think, in like Brazil and at least like a hurricane somewhere in America. You know, some sort of crazy series of events [00:17:00] that are completely unrelated that you would never think a butterfly could cause a hurricane.

So that's kind of what happens with a lot of these more complex dynamical systems is this chaotic behavior. And then you also have what are called attractors. And so what an attractor is basically kind of like a state that the model kind of almost like default falls into, and it'll stay there. It'll linger there for a fairly long time in comparison, and then it'll move on to the next thing.

So it's some sort of like pocket of stability. And so, you know, you use them a lot for different sorts of, you know, engineering applications. You could use them for different sorts of physics. They're used a lot in math, but they're also used a lot in like neuroscience, where you can not just model up, like you could not just.

fully use them to model human behavior. Like you could take things like, you know, you're experts, like your, your visual, uh, system, your auditory system, some sort of prior knowledge. And you could see based on that, what is the next behavior going to be, right? What is my motor action? What am I going to do with this information?

Or you could even go more granular. You can go further down to the neural level. You can [00:18:00] look at. you know, neural activations for specific things, for these same things, for that vision, for that auditory. And you can kind of look at how the brain is going to transition across time and how it lands up in like, you know, an attractor system.

Maybe that could be, you know, how do I take my visual neurons firing, my auditory neurons firing, and I land up in this attractor of like a memory that I'm pulling out. And then I'm going to move on to the next thing. And so you have a lot of these, like, you know, it's a way to kind of represent really complex things.

And that's actually kind of the basis of a lot of the Hopfield work. So I'm a big fan, because I think more than something like backpropagation, which is, you know, just derivative, it's just minimizing that loss. What dynamical systems kind of give you is how and why is a system changing. Because I feel like as humans, we learn, why is something changing?

You know, if I push something, I know exactly what is going to happen. That may not, maybe not with precision of how far I may throw something, but I still know what will happen based on certain, you know, [00:19:00] rules of physics. And when you get to it, you know, more complex behavior, I feel like that's what we're doing.

Where these really complex dynamical systems that are unfolding over time, really complex graphs that are pulling from different things that are related, memories, you know, learned behavior. that we can kind of iterate over. And those are things that we end up using for our cognition. And I think some of the arguments people make, which I think are very valid is if you took a model, you trained it on chess.

Now you want to have it learn some completely different game. I don't know, lacrosse or go or something, you know, something that is different depending on your scale of what is different, you know, it can't really readily apply a lot of those same rules and people do that. And so I think that's where that importance kind of comes in.

And this is not just, again, say that. AI doesn't have it, AI does have pockets of it, there are models called neural ODEs that kind of cover aspects of that, where they're a little bit more in, kind of aligned with that philosophy, but I think we're still kind of far from that in most traditional popular AI networks.

Bill: Okay, so that takes dynamical systems [00:20:00] over to expert systems then. Experts of

Ananta: dynamical systems.

Bill: Um, and I, it was kind of funny because you referenced expert systems a time or two in your explanation of dynamical systems. I

Ananta: meant to connect them, but I forgot. My attractor state did not settle. It kind of went on.

Bill: There you go. Oh, that was

Ananta: a really bad nerd joke. I'll stop. It's

Bill: okay. It's okay. I'm sure everyone was laughing, at least on the end. Expert systems, 70s, 80s into the 90s. That was how people were trying to build AI. And then we moved on toward more neurological models. But you mentioned expert systems a couple of times as relates to people.

in particular. And so what is an expert system either as differentiated from or as supporting part of AI or what is it? Where does it fit?

Ananta: There are lots of different types of expert systems. There are a lot of people, depending on who you ask or what pocket of the field you're in, they may use it somewhat differently.

And I think a lot of the complexity also comes now [00:21:00] because a lot of these colloquially with how mass reaching AI is. So in that vein, I guess starting broadly and then I can dive into the different types of expert systems in the brain versus AI. So broadly an expert system is a system that's specialized for a specific task or a specific data type and it handles problems specific to that.

task or data type. With the brain, you have different modalities. You have a visual expert and then you have an auditory expert. And both of these are like dedicated structures. They process different types of data. They have different types of architectures. They train together and they also talk to each other.

So they form these overlapping combined representations. With AI, at least in the multimodal space, as far as I know, it's a lot of focusing on different modalities. So you have different You could have text, you could have audio, and a lot of these models are trained separately. So especially in more [00:22:00] complex domains, it's really hard to train these models together.

And what generally happens is in multi metal domains, they train the model separately and then they have what is called fusing or fused states. And so that's where they get to talk to each other. So it's, it's a fairly complicated, uh, feat to endeavor, but generally compared to the human brain, these AI models are, do it fairly differently.

Now that term expert can go further. A lot of people would call mixture of experts, uh, as an expert model. Some people would not. It just depends on who you ask. Uh, but that's what most people think of when they think of expert systems in AI. Uh, it's different from the multimodal side of things. It's more about what is called ensemble learning.

So generally, these are systems where you have a specific specialization, and then you have this, like, mechanism, it's called a gating mechanism that sits on top, and it basically picks who should be involved and what. And these types of systems are generally trained together. It's not to say that this is, like, easy to [00:23:00] do.

It is. This is hard to do and you have a lot of these loss spikes, but these kinds of experts and AI are trained together. It's just very different from what we would think of in the brain and some of the multimodal side of things. And it can be an ever evolving kind of train of thought. Some people would call reinforcement learning an expert system.

I don't think I would agree with that, but reinforcement learning is an interesting one because I feel like it was, it's used a lot for foundational models and I think a lot of people are excited by that and a lot of our customers wanted to use it. for their LLMs. But I think most people have stopped using it now just because it's hard to do.

You also have agent based approaches that some people call the experts. Again, don't think I would agree with that. But you know, that's where you have a master model and then kind of worker models. So you have either models that specialize in a particular task, like one could be like a code assist and one could be like a data reader.

You can also devolve into these student teacher expert systems. And then if you jump. feels completely, and you land up in robotics, you have a whole different view of the word [00:24:00] expert. Because now you have these more autonomous systems, they deal with the real world, they're more complex, they're more noisy, and they're more hard coded.

Generally in robotics you have a lot of hard coded systems that At least I would say I personally am not a fan of. You know, when you land up with these hard coded systems, they have a lot of hard coded rules. They're not emergent. They're very good at what they're trained on. They're very rigid. But outside of that, they don't do so good.

So depending on what part of AI you're focusing on, or what part of the general, I guess, computer science field you focus on, different people think of. experts in different ways. So it's like divulging into oblivion, but to make a short answer long, I guess, depends who you talk to. And I guess my view of it is far more neuromorphic.

It's more about systems that have specialized structures, specialized encoding of information, and then they formulate explicit rules, dynamics, and reasoning to operate within those constraints into those bounds.

Bill: Which is why I often ask people, what is it to you, because I don't [00:25:00] know that there is necessarily like this is the answer.

It's still evolving and a lot of it is very personalized. Some of that took me down a different direction as humans. One of the things, maybe one of the biggest things that differentiates us from an AI is curiosity. We see something, we don't know it, we go, what's that? And then we figure it out. And that's how we fill in a lot of the gaps of our knowledge.

AI models aren't curious, so they don't try to go out and keep learning. They don't look for new things. You trade them, you stop trading them. That's that. They, they are what they are. Have any, any sense of, are we even looking for what the root of curiosity is and how we would teach an AI to do that? Do we want AI to be curious or is it just gonna go off and search, you know, surf Facebook all day and waste a whole bunch of computation power and not ever do anything?

Is that a path that we're trying to follow? Is it a path that we can follow today?

Ananta: Um, [00:26:00] I think you have aspects of curiosity. So like, I think some of the most interesting work that kind of aims to replicate human like behavior happens in reinforcement learning. I think this is probably just very poetic, but reinforcement learning is the most interesting part of AI, and yet it really sucks at being scaled to the real world because, I don't know, I guess that's just how the world works.

But it's, reinforcement learning basically kind of builds on how, you know, that, that same mechanism of how humans learn, right? That same of how do you, if you're put in a particular state, what is an action you take to get to the next state? You know, if you're hungry, what is an action you take to get to the point where you're no longer hungry?

And so there are a lot of different algorithms that do this. There's a lot of, you know, or at least for a period, it felt like that was something that was being incorporated into LLMs. I think most people have kind of moved away from that because it's just very complicated, but reinforcement learning as a field is really interesting.

And it's actually probably one of the areas that Most people would find really fascinating because they'll apply it to, and I'm not a video game [00:27:00] person, so I'm probably going to name these games badly because it's been a while since I've read these papers, but like, you know, I think there was a Doom 2, Starcraft, and different sorts of games that they, they most commonly started with the Atari games.

And so they'll basically build out these agents and they'll put them in this world and say, okay. Go figure it out. Go solve the problem. And so they, they do have a hard coded mechanism of, you know, explore and exploit. So, you know, when do you want to go explore and try to figure things out? And when do you want to exploit what you already know to get at that reward?

But at the end of the day, kind of because they are structured the way they are, and I think maybe the better way to describe these systems and AI in general is that they can, They can simulate certain aspects of human behavior and certain aspects of the complexity of what humans can show. They're just a very, and I say this with air quotes, but a very simplified version of what that complex behavior is.

So that's [00:28:00] why they can't you know, fully represent that, that gamut of higher order cognition of complexity that we see with humans. But, you know, these agents do go and they do try to figure out these really complex scenarios. And the hide and seek example that we talked about previously, that was a kind of a case of go explore.

They added levels of complexity. They threw things in, they had agents, you know, have to figure out. How do I block this? Or how do I not let this person in? And it was really interesting to kind of see that. And so you can replicate aspects of that. And so I think you do get aspects of that, you know, the models are curious.

But with how computationally demanding these models are, you can't have them be You know, just curious endlessly, right? Because that's just burning money. And I think that the difference with people is that when we're curious, it's generally goal directed. Yeah, we may meander, we may go do things that don't really make sense, but for the most part, you're going to be driven by goals like Think Maslow's Hierarchy of Needs.

I [00:29:00] need food, I need shelter, I need these different things. I can't just go wander off. Also, the repercussions of bad behavior. You know, if I go off and walk off a cliff, I'm not coming back for, you know, life, too. Like, like you would in a video game. So you just don't have that level of, you know, negative reinforcement.

And like, in the brain, most of our brain is negative reinforcement. I think you can, again, represent very small aspects of it, but I think for humans, there's more of a, like a guiding hand from these goals, needs, whatever it is, you know, that drives us. Um, you know, some people are more money motivated. Some people are more passion for their work motivated.

So I think people have different sorts of things that kind of drive them. I don't think these AI models fully have that. I mean, you could hard code it. And I might be wrong on this number, but Doug Leonard was a researcher and he actually, he did an experiment to try to see, you know, how many things can you hard code to build a generalizable system?

And they, they initially thought, Oh, we could probably get at it with 1 million [00:30:00] essential rules that you would put in. Um, towards the end of his work, he was like, actually, it's closer to 10 million. So, you know, and who knows if that number is right or it just goes up from there, right? You know, I think it's that level of what we don't know as people that I think what is called generally metacognition of, I don't know this, I'm going to go find this out because it's relevant to, to me and, you know, whatever goal I'm trying to pursue.

That helps us be a little bit more intentional with our curiosity. I just don't think you can quite get at that because you need an, um, quote unquote, agentic system to do that. So. I mean, I think you can probably modulate aspects of it. You probably see aspects of that with LLMs and how they, you know, search, how they answer.

But I still think it's a world apart.

Bill: Okay. So this is actually a podcast about Edge, which you couldn't tell by listening to us so far. So the Edge. has power constraints, compute constraints, size constraints, all sorts of constraints, depending on where you are on the edge, [00:31:00] transformer architectures, attention based models, really compute heavy, really power heavy.

Meanwhile, I've seen estimates that the human brain uses like 40 watts or 60 watts of power, something like that. And we're walking around with our brains all the time. It's not a problem. We just carry them with us wherever we go. And there's, you know, no, no power problems or anything like that. When will we get to?

How will we get to? A less compute intensive model that's more edge friendly, that's that has that some of that breadth of an LLM not, you know, it doesn't have to be full on chat. GPT. There are edge friendly models that do like one thing. Is this good? Yes. No. Okay. Very compute efficient. Great. But how do we get a more flexible model?

Are we, I assume lots of folks are working on that. I assume that's somewhere in your brain with that 40 to 60 watts of power. So,

Ananta: and I promise I will talk about this as enthusiastically as I've talked about everything else. Uh, well, I [00:32:00] actually think like in comparison, like just historically looking at AI, like.

Now is the best time. Like we have the most number of things for the edge. And so like, you know, yes, the transformer architecture, like again, very poetic, you know, the thing that makes it so powerful, so good at what it does, the attention mechanism, you know, is the reason why it's also so compute heavy.

And, you know, normally that would be a barrier and it is. a barrier, right? Especially with some of the larger models, even at the data center, with your clusters, with your servers, you know, it's a problem. But I think where we have reached, which is, you know, honestly, like if you asked someone 10 years ago, I don't think they would have said this.

And I personally, I don't think would have said this even several years ago, even where very tiny models, I had to run that on GPUs. Now with You know, very large LMs. I can run that on CPU. I can run that on my phone. And I feel like, you know, we have reached a point where you can do the most with AI models with minimal compute than ever.

And so I think that is very exciting. So like, you know, I think [00:33:00] we have reached a point where you have very capable models that you can deploy at the edge. And I think there's a lot of stuff happening in that realm. So like. Even if you look at new releases of some of the pre trained models that come out.

I'll just use LLAMA, I've worked a lot with it. But the LLAMA to 7 billion parameter model, which, you know, fairly small model, even though it has a billion. 7 billion, only 7 billion. Look at that. Yeah, I feel like most people are just like, I don't know if I'd call that small. But in our weird formulation of mathematics, that is very small.

Let

Bill: me get out my calculator and double check its work.

Ananta: So like the Lama 2 7 billion, like, you know, substantially smaller model versus say the Lama 3 8 billion parameter bottle. So that was their, their newest update that came out. The Lama 3 8 billion model is. You know, it's, it's just a slightly larger, it's 15 percent larger, but it is so much smarter than what [00:34:00] came out just a few months prior.

And it's also the way that they kind of optimize it. Dell published a paper on this with Meta or a blog post with Meta on this, but you can actually use the same. compute for the 7 billion, 8 billion. You can fit it on one GPU, right? So, you know, I think that makes that reality of using these full precision models, you know, 32 or 16 flooding point models at the edge, very plausible.

But you also, like, I feel like there's so much more that's happening, right? So we are building more, you know, increasingly capable, smaller models. But also like with techniques like quantization. So basically, if anyone's unfamiliar, what that means is you're taking these big models and you're just kind of shrinking them.

And so what you're doing is you're kind of changing their input output mappings in a way that you make them less precise, but more optimized so they can run with less resources. And so generally you see performance drops with those. Types of models, but there's been a lot that's been happening that actually you can kind of get around that because you can fine tune those models [00:35:00] just as you would like the full model.

You can do inferencing, you can do a lot of different tweaks to both of those where you can get them to the point where they're optimized. So you can get around a lot of those performance issues. And then one of the things that I feel like people don't talk about enough in, because everyone's just so So in love with LLMs is that, you know, that same transformer architecture is also used for computer vision, right?

The vision transformer. And there's so much happening at the edge with computer vision. And so with these quantized models, you know, generally you break them down into smaller models. So eight or four bit. With vision models, you can actually go even further down to make them even smaller. So you can go with one or two bit models.

So, you know, really small model. And they still perform fairly well. And that's because the, Vision models, you know, think about how a vision model will take in information. It's going to be pixel by pixel versus how a language model is going to take in information, right? It's going to be some sort of segmentation of a word generally.

And with the vision models, you have a lot more room for [00:36:00] error, right? Because you're taking pixel by pixel. And so you can get at these really smaller models and still use them at the edge for You know, with a lot of precision and there's a lot of really exciting stuff. And for people who want to go, you know, try this out, there's a tool called Ollama.

I use it a lot. You just kind of download it and you can, you know, you can set it up such that you can even, you can do rag with it. So throw a few documents at it and you can question it. You can run that on your laptop. Hugging face actually. And I don't know if this is for all the models on hugging face, but most of the models on hugging face, actually, you can now import into Ollama.

So you can bring down a lot of those quantized models and now you can try out. different things. And I think with AI, it's, and people love to say this, it's more of an art than a science, because really no one knows what's happening with AI and these models. But, you know, you can, that way you can try a lot of different things.

And I think with the Edge, more than with a lot of these, you know, bigger use cases, it gives you a chance to try a lot of things at not that much cost. Uh, and, and see what works. And you use a lot of these same techniques and, you [00:37:00] know, there's a range of different techniques, right? It's not just, you don't just get to define tuning and inferencing.

You can prune models. So now we've reached a point where we can cut out layers from a model. So you couldn't, and just throwing out random numbers, but you know, you can go from a 50 billion parameter model to say, a 30. billion parameter model now. And you can, you know, cut out layers because there's a lot of neurons that aren't really doing anything.

And so you just take those out. You can do a little bit of fine tuning. You can get similar performance. Even with actually the quantized models, I saw recently someone published an article where they, I think, compared half a million quantized models. You know, they quantized them with all sorts of different techniques.

And they saw fairly minimal performance drops compared to, and I think that was done with the LLAMA 3. 1 models. And so. I mean, I think just in that, you know, mainstream transformer space, there's so much like we've never had this much ever. And so I think it's a really exciting place for the edge. I mean, the data center part is cool.

And but I, you know, I think the edge is cool, especially for me, because I feel like, you know, a few years ago, I feel like I [00:38:00] had to use just even building a really simple convolutional net. I had to use, you know, GPUs. Are you now saying, Hey, I'm running billions of parameters on my bone or my computer.

Like, I, I, at least I would have never guessed this would have happened. So I think it's probably the best time for edge. Um, but the other thing, which is really interesting, this is a little bit more on the academic side is that, you know, I think people are very well aware that attention mechanism is a problem.

and it is very computationally demanding. And so over the last few years, there's been a lot of really interesting work that's been happening. Either they're taking that attention mechanism and they're changing it in a way that makes it more, you know, efficient. A lot of, and there are a lot of different papers doing it, but kind of what they're doing is taking out non linearities, just more complexity, and they're taking out a lot of these dot products.

So basically, having pre computation. So you don't have to keep running some sort of, you know, basically math in the background for every little thing. And so a lot of them are doing that. A lot of them share [00:39:00] their code. So that's pretty exciting. I played around with some of them. You also have some, which are actually going back to LSTMs and RNNs.

So that was the, the kind of the leading architecture before transformers, but they're applying a lot of those similar attention mechanisms to older architectures. And, you know, their, their thought behind it is like, you know, attention is great. It makes transformers really capable, but, you know, it's not unique to transformers, right?

You can apply it to other models, you know, lots of people apply it to different types of models. And so there's architectures like XLSTM and there's a new one called mini LSTM that kind of do similar things. And then you have the Mamba architecture that a lot of people are really excited about. So that does both software manipulations as well as hardware.

manipulations. And it's, you know, again, it does a lot of these pre computations. And so, there's so much happening, and a lot of them, like, I think every single one that I've talked about has shared code. And I've played with a lot of them, and they're really exciting. And it's really exciting to see, you know, how [00:40:00] much you can do with these models.

But I will kind of, I guess, caution with, they are academic, right? They are at a point where, you know, easy, really easy things. They do really well. They probably compete with Transformers. They're not at a point where you can kind of test them on more complicated things. But you never know. I feel like when the attention is all you need paper came out, no one, or at least my little social circle, no one really cared that much about it.

But it was like, you know, a year or two later, I was like, Oh, wow, this is really picking up speed. This is really cool. And then, you know, Transformers just blew up.

Bill: Also in my social circle. People just weren't really talking about it. I don't know why.

Ananta: I'm glad I'm glad I'm not the only one. I see some people listening are just like, you are clearly not in the right social circle.

Well,

Bill: maybe not the standard one. Probably true. So lots of companies are moving quickly toward deploying AI in their businesses, especially now that's, you know, open AI and chat GPT and. [00:41:00] All of the stuff that's coming out, that's giving it, that's really putting it into people's heads. What should those customers, enterprises, businesses be thinking about, be doing now to get ready?

Is it time that they should, I mean, everybody wants to already have AI in their business. And if you don't have it now, you're looking at how to get it so you don't fall behind, but how should they start? What are the foundational pieces that you see that are more critical to determining success?

Ananta: I'm a big fan of the Occam's razor approach, the easiest approach.

I think part of what happened with ChadGBT coming out was for at least a little while, people were like, I want to use the biggest model maps to the best results. And so I, you know, I feel like I had a lot of conversations where it was like, I want to use a 70 billion parameter model or so many hundreds of billion parameter models just to analyze like meeting notes or, you know, really simple, simplistic things.

And

Bill: then they get really upset that it hallucinates in weird [00:42:00] ways. Like, yes, because it knows about. Everything.

Ananta: And I mean, but like also think of it this way, like transformers are really computationally demanding for, for like one use. Imagine if you had 10, 000 or 20, 000 concurrent users with let's just use the 70 billion parameter model.

I mean that, just the compute, the money, the power costs. Well, yeah, all the

Bill: hyperscalers are looking at nuclear power now.

Ananta: Exactly. Maybe this was, you know, part of this little bit of a push that kind of came from a lot of different voices kind of coming into the AI field, but I feel like it was like bigger is always better.

And I feel like there is time and place where that is true. But for businesses, you don't always want something big, because the whole point of I feel like having an AI system is that it should be something where you can easily update it. So, you know, if you see a new architecture, like, you know, just like LSTMs used to be really popular, Transformers suddenly blew up.

So, you know, you should be at a [00:43:00] stage where your system is simple enough where you can. And I say the word simple with air quotes again, but you know, it's simple enough where you can try out different things and you can say, Hey, I can pivot now. I can update to what is most useful. And I think we're starting to see that come back.

And I think it's a lot of this push of like these smaller, more capable models. And I think that's where you want to start, right? You don't want to start too big. And you also want to start with use cases that are realistic. Like, you know, I would love to use AI for something, you know, absolutely ridiculously complicated and think that it's going to help me.

But you know, you have, like you said, all of these issues of hallucinations. You can't really control for that. You know, start with something simple. And actually, some of the work that we've done at Dell where, you know, we've taken some of these smaller models and we will compare a lot of different models, right?

Because a lot of them are trained in different ways. But, you know, kind of what we saw when we were trying to increase adoption within Dell was people were very untrusting. And I've heard this from a lot of customers that I work with that. We spent all this money, we built this [00:44:00] AI solution, no one wants to use it, no one trusts it.

Which is a problem, and I think most of us will agree. Like, I don't know if that's real or not. And I think you want to have things where you can build a system that is at least somewhat explainable. It can maybe give you, you know, where is it pulling that information? This happened at Dell, and I think it's very amusing that people actually started using these AI models as a search engine.

So, And they use that explainability feature that we put in to actually figure out, hey, you know, how do I get at the answer that I want? I'm not even going to care about what the AI model is saying, but I'm actually just going to use it as my search tool. And I'm going to go find the answer that way. And that was kind of unexpected, but it's actually a really cool way that people end up using these models.

And I think that subjective piece of AI. I think it's huge because, you know, every customer, every person is going to use AI differently. You're going to ask the same question very differently than I am. And I think that subjective piece is, it's very important to how you as a customer have to build it.

And it really comes back to you understanding your business. And [00:45:00] I, I always relate this back and this is again, a bit of a nerdy thought, but you know, you had. Uh, the, the science of the math, like I think it was the 19th century, you know, where you have these derivatives, differential equations, very deterministic things, the, the equation, you put an input, you get an output, it is just fixed.

And then you have things that kind of came up in the 20th century, like, you know, quantum mechanics and things like that, which are more subjective, like think general relativity, you know, it's very subjective of where you are. And I think we're kind of at that point with AI, right? Where, where. We were used to software development, which was very algorithmic.

You put it in, you get a specific response. AI is going to be more collaborative. So you as a business have to kind of think about how is that going to be collaborating with you. And that could be very different even within a business, right? The HR team could see that very differently than the IT team could see that very different than the developers.

And so I think there's some amount of customization that you have to put in. And a lot of that is going to come from that human feedback. And, you know, you don't want to just blindly be. Tuning these models for [00:46:00] your use case in the dark. And it's not really, it's, you know, finding a needle in a haystack, right?

It's going to be very complicated. I think having, and I say this very carefully, but having a very supervised approach because you don't want everyone feeding into it. That's how you get garbage. But having like, you know, a very supervised approach of experts kind of refining your system. And, you know, they do a lot of that with the pre trained models in different ways.

But I think having that. really helps increase adoption with these models because it's just, you know, you get a good answers and we, we saw this with some of the stuff that we've done within Dell. And so it's like, you know, how do you build something that is adaptable? You can kind of change things out and small.

You don't want to go too large. Like I know the 70 billion parameter model makes sense, but any dust, it probably will with some use cases, but not a lot. So, you know, start small and then think about how do you want to scale this? Because, you know, that compute cost and trying to scale is going to be huge.

And then you have to kind of think about how do you want to measure for things? You want to measure for, and there's not really a guideline, like I hate to say [00:47:00] it, we're kind of just going with the flow and, and trying things. Yeah, you know, seeing what works and kind of winging it. But you also want to see like how good is the tool working for you?

You know, what is the value? And not just comparing how do people work with and without the tool. Maybe people need more of a training. Maybe they, you want to look at how, you know, if they're doing code, what is the model spitting out versus what are they actually using that they're submitting to GitHub?

You know, that's a simple to compare. You know, different sorts of things like that. And then you also want to look at, you know, how often do people share? How often do they talk about it? You know, how often do I ask for features, all sorts of these things kind of help you with adoption. So I think it kind of comes back with how can you most simply apply this technology and find some sort of actionable, easy path to ROI and start internally.

Like that is what I, my, my one piece of advice, and maybe it's just because I'm a paranoid person, but you don't want to. You know, especially something like generative AI, you don't want to deploy it out in the real world and then have some sort of PR disaster. That's, you know, something to worry about. So I would say, you know, [00:48:00] start internal.

And then I think for most customers, this is important. You know, I think you have low code, no code tools. You have a lot of pre trained models. One of the things maybe thinking about going into the future is also thinking about what IP to build for AI? Because, and, and I, I've heard this a lot where, you know, a lot of times, and, and this is past in software development, but, you know, a lot of companies will outsource.

Um, they'll just use tools. And, you know, that's, I think that's fine, but I think you also have to think about how you as a company are building your own IP, and you may not be a technology company, but you want to make sure you're adapting AI to be maximized for you. For your organization, because you're going to work a little bit different, but you don't also want to have the same tools as your competition, right?

You want to think about how do I stand out? And so, you know, I think it's a lot of factors, but I think starting small and scaling kind of helps at least navigate some of that chaos that exists today.

Bill: And I like that idea of AI as kind of nondeterministic software coding [00:49:00] gives it a little more flexibility and a little less.

Rigidity. Interesting.

Ananta: And explain some of the bad behavior and the complexity.

Bill: That was a fun conversation. We are definitely out of time. And thank you so much for the perspective. This was a neat conversation. How can people find you online and keep up with the latest trouble you're causing? I'm sorry, you mean your work?

Ananta: Too much trouble, actually. I don't know if they want to find me online. So I would say kind of easiest place to find me would be LinkedIn. I'm a little bit of a slow responder, but I will respond. I promise. So I would say, yeah, LinkedIn or email at firstname. lastnameadele. com. Super simple. I would love to hear from people if people want to, you know, ask questions, collaborate.

I I'm always looking for research partners to cause more chaos and honestly, just debate ideas. I just love to hear what people's opinions are.

Bill: It's moving pretty fast for sure. So good to keep up with the evolution. Thank you so much for the time. And this was a really great conversation. We went awesome.

Thank you. [00:50:00]

Ad: Capitalize on your edge to generate more value from your data. Learn more at dell. com slash edge.