[00:00:02] BRUNO OLSHAUSEN:
So welcome everyone. Good afternoon. Uh, my name is Bruno Olshausen, and today it is my very great pleasure to introduce Jeff Hawkins for his second lecture as Hitchcock Professor.
You heard yesterday from Mary Comerio about Jeff’s accomplishments as an inventor, engineer, entrepreneur, neuroscientist, and author. He is widely known in Silicon Valley as the inventor of the PalmPilot and the person who pioneered handheld computing. But he is here today to, to tell, to tell us about his other lifelong passion.
That is the quest to understand how the brain works and to build intelligent machines based upon these principles. And by the way, for those of you interested in learning more about Jeff’s background, um, he had a great interview today with, uh, for– with Harry K-Kreisler for his Conversations with History series. And that, uh, videotaped interview will be av-available, uh, in a few months on the Grad Division website, and the URL for that, uh, website is on the back of your program.
So, uh, in 2002, Jeff founded the Redwood Neuroscience Institute, which would focus on developing a theoretical framework for neocortical function. I eagerly signed on as one of his first scientists. RNI grew into an intellectually rich and vibrant research institute, and Jeff was the person who brought life to it and made it ex-an exciting place to be.
He grew and shaped the institute into a group of individuals sharing a common vision to develop a theoretical framework for thalamic cortical function and who worked interactively on a daily basis. We held a weekly seminar series featuring distinguished neuroscientists from across the globe. Jeff was known to interrupt speakers at the begi-beginning of their talks as they were just describing the problem they were working on to ask, “Why would you want to do that?”
What is it going to tell you?” This produced a range of interesting reactions. Some perhaps, perhaps have never been asked that question so point blankly before.
But almost always it led to an interesting and thought-provoking discussion, and these seminars went on literally for hours. The speakers loved it, we loved it, and everyone learned a lot from these exchanges that were in large part driven by Jeff’s intellectual curiosity and his intense desire to understand the science. In two thousand and four, Jeff co-authored together with Sandra Blakeslee the book On Intelligent– On In– On Intelligence, which lays out his ideas about the cortex as a hierarchical memory system that learns from the environment and does prediction.
This book has in-inspired countless students to enter the field of neuroscience. I know this because my inbox has been flooded with their emails since then on a daily basis. And also, I was on the Neuroscience Admissions Committee and med– read many of their stories and their application essays.
These stories go on so– these stories go something like this: I thought I wanted to go into engineering. I thought I wanted to go into medicine or into law, or God forbid, banking. And then I read Jeff’s book, and now I want to study neuroscience and work on models of the neocortex.
Some students, uh, ended up interviewing for our program, and they arrived asking questions like, “Do you believe in a common cortical algorithm?” They got that question from Jeff, and that’s a great question for a student to be asking. In two thousand and five, Jeff started Numenta to push forward the development of intelligent machines based on models of brain function.
And today, he will tell us more about the models they have developed and the principles that make them work. In that same year, in two thousand and five, Jeff gifted RNI to UC Berkeley, where it now functions as the Redwood Center for Theoretical Neuroscience, part of the Helen Wills Neuroscience Institute. The Redwood Center Endowment funds students in the neuroscience graduate program, in addition to seminars, courses, and other programs of the Redwood Center.
The center now provides a rich intellectual environment for students doing work in computational neuroscience, and there are now ten PhD students who have graduated from Berkeley, having done their thesis work in the Redwood Center. So I would like to take this opportunity to thank you, Jeff, for what you, what, what, what you have brought to the Berkeley campus, not only in your intellectual contributions, but also helping to create an environment that enables students to learn and to grow and contribute to an emerging area of neuroscience. So please help me in welcoming Jeff Hawkins for his second lecture as Hitchcock Professor.
(applause and cheering)
[00:04:06] JEFF HAWKINS:
Thank you, Bruno. Um, that was a very nice introduction. Uh, I, I should tell you a story about Bruno.
Um, you know, when I started the Redwood Neuroscience Institute, uh, my first question, “Who would be crazy enough to come work here?” I mean, it’s a new institute, you know, how, how do you open up? Do you put a shingle on the street and hope neuroscientists walk in the door saying, “Hey, I saw your sign.”
Um, and it’s kind of a, it’s a very odd career move. Um, you’re going to a non-affiliated institute, um, and you know, it’s not leading to tenure or anything like that. Um, but the people who really were passionate about understanding how the brain works in the neocortex were willing to take that gamble, and Bruno was one of those guys.
He s– as he said, he signed right up. He says, “I wanna do this. I care about this.”
Bruno became my right-hand man or my other body, my doppelganger, whatever you wanna call it. Um, and the great thing about Bruno is, is he has this encyclopedic mind. He knows everything, everybody.
He has this like, he can remember the dates of everything, and I can’t remember anything. So I just follow him around and say, “Bruno,” who wrote that paper about such and such?” And he goes, “Oh, that was so and so in nineteen sixty-four,” whatever.
Oh, great. Thank you. Bruno, who should we have to speak about the Beijing coming up?
We should have three, three people. Great. Let’s do that.
So, uh, it’s been a great collaboration. I’m, I’m happy to have the introduction, uh, introduced by Bruno. Um, um, yesterday, um, I, I, I have two talks, and these are the titles: Intelligence and the Brain, that was the title of yesterday’s talk, and Intelligence and Machines is the title of today’s talk.
These overlap, and I’ll talk about it in a bit, but I just, I, I wanna ask something of you. Right now, I’d like to know who in the audience was not here yesterday. So if you were not here yesterday, please raise your hand.
Okay, that’s a fair number. Um, there is an overlap between these talks, and, um, um, and, and I’m gonna… I have to decide how much to repeat myself and so on, but I’m not…
I’ll try to manage that. Um, okay. Um, just, just to remind you, for those who were here, uh, I can’t imagine doing anything else, uh, than studying the brain.
This is– To me, this is the most interesting thing one can do. Uh, we are our brains. Everything we do is our brains.
Our life is our brains. Our questions of, uh, the product of our brains. Our knowledge is the product of our brains.
Our art, our science, our literature is what we are. And, and, uh, our ability to even know anything, of course, is a product of our brains. And so to know what humanity is, you really should, you know, you want…
I just wanted to know, what am I? You know, how do I do this? How did we come about and where’s this all gonna go?
So that’s where my, my interest in this field came from, just deep curiosity about myself and other humans. And, um, and then as, uh, as I got into it, I realized you could build machines, so, so that work on those principles. So I started out, let’s see.
I had a little problems with this yesterday. Yes, I hope it won’t be a problem today. Okay, I started out basically trying to answer these two questions.
I said, “Okay, how can we, uh, discover the operating principles of the neocortex?” And then I realized once we understood those operating principles of the brain, we could build machines that work on those principles. This is the order in which I care about them, but they go hand in hand.
Um, so there you have it. Uh, just to remind you, for those of you who weren’t here yesterday, the neocortex is about sixty percent of the volume of the human brain. It’s the big wrinkly thing on top, and it’s the locus of all high-level thought, language, science, uh, high-level motor planning.
Anything you can tell me about is stored in your neocortex, pretty much. Uh, and so it’s the system we’re interested in when we, when we, when we want to deal with, uh, intelligence. Okay.
Uh, it’s still not working. Here we go again. So this is the process.
Again, I showed this slide yesterday. This is the process I go about my work. I start with anatomy and physiology.
This is the, the, um, the, the details, very detailed knowledge about how brains are constructed. We use that as a set of constraints on our theories, and then we develop, uh, principles by how which that works. And then finally, uh, we can develop software and hardware solutions to implement these things to build intelligent machines.
Um, yesterday’s talk was about the first two of these items, and today’s talk, I’m going to do a little bit of review of the principles, then I’ll talk about the software and where this is going in the future. So here’s my outline for my talk today. I’m going to do a brief history of machine intelligence.
Very brief. I’m not trying to be complete. Um, it’s very opinionated, so just take it for what it is.
Uh, I’m going to then tell you how I define machine intelligence. What, what are we trying to do? How will we know if we achieve it, if in some sense?
Um, and then I want to do that little bit of review of what we’ve learned about how the neocortex works, and then I’m going to talk about Grok. Grok is a product we’ve built, and, uh, it’s an illustrative product because it, it’s based on these principles, and it can tell you what we can do today. And then I’m going to talk about, briefly about the future machine intelligence and, and will answer in the end: why should we do this?
Why do we care? I’m gonna start now. A brief history.
This man, Alan Turing, very famous. Um, he was a British mathematician. Most of you probably know or have heard of him.
He was one of the guys who actually helped us win World War II. He dec– he was, uh, key in deciphering the Enigma machine, which is the encryption machine for the Germans. Uh, a real hero.
Um, but he was interested, he was a founder of computing science, and he wrote, uh, a paper starting in 1935 and then continuing on from that, which is a b- really foundational principle about how computers work. Now, this quote is not from Alan Turing. This is my paraphrasing what, uh, what his paper starting in 1935 was all about.
He basically, basically he said, you know, “Computers are, are actually a universal machine. They can model anything. If there’s anything that you want, you c– you– in the world that can be modeled, computers can do it.
There’s no other reason to do anything else, and all computers essentially are the same, so any computer could do this. They’re universal machines, and this was a really powerful idea. He then, uh, wrote another paper in nineteen fifty.
Now, he was very, very interested in machine intelligence. He, he said, “Oh, we can build intelligent machines.” And he wrote this paper in nineteen fifty.
There’s a picture of it on the right there, uh, the cover on it. It was called Computing Machinery and Intelligence. And in here, he basically said, “You know what?”
We can build intelligent machines.” But he was worried about something. He was afraid that people were going to get into endless arguments about whether this could be done or not, whether there’s, you know, what’s, is there a… soul in the machine and so on.
He just didn’t want to get in it. He said this. He wrote this in the paper.
He says, “I don’t want to get in arguments with people.” So he said, “Let’s just agree on the following thing.” I’m going to come up with a test for machine intelligence, and if we pass this test, well, then we’ll just have to agree that the… you know, he’s proposing we can agree that the machine is intelligent.
And the test he proposed is what we now call the Turing test. And the basic idea of the Turing test is, is shown in this little picture in the bottom here. It’s just, you know, he called it the imitation game, which someone sits at a, a, a teletype or a terminal and types questions to a computer and a human, and he looks at the answers, or she looks at the answers and tries to decide which is a computer and which is a human.
If he can’t decide, then he said, “We have to say the computer’s intelligent.” This was clever, but a really bad idea, in my opinion. Um, it set things up for a, a, a lot of problems going forward.
First of all, as I say here at the top of the slide, it kinda sets the bar of that machine intelligence is about human behavior. It’s like, well, we’ll know when we have machine intelligence when it’s in, you know, it’s– you can’t tell the difference between an intelligent machine and a human. Well, there’s a lot of problems with that.
Uh, first of all, you can imagine very, very intelligent machines that would not pass the Turing test, that maybe speak another language or don’t speak a language at all, but are still very, very intelligent. We have lots of intelligence in other species on this planet: dogs, cats, monkeys, dolphins. None of them would pass the Turing test.
On the other hand, we can imagine some machines that might pass the Turing test that really aren’t intelligent at all. So this sets us up, but these two principles together, basically that computers can do anything, including model the mind or model the brain, plus our, our ideas that replicate human behavior, was really the foundational principles of what came the, the artificial intelligence movement. And this went on for many years.
Um, I’m gonna summarize the artificial intelligence world in one slide. Um, I don’t wanna do anything, uh, rude about that. There’s a lot of great work went into this field.
I, I noticed I call this non-no neuroscience, and essentially that’s my definition of AI. People are trying to create intelligent machines, but just don’t care about brains. They don’t look at brains.
They don’t think about neuroscience at all. On the left, I listed just a partial list of all these different initiatives and techniques people have had over the years. I don’t need to go into them.
They covered everything from robotics. There’s a picture of Shakey the robot. That was one of the first robots, and then a recent one, Asimov.
We have the famous IBM computer playing chess against Kasparov. We now have Google driving cars, which we’re going to be able to do soon. That’s pretty cool.
Um, we– There’s a little picture of these blocks here. That was something called Blocksworld, which is about, uh, natural language processing. You could ask questions about that.
These are all things that happened over the years. And recently we had IBM, uh, another IBM computer, uh, computer playing Jeopardy! That was pretty cool.
Um, there have also been major AI initiatives. The MIT AI lab existed for decades. It’s still in existence, excuse me, uh, which was the center of AI research in the world.
There’s something called the Fifth Generation Computing Project. Uh, you might have heard of this. This was from Japan in 1982, and Japan announced that they’re going to build machines that are as smart or smarter than humans, indistinguishable by humans in all ways.
This scared the bejesus out of people in the United States. They said, “Oh my gosh,” this is the time when Japan was really rising in their technology, and people said, “Japan’s going to pass us in technology, uh, capabilities,” and this got a lot of people riled up. The next year, 1983, DARPA, our, uh, our– a military research group, they created the Strategic Computing Initiative, which went on for about 10 years.
Billions of dollars were put into these two initiatives. They failed miserably. Uh, we will learn some things from them.
They failed miserably, though. Then there was recently, you might remember the DARPA Grand Challenge. This is the one to get the cars to drive through the desert.
That was very successful. So there’s a, a whole bunch of things that have been going on here for decades, and if you tried to follow it, it’s very confusing, all these things different ways. I need to stand behind a lectern.
Okay. From a sound point of view or from a visual point of view?
[00:13:53] STAFF:
Both.
[00:13:54] JEFF HAWKINS:
Okay. Is my fly open? No, I hope not.
(laughter)
Okay. No problem. We’ll continue from here.
Um, here’s how I’d summarize these things. Uh, these are good solutions. Many of them are very good solutions, but they’re very task specific.
Uh, they weren’t general-purpose solutions. A Google self-driving car can’t fold your laundry, and, uh, you know, and it can’t cook dinner. Um, it’s, it’s a very specific things.
These were primarily program solutions. Engineers designed program solutions, um, with very limited learning. Some of them have learning capabilities, but they’re generally fairly limited learning abilities.
And if you talk to AI researchers, they will universally tell you one of the biggest problems they had is knowledge representation. They couldn’t figure out, and they still can’t figure out, how do you get knowledge about the world into a computer? And, and you think, well, what do you mean by knowledge about the world?
Like, well, what’s a car? And what does a car do? And what are all the attributes of a car?
Are there different types of cars, et cetera? It’s just this huge list of information we kinda know, our brains know, but it’s very, very difficult to get computers to do this. And this is probably one of the primary problems that AI research has had over the years.
Um, now we’re gonna switch gears. We’re gonna talk about two other guys back in the early part of the tw– uh, twentieth century. This is Warren McCulloch and Walter Pitts.
Um, they also wrote a very important seminal paper back in 1943. Here’s the picture of it. It had a more, um, a rather highfalutin title of A Logical Calculus of the Ideas Immanent of– in Nervous Activity.
Um, not as simple as Alan Turing’s title. Um, but the basic idea in here is they said, “You know what? We can, we can think about neurons in,” in brains like computers.
Here’s what they said. Say, okay, we, we know about neurons, and we know that these are cells in the brain, and we know that they process information, and they get these inputs on their synapses. And they said, “Well, let’s model that with a simple sort of model.”
These are the guys who first invented the idea of a, a model neuron or an artificial neuron. This is the first time this appeared. Um, and this basic model they outlined is, is, uh, one that’s still in use, uh, drema– all over the place today.
And it essentially said, “Well, a neuron’s like a, a cell that has these many inputs, and we’ll just… W- those inputs have weights on them. We’ll sum them together, and if they’re above a certain threshold, the neuron fires and has an activation schedule.
This was as best as they could do at the time, and it was pretty good. Um, we now know that this is a very insufficient model of a neuron. This is very much unlike real neurons.
Uh, I’ll talk a little bit more about that in a second. But, but at the time, it was as best they could do, but it has some fundamental flaws. But then they went on to do something more amazing or interesting or perhaps regrettable.
They said, “Well, look, if we c- if we design these neurons in a very, very weird way and put them just a few inputs and nothing neuron-like at all, we can turn them into logic gates, like ANDs and ORs and NOTs. This is the basic language of, of computers, and then they could think, “Well, guess what?” You can turn…
You could, uh, make neuron-like things, not very neuron-like, but we can call them artificial neurons, and you can build entire computers with these things. You can do anything, and therefore it’s a universal Turing machine again. And so they’re saying, “Well, let’s look at that.”
And so people got this idea, like, “Well, let’s use these artificial neurons and try to build, uh, you know, intelligent machines using these artificial neurons.” Um, and so that, in some ways, was the genesis of the whole field of artificial neural networks, which is this other entire genre of intell- uh, machine intelligence that’s been going on for decades. Um, the– I mean, I call this minimal neuroscience because it has this homage to the neuron, but it’s not a very realistic neuron, not a close.
And beyond that, they almost completely ignore almost– most of these models, ninety some percent of them um, completely ignore, uh, the anatomy and physiology of the brain and all the details we have about how brains are actually built. They just built these simple networks. And I’ve shown here some pictures of some of the, the artificial neural networks you’d see in the literature.
Uh, they all– they have lots of similar char– uh, characteristics. Um, there’s, uh, some names of the sub– different types you might see. There’s plenty of them.
Backpropagation, perceptrons was one of the first ones, Boltzmann machines, Hopfield networks, Kohonen networks. There’s two books that came out in the mid-eighties, eighties called the Parallel Distributing Pro– uh, Processing books, which created a whole flurry of activity in this field Uh, and there was PDP societies and so on.
So that was going on. Here’s how I’d summarize them. The good thing about these is that they’re learning systems.
They essentially say, “We need to learn from streams of data,” or, “We need to learn from data being presented to these things.” They were useful. They’re still used in many applications, but they’re very, very limited.
They don’t do much. They’re essentially classifiers of various sorts. You can give a pattern, it can sort of match it to something else.
Um, and again, they’re not Brain-like at all, really. Um, so the, the, I, you know, no one, AI and artificial neural networks are not even close to producing something we would say, “Wow, that’s an intelligent machine.” Uh, we have a–
It, it’s just, there’s huge gaps between where these fields are and what most people would think about when they think about intelligent machines. So, um, there’s– lately, there’s been a, a, a another activity that you might, in this area, that you might wanna know about. This is, uh, I’m only gonna give one example.
This is an interest of people trying to do whole brain simulations. They’re saying, “Look, we have all this computing power, we have all this knowledge about neuroscience, let’s, um, let’s model the brain.” And, uh, the, the, the premier or primo example of this is something called the Human Brain Project, which is, uh, centered in Europe, but throughout Europe, and it’s very, very ambitious.
The Human Brain Project is trying to model a brain all the way down from ion channels, and synapses, and neurons, and l- the entire brain, the entire structure of everything. Um, that’s it on the left here, sort of different scales. This is from one of their brochures, all the different scales of which they’re trying to deal with.
That rosette in the center, which you can’t read any of the details on, those are… Each one of those, uh, leaf nodes on that picture is a scientist who’s one of the principal investigators on this project. You can see there are hundreds of them.
Hundreds of scientists in all these different fields are coming together to build this monster brain simulator. There on the right is a picture of one of the early, um, um, models they did from the, um, when they were working with IBM called the Blue Brain. And these are cool.
I went and saw this, and there’s these walls of pictures. You’re surrounded by walls of these neurons on– being projected, and there’s spikes going around. It’s very, very impressive.
Um, now, it’s interesting when you think about this, and they won’t disagree with this. This is not a criticism. This is just– they’ll agree.
They won’t disagree with this. There’s actually– First of all, there’s no theory here.
There’s no expect– They, they have no idea what this should be doing. They’re just making a software simulation of millions of neurons, and they’re turning it on and saying, “What happens?”
Um, well, they might be learning something from that way, but there’s really no theory there. And I, I just kind of believe that you’re not going to make any success either, if you have no idea what this thing should do or what pieces, pieces are
(clears throat)
im-important, what pieces are unimportant, what things are essential, what things are not essential, you’re just not going to get it right. It’s just impossible, and that’s, uh, my opinion about that. Um, and they don’t really disagree with that.
The other thing is there’s really no attempt at machine intelligence. They realize it’s not going to happen this way because, because, uh, they don’t really know what they just should do and how it’s going to work. But it’s interesting, they’re, they’re now viewing this as a way of saying, “Okay, we’re gonna model the brain” in this huge simulator.
We’re gonna learn all kinds of things maybe about diseases. It’s gonna be a tool for scientists. Uh, and maybe we’ll figure out how intelligence comes out of this a little bit down the road.
So that’s pretty cool. Very interesting. Um, a lot of money and effort being spent on these kind of things.
This is not the only initiative. There’s some other ones that are similar to this, but this is the one that, uh, is most, uh, in the news these days. Now, you might not be sur-
I don’t– I’m not a big fan of all of these as a way of getting to machine intelligence, and I think we need a, a different way of thinking about it, a different way of looking at it. And, and you shouldn’t be surprised.
You know, what, what would be my alternate approach to this? Well, you’ve already seen it. It’s this slide right here.
My alternate approach is you have to first start with the neur- neuroscience. Start with the brain. Use that as a set of constraints.
Build those principles, and then once you have those principles, then you, then you can start implementing them. My idea here is, it’s, you know, if I want to build intelligent machines, it shouldn’t be performance-based. Think, you get a mouse as an intelligent machine or an intelligent animal, a cat, a dog, a monkey, humans, these all have different levels of intelligence.
You know, it’s not to surpass human intelligence, uh, wh-which we wanna do, but it’s– the goal is we should be working on the right set of principles. And if we ha– if we agree on what those principles are, then we can say this machine is intelligent. It may not be very intelligent, or it may be super intelligent, but it’s the principles that count.
And, uh, that’s the approach I wanna do. So I’m gonna go through that list of principles now. It’s– for those of you who were here yesterday, I talked about a few of them.
I’m gonna add some more today, and I’m gonna review a few of the ones I talked about yesterday. So now I’m gonna end up a little bit of review for a few minutes. Um, and for those of you who saw this yesterday, you think of it as a refresher course, uh, but it’ll be much quicker than what I did yesterday.
Okay, so number one thing is you have to understand what the neocortex is doing. Um, the neocortex is a memory system. It’s not a computer, it’s a memory system.
It stores information in its connections and, um, it builds a model of the world. When you’re born, it doesn’t know about anything. It doesn’t know about all the objects in the world and how they relate to one another.
It doesn’t know about presidential debates and computers and rooms and so on. Um, and it has to build the knowledge of that, and it does that through its senses. And so we have arrays of sensors here, the retina, the cochlea, the somatic senses, those are the body senses.
Those are millions of sensory bits coming into the brain, streaming in real-time, high velocity, meaning very rapidly changing. The brain gets these in and has to build a model from the world. That’s what it looks like.
And the model has a c– several things. It makes predictions about the future. It detects anomalies, and it, it, it generates actions.
So the ex, uh, give me an example, I talked about this yesterday, but you know, The brain is constantly making predictions about what it’s going to see, hear, and feel. Um, you’re not aware of most of these predictions. They occur, uh, i-i-it’s sort of subconsciously for the most part, but you know when your predictions are violated that, that something’s different than was expected.
The example I used in my book, I was sitting in my office one day, and I realized there’s all these objects in my office, and that if any of them, some of them, if they just moved a little bit or disappeared or a new object would appear, I would notice it right away. But I’m not sitting here going, “Let’s take inventory of my office.” I’m just looking around and things happen.
Similarly, you’re always ex– you have expectations what you’re gonna hear when I speak, you have expectations what you’re gonna feel when you touch things, you have expectations what you’re gonna see. This is proven. We know this is a fact.
Your brain is constantly making predictions. And so this was a way of fig– like a, a cornerstone for me, like, how is it can– What kind of memory system would make predictions?
And the answer to that was a memory system that learns temporal patterns. What you n-normally follows what in various ways. Okay, so that’s the basic idea.
Um, now let me just go through, I’m gonna list these principles in neocortical function that I think are essential for true machine intelligence and, and then I’m gonna go into detail about a couple of them. So here we have this cortex. We have this high velocity data stream coming from a set of sensors.
What are the, the attributes that I think really are gonna define machine intelligence? The first one, you might be surprised. Uh, you certainly wouldn’t have thought about this much, but when you think about intelligent biological systems, they all have these sensory arrays.
We don’t have a camera for eyes. We have an array of sensors, and they are processed like a whole– like millions of, uh, uh, millions of sensors. There’s nobody looking at that entire picture in your brain.
And so– And the same with your cochlea and the same with your skin. And so this, uh, we now understand this is an essential property of how the memory system in your brain works, and I don’t think you can get around it. So intelligent machines are gonna have to have sensor– low-level sensory arrays.
You just can’t– No one just pumps in Shakespeare into your head, you know? That doesn’t happen.
You read it in a ve– complex temporal streams from your eyes, or you hear it being spoken, or you can feel it through braille. But the point is, it’s, it’s got to go through this array of sensors through time to get it in there. The second, uh, thing is that the neocortex is a hierarchy, and I talked about this a little bit yesterday.
When we look at it physically, the region, the, the memory in the neocortex, this sheet is arranged in a hierarchical fashion. These regions are connected to each other in a hierarchy, and information flows up and down the hierarchy. This is a physical fact about the brain, and it’s a very interesting observation about the kind of memory system it is.
It is a hierarchical memory system. Um, and, and y-, and you can’t get around that. Intelligent machines are going to have hierarchical structure.
The next thing is, is, is the, and this is part of the theory that we’ve developed, is that what each region in the hierarchy is doing is a form of sequence memory or different types of sequence memory. And so as you are in, when you’re hearing my speech, and you hear these t-complex patterns coming in all through time, the way you recognize my speech is because you have memory of what words sound like in order and what sentences and phrases sound like in order, and what things typically follow what. And so to recognize speech, to recognize when you touch things, and even vision is a highly temporal process, and so there’s this sequence memory.
And the, and the idea of sequence memory in a hierarchy is that you have, you have sequence memory at the first levels that are recognizing very small patterns in space and time, and then they coalesce into longer patterns in space and time and so on, going up the hierarchy. And you get to the top of the hierarchy, you have representations of higher level objects in the world and how they behave. Um, some of this is, is very well understood, some of it’s not well understood at all.
But this is a fact. And, uh, one of the things about the hierarchy, which, uh, I mentioned yesterday and I haven’t mentioned yet today, is that all the regions of the hierarchy are doing the same thing. So once we understand, um, you know, how one part’s doing it, we understand how all parts are doing it.
So this is– these are attributes that an intelligent system are going to have to have. The next one is sparse distributed representations. This is the language of the brain.
I’m going to go a little bit more into this in a moment, but it’s the way when we, everywhere we look, we find a few things that are active, few cells that are active, and most cells that are inactive. And there, and there are properties of this which are important, which you’re going to need to understand. And I contend that no intelligent machine, biological or otherwise, can work without sparse distributed representations.
So we’re going to go a little bit more into that. And then here’s some things I did not talk about much yesterday. Um, everywhere you go in the neocortex, there’s there’s these cells in sequence memory that are learning the, um, the patterns of the world.
But everywhere you go, no matter where you look in the neocortex, you see outputs that are descending to motor parts of your brain. And everywhere is that sensing information is also directing behavior. You can’t separate out behavior from sensory information, you can’t separate out inference from behavior.
And think about it, when you, when you interact with the world, whether you’re just moving your eyes or walking around or touching things with your hand, you’re changing what your senses are going to feel. You know, my sense, my fingers aren’t just feeling this podium because they happen to be feeling the podium. It’s because I have moved my hand there.
And the, there’s a k- a tight coupling between your behavior and what you sense. This is the sensorimotor integration issue, and we have– this is part and parcel of how brains work, and this is an essential feature that our intelligent machines have to have. Then there’s an attentional mechanism.
Um, I’m giving you the laundry list here, okay? So bear with me. I already got, uh, I think this might be the last one.
Maybe I have one more. Um, attentional mechanism. In the hierarchy, there’s information flowing up and flowing down, but we know that there’s actually ways of turning it off, and so, um, a-and you can attend to some subset of your sensory stream.
So you might be sensing, you might be zoning, zoning out on your, on your, you’re not really feeling too much right now. You’re listening carefully to my words. Or maybe you’re reading, you’re not listening to my words, and you’re paying attention to the, the stuff on the screen.
But I could also tell you, like, “Okay, look at the word, uh, principles at the top of this slide.” Now, there’s a pattern coming in your brain right now, and you’re seeing the word principles. Now I say, look at the I, the, the letter I.
The same pattern is coming in on your eye, but now you’re focusing, you’re attending to a subset of that information. Same information coming in the brain, and now you’re attending to a subset. Now I say, look at the dot on top of the I. Focus on that.
Same pattern coming in your eye, the whole, the whole picture, but you’re now attending to a smaller piece of image. You’re doing this all the time. You’re not aware of it.
You’re doing this all the time, tuning in certain parts and taking out certain parts, and this is pretty important when you have a complex, uh, sensory environment. So this is going on in the brain. I show this in this picture here, where I can say, “Look,” we can turn off those X’s.
We can turn off some of the input and focus on some other parts of the input,” and this is a pathway through the thalamus which does this. This is a very important part of how we interact with a very complex, rich world, and it’s an important part of, uh, of, of, uh, intelligence. Now, I’m going to list a few things on this.
There’s a lot of things that aren’t on this list. I’ll just mention a couple things right here. One is, you notice I don’t mention emotions.
Uh, there’s, you know, books about emotional intelligence and so on. I don’t believe you actually have to have emotions. You, you have to have emotions to be human-like.
You have to have emotions to pass the Turing test. But to be purely intelligent, to build a model of the world that’s very, very sophisticated and make predictions and detect anomalies and discover structure in the world, you do not have to have emotions. Emotions are all about…
Uh, it’s sort of like, um, uh, an uber learning system. It says, “You know what? This is ha– this is really dangerous,” or, “This is really good.”
I need to remember that,” or, “we need to avoid that.” But you can do an awful lot without that, and I do not believe this is something that’s essential for intelligence. And finally, you don’t need to have a human-like body.
This isn’t about building robots and, and, you know, and, and, yeah, it’s not about building robots.
And, um, uh, you know, maybe you wanna have a human body like human, but, but it’s not about that at all. We can have intelligence embedded in all kinds of systems. It’s just a bunch of computers running some place with some sensors on the world.
So we don’t need those things. We’re gonna get rid of those things. Um, and we’ll just stick to these principles.
Okay. So I now, th-there’s, there’s, there’s two here that I talked about yesterday and are really, really important. Um, and so what I’m gonna do is I’m gonna go through the slides I had, two slides I had yesterday about sparse distributed representations.
And I’m gonna do those completely because that’s– I– to me, that is the most important thing. If you want to walk away anything, you’re gonna have to understand what sparse distributed representations are. And the second one, I’m just gonna give you a cursory review of the sequence memory.
And I apologize for those who to-totally got it yesterday and say, “Why are you repeating that?” But I doubt anybody totally got it yesterday. Um, it, it takes a while.
Okay, so here again, the, the two slides on sparse distributed representations. It’s easiest to do this when you compare it to a computer. So if you know how computers work, then you can, uh, easily see how this works for– in the brain.
And I’ll do this a little bit quicker than usual. In the, in the computer, we have what we call dense representations. We have 8-bit, 16-bit, 64-bit, you know, entities of, you know, 64 bits at a time, something like that, and we use all combinations of ones and zeros.
So if I have an 8-bit quantity, a byte, I c- all 256 possible combinations of ones and zeros are used to represent things. Um, so that’s why it’s called dense. There’s an example of the ASCII code for the letter M.
It’s just some arbitrary assignment of ones and zeros th-that tells, says, “This is M.” If I ask you what is, what those different bits mean in the letter M, they mean nothing. In fact, if I change one of those bits, I get a completely different letter. So I have to look all of them to get any idea what this is, and there’s actually nothing about those bits that tell you it’s an M.
It’s just some place elsewhere in some table that someone says, “Okay, that’s the letter M.” And then, um, finally, these representations are assigned. They don’t have any inherent, you know, they’re not learned.
Someone just said, “Here’s the ASCII code. We’re gonna use this for the next hundred years.” Um, in the brain, it’s very different.
In the brain, we have cells, and when we look at the cells, very few are active at any point in time, and most of them are inactive or very relativ-relatively inactive. And, and so it’s like– and you, and we, and– there’s thousands of them. And so in a sparse distributed representation, we can represent those cells with zeros and ones.
And we can say that I have, like, several thousand or tens of thousands of these bits. Uh, we don’t– we typically use things about two thousand bits long in our work at Numenta, and I’ll talk about that in a little bit. Um, and they’re mostly zeros.
We, we’ll typically have two percent on. So I’ll have two thousand, I’ll have forty-one bits, and have nineteen hundred and sixty zero bits. The, uh, there’s nothing magic about those numbers, but sparsity is important.
The, uh, those bits have meaning of some sort. They have some– I can… You know, they don’t change over time.
If I say, “Here’s a bit,” it was like, “Here’s a cell.” The cell represents something, um, and it doesn’t arbitrarily change from moment to moment. So if this cell represents a line in a circ part of your visual space, it’s always gonna represent that line in part of your visual space.
If this li– If this cell represents part of recognizing a face of a human, it’s always gonna be that way. Um, and what the basic idea here is, when we, when you, when you form a representation, let’s say I have two thousand bits, I have two thousand semantic meanings, and I pick the top forty, the top two percent that best represents this thing. The example I used yesterday is I wanted to let it represent a letter.
I might say, “Here is bits,” and we wouldn’t do this, this is just an example. I might have bits for, okay, is the letter a consonant or a vowel? Is it an S sound, O sound or E sound?
Is it hard or soft? Um, where is it in the alphabet? Has it got descenders and ascenders?
And so on, and then I pick the bits that would best represent that thing, and now my representation actually tells me what it is. There is no external place where I have to say, “Oh, this code represents X.” It’s, uh, it’s right there in the code. If I know what those bits mean, the encoding tells you what it is.
That is the entire definition. Um, and so we say each bit has semantic meaning, and these bit definitions have to be learned. Now, there’s these properties of SDRs which are very, very important.
Um, one is one of similarity. If I take two sparse distributed representations and I compare them bit by bit, if they share bits, that means they share semantic meaning of some sort, and therefore, they’re similar semantically. The more bits they share, the more semantically similar they are.
Very simple property. The next one is if I wanna store a, a, a pattern, a sparse representation, and I wanna recognize it again. So, hey, here’s a pattern.
I’ve seen this. Now I wanna see does it ever occur again. Well, I could save all two thousand bits.
That’d be one way of doing it. That’s how a computer programmer would do it. But we can do something simpler.
We can just say, we’ll just store the locations of the ones. So I have forty-one bits. I’ll just say, where are those forty-one bits?
And if I see ones in those forty locations, I know I’ve got my pattern because that’s all there is, is forty ones. And then we can ask the next question, which is, what if I couldn’t save all forty indices? What if I can only save 10 of them?
Um, and so I say, “You can’t, you can’t store all of them.” You just kind of have to randomly sample all the 40 and just save the locations of 10.” Well, so I g– I do that, and I say, “Here, I’m showing I’m storing some of them, but I’m not storing other ones.”
Now, a pattern comes in, you say, “Is it the same pattern or not?” And I say, “Well, look, the 10 are the same. What are the chances that the other 30 are being the same?”
You say, “Well, that could be wrong. Those other 30 could be someplace else.” The chances are very, very, very unlikely that’s gonna occur, but if it does occur, you, you’ve made a mistake, but it’s a mistake with something semantically similar to the thing you did store.
So it’s like, okay, well, I, I made a mistake, but it’s similar to the thing I stored before, and that’s often very good enough. And then finally, this is, uh, the most difficult property is the one of union. And you can take these sparse distributed representations and what if I take ten of them
And I say, “Order them together”? So I have these ten patterns, each of two percent of the bits on. I order them together, and now I have one pattern, which is about t- about twenty percent, a little bit less, of the bits are on, and I have this union.
And I say, “Well, can I tell you what those first ten were?” No, I can’t do it. I can’t un– I cannot undo this operation.
But I can do something very interesting is I can take a new sparse distributed representation and ask, is it one of the members? And the answer is, I can do that. If I find that the patterns, the ones in my unknown match the ones in the, in the union, I can be very certain that this thing is a member of the original ten.
And you might again say, “Well, it could make a mistake,” but simple math will tell you it’s not going to happen. Very, very unlikely. And this is– how we use this is when the brain makes a prediction, um, it’s making predictions in the activity of cells, and it makes it– and it, and it essentially says, “I can have multiple predictions going on at once.
I can predict many things that might happen next, and I can tell you if what actually happens was one of those things.” So when we make predictions, you often can’t tell me exactly what you’re predicting. You don’t really know.
But when something unexpected occurs, you know it. And that comes from that property I just showed you there. So again, this is the most important thing about machine intelligence and about brains.
You want to walk away and say, “Hey, I heard this guy Hawkins talk.” He’s kind of crazy, but, but one thing I remember, this would be it.” Brains and Numenta machines are gonna be based on sparse distributive representations.
You can bank on that, um, and many people will in the future. Okay. Now I’m gonna talk very, very briefly about sequence memory.
I will not be able to go into all details about this, but I wanna give you the flavor for it. Yesterday, I gave you s- many more of the details. Here we’re seeing a sparse distributed representation, but instead of showing you ones and zeros, we’re showing it as little cubes.
We can imagine those cubes being cells in the brain and at– the red ones are active, and the white ones are inactive. And at any point in time, I’ll have two percent of them active like this, and then at another point in time, another set are acti-like that. And the basic idea we wanna do when we wanna learn sequences is we don’t try to learn the sequence of everything, but every cell in itself, every little cube here tries to predict its own activity, tries to learn when it follows something else.
And if you do that, you end up with a distributed sequence memory. So when a cell becomes– So we have these patterns coming into your brain right now as I’m talking. Whoops.
As I’m talking, these patterns are flashing on back and forth like this all the time, and when a cell becomes active, it looks for cells nearby. It doesn’t have to look for all of them. It just has to subset, find a few, and say, “Let me remember which ones are active.”
Those are like those indices. Which ones are active? And I’m gonna save those ones and load them, and if I see them happen again, then I can predict my own activity.
And here’s a situation where I have an input coming in and a whole bunch of cells, yellow cells, are predicting they’re going to come next. This would be in a situation where I’m predicting multiple things. So if I had like A followed by B, and then A followed by C and A followed by D, and I show you A, it’s going to predict B, C, and D.
Then I showed yesterday that this is a first order memory, meaning it only can predict based on what the last thing that happened. It has no history. It can’t tell you things about how long in the past.
Imagine a melody. A melody is a high-order pattern. Think about Beethoven’s Fifth.
It goes, ba, ba, ba, bum, ba, ba, ba, bum, ba, ba, ba, ba, ba, ba, ba, ba, ba, ba. The first, uh, f- four notes, ba, ba, ba, bum, are repeated as the ninth through twelfth notes. Exactly the same notes, but you don’t get confused.
Um, and, and, and I don’t get confused as I go through the entire melody. I never get lost and say, “Oh, it’s just the beginning again. I’m starting over again, starting over again.”
In, in order to do that, you have to have a high-order memory. Uh, you don’t wanna get confused if the beginning always sounds like the beginning, and you’re lost. So we need to do this.
And without going into the details, the solution to this problem, I believe, has to do with using columns of cells, and we see this in the brain. Uh, instead of one cell per bit, we actually use multiple bits, Um, uh, per, per, per– We use multiple cells per bit in our sparse distributed representation, and this gives us this very, very high-capacity memory. And I won’t walk through it here.
I’m just gonna tell you that it’s a, it’s a variable-order sequence memory. It’s, it does multiple simultaneous predictions. It’s extremely high capacity.
It’s a distributed memory system, meaning it’s fault-tolerant. You can drop out cells and neurons and columns. It’s just like in the real brain, it keeps working, And, and it does semantic generalization.
Um, if you really want– If you missed yesterday’s talk, or even if you had yesterday’s talk, you can read the full details of this. There’s a white paper on our website which tells you all about this in great detail. So you can go and say, “What was he talking about?”
You can read about it. Okay, so that’s it for my, my review, and now we’re gonna go forward. We’re back to this situation right here.
He said these are my six attributes that I believe need to be i-in part of an intelligent machine. Now, you know, intelligence is a, is a, is a scalar. It doesn’t–
You know, there isn’t some threshold to it. As I said, we have lots of intelligent animals with different capabilities. And so maybe you won’t be able to do all these things, but, but a, a really intelligent machine would do all these.
Where are we today? Here’s where we are today. Today, we understand the sensory arrays and streaming data.
We, we are modeling this today. We understand the sequence memory, uh, very, very well. We understand the sparse distributed representations very well.
We partially understand the hierarchy. We, we, we, we, we have some pieces of it, but not all of it. So some of our– we’ve done some simulations with hierarchy, some simulations without hierarchy.
There’s more work to be done there. And now what I’m going to do is show you what can you do with a very simple version of these, these ca- capabilities. Can I produce something, I’m not going to call it an intelligent machine, but it’s on its way to being an intelligent machine.
We’re, we’re, we’re taking baby steps there, but it turns out I can do something very, very commercially useful today with just these properties. And that introduces the concept of our product. And to tell you about our product, I have to give you a little bit of a, A, a side diversion into s- into, into data.
I’m going to have to talk about the world of data a little bit, so bear with me. It’s not going to be very technical at all. Today, we have the ability to collect huge amounts of data.
Perhaps you’ve heard about big data. It’s a popular term these days, and we’re storing so much data, they now, it’s sort of like the Library of Congress is the metric for how much data we’re storing. Oh, we’re storing four Libraries of Congress an hour or something like that, you know?
Uh, huge amounts of storage. And the problem is, what do we do with this data? We put it in these databases, and then people can look at it.
They, they have tools for visualizing it and, and so on. You see the patterns, and they, and they build predictive models. They hire machine learning experts to come in and, and they build these models that take some months and then the models get out of date, and they do it again, and so on.
This is an unscalable problem. It’s not… It w– This is not the solution in the future.
There’s all these problems with it. There’s problems with pr-preparing your data. There’s problems that the models become obsolete.
The patterns in the world change, and someone builds a model, and then, then the model’s no good anymore. This, you see this in credit card fraud. People are always…
They’re the, there’s the fraudsters. That’s what they call them. And, and then there’s the people who build the, the f- the fraudster detector.
And so they build these models that detect the fraudsters, and they say, “Ah, the model’s working really well.” But the fraudsters figure this out, and a month later they got new ways of cheating. And it just, it’s just a race that goes on and on all the time.
And there’s another problem with this, is that it requires people, lots and lots of people, machine learning experts and so on. So this is not a scalable solution. And by the way, this is an important problem.
The world is going to be awash in data. We’re gonna have trillions of data centers. You may have heard of the Internet of Things.
Everything in the world is gonna be connected and streaming data someplace. What are we going to do with this? Not this.
Okay? The answer to this is the following. The answer is you’re going to take that data, and you’re going to stream it to online, what we call online models.
Those are continuously learning models, and you’re going to take actions directly from it. You’re going to take the data, put it through these models, make predictions, detect anomalies, take actions imme-immediately. Does this sound familiar?
This is what brains do. Streaming data, continuously learning models, making predictions, detecting anomalies, and taking actions. And so there’s an opportunity here to rethink the way data is acted upon in the world.
And this turns out to be a very powerful idea, and we’ve turned– we’ve built a product called Grok, which does this. Um, the key criteria here are– is the automated model creation and the continuous learning, because as the patterns change in the world, the models have to adapt. Just like, you know, you’re–
As a human, you do this too. You– every day, you learn something new and you adapt, and you abandon some things you learned in the past and you adapt new patterns. So we do this continuously, and it’s very, very important to find this– the temporal and spatial patterns in the data.
Very few people look at the temporal patterns in the data. That’s what brains do. But the temporal patterns in the data are very, very helpful.
They tell you what’s going to happen next. They help tell me what’s– when things are unexpected. So this is a great model for what brains do.
So we’re going to build miniature brains. We are doing this, building miniature brains, if you will, that take data streams and make predictions and take the a-a-actions. So building a product is quite an endeavor.
Uh, and so, um, uh, th– I’ll just walk you through some of what it– what this is. Here’s a sort of a, a diagram of how we do this. On the left, I have these three vertical bars.
They’re representing records of data from d– some data stream. You can imagine this coming off a com– off of some computer or server. You can imagine coming off sensors on a building.
And these are records coming in in time. They might be coming in very rapidly, once a minute, once every five seconds. They may be coming in slower, like once an hour.
Um, they have multiple fields. They might be numbers and categories and things like that, but we feed them into our, our system. The first thing we have to do is we have to turn these, these inputs into sparse distributed representations.
That was one of my criterias. The sensors have to produce streaming sparse distributed representations. So we have a way of doing that.
I don’t mind telling you about it, but I’m not going to go through it in this talk. We can take numbers and turn them into sparse distributed representations that have the right properties. We can take category information like, you know, um, male, female, and days of the week and things like that, and we can turn them into sparse distributed representations, and we do that.
Then we feed them into the, the sequence memory. It’s a, it’s a cortical model, 2,000 columns, 60,000 cells, you know, um, 300 million synapses doing the thing I just talked about before, those columns and all that stuff, and from that, we make predictions, and we take actions. Here’s, um, what a user would do with this system.
They don’t need to know any of this stuff. The user essentially says, “Oh, here’s my data.” I have a stream of data coming from this building,” for example.
“I want to, I want to define a problem of trying to make predictions every so often, and here’s what I’m trying to predict.” And then, uh, then our product, Rock, basically creates these models, figures out how to do them, it learns continuously, it finds the spatial temporal patterns in the data, and it makes predictions. And it can tell you the probability of these predictions.
Um, and there’s lots of areas for applications here. Um, energy pricing, energy demand, product forecasting, ad network returns, et cetera. We have all kinds of people trying to do machine efficiency.
Can I predict the–
(coughs)
which machines to use and when to use them and things like that. I’m going to give you just a couple of examples so you get a flavor for what this is like. Uh, this is not an important detail, but for the computer people in the room, you might care about this.
Today, this is implemented on a, a cloud server, an Amazon cloud server. It doesn’t have to be. This could be embedded in things, it could be embedded in cars, it could be in chips, whatever.
But basically, today, we’ve, we’ve implemented our first little brain models, um, as a service on the cloud. Um, so I’ll walk you through a couple, uh, simple slides, uh, an easy to understand version of this, which is energy. You may not be aware of this, but large consumers of electricity, um, they, they pay and they decide how much energy they’re gonna use, uh, sometimes an hour by hour basis.
That is a market for like a large factory. When they talk to the utilities, they’ll– the utilities say, “Well, I’ll sell you this electricity at four o’clock in the afternoon at this price if you take so much or don’t use so much.” And then the, and then the consumer of the energy says, “Okay, I’ll either do that now or not.”
Or there’s some, some people, they will pre-cool buildings because they know that the, the electricity is gonna be cheap– uh, more expensive later in the day. And so there’s this market going on and for large consumers of, of- of power that you’re probably not aware of.
Uh, it’s called the demand response market. And if you can make that more efficient, you can save energy and save money. So here’s a, here’s a typical example.
Here’s an energy profile of a building, electrical… It’s just some factory. And you can see there’s a pattern here.
It turns out that these peaks correspond to days of the week. So you can see there’s five days of the week, and then the weekend comes along, the factory’s shut down, and nothing’s going on there. Now, we can feed this kinda energy thing into Grok, and it can learn this pattern.
This, this looks like a fairly simple pattern, but it’s not as simple as you think it is. Uh, so, um, Why did it stop working again? Excuse me.
In this case, the, um, the customer, what they wanna do is they said, “At midnight, I want you to predict the amount of energy that’s gonna be used every hour for the next twenty-four hours.” That’s their problem, so we have to do that, and we can do that using these kinda models. So that little red line sort of said, “Okay, can we predict that?”
And so in the next slide here, I’m gonna show you, we, uh… This is just showing… In the red is our predicted, in the blue is our actuals.
And this is, you feed some streams of data in the Grok. Grok doesn’t know anything at all about what this information represents. It doesn’t know if it’s energy or, mm, you know, grams of alcohol.
It doesn’t really care. It’s just a number, and it looks and tries to find these temporal patterns and says, “Oh, I can see these temporal patterns.” And, and it looks like it’s doing a pretty good job.
It actually is in this case, although you can’t tell too well by just looking at these graphs, but trust me, the customer is very happy with this and it is doing something very significant. Um, and here’s a situation where Grok started to make a mistake. There was three days in the week, and then you can see right down here, it starts predicting the next day.
Well, it turns out this was a European holiday, and, uh, it hadn’t seen that before, and it says, “Oh, well, oops, no, it’s not,” and it quickly says, “You know, that’s not the pattern. Here’s another pattern. This looks like the one I’m supposed to…
“It’s supposed to be like this,” and it quickly recovers. Um, there’s another example in dismantling. This one looks a little bit harder, so I’m just gonna give you a flavor for what it can start looking like.
There’s a company we work with that’s trying to deman– uh, predict how much demand for their service, and they have a, uh, a service which is, uh, encoding videos on the web. And so a customer sends a video, it has to be encoded in many different formats, so you can look at it on your phone and other things, and they want that immediate response. As soon as they start sending the video, they want the, the encoded one to appear on the web.
And so this company has to leave lots of computers around running all the time doing nothing because they wanna be able to make sure they can catch a s- a, a spike in demand. And this graph showed you that the demand is ac- actually very spiky. It, it goes up and down all over the place.
It’s very hard to see what the patterns in there are, and, and yet we can run something like this through Grok, and Grok will say, ‘You know, maybe there’s patterns in the afternoon, maybe when schools gets out.’ Who knows what? It doesn’t really matter.
Grok says, ‘If there’s patterns in here, I’m gonna try to find them.’ It can’t do a perfect job, but in this case, it did a, a good enough job that they can save about 15% of their cost, uh, which is very s-significant for them. So, um, it’s– this is the kind of things we’re applying it to.
Now, here I’m gonna show you something a little bit more technical, um, and I hope you can follow this. On the right, just pay attention to the thing on the right. I told you that our models have these, like, two thousand bits, these two thousand, uh, columns of cells, if you will.
There is two thousand little circles on that, on that drawing there, and these are representing the activations, the internal activations of our cortical model. These are like the two thousand columns in our cortical model that we’re running. We’re just looking down on top of them, if you will.
And a green dot means, and if you count them, there should be forty green dots there. Don’t bother. If you count them, there’d be forty green dots there.
These are the– These are both predicted and what actually happened. So Grok was saying, “Okay, in the next representation,” I’m expecting these forty attributes to be active, and it turned out those forty attributes were active.
It was a perfect prediction. It happens a lot, but not always. Here’s another one where those little blue circles are things that were predicted that didn’t happen, but everything that did happen was predicted.
So this is again, our multi prediction going on. It says, “Okay, I could see three or four different things occurring now,” one of them actually did occur.” That’s good.
We like that. So that’s what they’re representing there. The blue– the little blue circles are things that were predicted but didn’t happen.
And then finally, here’s a situation, and you probably have trouble seeing this. There’s a bunch of little red circles on here. If you can’t see that, trust me, there’s a bunch of little red circles.
So we have red circles, green circles, and little blue circles. And what’s, what’s going on here is that Grok made a prediction. Some of those, some of those attributes turned out to be true.
Those are the green dots. Some of the things that were predicted didn’t happen, not a problem, but some things occurred which weren’t predicted. Some attributes occurred in the input stream which weren’t predicted, and those are the red circles.
And what– the point of the slide is to say that when you make a, an error in prediction, it’s not a binary thing. It’s not one or nothing. It’s a very nuanced thing.
There’s some things that are right and some things are wrong. And if you actually c-could go and look and probe into this, you’d find it’s– you could tell what semantically was correct and what semantically was incorrect. And it’s, it’s an, it’s a nuanced thing about what, what an, an anomaly is or a prediction is and so on.
So this is the kind of stuff we do internally. And you can use this i-i-in, in various ways. Here’s an example where we’re…
And this is my last example in this thing, uh, of a, um, um, a windmill. This is one of these huge windmills offshore. Have you seen these offshore wind farms?
It’s amazing. These things are, like, monstrous. And in the North Sea they have quite a few, and they’re out there in the sea, these monstrous windmills running twenty-four hours a day, and they’re very, very expensive, and if they fail, it’s very expensive to replace the parts.
Like, you know, the gearbox costs seven hundred and fifty thousand dollars, and it probably costs a hundred thousand dollars to replace it. So if they can, if they can, uh, de-detect anomalies or detect before failures occur, it’s worth a lot of energy and money and so on. So this blue line happens to be the energy consu– the, um, excuse me, it’s the temperature of the oil in a gearbox in a windmill in the North Sea.
Okay? I forget what year, what time is this? Uh, maybe it even says it.
Yeah, in two thousand and eleven. This is the last fall. So, and you can see that that temperature is going up and down rapidly all the time with as the wind goes and changes and so on.
It’s a very complex pattern. And on the bottom, on the red there, you see sort of an aggregated anomaly score for Grok. Grok is saying, “Look, I’m trying to predict what’s going on here.
I can’t, I can’t predict all this stuff, but I’m looking for patterns that I haven’t seen before.” And you can see on the, um… Down here, it, the, it’s the peak there, ’cause there’s actually two peaks in the anomaly score. And what’s happening here is there’s nothing wrong.
If you looked at the temperature of the gearbox, there’s nothing wrong with the temperature. It’s in range, but the pattern is, is wrong. It’s like I’m, I’m listening to a melody and the notes are in the wrong order.
And so we can say, “You know what? There’s nothing out of range here, but it’s not like I’ve seen it before.” And it says, “You know what?
I think you ought to look at this.” And it turned out that in- indeed, that very shortly thereafter, there was a maintenance event.
Um, so this is, uh, encouraging type of sign. And, and this is worth a lot of money and a lot of, uh, saves a lot of, uh, energy and so on in the world. So that’s the end of my, my discussion on Grok.
So now I’m gonna talk to the end of my talk, which is about the future of machine intelligence and why we should do this. Where is this all gonna go? Can we, can we build, you know, really crazy great machines?
Well, it turns out yes, we can. I, I am absolutely certain we can build amazingly intelligent machines. But the question, what are they gonna be like, And what are they gonna do?
Is this good or is this bad? And how is it gonna be? How are they gonna be amazing?
What’s, what’s unusual about them, and so on. So let me just walk through some of these things. You know, there’s, there’s two basic views on this.
One view is this is bad. Um, you know, Sky- if you know, I’m not a science fiction fan, but I know what these are now.
Um, Skynet is this bad machine intelligence that takes over the world, and so on. There’s The Matrix. This is like, you know, we’re all being plugged in, we don’t even know we’re b- being consumed for food or something like that.
There’s, there’s the Terminator 1, that was the bad Terminator, you know, robot guy. Um, then there’s a- There’s the benign view.
Uh, you know, wouldn’t we all wanna have C-3POs all around our house helping us out do things? Uh, or maybe we’ll all be playing games with Watson-like things in the future. Or maybe we’ll get our entertainment by donning one of these hats and, and sitting back and go, “Wow, that’s great.”
Who knows? Um, then there’s the sort of the ambiguous thing in the middle, like, well, we thought it was good and it turned bad. You know, it’s– that’s how.
Um, so, um, but let’s just talk about my opinion about these things. So I’m gonna talk about some things that are definitely going to happen in my opinion, absolutely gonna happen. Number one, we can make artificial brains that are faster than, than biological brains.
Biological brains are really slow, actually. The neurons can’t do anything faster than five milliseconds. That’s like the floor.
But we can do things in silicon that are a million times faster than that. So if we, uh, build machines with those principles, we can in principle make brains if– that are millions of times faster than humans. Uh, I’m sure this is gonna happen.
And that really introduces all kinds of interesting ideas about how would you take advantage of that. Certainly, you could get to conclusions quicker, but more importantly, can I deal with things that are very high velocity data streams? Um, can I– things that are working non-stop, can I have a thousand physicists working, you know, twenty-four hours a day on some problem that would take a thousand years, and when they’ll figure it out, type of thing.
I think that’s gonna happen. The second thing, there’s no reason why we can’t make biological—you know, artificial brains and intelligent machines that are bigger, and not physically bigger, but bigger in their memory capacity. Bigger hierarchies, more regions, more size, and so on.
Why not? You know, work– our brains are constrained by the birth canal. You know, we’re pushing the limit, and we have a high, you know, n-n-naturally, we have a high death rate in birth in humans because, because of this problem, you know.
Nature might want to build bigger brains, but they don’t come out. Um, and so, you know, we don’t have that problem. We can make them as big as we want, and, and they will be have more knowledge and, and have deeper insights in ways we don’t know yet, but I’m certain this is going to happen.
Another area which is very, very interesting, we don’t have to stick with any kind of sensors that humans or other animals have. We know, first of all, there’s a diversity of sensors in the, in the biological world. Um, we just have one set.
But you can imagine sensors that are huge arrays that cover the entire planet. You can imagine sensors that, that measure things really microsco- Microscopically, nanosensors that are looking at protein folding and things like that.
And then these artificial brains, these machine intelligences will be able to understand worlds that we have trouble understanding. Everything a human has to understand, we have to put it into something that runs at our speed and through our senses. We have to come up with visualization tools.
So if I’m trying to understand protein folding, I gotta have visual graphics and so on. It’s not a very good fit. But if I’m, if I had sensors that really could l-live in that world, they would think in that world.
We can have 3D sensors. Right now, our sensors are all two-dimensional sheets. The theory says there’s no reason you can’t have higher dimensional sensory arrays.
So that’s really cool. I think a lot of stuff’s gonna happen. Um, you know, fluid robotics.
Today, we have no fluid robotics. If you look at the robots today, they’re so clunky and slow and difficult that we’re not even close to this, but we will have fluid robotics based on these principles where machines can operate. They’re not…
Again, I’m not saying they’re gonna look like a human necessarily, but they’ll be able to do things very, uh, carefully and, you know, I’m, I’m sure we can create intelligent machines that’ll explore the universe, um, for us. Okay, and another interesting idea is the whole idea of a distributed hierarchy. Now, I told you, you know, we have this hierarchy in the brain, in our, in our heads, in the neocortex, and that’s how the whole thing works.
But what if I could distribute it? What if I could have parts of the hierarchy all around the world, and, and then they combine together, and that’s so I can model very, very large systems. Um, these are ideas that… who knows which of these are gonna take place?
We don’t really know where this is going to go. The, you know, the history of technology is it goes in ways we just don’t anticipate, um, and often to the, uh, tremendous delight and benefit for us. You know, who could have anticipated where computers were gonna go, where they are today with GPS and cell phones in your pocket?
50 years ago, no one could have imagined that. Who could have predicted our, uh, communications abilities, starting with the telegraph, what we can do today? Unbelievable.
And so these are gonna go in these directions as well. Here’s some things that might happen. I don’t really know.
I’m kind of on the fence about them. One is humanoid robots. Are we really gonna create C-3PO’s?
I’m not so certain. And I mean, you know, if you really, really wanted to, I suppose you could, but I’m not sure we really, really want to. Um, this is, this is not about, you know, science fiction.
This is about building tools for us as humans to make our lives better and to discover things. And so, you know, will this happen? I don’t know.
If you want to do that, you’ll have to build a lot more things that we– that I didn’t talk about today. You’d probably have to have emotions, you’d have to have bodies. Whoops.
You’d have to have bodies, and so on. So maybe that’ll happen, maybe not. Um, who knows?
I’m, I’m not so certain about this computer brain interface. Now, there’s some really cool work that’s being done in this area, some here right at Berkeley. Um, where, you know, for people who have, uh, uh, who have, uh, have, uh, have, have damaged their, their nervous system or something like that, we can put patches on their brains, and they can learn to control things.
We already have, uh, uh, artificial cochleas that work very well. And so the idea that, you know, we might create an interface between our brain and the rest of the world, um, you know, certainly is gonna happen at some level. But I’m asking the question, are we gonna like plug ourselves in in the evening and, and, you know, totally wired?
I don’t think so, but I’m not gonna discount it completely. Um, you know, who knows? You never… Very difficult to tell these things.
Um, here’s some things that I don’t think are gonna happen. I really… I still– I’m very, very doubtful this is gonna happen.
Um, th-this is a popular thing these days. Uh, Ray Kurzweil talks about this, like you’re going to upload your brain. So here I am and say, “Okay, Jeff Hawkins is, is standing here.
I’m gonna take my– all my brain connections, and I’m gonna stick them in some artificial machine over here, and I will transfer myself from here to here, and I will have immortality, or I’ll have superpowers, or something like this. Um, I believe this is a fantasy. And, uh, it, it, theoretically, it’s possible, but if you know anything about brains and how they really work, um, I don’t think it’s possible.
I don’t think we’ll ever get the technology to do that. But, um, and it’s also, I think, would be a very unsatisfactory experience. Um, now imagine I was sitting here, and we were able to transfer my connections into, uh, my little m-machine doppelganger over here, and, and they flip the switch, lights flashed, and bingo, it happens.
Now, this guy over here says, “Wow, Jeff, I’m Jeff Hawkins. I’m over here.” But I’m still here.
I said, “No, I’m still here. I didn’t go anywhere.” You know, uh, y-
“But we can get rid of you, the biological Jeff,” “because we don’t need you anymore.” I’m like, “Whoa, oh, don’t do that.” You know, I-
I don’t think this is kind of weird stuff. So I don’t think this is gonna happen. Um, I don’t think we’re gonna have evil robots.
I- this, this is another science fiction fantasy. Every time a new technology comes along, people imagine it, how it’s gonna just destroy the universe and so forth. Eh, it’s not gonna happen.
Someone have to go really, really hard out of their way to make this happen. I tell you, the only thing that could be really, really dangerous is self-replication. That’s the dangerous thing in the world, machines that can self-replicate or anything that can self-replicate, viruses and so on.
So as long as we don’t make our intelligent machines self-replicating, which is a totally separate field, and I’m not going to go there, then we don’t have to worry about evil robots. They’re not going to be feeling imprisoned or any of that fun stuff. Um, finally, I’ll be realistic.
Then, it’s not only going to be used for friendly purposes. So look, my point of this double negative here, or whatever it is, um, is that sort of people, you know, the military’s gonna try to do things with, with intelligent machines, yet that happens. They do the same thing with, you know, cell phones and radios and so on.
So it’s not always gonna be for friendly, benign things, but I don’t think it’s a threat to humanity in any sense of the way. So I’m not really worried about this. Uh, th-these bad things, I don’t think they’re gonna happen, um, and I think history is on the side of that.
Uh, so we’ll just get rid of those guys like that. Okay. Now, I’m going to finally talk about, this is my last slide, I’m gonna talk why should we do this?
Why, you know, why should you care? Um, I actually think it’s essential. I think it’s essential for the survival of our species, um, and I think it’s essential for our mission as a species, if you will, uh, the purpose of life and so on.
So, uh, the number one thing here is that we can make our world better. Uh, every new technology is, uh, can make our world better, and I think having, uh, uh, machines that can help us be more efficient and take– make the world safer and be-produce better communications, and essentially accentuate our lives, just in the same way computers have made our lives better. In general, computers are a bi-big plus over what things were before.
And, oh, there’s some downsides, but mostly it’s a big plus, and we’re glad we have them to do the things we need them to do when they’re there. And the same thing will be happening here. We’ll become very reliant upon artificial machines.
Not, again, robots, not, you know, evil things. Just machines that work on the principles in your cortex that are helping us figure out how to run the world and so on. But more importantly, if you think what is the, what is, you know, what is the purpose in life?
Um, I don’t know, but I, I, I, maybe there is no purpose in life. But there is something I know. I know that we are, as a species, have have been discovering what the universe is.
Our, our, our path has been to discover more and more about the universe. And perhaps, as we discover more and more about the universe, we will discover why or how, uh, the origins. We might come to closure about that.
We may not. We don’t know. But I don’t know anything else that’s worth doing.
And I think this is what motivates many people of science, and pretty much motivates everyone to some extent. We want to know. We want to know more about things.
And today, our brains are how we figure out more. And we, we have scientists who do this, and this is what they do. They, they look at patterns in the world.
They say, “Can I discover the structure in the… I mean, they look at data in the world, they say, “Can I discover the, the structure and patterns in that data? Let me test my model.
I build a model of that. If it’s testable, then I can assume it,” and we build mo- knowledge on top of one another. Well, that’s what brains do, right?
That’s what science is. And by having very, very intelligent machines in ways I can’t even imagine yet, uh, I’m sure that we’ll be able to sc-discover more about the universe. Yeah, I doubt that humans are ever gonna go explore into space and spend years and years traveling around in the universe.
I– Maybe it’ll happen, but I doubt it. But can we send intelligent machines in our place to go out and discover the world, come back and tell us what it’s about?
Yeah, we can do that. So I– And, and can we have those thousands of physicist, physicist brains working around the clock? Yeah, we can do that.
So I think this is gonna be a way of really accelerating the, uh, the, the accretion or the, the, the, the assimilation of knowledge and data in the world, uh, that’s gonna be unprecedented. Uh, the human brain has been unprecedented in the, in the long history of biology and our ability to do this, but I think we can accelerate this dramatically. So I find this all very exciting.
I, I want– You know, this to me is like, “Oh, this is the future. This is so cool.” I won’t live to see most of this, but, um, I think it’s worth working on.
And, uh, and I think it’s a very exciting future. And, uh, that’s why I come and do this, and why I come and speak to people like you. So that’s it.
Thank you very much.
(applause and cheering)
So, uh, we’re gonna take some Q&A now. If you wanna leave, fine. I know there’s a presidential debate tonight, so maybe some of you wanna go home and watch that.
Go ahead. My wife’s leaving. Um, but, uh, I’m happy to take some questions from anybody in the audience.
And I suppose– you’re supposed to come up to this microphone and speak to the microphone. Any volunteers? Yes, brave soul.
[01:06:44] AUDIENCE MEMBER:
Thank you. So in some of the, your slides, you were talking about the actions, motor part of the brain, but in the model you presented, there is no actions taken by the model.
[01:06:56] JEFF HAWKINS:
So— Um, I didn’t hear the question completely.
[01:06:58] AUDIENCE MEMBER:
So… So in, in the model that you presented— In the model… this Brock model, there were no actions that were ch-changing the environment to be able to sense something different Yeah.
So you cannot change what you are perceiving. The prediction is going to be very hard because you are just a very passive- yeah, uh, system.
[01:07:15] JEFF HAWKINS:
Let me repeat the question in case not everyone else heard it. So the, the gentleman pointed out that in the model I presented, there was no sensorimotor integration going on. There was no action, uh, in the model itself.
And if you remember all the attributes I talked about, and the sensorimotor integration was one of them, I didn’t put a green dot next to that one. Um, so that’s, uh, uh, an area we’re working on, or I’m working on right now, which is very, very interesting and enticing, and it– to understand how that same model… So what we know in the brain, let me tell you what we know about in the brain.
Everywhere you look in the brain, in the neocortex, you look anywhere, you’ll see those layers of cells forming this sort of sequence memory. Everywhere you look, there is, there are cells that project to some motor section of the brain or motor section of the body. And so everywhere you go, there’s a motor output.
So we know that this is– they’re tied together, and we have a lot of interesting clues, a tremendous amount of clues from the neuroscience literature about saying, “How would I understand that?” What’s going on there? What is the theoretical principle underlying how, w– you know, an action affects the input and so on?
Um, I, I don’t have the answer to that, and, uh, but we’re– I’m hot on the trail. I’m excited about the progress that I’ve made on it, and maybe in a year or two, I’ll come back here and tell you about it. In the meantime, the question that we had at Numenta was, can we build something useful without it?
Because if I, if I couldn’t build something useful from it, Numenta would still be a research company, and we’d be working on it. And it turns out we can. Um, we– It turns out the problems I talked about there are simple enough that we can do them without the motor or sensory loop in some sense.
Um, and so, uh, I wouldn’t say– You know, I had those six attributes, and I only had three of them actually in that model. So there’s a lot of things missing, but the question is, can we get started?
So, good observation. It’s not there, um, but I– it looks like we can go forward in getting started, and we have, we have a lot more to do. Good question.
[01:09:06] AUDIENCE MEMBER:
Yeah. Um, I have a, a bunch, so maybe-
[01:09:10] JEFF HAWKINS:
You brought your book of questions.
[01:09:11] AUDIENCE MEMBER:
Yeah, yeah, sorry.
(laughter)
[01:09:12] JEFF HAWKINS:
I, I think, you know, why don’t we start with your most burning question and see-
[01:09:15] AUDIENCE MEMBER:
Yeah, or, uh, maybe I could ask you a couple, you just answer whatever one you, you feel like.
[01:09:19] JEFF HAWKINS:
Well, okay. But just to, in fairness to other people, we wanna make sure we don’t get too many. Okay, so start with one or two. Let’s start with one.
[01:09:24] AUDIENCE MEMBER:
Fair enough. Um-
[01:09:25] JEFF HAWKINS:
We’ll, we’ll judge the quality of your question to see whether you get a chance for a second one.
(laughter)
(laughter)
[01:09:31] AUDIENCE MEMBER:
Okay. Um, Yeah, the first one was just, uh, you had, I noticed you had, what was it? Two K cells- or two K columns, sixty K cells. Yes. Uh, that’s like thirty cells a column. I know the brain has, like, seven in the cortex. So-
[01:09:43] JEFF HAWKINS:
All right. Uh, you’re saying, does that match up to what the biology says?
[01:09:47] AUDIENCE MEMBER:
Yeah.
[01:09:47] JEFF HAWKINS:
All right. So you’re, it’s a good question, although your facts are a little bit wrong there. Um, the, the, the neocortex, this is…
I know everyone gets confused by this. The neocortex has essentially five layers of cells. Okay?
Now, I say five layers of cells, that doesn’t mean five cells. It’s five dense layers of r-really dense cellular material. And the columns span across all those layers of, uh…
People say six layers, but one layer is not cell- It’s acellular, it doesn’t have cells. So we have five layers of cells, and the columns go across all of them.
And I was modeling just one of those layers. Now, if I looked at the entire column across the entire neocortex, in a real neocortex, it’s about, depending where you look, but, uh, uh, many places, there’s about a hundred to a hundred and ten cells in a, in a mini column. This– we’re not talking about the ice cube column, but a mini column in the neocortex.
And so if I were to say I’m modeling one layer of cells like layer three, which is one of the prominent layers in the neocortex, and I say there are thirty cells in a microcolumn, that’s a realistic number. That’s right in the ballpark. Uh, out of the hundred and ten, maybe thirty might be allocated there.
The model is not particular about this. I can have twenty cells. I can have ten cells.
I could have fifty cells. It all works. It’s just varying capacity.
So and then, and so if having two thousand columns, this is a very, very small part of the neocortex. Two thousand microcolumns. These columns are only thirty microns wide.
So to put two thousand of them, it’s a little teeny bit of a section of cortex, one layer, we’re just modeling the teeniest little part of it. Um, but those numbers are realistic, and I didn’t go through it. Yesterday, I had a slide where I talked about the other numbers where they’ve got a hundred and twenty-eight, uh, dendritic segments per cell, right in the ballpark of reality.
Each cell has, uh, up and around five thousand synapses, right in the ballpark of reality. In fact, if you know anything about, uh, neuroscience, uh, you know, a typical cell in the neocortex is several thousand synapses. There are no other theories as to what all those synapses are doing.
This is a very concrete theory, explains all that. Anyway, so my point is, if you, if you, if you delve into it a little bit, the, the numbers match up really well, actually.
[01:11:42] AUDIENCE MEMBER:
Okay. Thanks.
[01:11:42] JEFF HAWKINS:
That was a good enough question. You wanna ask one more?
[01:11:44] AUDIENCE MEMBER:
Yeah, please.
[01:11:45] JEFF HAWKINS:
Okay, one more. You got it.
[01:11:46] AUDIENCE MEMBER:
Um, I remember yesterday… Oh, sorry. I remember yesterday you were talking a little bit about how if, uh, how, uh, what’s it called? Um, robust it is-
[01:11:54] JEFF HAWKINS:
to
[01:11:54] AUDIENCE MEMBER:
degradation and stuff. So I was just wondering, it made me think of if I have multiple Groq instances, if you will, existing in the same environment-
[01:12:01] JEFF HAWKINS:
Yes.
[01:12:01] AUDIENCE MEMBER:
Um, and you were to, uh, like–
[01:12:04] JEFF HAWKINS:
Yeah.
[01:12:04] AUDIENCE MEMBER:
like if they could, uh, I guess, make predictions for the same problem or be connected in some way, would certain instances, uh, take over certain functions and others, uh, sort of distribute their functionality?
[01:12:16] JEFF HAWKINS:
That’s, that’s a, that’s a tricky question, and it’s probably deeper than most of the audience cares about. Um, but I’ll answer it very quickly. Um, I, I– we actually do run multiple instances.
Uh, Grok is the name of the whole thing, but we can actually run multiple instances at the same time of these algorithms. And the reason we do that is because there are some parts of this system which are not really learned. Just like in, in your brain, like, your retina is genetically determined, right?
And, and how it works is genetically determined. And a different retina, animals, other animals have different retinas, uh, they may not be as good. So a dog doesn’t see or a cat doesn’t see quite as good as you do in many situations, um, and has a different type of retina.
So there’s some, there’s things like that i- in our system, like learning rates and how we encode the data, which are sort of more like a, a biological evolutionary thing. And so we don’t really know the best way of encoding the data. We don’t really know the best learning rates for the system.
So we actually run multiple of these models at the same time, and th- they, they end up all working to some extent, but some are better than others. And we don’t– You can, you can view them as not really what you said, like, they end up learning different things, but they kind of view the world slightly differently. This is like my wife and I view the world slightly differently.
She sees colors better than I do. I, I don’t know what the hell She’s talking about sometimes, but she says, “Oh, look at that color on the wall.”
I’m like, “What?” Now she’s an artist, so she’s, she’s sensitive to those things. So, um, so there have been inversions of Grok are like that too.
And so we- we’re constantly running a population of these guys, and, and then some are better than others, and we kill the ones that don’t do very well, and we produce new ones. So those are both good questions, but I wanna let someone else ask the question.
[01:13:44] AUDIENCE MEMBER:
Thank you.
[01:13:45] JEFF HAWKINS:
Sure.
[01:13:47] AUDIENCE MEMBER:
Hi. Um, how does the performance of your model compare to other machine learning algorithms?
[01:13:53] JEFF HAWKINS:
So the question is, how does the performance of this model compare to other machine learning algorithms? So the, the machine learning field is very, very large. Um, and, um, let me just give you…
I’ll, I’ll answer that question in a couple of ways. First of all, y-you have to ask yourself, what other machine learning models are really good at time-based data? Um, very, very few.
Um, I can name them on like two fingers. Um, then the next question, how many of them are machine learn– Uh, continuous learning, online learning?
Not batch learning, right? But cons-con-continuous learning. Very, very few.
Um, and, uh, and so you can… What we, what we, we claim what we’re doing here is we’re saying we’re solving problems that other machine learning processes don’t solve. It’s not like, “Oh, I can compare the performance of yours with the compared performance of this.”
It’s usually not possible. Now, however, I’ll say that when we go into a customer, they typically have some solution. Uh, it may be sophisticated, it may be crude, but they have something, and they’re not very happy with it.
And so instead of going in claiming– we don’t go in claiming, “Oh, our algorithms are better than all the other algorithms.” We said, “No, just give us your problem and see if we can improve upon it.” ’cause usually there aren’t very good solutions for these type of problems.
They just aren’t good solutions. And so we come in, and we do much– If we can do much better, they’re very, very happy. And, uh, so that’s really the, the, the, the claim here.
It’s like, it’s not like you— You don’t wanna get there in some sort of, you know, we’re doing better at this, and you do better at this by one or two percent or something like that. No, it’s not about that. It’s like, it’s about streaming data, it’s about continuous learning, it’s about making predictions, you know, and, and, and, and automated, um, uh, an, an automated model creation, and there just aren’t other solutions like that.
[01:15:30] STAFF:
Next.
[01:15:30] JEFF HAWKINS:
Yeah.
[01:15:33] AUDIENCE MEMBER:
Hi.
[01:15:33] JEFF HAWKINS:
Hi. Hi.
[01:15:34] AUDIENCE MEMBER:
I had three questions. Um, for-
[01:15:36] JEFF HAWKINS:
How many?
[01:15:37] AUDIENCE MEMBER:
What was that?
[01:15:38] JEFF HAWKINS:
How many questions?
[01:15:39] AUDIENCE MEMBER:
Three. Um-
[01:15:39] JEFF HAWKINS:
Three? Oh, you’re gonna go for more than the last guy. Okay. Let’s see.
[01:15:42] AUDIENCE MEMBER:
Okay. Uh, well, um, yeah. The first question I was asking you, um, what’s the equivalent of RAM in the brain?
[01:15:47] JEFF HAWKINS:
What’s the equivalent of RAM? Like RAM, like, uh- Random access random access memory? Yeah. Well, did I say there was an equivalent of RAM in the brain?
[01:15:55] AUDIENCE MEMBER:
Okay.
[01:15:55] JEFF HAWKINS:
Um, all right. All right. I, I, I don’t even know how to answer that question.
All right. So the memory we use… I’ll try. Okay.
The me- The memory we use in a computer of RAM is one type, okay? It’s random access, meaning it’s just a big list of things.
You put an address, you get the result back, right? Um, that’s a type of memory. It is structurally completely different than the type of memory I just talked about.
There’s nothing equivalent to them whatsoever. What– This is a hierarchical temporal memory system, and RAM is a linear flat, uh, random access memory.
So there’s no equivalent to RAM in the brain. Now, I, I could answer this slightly different. If you thought of RAM in a computer, typically what’s, what’s in the RAM in the computer is the temporary memory, and it’s the state of the system, right?
Like, like I had– the hard drive is where I’m storing data, and maybe the RAM is the current state of the system. So if the RAM dies, then you lose all the state of the system. If that’s the question you’re answering, and it looks from your face that that wasn’t the question you were answering.
Um, yeah. But I’m gonna answer it anyway.
[01:16:48] AUDIENCE MEMBER:
I’m still learning about the– yeah, I mean, I’m still learning about this field, so.
[01:16:51] JEFF HAWKINS:
Okay. Well, um, let’s put it this way. When our models are running, there’s an instantaneous state that is like the current activation, and that’s kinda like what this– what’s kept in a RAM in a computer memory. But anyway, I’m gonna do the answer.
[01:17:03] AUDIENCE MEMBER:
Well, it’s not equiva– I mean, but you’re saying there, there’s not, there’s no real equivalent?
[01:17:06] JEFF HAWKINS:
There’s no real equivalent. Oh, I coulda just answered it that way.
[01:17:08] AUDIENCE MEMBER:
Right. Okay.
[01:17:08] JEFF HAWKINS:
No, there’s no equivalent.
[01:17:09] AUDIENCE MEMBER:
Okay. Um, yeah. My second question was al-also, um, I was wondering, is the computing model your, your company’s developing, is that the only computing model that’s– I mean, I mean, I’m assuming you’re, you’re, you’re like cutting edge, so, um, well, what I was wondering, um, what, what models are your competitors using?
I mean, li-like I said, I’m also kinda studying the companies, I mean, like the industry, so.
[01:17:29] JEFF HAWKINS:
I, I don’t want to get too much on the business side of things because it’s not what this talk is about. But, um, uh, we, you know, everyone likes to say this, I don’t know if we have real competitors. Again, you go in, the attributes of what we’re offering are not available in other systems.
Um, just write this down. You got a piece of paper, you know, continuous learning, temporal data, automatic model creation. Okay?
Um, a-and, and go out and find companies that do that. Okay. Okay.
That’s your two questions. Sorry. I’m gonna let the next guy go.
[01:17:58] AUDIENCE MEMBER:
Uh, yeah. Um, two questions-
[01:17:59] JEFF HAWKINS:
No, next, next guy. Um, sorry. Oh. All right. I’ll, I’ll talk to you later. I’ll be here. Okay. I’ll talk to you.
[01:18:06] AUDIENCE MEMBER:
So, uh, could you tell us more about the encoding
[01:18:09] JEFF HAWKINS:
Yeah, Yeah
[01:18:09] AUDIENCE MEMBER:
inputs into Grok and the principles behind it?
[01:18:10] JEFF HAWKINS:
Yeah. Okay. So this is a really good question.
A little techy. I didn’t talk about it in the talk, but I’ll, I’ll go through it. So the question is, you know, I, I said magically, here’s these numbers and fields and so on, and I turn them into sparse distributed representations, right?
How do you do that? Um, let me give you this, the, the one example that’s the simplest one to understand, and, uh, and then, and then you can ponder about it. So imagine I have a number.
Energy, price, temperature, right? I got a number. It’s a scalar or a floating point number.
It’s, it’s on a number line. And I wanna turn that into a sparse distributed representation. So imagine I have this number line in front of me now, and, uh, imagine now, I define, I, I give you, I define a whole bunch of bits, two hundred bits.
And the first bit represents zero to ten, and the next bit represents one to eleven, and the next bit represents two to thir– uh, two to twelve, and so on. Now, when I have a number on the number line, let’s say it’s number twenty-two, I can go see which of those bits overlap with that number. And those are the bits that are going to be one.
Now, this idea came from the cochlea, so this is not– We didn’t make this up completely. This is kind of a little bit like how the cochlea works in your ear.
And so if I give you some number, it’ll be, there’ll be some number of bits that become on because they, their, their, their bands overlap with that number. And if I, if the number moves up a little bit, well, one of those bits will turn off and another bit will turn on. You, you following this?
Can you, you visualize this? So we have a sparse distributed representation. Each bit has semantic meaning.
I have a number of bits on, it’s sparse, and, and similar values have similar representations. And that’s how we do it. Um, and it works pretty well
[01:19:42] AUDIENCE MEMBER:
when it comes to semantic categories?
[01:19:45] JEFF HAWKINS:
Ah, so the s- So the way the semantic categories works, y- you have to understand that, um, everywhere in the system, we’re constantly… I-
It’s learning these semantic meanings, right? You don’t– You start…
The way it works in the brain and the way it works in our systems, you start with very low-level semantic meanings, like little bands of frequencies or little bands of numbers or, or little patches of visual, you know, lines in the visual field and things like this. And you have to build the more sophisticated representations as you go. So the very first thing that happens in our system is I take all these inputs from a bunch of fields, these sparse representations, and I run them through, uh, to something I didn’t talk about today.
But basically, I have to form a new sparse representation that’s two thousand bits. My input may not be two thousand bits. My input may be more or less, and I run it through this thing which learns new representations based on the common spatial patterns in the world.
So it basically says, if I see coincidences that occur, spatial coincidences, I form representations of them, and then I learn sequences of those, and then I form com– It’s a pretty technical thing, so I’ll, I’ll just leave it at that. Okay?
Yeah.
[01:20:44] AUDIENCE MEMBER:
Hi. Hi. Uh, my question is, uh, uh, you said, uh, it is very hard to compare, uh, between your algorithm and other algorithms.
[01:20:52] JEFF HAWKINS:
Uh- I’m sorry. So I say again, hard to compare what? The, the algorithms to other algorithms?
[01:20:56] AUDIENCE MEMBER:
Yes. Yeah. Okay.
Uh, but, uh, some unsupervised feature learning algorithms like deep belief network, uh, they– this algorithm is, uh, less, uh, biologically plausible, uh, compared to yours, but it can, uh, simulate some, uh, known biological effects like, uh, it can… the neuron which maximally, uh, responds to specific angle from the object. So my question is, uh, and your algorithm is, uh, biologically plausible, uh, algorithm, but, uh, is your algorithm able to simulate some known biological facts or-
[01:21:36] JEFF HAWKINS:
Yes. Okay. So the, the gentleman’s asking, he says, “Like there’s other algorithms like deep belief, uh, networks, um, that you might compare to the this, this…
By the way, we call this, this, uh, cortical learning algorithm, the CLA. So you can compare it to the CLA and, um, there’s, uh, there’s some, uh, they, they, they produce some predictions or about biology or about anatomy perhaps. Um, is that what you’re saying?
And then do we have the similar type of thing here? Is that a good paraphrase question? Okay.
Um, first of all, I’ll, I’ll, c– I’ll disagree with something you said. Uh, deep belief networks are not nearly as biological, uh, bound, uh, grounded as what I presented here.
Uh, not even close. Uh, the orders of magnitude are off. And, but they are hierarchical in some sense, and they can deal with time in some sense.
So from that way, I like them. They’re in a good direction. Um, but they’re, they’re very, very quite different, and they don’t do what I said here.
There’s this, it’s quite far apart. So they’re not really easily comparable. None of the customers we’re working with say, “Hey, I can do this with a deep belief network.”
They can’t. Now, the second part of your question is, um, what are the biological predictions that come out of this? Or can I, can I show behaviors that we see in biology?
The answer is absolutely yes. I didn’t present any of that. We did this original work in with computer with vision e-experiments, and we were able to reach a lot of physiological properties that you see, um, uh, in, in, um, in, in the vision system.
Uh, y-y– the answer is absolutely yes. I’m not trying to do that here. I wasn’t showing you that here, um, but it’s actually very good at that.
So I’ll leave it at that. Okay? And you can’t, unfortunately, we didn’t publish that anywhere.
You can’t read about it, so sorry. You’ll have to take my word.
[01:23:06] AUDIENCE MEMBER:
Um, hi. Yeah, I had a question. Some of those, what you
(clears throat)
just, just had said sort of changed my question slightly, but—
[01:23:12] JEFF HAWKINS:
Uh-oh.
[01:23:13] AUDIENCE MEMBER:
Um. New question. But I’m coming g-from a music background.
[01:23:16] JEFF HAWKINS:
Yes. And
[01:23:16] AUDIENCE MEMBER:
I not— I noticed you used a number of musical metaphors just in, in your speaking.
[01:23:20] JEFF HAWKINS:
Yes. So
[01:23:21] AUDIENCE MEMBER:
to say that. But basically, I was thinking about… So I got nervous when you waved the em— the emotion stuff off of the off the thing.
(laughter)
And I’ll tell you, I was just wondering, uh, I had a question, which is, um, like, j-just real basically, what, what I s- what I got out of this talk thus far is like you’re taking information from the brain, maybe this might be– if this is… My question, this is a preamble of my question. But taking from the brain and then using that, um, you know, that interpreting data and somehow making, uh, using it for certain purposes.
I’m just wondering, I guess what I’m wondering is, um, like, have you thought about or in this research, is there, are there– all the, all the pictures had sensors of the brain and everything. I mean, what about the entire body? And I’m just wondering about emotions.
Like, if there’s like things that are happening in the, in the body, could you put sensors in there and use that information, or do you do that, or–
[01:24:08] JEFF HAWKINS:
Okay. So that’s a, that’s a kind of a confusing question, but I’ll try to parse it.
[01:24:12] AUDIENCE MEMBER:
Okay. Well, I’m, I’m happy to try to clarify.
[01:24:13] JEFF HAWKINS:
No, let me try to get it. Okay. Let me address the first thing you said in the end.
Like, what about a body? And you have sensors in a body? Well, by the way, you do have sensors in your body.
There’s a whole, uh, a sensory system called the proprioceptive system, which measures your joint angles and your things like this, and it’s all fed into… It’s a whole nother sense most people don’t know about. But that’s how you model where your body is, ’cause there’s all these sensors that tell you this.
Um, and, uh, we, we don’t have bodies in our system right now. We’re doing, you know, inference on data streams. And, um, and the, the, the day that I have a sensory motor integration loop, I would– and if there was a bo- embodiment of that, I probably would have bodies and have sensors.
But what, what kind of body would it be? Is it gonna be humanoid or is it gonna be something else? I have a feeling it’d be something else.
Uh, by the way, just a little speculate here. When I think about sensory motor integration, most people think about robots. I don’t think about robots.
I think about how it is I would navigate through data. How would I navigate through a sensory world to know which part of the sensory stream to pay attention to? And it’s not even– it may not even have physical movement, but the sensory motor integration problem can– it’s basically dealing with this sort of whole idea of how do I interact with my world and ca- the cause and effect, something like that.
So that was the last, maybe a couple parts of your question. Did I miss an important part of your question?
[01:25:26] AUDIENCE MEMBER:
Yeah, I mean, I guess, I, I guess the, the, I just—
[01:25:28] JEFF HAWKINS:
Do you want to get a question about the emotions? Do I really— can I really get away with, without emotions? Is that? Yeah.
[01:25:32] AUDIENCE MEMBER:
I mean, basically, like, could there be intelligence in the emotions? I guess is what I’m asking, yeah.
[01:25:36] JEFF HAWKINS:
Yes, I think you can have… You know, but as I define it—
[01:25:37] AUDIENCE MEMBER:
Yeah,
[01:25:38] JEFF HAWKINS:
if you wanna define it as human-like, then no. If you wanna be human-like, you’re gonna have to have emotions. If you wanna pass the Turing test, you’re gonna have to have emotions. But to do the things I talked about, make the world better and explore the universe, I don’t think so.
[01:25:51] AUDIENCE MEMBER:
Not necessary
[01:25:52] JEFF HAWKINS:
? Not necessary. Or, you know, in the end, what emotions do, if, uh, from the biological point of view, they basically, uh, how…
There is a… It’s a sort of a switch to tell the neocortex to learn this very rapidly. So when an emotional thing happens, it, uh, the like amygdala will say, you know, “Remember this,” and store it very quickly.
And, um, and that’s how it’s kinda interplays with the neocortex. We can do things like that, but I don’t– it’s not an essential component of what intelligence is. And by the way, I’ll just…
You mentioned the music thing. You said you’re into music. I’ll mention the, um, there’s a guy, Charlie Gillingham, who was, who plays with Counting Crows, and he’s, uh, he’s one of the musicians there.
And this guy is a rock star computer, uh, machine learning geek. You wouldn’t know this. Um, but when he’s not touring, this is what he does.
And, um, and he’s, he’s, he’s talked to us and come in and visit us because he’s actually trying to build a– I don’t wanna give too much away, but he’s trying to build a product in the world of, uh, uh, uh, machine-generated music. And he’s trying to understand how you do this, and he’s been really intrigued by the kinda algorithms we have.
So we’ll see if that comes– something comes out of that. All right. And one more, one more person with questions, this woman here’s who’s up next.
You’re the last one, so make it great. Make it last.
[01:26:58] AUDIENCE MEMBER:
Hi. Well, it’s probably like the simplest question of all. Actually, it was like really similar to what he asked, but, um, I was going to ask about, so I was going to ask about emotions, if you felt that the reason you disregarded it is because it’s just so hard to have an emotion in a, in like in artificial intelligence, or because you just felt that that actually hindered intelligence, so.
[01:27:18] JEFF HAWKINS:
So, it’s, it’s the latter. Uh, actually, m- from a neural mechanism point of view, the emotion systems in your brain are fairly small. Um, I’m not sure we understand them.
I don’t study them exactly. I, I understand how they interact with the neocortex, which is fairly simple. Um, and so I think I have a, a b- a feeling about that.
But it’s more, it’s more a matter of to what purpose do you want emotions? Um, the what’s the purpose? You know, the, why do we want to do this?
If, if there was a need for it, well, we’ll do it. Um, but and I don’t see any reason why you can’t do it. But, but the only reason I didn’t put it on my list is I don’t think it’s an essential ingredient.
I don’t see situations now where you absolutely have to have emotions to do the things I talked about. Streaming data, build models of the world, understand the structure of the world, make predictions about the world. Those do not require emotions.
Emotions are, are more for like, um, what was important and what wasn’t important? How do I prioritize between various things? Um, and things like that.
So, and I don’t think there’s anything mystical about emotions. There’s nothing like, “Oh, you can’t, you can’t model that.” Just it’s…
You know, in the end, I didn’t– I mentioned this in my talk yesterday. I didn’t bring it up today, but, you know, the brain is just a bunch of neurons.
There’s nothing else. Everything is just a bunch of neurons. And, uh, there’s no other magic going on up there.
So, uh, if you can understand how those neurons work and how they play together in the kinda way I talked about, then And yeah, you can model them too.
[01:28:36] AUDIENCE MEMBER:
So there, there’s nothing human that can’t be in a machine?
[01:28:39] JEFF HAWKINS:
Is there nothing human that can’t be in a machine? Again, I said I didn’t think that was a goal, and I don’t think- Yeah, that’s gonna happen, um, because I don’t think there’s no reason to do that.
But I, on a, on a theoretical level, I don’t see anything that’s in a human that couldn’t be in… And, and again, you say a machine. Don’t, don’t think of it like, you know, some, you know, mechanical answer machine.
It’s a very complex memory system and, um, and Yeah, I don’t think there’s anything magic going on there. Absolutely not.
[01:29:07] AUDIENCE MEMBER:
Can I ask one more question?
[01:29:08] JEFF HAWKINS:
You don’t li- you don’t like my answer. That’s why you’re asking another question.
[01:29:12] AUDIENCE MEMBER:
Oh, no, um, it’s different actually. Okay. It’s really on the side, but kind of-
[01:29:16] JEFF HAWKINS:
We’re gonna have to wrap up soon. Only got one person, so real quick.
[01:29:18] AUDIENCE MEMBER:
Okay. Um, okay. So basically you said we have too much data, and it’s like building up. And do you propose a solution for that?
[01:29:25] JEFF HAWKINS:
Yeah, in some sense, um, uh, this is switching away from brains and neuroscience. Yeah. Yeah, the solution is that we are not gonna save all the data in the world.
We j- There’s no point in saving it. Your brain doesn’t save all the sensory information.
There’s no point in saving second-by-second energy information from a billion buildings. The point– The whole way of, of, of getting around that problem is to handle the data immediately, feed it into models, billions of models, have them act on it immediately, and get rid of it.
Um, that’s the solution to big data. Okay. I, I…
There’s another woman. Do you wanna ask a question?
[01:29:59] AUDIENCE MEMBER:
Yes.
[01:30:00] JEFF HAWKINS:
All right. Can I have one more question? I said I lied. I’ll let one more question in.
[01:30:03] AUDIENCE MEMBER:
Uh, I’m just curious that, like, when do you think the fluid robot will exist?
[01:30:09] JEFF HAWKINS:
When will the fluid robotics really exist?
[01:30:11] AUDIENCE MEMBER:
Yeah, existing, like, yeah.
[01:30:13] JEFF HAWKINS:
Boy, uh, that’s a tough question.
[01:30:16] AUDIENCE MEMBER:
Can I see that before I die?
[01:30:17] JEFF HAWKINS:
Will you see it before you die? How old are you?
[01:30:20] AUDIENCE MEMBER:
I’m twenty-one.
[01:30:21] JEFF HAWKINS:
Thirty-one?
[01:30:22] AUDIENCE MEMBER:
I think so, yeah.
[01:30:23] JEFF HAWKINS:
Okay. I’m not sure I’ll see it before I die. Thank you very much. Good luck.
(applause and cheering)