[00:00:00] MARY CUMMERIO:
Good afternoon, everyone.
(background chatter)
My name is Mary Cummerio. I’m professor in the Graduate School in architecture and the chair of the Hitchcock Professorship Committee. We are pleased, along with the graduate council, to present Jeff Hawkins, this year’s speaker in the Charles M. and Martha Hitchcock lecture series.
As a condition of this bequest, we are obligated and happy to tell you how the endowment came to UC Berkeley. It’s a story that exemplifies the many ways this campus is linked to the history of California and the Bay Area. Dr. Charles Hitchcock, a physician in, for the army, came to San Francisco during the Gold Rush where he opened a thriving private practice.
In 1885, Charles established a professorship here at Berkeley, an expression of his long-held interest in education. His daughter, Lillie Hitchcock Coit, still treasured in San Francisco for her colorful personality as well as her generosity, greatly expanded her father’s original gift to establish the professorship at UC Berkeley, making it possible for us to present a series of lectures, this very long and wonderful series which we have had for many years. The Hitchcock Fund has become one of the most cherished endowments of the University of California, recognizing the highest distinction of scholarly thought and achievement.
Thank you, Lillie and Charles. And now a few words about Jeff Hawkins. Jeff Hawkins has a multifaceted career as an inventor, engineer, neuroscientist, author, entrepreneur.
In his book ‘On Intelligence,’ he describes his life as an animated, as animated by two passions: mobile computing and neuroscience. As the founder of Palm and Handspring, Hawkins was at the forefront of mobile computing and developed landmark products like the PalmPilot and Treo smartphone. His lifelong interest in neuroscience led him to UC Berkeley as a graduate student in integrative biology and to found the Redwood Neuroscience Institute aimed at understanding how the neocortex processes information.
In 2005, Hawkins gifted the Redwood Neuroscience Institute to UC Berkeley, where it now exists as the Redwood Center for Theoretical Neuroscience. His latest company, Numenta, brings his two passions together. At Numenta, Hawkins is developing new computer technologies modeled on the workings of the neocortex.
This approach, hierarchical temporal memory, allows machines to extract patterns from complex data streams and predict what is likely to occur. The company’s latest product is Grok, a cloud-based engine that makes predictions from streaming data. Hawkins hopes that Numenta will play a catalytic role in the emerging fields of machine intelligence.
Hawkins received a BS in electrical engineering from Cornell University in 1979. He came to UC Berkeley as a graduate student in integrative biology in 1986. In 1988, he returned to the computing community, ultimately to found Palm Computing and then continued deep engagement with neuroscience through the Redwood Neuroscience Institute.
The institute was aimed at developing a theoretical framework for the thalamocortical system and fostered a vibrant intellectual atmosphere among researchers, post-docs, and students. Hawkins’ account of the neocortex, On Intelligence, was published in 2005 to critical acclaim. Hawkins was elected to the National Academy of Engineering in 2003.
Please join me in a warm welcome for Jeff Hawkins.
(applause)
[00:04:08] JEFF HAWKINS:
Thank you, Mary. Oh, let’s trade places here. So hopefully this microphone’s okay?
Not too loud? Great. Thank you, Mary, for that lovely introduction.
And it’s a real honor to be here. This is a very distinguished lecture series, and it goes back for many years, and it’s a treasure, it’s a pleasure, and I’ve been looking forward to it all year, to being here. So I’m gonna give two talks today and tomorrow, all about intelligence.
Today is intelligence in the brain, and tomorrow, intelligence in machines. Mary told a little bit of my story already, which I was going to introduce here about how I came to be on this stage. And so I won’t go through that all again, but I’ll tell you a little bit more flavor about the time I was here as a graduate student at Berkeley.
[00:04:54] INSTRUCTOR:
This was it goes back to actually 1979 when I just came out of an undergraduate, got my undergraduate degree, and I had read the September of ’79 issue of Scientific American, which was a single-topic issue on the brain. And there were stories by many, many famous neuroscientists, Eric Kandel Hubel and Wiesel, et cetera, and the very last article was by Francis Crick of DNA fame. And Francis wrote, he said, “This is all well and good.
We have all this data about the brain, we know all this about the neuroscience and the neurons and diseases of the brain and so on.” He says, “But don’t be mistaken. We have no idea what the hell’s going on up there.”
And he says, “What we’re lacking is a theoretical framework.” And this excited me. It was like, oh my gosh, we have all this data about the brain, but we really don’t understand what’s going on, how it works.
We need a theoretical foundation for it. And I said, “This is something we should be able to do in my lifetime. It’s something I wanna work on.”
And I basically dedicated my career to that at that point in time. And my first attempt was at MIT where I tried to become a graduate student there in the AI lab, and they asked me they said, “Well,” they said, “Why are you here?” And I said, “Well, I wanna build intelligent machines.”
And they said, “That’s great. We do, too.” And then I said, “But I wanna study brains first to see how they work.”
And they said, “Oh, that’s stupid. Why would you do that? A brain is just a messy computer, and why would you study a messy computer when you can study good ones?”
So I didn’t get into MIT, and that’s why I snuck into Berkeley because at Berkeley I said, “Oh, I’ll just go in as a biology person. They won’t know the difference.” And so I got in and it was actually the biophysics program, but it didn’t really matter.
There was no theoretical neuroscience back then. And at that time, my first six months here, I took classes. I took Geoffrey Weiner’s anatomy class, which was great.
I read Kandel’s book twice end to end And I wrote a long thesis proposal about what I wanted to do as a graduate student. And Professor Frank Werblin was very kind and head of the chairman– he was the chairman of the graduate at the time.
He read it, and he had some other faculty read it, and he said, “This is really good.” In fact, it’s a proposal of what I’m gonna tell you about in my talk. And he said he says, “Unfortunately, you can’t do this here.”
And I go, “What do you mean? I just got all this effort to come here as a student.” He said, “Well, no one’s doing this.
No faculty is working on models of neocortex from a theoretical point of view.” In fact, he says, “I don’t know anyone in the world doing this.” This is in 1986 and he said, “As a graduate student, you gotta work for somebody in somebody’s lab, and there’s nobody doing what you wanna do.”
So I was really kinda bummed out And so what I ended up doing after six months of, of being a regular student, I took the next year and for a course of a year, I would come to Berkeley.
[00:07:26] JEFF HAWKINS:
I kept my student status. I’d come to Berkeley once a week because I wanted to read papers, and you couldn’t. There was no internet back then and to read papers, had to go to university library.
So every week, I would come to Berkeley with a list of papers I wanna read. I had to visit multiple libraries on the campus. I’d go look at these papers.
I photocopied the ones that were interesting to me. I’d come back home, read them over the course of the next week, come back the next week with a set of another papers I wanted to do, and I did this for about a year. And it was a very great knowledge-gaining exercise about the history of neuroscience.
I could study whatever I wanted. But eventually I had to make a living, and I couldn’t do that forever. So that’s when I went back to work, which I thought was gonna be for four years, but it turned out to be for about 16 years because I had very fortunate success in my business life, and finally I was able to extricate myself.
And I wanna tell you one more thing. When I started the Redwood Neuroscience Institute, this was a crazy idea. How do you start a new science institute?
Who’s gonna come up? Who’s gonna show up? Who’s gonna be the first one there?
I said, it was not even my idea. Some neuroscientist friends of mine said, “Why don’t you start this institute? That’s what’s required.”
And I said, “Well, I can’t do that.” I said, “Well, if you help me, I’ll do it.” And one of the people who helped me was Christof Koch.
Another one was Bob Knight at Berkeley, and I talked to Bob, who’s head of the Helen Wills Neuroscience Institute at the time. I said, “Bob, would you help me do this?” He goes, “Sure, we need a theoretical component here.”
And so, from the very beginning, there was a close association between the Redwood Neuroscience Institute and Berkeley. Now, it’s a little bit ironic because the Redwood Neuroscience Institute was located right next to Stanford, and they didn’t wanna really do much with us at all, but Berkeley was very opening. So we came up and we did some classes and we exchanged students and so on.
So from the very beginning, it was a good collaboration and when it came time for me to start my own lab and my own business at Numenta, we had to decide what to do with the Neuroscience Institute. Moving it to Berkeley made a lot of sense and Bruno Olshausen, who’s here, was the, the, the head guy who was helping me out at RNI, and he’s now a faculty here at Berkeley. So, it’s a long story.
Here I am, 30 years into it. I’ve been working on this for a long time. So we’re gonna just jump into it.
Let’s see if this works, hopefully. It’s not working. Why is that not working?
Let’s try again. How about that? Oh, there we go.
That work? Yes. I’m not sure why it didn’t work the first time.
Okay, this is Francis Crick’s words. This is what exactly he wrote, “What is conspicuously lacking is a broad framework of ideas in which to interpret these results.” The results he was talking about were the empirical results of neuroscience.
He said, “Look, lots of data, no theory.” And this is what I’ve been working on, and I’m gonna tell you about the progress we’ve made on this so far. And let’s see hope, like this is gonna work better.
I’ll be able to do a click. Nope, must have to do it this way. Okay, so why is this giving me a hard time?
Okay. I set out two questions I wanted to solve. One was what are the operating principles of the neocortex?
What are the, yeah, the theory behind it? It’s not like an equation or something like that, but there’s a set of principles. I also realized that once we had those principles, we would be able to build systems that work on those principles, and we’d be able to build machines that exhibited the same principles of the neocortex, and so those two came hand-in-hand, and they complemented each other.
The way I go about this is the following. You start with, oh, it doesn’t look like you can see that. There’s a picture of a brain there.
It’s on my screen, not on your s-, it’s not there. Well, there’s a little picture of a brain there. We start with the brain and we start with anatomy and physiology.
These are constraints on the problem. We know a tremendous amount of how the brain organized and wired and so on. These are not things you can ignore.
If we wanna understand the principles of intelligence, you need to it has to be copacetic with those brain principles. And so this, I took this serious, as very serious as to getting totally embedded with the anatomy and physiology and histology of the brain. Then once you understand the principles and you can elaborate them, you can model it, and you can first do that in software, which is what we’re doing today, and I’m gonna talk more about that tomorrow, and then ultimately, you can do it in hardware and we have conversations with people trying to build this stuff today.
So, this is the basic premise. My today’s talk is gonna start with the anatomy and physiology and talk about the theoretical principles. Tomorrow, I’m gonna start with the theoretical principles and talk about how we build this stuff and where is it going.
So let’s start with the brain again. It’s a little bit more visible here. We’re gonna start a little bit of a history about the brain.
You know, if you go back in the beginning when people were really trying to figure out how the organs of the body worked, people would take tissue and they’d look at it under a microscope, and everywhere you looked, if you looked at a liver or a kidney or a muscle, you would see cells. And there’s like looking at peas in a bowl. They’re all there lined up under a microscope.
But when you looked at the brain, it didn’t look like that. It looked like a bowl of micro spaghetti and it was not clear in the late 1800s that the brain was made of separate cells. This was a question, and many people thought it wasn’t.
They thought it was some sort of magic tubular mess of jelly or something like that. And it was one of the most famous neuroscientists of all time Santiago Ramón y Cajal, and every neuroscientist knows this guy. He was like the king of neuroscientists.
He was a Spanish guy. He basically mapped out the entire nervous system for many species. Cajal came along with a new technique that he didn’t discover, but he was exploiting for staining cells, where you could stain completely a cell and see what it looked like.
And when he did that, he started making these wonderful pictures. Here’s one of Cajal’s pictures on the right here of a, of a classic neuron in the neocortex. And what he showed was that actually the brain is made of individual cells.
Even though they branch all over the place and they look like they connect, they don’t. The cytoplasm doesn’t go between, and it’s just like tissue like everywhere else in the brain. Now, this has become known as the neuron doctrine.
It’s the founding principle of which all neuroscience is based, is essentially that the brain is made of a bunch of cells. A corollary to that, which is not often stated, but we should just get it out of the way, is the brain is not made of anything else. There’s no magic pixie dust in there, it’s just a bunch of cells.
And if we wanna understand intelligence, we need to understand how those cells interact and what they do. There’s no room for a mind that’s separate from the brain. There’s no room for a mind that can do extra testral communications and things like that.
So until we have proof otherwise, until we’ve eliminated every other possibility, we have to think of everything the mind does, everything our you know, we think about as our humanity is a bunch of cells doing this in our head. Okay. Now already, you can see in this picture that these cells are very unusual.
They have the cell body, which is fairly small in that picture, and they have these branching arborizations called dendrites, and the connections to the cell where it receives information is all arrayed on those dendrites. They’re not even on the cell body, at least not the excitatory ones. A typical neuron has several thousand of these connections arranged along the dendrite.
This is a very complex system, and lots of inputs. The there’s another part of the, of the neuron, which I’m gonna add to the picture right now, which is the axon. This is the output of the neuron, and it’s a single fiber which, where the spikes from the neuron go out to other cells.
And you can see here, the first thing is that the axon spreads locally and makes connections to lots of other cells nearby. Thousands of them typically, and those connections are very interesting. They don’t connect back to the original cell.
They don’t connect many to any particular cell. A particular other cell may get one or two of these guys, and most cells don’t get a connection at all. So it’s very densely packed in there, and but this, we make a lotta connections locally.
Then the axon typically, as shown in this picture, continues on and goes someplace else and makes more connections. So it makes a lotta connections locally And then it goes elsewhere to make other connections.
And this is the basic idea of what neurons look like in the brain. We have somewhere around 30 billion to 100 billion of these cells in your neocortex and we wanna understand how they work together. If you take a, we actually today have a fairly good idea about how these cells work.
I mean, what their properties are how their ion channels work, how they grow, and how they die, and all kinds of wonderful stuff like that. So it’s fairly well understood, and we have good models of these cells. However in the real brain, we find the cells on their own don’t do much.
You can eliminate any cell, and it doesn’t really matter. It’s the collection of cells that matter. It’s ensembles of cells that matter, cells working together.
Here’s another picture from Cajal and back, this is over 100 years ago he made these pictures, but they’re still quite good. In this picture, you see quite a few cells in it. This is still a very small percent of the cells that you might find in a region like this.
This is about a two-millimeter-high view of the cells in a slice of the neocortex. And already you see some things popping up here. When we see these cells in groups, we see that they’re sort of grouped in layers.
Cajal labeled these with numbers on the side there. You can see the layers are different by the type of cell types, about the dendritics, arborizations, where they project, and so on. And there’s a, so there’s this layer effect, and there’s also a vertical sort of columnar effect you can see in this picture as well.
So our goal is to try to figure out how those cells work and make you smart. Now, I’m gonna, I didn’t say this already, but I’ll say it right now. The brain is a very complex organ.
It has many different components to it. My interest is the neocortex. The neocortex is about 60% of the volume of your brain.
It’s the big wrinkly thing on top, you know, it sort of surrounds the rest of the brain, and it’s like a sheet. It’s a thin sheet of cells it’s like two millimeters thick and it’s the locus of what you might call all high-level intelligence. All high-level vision is in the neocortex, language, whether it’s spoken language, written language, the mathematics, and physics, and so on.
My neocortex is generating my speech right now, and your neocortex is understanding it. The other parts of the brain are important, and they interact with the neocortex in various ways, but my goal is really just to understand how the neocortex itself works, and we’re gonna focus our comments on that today. Okay.
So what we wanna do is figure out, you know, why do cells have all these synapses on them, which is kind of a mystery, and I’m gonna give a pose an explanation today, and how do they work in groups like this? What is the information processing going on in here? All right, so let’s go to that.
Here’s the agenda for my talk. I’m gonna talk about the neocortex as a predictive modeling system, and then I’m gonna talk about three attributes of, of the neocortex that in my work, in my company, we’ve, we three years ago we had a real sort of breakthrough in understanding these attributes, and so I’m gonna focus on them today. One is the way the brain represents information, which is sparse distributed representations.
The second thing is how it learns sequences of those patterns. I’m gonna argue that sequence memory is the primary memory in your neocortex. And then finally, I’m gonna talk about how the online learning or how we learn continuously, how the, how the memory is formed in online learning.
All right, so let’s just jump right into it. The neocortex as a predictive modeling system. So here you see a picture of a neocortex.
Maybe you can’t imagine that little line drawing there. And what it does is receive inputs from your senses. Now, we think of the three primary senses as vision, hearing, and touch.
There’s many more. In fact, vision itself is more than one sense, but we think of these as three primary senses. The first thing you have to know is that they’re not singular senses.
The vision, the retina is really like a million senses. It, there’s a million fibers on your optic nerve going from the retina to the neocortex. It’s not like a picture.
It’s actually a million separate little sensors that are arrayed topologically. The same thing with your body, or somatic senses. There’s an array of senses on your body that’s got another million fibers coming into your brain, and your cochlea produces about 30,000 again, individual nerve fibers coming into the brain.
So, you have this massive multi-million bundle of nerves, activations coming into your brain, and they’re changing very rapidly. That’s what we mean by high velocity. They’re changed on the order of milliseconds, tens of milliseconds, and hundreds of milliseconds.
It’s like a fire hose of millions of fibers flipping on and off constantly, and this is what the brain works with. This is it. Once those patterns enter the brain, they’re no longer light, sound, and touch.
It doesn’t exist inside the brain. It’s just patterns of activity, and the brain has to build a model of the world from those patterns, and you start this at birth. When you’re born, you have the structure of the neocortex, but it doesn’t know anything.
It doesn’t know about Berkeley, or universities, or chandeliers, or lecterns, or computers, or water cups, or toothbrushes, or cars, anything. It has to learn everything in the world. It has to learn what your environment’s like, and it has to build this model of the world.
The amount of things it learns is amazing. You’re not even aware of all the things you know. And it has to build this from this stream of data.
One thing we can say, once it’s built this model of the world and new patterns come in, it recognizes those patterns, and it does a few things. One is it makes predictions, and I wrote extensively about this in my book on intelligence, that the brain is constantly making predictions. You’re not aware of most of ’em.
You’re constantly predicting what you’re going to see, what you’re going to feel, what you’re going to hear. If I do a simple gesture, just like putting my hand on this lectern, and I’ve never done this in this exact spot before, as my hand comes down, my brain has c– expectations. If my hand went too far, it would pass an inch through this lectern, it would say, “Whoa.
Something’s wrong.” If it felt like water, or cold, or metal, or Jell-O, it would all say, “Something’s wrong.” And you’re not even thinking about this, but your brain is constantly making expectations.
You’re predicting what I’m going to say. You may not be able to predict the exact words. Sometimes you can, like what is the word at the end of this sentence?
But other times, it’s just you’re predicting attitudes. But if something is wrong, you know it. And so part of prediction is also detecting anomalies in the world.
So, we know that the brain is detecting anomalies as well. And finally, everywhere you look in the neocortex, it’s generating actions. So again, as I said earlier, my neocortex is speaking right now, and it’s controlling the high level actions.
Other parts of the brain are also doing behavior, but the neocortex is doing all sort of high level behavior. Okay. So, this is what we wanna do.
We have this high velocity data stream building a model. Now, if I were to tell you the three top attributes, my top three attributes explaining the principles of how the neocortex works, these are them, starting with number one. It’s a hierarchy.
Although it looks like a sheet of cells, it’s connected in a way that it’s literally like a hierarchy of regions, and this has been well known for a long period of time here. Information flows into regions at the bottom of the hierarchy. This is a caricature drawing.
This is not a picture of a real hierarchy in a brain. It goes into the bottom of this hierarchy, and it flows up, and it also flows back down. It converges as it goes up and it diverges as it comes down.
We also know that these regions in the hierarchy are very similar. They look the same, and it is believed and we act as if they are the same. They’re actually doing basically the same thing.
There’s variations on a theme here, but the regions at different points in the hierarchy, the ones that are connected to vision, and touch, and hearing, are actually all doing the same thing. They’re all doing the same sort of information processing. There’s tremendous evidence to support this.
This was first proposed in 1979 by Vernon Mountcastle. So, we have a hierarchy of cortical regions that are all doing the same thing, and if they’re all doing the same thing and the whole thing does prediction, model building, prediction, anomaly detection, and motor behavior, then every region is doing those things. It’s building a model of its world.
It’s making predictions, detecting anomalies, and generating behavior. So, that makes our job a little bit easier. Here’s a picture of what a real hierarchy looks like in the brain.
This is a famous, all the neuroscientists in the room will know this picture immediately. This is from Felleman and Van Essen, and it shows in a macaque monkey some of the regions in the visual cortex. We don’t need to study this in detail.
There’s little colored boxes of different regions in the neocortex, and you can see there’s quite a few layers of ’em, quite a few of these guys, and they’re all hooked together, different species of different hierarchies. We don’t really know what the humans looked like, but it’s probably something similar to this. We’re not gonna go into more detail here, but this hierarchy is a fairly complex thing, and it’s real.
Okay. The second principle here that I argue is that the primary memory function which is going on, and the cortex is a memory system, is sequence memory. Now, I want you to think about this.
May not be obvious to you, but when you are trying to understand the world, for example, you’re trying to understand my speech, the patterns are coming in time. It’s like you’re listening to a melody. The order of which the patterns occur, rapidly changing order, rapid changing patterns, is important.
It’s a sequence of patterns. The same is true when I touch things, and the same is true of my eyes. I’m constantly moving my eyes in the world, and there’s a pattern, a sequence coming in.
For you to understand or infer on the world, you have to have a memory of sequences. Similarly, if I’m gonna make predictions, I have to have a memory of what follows next. You know, what follows this?
What is typically after this? And that’s a sequence memory. If I’m gonna generate motor behavior, think about what happens to my, for me to create this speech right now.
I am creating a very fast temporal pattern on dozens of muscles in my voice box that are going in this complex pattern. I am not making this up. I’ve said these words before.
I’ve said sentences like this before. I’m doing little variations on a theme here and there. But basically, I’m playing back sequence memory of very complex sequences, and we do this in all our behavior.
Sequence memory is key to understanding inference and prediction, and I believe this is the key to understanding how the hierarchy works. In fact, the idea basically is in the hierarchy as you learn sequences of sequences and sequences, you go up the hierarchy, you, you build stability. The third component here is the way the brain represents information, sparse distributed representations.
It’s been known for a long time that when you look in the brain, you find that at any point in time, most of the cells are relatively quiet, and a few of the cells are very, relatively active. It’s sparse. That’s what we mean by sparse.
Very few things are very active. Most things are quiet, and there’s a lot of inhibition that’s making this happen. Um, it’s been, you know Bruno Olshausen, I mentioned earlier, he wrote one of the seminal papers on sparse coding in the visual system, looking at it from an efficiency point of view.
But only recently did I fully understand some of the attributes of sparse distributed representations, and we’re gonna talk about ’em. So everywhere you look in the brain, even coming out of the senses, you see sparse activity in between regions, within regions. Everywhere you go, it’s sparse, and this is not an accident.
There are some properties of sparse representations which are essential for building intelligence, and I’m gonna talk about them. Okay. So those, I’m gonna now go in, in detail on two of these items.
I’m not gonna talk further about hierarchy. I’m gonna talk about sparse distributed representations, and I’m talking about sequence memory, and we’re gonna talk about how groups of cells can learn sequences of sparse distributed representations. So now, I’m gonna switch into a little bit more, like, a computer scientist mode of thinking here when we talk about sparse distributed representations, and I’m gonna first tell you about how representations are used formed in a computer, and then contrast that to sparse representations in the brain.
So in a computer, we typically use something called dense representations. You might think of this. We have a few number of bits, maybe eight like in a byte or 128 bits or 64 bits, something like that.
We use all the combinations of ones and zeros. So if I have eight bits, I can use all from 0000000 into 11111111, and everything in between, it’s all valid. The individual bits don’t mean anything.
Start with an example, the ASCII code. The ASCII code’s an eight-bit code for letters, and so the letter M is represented by 011101101. However, if I say to you, “What do those bits mean in an ASCII code?
What does the third bit in that encode mean,” something? It doesn’t mean anything. It’s just, I have to look at the whole thing.
There’s nothing about that representation which tells me what it is. There’s nothing about those bits that say M. It’s just an arbitrary assignment.
The bits themselves have to be looked at as a group, some programmer decided one time that that should be an M, we’re gonna–and the computer knows nothing about that. Now these representations are assigned. They’re not learned.
They’re just made up in some sense, and that’s a great way of working on computers. In the brain, we don’t have that. We have sparse distributed representations.
Let me just tell you what they are. First of all, you have thousands of bits. Now, when I say bits, you can think of cells, okay?
So when I’m talking about bits, it’s the same as cells. When I say we have a few active bits, I mean I have a few active cells. So I might have thousands of cells or thousands of bits.
You need to have a large number. We typically work with 2,000 bits in my work at Numenta, and they’re mostly zeros and a few ones. We will typically use 2% activation, so we’ll have 40 one bits and 1,960 zero bits, and so this is a sparse representation.
I show you a example there, just showing a bunch of zeros going off to the distance here. Now, the difference here in the brain is that each bit has some sort of semantic meaning. Each bit has some meaning on its own.
You just—it’s not like this is an arbitrary bit and or an arbitrary cell. One moment it means this, in something moment it means something else. They have some sort of semantic meaning.
We may not be able to figure that out very easily, but there’s some sort of semantic meaning. And what we do is we pick the top bits, the top semantic meanings for a representation, and this will become clear, but let me give you an example. This is a made-up example for analogy.
It’s not the way we would do it, and it’s not the way it’s done in the brain. But let’s say I wanted to represent a letter, like the letter M in a sparse distributed representation. I would come up with 2,000 attributes.
I might have attributes for this is a vowel or this is a consonant. I might have attributes for where in the alphabet this letter is. I might have attributes for what it sounds like.
Is it O sound or E sound? Is it a hard sound or a soft sound? Is it a fricative sound?
I might have attributes how it’s drawn. Is it drawn with descenders or ascenders? Is it a closed shape or an open shape?
And so on. And then when I wanted to represent a letter, I would pick the 40 attributes that best represent that letter, and therefore, the meaning of the letter, the meaning of the thing is actually incorporated in the encoding itself. And that’s a really great thing to start.
Now, we’re gonna have to learn these meanings. The brain doesn’t know these bit meanings. It doesn’t know what the cells represent when you’re born.
It has to learn them. But the basic idea is as I said. Now, there’s some great properties that come with sparse distributed representations, and I’m gonna go through those.
This is pretty key to understanding everything else that’s going on. Let’s start with the simplest one. Let’s say if I wanted to take two sparse distributed representations and say if they’re similar.
I could just basically line ’em up and say if they have similar bits, one bits in the same locations, they have semantic similarity. If they don’t, they’re semantically different. And I can see, well, if I have 41 bits, I can be anywhere between no overlap and 40 bits overlap.
But I can easily compare them, and wherever I have a one bit in common, it’s semantically similar. The second thing is that what if I wanted to store one of these patterns and recognize it later? We’re gonna, certainly gonna wanna do that in the brain.
We wanna say, “Oh, I’ve seen this before. What is it?” How would I go about that?
In a computer, we might save all 2,000 bits, but here, we’re not gonna do that. We’re gonna say, look, let’s just say the locations of the 40 bits that are one, and in a computer language, we would say we’d store the indices of the ones. So I’d have a, a list of 40 indices saying, okay, what locations are the one bits?
And now, if a new c– pattern comes in, I’ll say, “Hey, do I have ones in those same locations?” If I find those ones in the same location, I know I have the whole pattern. I don’t have to look at all the bits.
I only have to look at the ones that were one before and know where they are. That is gonna work very well. But what if I came along to you
and I said, ‘Okay, we have a problem.’ We can’t store the location of all 40 bits, all 41 bits. We can only store 10 randomly.
You can pick 10 randomly, but that’s it. And so, we now have 10 indices. We call this subsampling.
And you say, ‘Well, will that work?’ If I see those 10, I have those 10 indices, a new pattern comes in. I say, ‘Is this the one I saw before?
Is this the one I saved?’ And I’ll go down and I say, ‘Look, the 10 ones are in the right locations.’
I say, that’s good. You might say, well, that’s not good. I could be wrong.
Maybe I got 10 of them right, but the other 30 are wrong.’ Well, that could happen. It’s very, very unlikely to happen.
But even if it did happen, what does it mean? It means I am now made a mistake, but the mistake is for something semantically similar to the thing I stored originally. Even though I didn’t get, save all the bits correctly and maybe a few of them are wrong, basically I have a very semantic similarity to the thing I saw before, and that’s good enough.
And we’re gonna take advantage of this ’cause these are actually gonna be connections in the brain, and we don’t have to connect to everything. We only have to connect to a small number to be pretty certain we got the right pattern. Now, the final property here is one of union.
Imagine I took 10 sparse distributed representations, 10 2,000-bit patterns and I ORed them together, so wherever there’s a bit, I’ve added up with a bit. So I have 10 of these guys, each has 40 bits on or 2% activity, and I create a new one, which is 2,000 bits, which has about 20% of the bits on. I just ORed them together.
This is a one-way operation. I can’t undo it. I can’t say, ‘Oh, given this union, tell me what the individual bits were, individual pattern.’
I can’t do that. But I can do something very interesting, almost as good. I can say, ‘Here’s a new sparse distributed representation.
Is it one of the original 10?’ So given the union and a new one, is it one of the original 10? And the answer is, I can do that.
I can do it very reliably. All I have to do is say, ‘Are the ones in the new bit in the ones in the union?’ And if they are, then that’s good.
You might say, ‘Well, that could be an error.’ If you’re following this, you might say, ‘Well, look, I could get maybe matching up one bit from one of those 10 and another bit from another of those 10,’ and so on. This is astronomically unlikely to happen.
A little bit of math shows you that. And so this is a good thing. Why do we wanna care about this?
I said earlier the brain makes predictions. What the brain does is actually it’s predicting lots of things all the time. It’s not predicting one thing, it’s predicting many things.
And the way it makes the predictions, we’re gonna see, is it makes a sort of a union of predictions, and we wanna know if something unexpected happens, we have to say, ‘Hey, what actually happened was one of the predictions or was it not one of the predictions?’ So I can’t always tell you what my predictions are, but I can tell you if this thing that happened wasn’t correct. All right, so those are our properties.
If you’re gonna forget everything else I talk about today and tomorrow, and you wanna know, remember one thing from my talk, we can remember this. And I’ll state this right, very clearly, I’m certain of this, that all intelligent machines, biological or otherwise, are gonna be based on sparse representations. These properties are critical, and they solve some very fundamental problems that have been bothering people for a long time.
So, sparse distributed representations is the language of the brain, and we wanna do that. Okay, so now we’re gonna do the next step here, we’re gonna talk about sequence memory. I already argued why it’s important, but I’m gonna tell you how we learn sequences of these sparse distributed representations.
So let’s just jump in right into it here. We’re gonna do a little bit more neuroscience. Here’s a picture of our brain again, a very light picture of our brain, we’re gonna zoom in on a little section in the neocortex.
Hopefully, you can see that better. There’s a picture of a little slice of the neocortex. There you can see these layers going left to right, those are layers of cells, and you can also, if you were close to this, you could see there’s sort of a vertical orientation of columns in there as well.
We’re gonna zoom in on one of those layers and look at a little character drawing that I’ve made here. Those little circles represent cells in the brain, I’m just making it so, for illustrative purposes. When you look at a little section of a layer of cells in the neocortex, you see two principles.
One principle is that represented by the green arrow, is that there’s a columnar structure. The cells that are vertically aligned in a very skinny column tend to have the same feedforward response properties. They tend to behave the same way.
Very few connections actually vertically, but there’s a vertical thing, so if I give an input to the brain, those cells all have a similar feedforward response property. The orange thing, the orange arrow basically represents where most of the connections occur. 95% of the connections are to cells in columns nearby.
So we have a very vertical orientation in terms of performance, all these cells in the column are doing the same thing roughly, and then we have mostly connections are horizontal. If I now zoom in on one of those cells, this is that little picture of the neuron I showed you earlier, so we don’t have to go through that again, but now I’m gonna zoom in on one of the dendrites. And this is a picture of a real dendrite on a neuron, and if you can, I’ll try to walk you through it.
It’s a little section, it’s only probably in that picture maybe 40 microns wide or something like that, and you can see there’s a branch going on in there, and you can see the actual synapses, the actual spines where the synapses connect. Those are the little appendages going up and down. They’re really packed in about one micron apart.
Now, these spines are all over, these synapses are all over the dendritic tree. We’ve learned something in recent years which is very important, that these dendritic regions that I could take a little section of a dendrite, it acts in a very unusual way. When you have a bunch of synapses active at the same, if I have one synapse active, it has almost no effect on the cell body, but if I have 10 or 20 of them active at the same time in the same location, it has a very large effect on the cell body.
It’s as if they’re like a coincidence detector. It’s like I have a whole bunch of patterns coming in at once, it says, “Oh, that’s a whole group at once. That’s good.”
If I only have some of them or a few of them, doesn’t do that. And so, this is something that’s very, very important property, and we’re gonna need to exploit that in our model here. Just to give a little bit more flavor for that since this is the neuroscience talk, so there’s a little bit depth and I hope I don’t lose too many of you here.
If I look at that neuron carefully, we can actually make a distinguish between two types of dendrites. There’s the ones that are very close to the cell body, these are called proximal dendrites, if I have an input there, has a linear effect on the cell, meaning it like if I have one spike come in and the cell gets depolarized a little bit, if I have two, it’s twice as much, and three and so on and so on. so it’s a linear summation, and this is where we find most of the feedforward connections to the cell occurring.
Then there are the distal dendrites. These are the ones that are further away from the cell body, and they work like I just said a little moment ago. There are dozens of these regions.
They’re a nonlinear summation. They act sort of like coincidence detectors, and this is where we find, primarily find connections to other cells nearby. And these are important properties for us.
We are gonna model this with a model neuron and this is a fairly sophisticated model neuron. This is a picture of the neurons we use at Numenta, and it basically models exactly what I just said there on the left. You have a cell body, which is that little square.
We have the green dots are representing the proximal synapses, and then the blue guys arranged in those little dendritic segments are representing the distal synapses, and we will align them as a series of dendritic segments, not in the branch like we see in the brain, but in an array like that. And it has the same properties. So we’re gonna build now, we’re gonna build networks of these guys and we’re gonna learn how to form sparse distributed representations and learn sequences of them.
So let’s come back to this picture. We are now gonna take that model neuron I just told you about and we’re gonna arrange them in a layer of cells, and in this picture here in the center bottom, we’re showing a lot of little rectangles, a lot of little cubes, and each one of those is one of our model neurons, if you will, and I’m showing them in an array like we’d find in, in the brain, and columns that are only four cells high in this particular case.
And the colors of those cells are their activation states, which I’ll talk about in a moment. Okay, so we wanna understand is how does this structure using those neurons form sparse representations and how does it learn sequences and how does it make predictions and how does it detect anomalies and so on, which I think is what basically is going on in neural structure. Okay, so let’s now zoom in on a layer of cells.
You can think of this like the layer of bits in our sparse distributed representation, but this is a two-dimensional layer of cells here. Each one of those cells is what I’m showing down here, we have some feedforward input coming from a sensor from another region in the brain going into this area, each of–and that feedforward input is going on the proximal dendrites proximal synapses if you will, what we’re showing in the colorization here, just one of our simulations, colorization here is representing the level of input activity that it’s getting. It’s not the firing rate of the cell, it’s just the depolarization of the cell, how much input it’s getting.
Some are getting more and some are getting less. They’re sampling from the input space. And the ones that get the most fire first and they inhibit the other guys.
So, did it stop working here? There we go. And so this little yellow thing is basically an inhibitory field, and basically the guys who get the most activity are the ones inhibit everybody else and what we have now is we have a sparse distributed representation of of our input space.
I won’t go through how that is learned and how it exactly works ’cause I wanna get to the sequence memory, but this is how we ended up with this pattern. So here we have our array of cells. This is just like our zeros and ones.
The white cells are zeros and the red cells are one. Those are the active ones. I’m not showing you 2,000 columns, you know, about a quarter of that just ’cause, to make it look visible, but you can see how I have a sparse activation here.
Now over time, the patterns and inputs change. Unfortunately this is not working very well here. This is time one and okay, so here’s time two.
So imagine now as I’m speaking, as you’re doing things, these patterns are changing constantly in these cells. There are some cells becoming active and a moment later other cells are becoming active as the patterns in the world change, and this is a sequence and we wanna learn this sequence. We wanna learn like, well, at this point, how do I predict what’s gonna happen next?
And we do that on a cell-by-cell basis, and here’s how. When a cell becomes active, it says, ‘Who was active recently nearby?’ ’cause if they were active recently nearby and I see them again, I’m gonna predict my own activity.
So a cell will come along and say, “Look, I just became active. Let me look around and sample from the people who are nearby who were active just a moment ago.” And we’re gonna form connections at, and we’re gonna do that on our distal dendrites down here, so this is like a coincidence detector.
This is like our index of 10 patterns. We are gonna basically say, “I’m gonna try to recognize the previous state by forming connections to it, and if I see that, I will become in a predictive state. I’ll predict my own activity.
So if I see this, I’ll predict my own activity.” Every cell is doing this all the time, and so when the new pattern comes in, we have all these yellow states, these cells in a predictive state. They’re all saying, ‘Hey, I might become active next.’
Now in this case, there’s more yellow cells of predictive states than there are red ones, and that’s because imagine I had some patterns. I had A followed by B or I had A followed by C Or I had A followed by D, and if I show you A, it’s gonna predict B, C, and D.
It’s a union of those things, so that’s what you’re basically seeing happening going on here, and this, is the basics of a, of a transition memory or beginning of a sequence memory because I’m learning how one state goes to another state. As I’ve shown it here, it has a major flaw. It’s what we call a first-order memory, meaning it only can go back one step in time, and that means, I can’t use information that occurred a long time ago.
All I can use is what happened earlier. Well, let’s say my patterns were XAB or XAC, excuse me, YAC or ZAD. If I could go back two steps and say, oh, it’s X followed by A, then I can predict it’s B.
If it was Y followed by A, I could predict it’s C, and that would be a second-order memory. We want a system that can go arbitrarily long sequences, learn very long melodies, if you will, and this system isn’t gonna do that. So the way we’re gonna solve that problem is we’re gonna add cells in a column.
Essentially, we’re gonna use the columns and multiple cells to column to solve this problem and makes, let me show you how it’s done. Variable order sequence memory. Let’s start with our zeros and ones, our representation of a sparse distributed representation, and I’m now gonna assign instead of each of these bits being a one, I’m gonna make it a column of cells.
So I now have 10 cells above it, so each column is now one of my ones or zeros, and I’m going to pick one, those are supposed to be little circles above the ones and zeros. I don’t know if it looks like that to you. But anyway, I have 10 cells
And I’m just gonna pick for every one bit, every column that’s active, I’m gonna pick one cell and say, “That’s the one that’s on. I’m gonna pick that for another one bit and another one bit, and I made an arbitrary assignment like this. I can now do the same thing again in a different.
Here’s the same representation, the same ones and zeros, but I’m gonna pick a different set of cells in that column. Now think about this, I’ve now the same input But I’m representing it in two different ways.
It’s the same time I can say they’re the same input ’cause they’re the same columns, but on the other hand, I could say they’re different because I used different cells. And even though some of the cells might be the same, nine out of ten will be different, and that’s good enough. Think of it another way, if I have 40 active columns of 40 bits that are one
And I have 10 cells per column, there are 10 to the 40th ways to represent the same input. I have 10 to the 40th ways of representing an A, or 10 to the 40th ways of representing a pattern or a note and a melody, if you will. And I’ll say a sentence here.
I’m gonna use the sound two in multiple ways. I’m gonna say there are too many two twos to count. Right?
Well, I use the same sound coming into your head over and over again, the two sound, but in context, your brain has to represent them differently ’cause it hears them as differently, it perceives them as differently, and there has to be a way of doing that. And I propose the columnar structure like this as a perfect way of doing this. And it represents gives us a huge capacity.
So now we can go back And I’m not gonna walk you through all the details of this, but if I do the same thing in a common array layer of cells, and a cell now makes connections to other cells in this larger assemblage here, we basically are gonna build a high capacity sequence memory. Every cell is gonna be predicting its next state, and when they do that, we can follow these and make predictions and go through time.
I’m gonna give you a little bit more flavor on it, a little bit deeper so you can, if it gets hard, and you can just, I’ll come back to you in a second here with something easier. Part of this theory is that when a column is predicted or not predicted, it behaves differently. So if it wasn’t predicted and a new input comes along and that column is selected, it fires all the cells.
This is like, “I don’t know which cell is supposed to be active right now.” It’s like, “I’m gonna burst them all, they’ll all become active.” And this is a way, like, oh, what you can say to say, “Ooh, unexpected thing happened.”
Or allows you to pick up a melody in the middle of a melody, something like that. If you had good predictions and you knew exactly what you’re tracking And you knew exactly where you’re going and everything was working just like you expected, then you’d only have activations of one cell per column.
We see this kind of thing when unexpected things happen in the brain, you see a lot more activity going on. And so, this picture here, I’m gonna show you, there’s three columns I’m gonna highlight here. Two of them, the two on the left have one cell predicted, that’s the yellow state, and the one on the right, which is the heart of the C, has no cells predicted.
And if all those columns become active in the next state, then you’ll see the one on the right, all the cells in the column become active, and the one on the left, only one cell become active. So we can detect anomalies on a column by column basis. It’s sort of an attribute by attribute basis.
Okay. So you put all this together in ways that you can read about if you want, and you end up with a very powerful sequence memory. It’s a variable order sequence memory.
It can learn very complex, long temporal patterns. It makes multiple simultaneous predictions. It can detect when what actually happens is correct or not correct, and it can tell you why and why not.
It is a high-capacity memory. I’m not going to prove that to you here or demonstrate it today, but trust me that you can, a simple memory of just 2,000 columns can learn millions of transitions. It is a very high-capacity system.
And it’s very, kind of difficult to max it out. It’s distributive. Think about this.
There’s not a single point of failure in this system. I can eliminate columns, I can eliminate cells, I can eliminate dendrites, I can eliminate multiple ones of those things, and it keeps on working. It degrades gracefully all the way down because of all the properties involved here.
And finally, it does semantic generalization. What do I mean by that? Let’s say I have a series of patterns coming in, and remember, these bits have semantic meaning.
They’re learned, but they have semantic meaning, and I learn some sequence of these patterns. And now I give you a new pattern, and it’s the same sequence, but actually it’s different elements, they’re slightly different. Different numbers of, some of the bits are different, some of the columns are different.
It’ll be able to say, “You know what? These are different patterns, but I recognize the sequence. I recognize there are different spatial patterns coming in, different notes, if you will, but I recognize this.”
It’s the same basic melody. “And I will make predictions based on previous knowledge on a different environment.” And so, it generalizes in a semantic way, which is a very desirable attribute.
Okay. If I were, now, I’m gonna, the third attribute I was gonna talk about as part of this theory is requirements for online learning. You know, think about what does online learning mean?
It means that when data’s coming into the brain, you don’t get to store it. You know, when we think about computers, we bring data into a database and we look at it. But here, the brain doesn’t do that.
The brain, it comes in and has to learn right away. It has no chance to store the sensory data coming from your senses. It’s like, “I need to infer, I need to make predictions, and I need to learn all in one fell sweep.”
So online learning is the concept of that you’re continuously learning, that’s the machine learning term for that. So it’s pretty simple actually. We have to, essentially, you have to train on all the time.
All new inputs have to be trained. It might be noise, it might be something you’re never gonna see again, but you still have to train because it might be something you’re gonna see again if it’s a novel thing. And so essentially, if a pattern does not repeat, then we’re gonna forget it, and if a pattern does repeat, then we wanna strengthen it and remember it.
We model this in a way similar to our interpretation of what’s going on in the biology. Let me just walk you through. Here’s our model again of a layer of cells, there’s a model of our neuron, and then over on the right is our picture of the dendritic and the spines on it.
The one thing we’ve learned over the years, it used to be believed that learning only occurred in the strengthening and weakening of synapses. In fact, many people still they say that they wrote. But we’ve learned that synapses can form very rapidly.
That if I can have an axon and a dendrite that are near each other and there’s no synapse, but if they both fire at the same time, that a new synapse can grow very rapidly in a matter of minutes or a minute or so, I mean, you can watch movies of this stuff on YouTube, it’s pretty amazing. So these synapses can grow and ungrow. And so instead of just increasing the weight or decreasing the weight, we can grow new synapses.
This is a much more powerful concept, and so we’ve adapted that in our models. Here’s how we do it. We consider, there’s two things about a synapse, a connection between two nerve cells.
There’s the growth of it and then whether it’s connected. Whether it’s connected, the weight if you will, we make it binary. I’m not saying that it’s binary in the brain, but it’s good enough for what we need to do.
So it’s either connected or it’s not connected. But however, it can grow on a scale. Meaning I can be from no growth to a little bit of growth, and at some point once it gets above a certain threshold, we say it’s connected.
And if I continue increasing this, what we call the permanence, it just gets stronger and stronger. It’s harder and harder for it to forget. So by training over and over again, it’s not like making the synapse stronger as much as it’s making it harder to forget this thing.
And there’s a lot of, this, we didn’t make this up. There’s a lot of evidence for this. Someone, I read a paper once where someone suggested this idea, and we’ve adopted it.
Okay, so that’s basically the system we’ve come up with. When we simulate this stuff in our work, we typically as I said, we had like 2,000 columns. We typically use about 30 cells per column.
Each cell has about 128 dendritic segments. Each dendritic segment can have up to 40 synapses on them. These are numbers that are right in the range of realism on real neurons.
And what you end up with then is you’ve got, you know, 2,000 columns in one of our simulations, 60,000 neurons, 5,000 synapses per neuron, and about 300 million synapses total. This would be equivalent to a very small section of the neocortex, but it turns out is a very robust and powerful tool for discovering patterns in complex data streams. And we build hundreds of these models every day at my, in my office because we’re in a production system.
So they really work. Okay, so now I’m gonna basically sort of go– go back to where we started from and say, well, we’re trying to find a broad framework of ideas in which to interpret the neuroscience empirical data. And the question is, how good are we doing on this?
And, you know, we have a long way to go, but I’m also pretty happy with how far we’ve gone so far. So here’s some of the things that I didn’t know 30 years ago. Maybe someone else knew them, but I didn’t know them.
And I don’t know why my computer stopped doing that. Again. Okay, let’s try one more time here.
Okay, we now know that the neocortex builds a predictive model of the world. It’s a memory system, it’s not a computer, and that system must be trained. In my book, I called this the memory prediction framework.
So for those who follow the work our work, I use that term to describe this. It’s basically saying, hey, we have a memory system and it makes predictions. The second thing is we know that there’s a hierarchy of regions and that in those hierarchy of regions, I speculate that it’s sequence memory and– and with that sequence memory, we can do inference or pattern recognition, prediction, anomaly detection, and motor generation.
There’s a lot of things we don’t know about this yet, but I’m arguing that’s the basic principle of what’s going on in each layer in each region of the hierarchy. We call that hierarchical temporal memory because it’s basically describing the hierarchy of temporal memory regions. And then finally, this is the work I described today, which is really recent, happened in the last three years, where I’m arguing that each layer of cells in a region is a sequence memory.
So you’ve got maybe five layers of cells in a typical region of cortex. Each one is learning some type of sequence for various reasons, feed forward, feedback, motor control, and so on. It’s based on sparse distributed representations.
We now have a much better idea of what sparse distributed representations are about. It explains why there’s columns and why, how those columns are representing the feed forward data. There’s a lot of physical, biophysical excuse me, it’s like a physic– or not like a physics physiological data which matches with this.
It explains how cells in the column can represent the feed-forward data in different contexts. It gives a, for the first time that I’m aware of, we have a model of why cells have so many, have non-linear dendrites and how there’s so many synapses on them, and why they’re distributed the way they are, and why they grow the way they are. I’m pretty confident these ideas are basically correct.
But if nothing else, they work, and they’re really cool, and they do a lot of things we’ve been trying to do, and they solve a lot of constraints. We call that the cortical learning algorithm because it’s basically saying, “Hey, we think this is the basic learning mechanism that’s going on between groups of cells and distributed memory formations,” and we’re pretty excited about that. Okay, we have a long way to go, but this actually in my mind, it may seem like not much for 30 years, but this is actually pretty good.
I’m really happy with this. And I’m gonna tell you about tomorrow, how we actually build this stuff and how we are applying it to problems, and some of the, where. How do we turn this into like really intelligent machines.
But there’s a long way to go there, but this is enough to get started, and that’s what we’re trying to do. I want to end here with just a couple pictures. This is a picture of the Redwood Neuroscience Institute on the last day before it became the Redwood Center for Theoretical Neuroscience.
There’s Bruno and myself and there’s Tony Bell and a few other people, and there’s people at Numenta now. Obviously, all this work is team effort. Nothing is done individually, and these are the people I’ve worked with over the years.
So that’s it, thank you very much.
(applause)
(uninteliigible)
–For those who wanna ask questions, for those who wanna leave, go ahead. For those who wanna ask questions, we have a microphone down front And I will we need to do that for recording purposes, and I’ll probably repeat the questions anyway.
So, do we have anyone who wants to ask a question? This gentleman in the front row, but you gotta stand at the mic.
(uninteliigible)
Well, thanks a lot. I thought it was a terrific talk. But I’ll ask the obvious question.
Where do the patterns get their representational capacity? That is, I like the example of the different senses of the word two. The infinitive, the numeral, and of course, there must be something corresponding in the brain to that, and what the picture will show is there are a set of patterns of neurons.
[00:54:59] INTERVIEWER:
Now, what fact about the patterns makes them represent different meanings of the word two?
[00:55:06] JEFF HAWKINS:
All right, so I, the question, I’m gonna repeat that in case everyone didn’t hear it or at least I get to interpret it the way I want, so I get to answer the question that I wanna answer. Which is how does it, how do we learn? How do those representations of those bits, how do they come about?
Where do those different representations come from the semantic meaning of those bits? What makes them represent? What makes them represent?
Well, first of all, you have to realize starting right at the census, they have semantic meaning leaving right coming off the retina or coming off the cochlea, they have semantic meaning. It may not be the kind of meaning you think of semantic meaning, but I can say, oh, this ganglion cell in the cochlear represents a range of frequencies in the sound pattern. And this one coming off the retina means I have an edge of a, I have a certain sort of on-off pattern in this particular point of my visual space.
These are sort of small semantic meanings, but they’re there from the beginning. And what happens in the hierarchy, and we understand this mostly, but not a hundred percent. What happens in the hierarchy is you take multiple, you take a whole bunch converging sparse representations, and you run it through that first activity I told you there where you project to the proximal synapses.
What it does is it forms new representations that are combinations of the old ones, and it forms the representations that are most common. So when we did this in the visual system, and I didn’t show this, we would end up, I would take like bit patterns from a retina, and it would end up forming the kind of patterns you see in V1, which are line orientation segments or different types of patterns you’d see in the world. And we tested this in various ways.
And so now you have a little bit more sophisticated representation of a bit. Then we learn sequences of those. So what we now, when we learn sequences of those representations, now we have cells that are essentially saying, okay, this is this spatial feature in a sequence.
And it’s a lot of like the sort of complex cells in V1 where you say, okay, it’s anything in this movement, in this direction. And you do this over and over again in the hierarchy. And as you go up the hierarchy, you find more and more complex patterns.
A lot of people have been working on this idea for many years in vision. But I think what they’ve really missed in a strong way is how to use time and sequence memory in addition to this combining of bits. You have this forming new spatial patterns that are the common patterns form sequences of those and news runs of those.
[00:57:27] INSTRUCTOR:
And honestly, I can’t say I understand it all, but I, in the end, that’s how it has to come out of that. And we’ve been able to exhibit up to a certain level, so perhaps not the fully answer you’d like, but it’s the best I can give today. Yes. It’s that next gentleman.
[00:57:43] AUDIENCE MEMBER:
Have you gained any insight into the purpose of sleep, and for instance, why prolonged absence of sleep distorts the mechanism of the brain and sensory processing?
[00:57:53] INSTRUCTOR:
Yeah. The quick answer to that is no. I assume everyone heard that question.
Did anyone not hear? Can everyone hear those questions in the back? If you can’t raise your hand, so everyone heard that great.
You know, sleep is a, is another one of those areas which is huge amount of literature on it is known very clearly that sleep is necessary to live sleep deprived people eventually die. It’s also known to consolidate memories. There’s a whole interplay that I didn’t talk about at all here between the hippocampus and the cortex.
And the hippocampus is the location of episodic memories. Things you learn very quickly, and if you lose your hippocampus, you won’t form any new memories of the certain types. And it’s believed that sleep has a way of taking those, some of the things that are stored up in like in the hippocampus, and retraining them into the cortex.
I, we don’t model that. We don’t try to. Um, I don’t, it until I find a need for it, I’m not gonna do that.
Now we’re not trying to emulate humans. We actually, in this system, I showed you here, we have no concept of episodic memory. We think we can add that if we wanted to, but there’s no need to at the moment.
And we’re able to learn these things without remembering exact details of stuff. So the short answer is no. We’re not modeling.
We have some ideas how we would, and we didn’t gain any insights why humans need to do that.
[00:59:16] AUDIENCE MEMBER:
Hi. You’ve pointed out correctly the importance of temporal sequencing in these operations. In all artificial computers, there’s some sort of a clock mechanism operating at so many megahertz per sec.
Is there some sort of a clock mechanism in the brain, or is it operating in a maybe like an asynchronous or a serial fashion where the bits that the timing is intrinsic to the bits themselves?
[00:59:44] INSTRUCTOR:
Yes. A very good question. And the answer depending on who you are in the audience, you might interpret that question differently. So let me try answering it two different ways. The computers have a clock. Everything happens on a clock.
[00:59:57] JEFF HAWKINS:
For this model, I knew that the brain doesn’t have a clock. And we developed this model initially with the idea I can’t have a clock. If it requires a clock, it’s not a real, it’s not a model.
And I don’t ha– I could show you separately, but I don’t have that material in my, in this presentation where we had to get this all work asynchronously. And, and we essentially did or at least, I, I theoretically understand how it would, and then, then now we implement this today, and the way we’re implementing it, we’re implementing it on a computer, and the way we’re, the kind of data we’re feeding to it, actually I can feed it step by step, and therefore I made a lot of simplifications in my, in my in my implementation of this. But the theory itself does not require that.
Our implementation today has it. There’s another side of this question I wanna just address. The brain has various rhythms, and those rhythms aren’t like a clock in the sense of a computer clock.
They’re different. And just today, I was talking to some neuroscientists here at Berkeley about how to interpret those, and there’s different opinions. I’m under the belief that the rhythms in the brain, these sort of oscillations you see, are plumbing issues.
They’re necessary for a biological brain to work. They’re necessary to make sure that synapses are becoming active relatively at the same time because that’s how dendrites require them to become active relatively, within a few milliseconds of each other. And if I let ’em just become active any time they want, they wouldn’t, it wouldn’t work.
So it’s more of a plumbing mechanistic thing that brains require, but it’s not an information thing. It’s not really- this is my opinion. There’ll be people who disagree with this.
But our models do not require that because we don’t model the small-level biophysics interactions in a dendrite. So no, no clock required for this algorithm, and rhythms are something brains need, and our actual implementation right now, we actually do have a clock, but it’s not part of the theory.
[01:01:54] AUDIENCE MEMBER:
You talked about how this framework is relatively insensitive to the failure of a single column. I was wondering if this gives any insight into amnesia or brain damage and if you’ve explored any of that stuff, like big patches failing.
[01:02:06] JEFF HAWKINS:
Yeah. It’s a good question. I won’t talk about amnesia, but let’s talk about brain damage.
If you were to lose part of a sensory stream, like part of your retina or something like that, the part of the brain that’s representing that is no longer getting an input, and over a period of months, it’ll learn to represent something else. You know, if you were born at birth without eyeballs or without retinas, the primary visual cortex becomes something else. If you damage part of a small region of the neocortex the cells nearby all adjust to represent the thing that was damaged.
This is why, you know, you can do a fair amount of recovering over a period of about four months from damage. One of the nice things about this algorithm, it actually models that fairly well. If I had, in that area where I was talking about how the columns compete with one another everybody’s hungry.
All the cells are trying to, all the columns and all the cells are trying to fire some percentage of the time. And if they don’t fire some percentage of the time, in this model they basically say, ‘You know, I don’t give a damn, I’m gonna fire anyway.’ And they basically sorta lower their threshold, and what they do is that they start firing, and then they inhibit somebody else.
And so essentially, if I was deprived of input because my, some input was lost, and I’m sitting here going, “I’m a neuron. I’m not doing anything. I’m not doing anything.”
And I said, “Well, okay, I’m gonna go for this right here. I’m gonna fire, and I’m gonna shut you down.” And what, everybody readjusts.
And we did a little bit of modeling of this, but the theory, it explains this beautifully. We can change the input in various ways. The patterns can change.
The sensory organ can change. The architecture can change, and everybody sort of slowly moves and readjusts and forms new representations the best they can. And it’s a really beautiful part of the theory.
But it’s not what we’re focusing on. This is not, we’re not trying to understand trauma and how to solve trauma. We’re trying to understand how healthy brains work.
Yes.
[01:03:59] AUDIENCE MEMBER:
So your work is so influenced by neuroanatomy that seems like you’d be the perfect person to ask, is, do you need a connectome? Or if you had one, would you even know what to do with it?
[01:04:08] JEFF HAWKINS:
All right. The connectome is the idea that’s become popular recently and there’s the book written by, oh, what’s his name? Just came out.
Uh, Sung, was it? Yeah. Sebastian Seung.
Thank you. And it’s the idea that if you mapped out all the connections of the brain, that would be the connectome. Well, no, I don’t think you need that.
I think if I wanted to recreate your brain exactly in some piece of silicon or something, so you were a identical clone over here, I would need your connectome. But if I wanna understand the principles by which your brain work, I don’t need the connectome. You know, the brain develops without a connectome map.
It has a broad map saying, ‘Okay, these things go here, these things go here.’ But the actual final connections are learned. And so, I think it’s a cool thing.
I think the techniques they’re using for building the connectome, all these different techniques for imaging where the connections are, they’re all very cool. I’m not saying it’s not gonna be helpful, I’m just saying, this is not the key to unraveling how brains work. Just in the same way that if I knew every activation of every cell, is that gonna tell me how the brain works?
No. I need some theoretical basis for understanding why they’re doing that. And the connectome doesn’t give you that theoretical basis on its own. It just, it’s a cool tool, but not, it might be useful for a diagnostic tool as well.
Hi, there.
[01:05:34] AUDIENCE MEMBER:
Hi. It’s a really cool theory, and I was just thinking, did you also try to feed it back to the brain again to test some of it? Two things I have, I mean, it’s difficult to measure, but for instance, one thing that struck me was, first you showed this picture where the different layers are very different, and in your model, the different cells within one layer are almost like just copies of each other–
[01:05:58] JEFF HAWKINS:
Yes.
[01:05:59] AUDIENCE MEMBER:
So, yeah. Could you respond?
[01:06:01] JEFF HAWKINS:
Well, okay. So, I’m not sure I have the question. Let me make sure. The question is why in the picture of the real brain the layers are different, in here, in my model, they’re not different?
[01:06:09] AUDIENCE MEMBER:
Yeah. In general, it’s more like how can you feed it back? And one thing where I see a difference between the brain in your model is, for instance, in these layers, but also more in general. Did anyone sort of, did some testing where you see that I mean the model of the different–
[01:06:27] JEFF HAWKINS:
Well testing, not testing in the way I think you’re talking about it, but can I just address the first part of your question?
[01:06:32] AUDIENCE MEMBER:
Yes.
[01:06:33] JEFF HAWKINS:
‘Cause I’m comfortable addressing that one. Which is, why are there differences between these, you know, we have a very simple model here and there’s different layers. No one really knows why we have different layers in neocortex.
But the general properties I told you about with the predominantly horizontal connections and the vertical column rotations, that exists in all the layers, in all regions of the cortex. Now what makes regions different is the density of cells, certain cell types where they connect, how far they connect, and so on. What we basically believe, if you look at a region of neocortex, there’s some patterns have to be passed up the hierarchy, and those come from one layer.
There’s some connections that have to be passed down the hierarchy, that comes from another layer like layer six. Layer five is one that is a motor output layer, and it projects to subcortical regions. So they are projecting different ways, places, and they have different attributes.
So layer five cells have to project further, so they’re bigger cells physically. They also, if I’m gonna do motor behavior, I need to have very specific timing. You know, the exact timing is important, but if I’m doing inference it may or may not be as important.
So, there’s a bunch of other complexity I didn’t talk about here, but where are these cells connecting to? Some of them are coming through the thalamus, some of them are not, and so on, that kind of leads you to a theory of why the layers would be different. And so I can’t, again, that’s a set of constraints we need to understand that.
We have some ideas about it, but I’m trying to get at least a core principle about how any neurons in any cell, regardless of the extent of their arborization and so on, what might they be doing? And I’m arguing they’re all doing some flavor of sequence memory. And to test these theories in, you know, in animals and other things is a very, very complex thing.
I don’t do that. There isn’t a long history in neuroscience of collaboration between theorists and experimentalists as you see in physics. The experimentalists say, ‘I got my own theories.
Go away. I’ll do my own stuff.’ It’s a little rough characterization, but there’s no animosity there.
I’m just saying that, you know, there’s a lot of predictable, there’s a lot of predictions that come out of this theory that can be tested and if someone wants to, I’d glad to talk to them about it. I know I give long answers to questions, but sorry.
[01:08:49] AUDIENCE MEMBER:
My question’s actually about Numenta in the future. The way that I view it, which may be incorrect, is that in the future you’re gonna make it sort of a platform for developers to do what they want with it, such as like Java or just basic web development. So when that comes about, what do you see the individual companies’ role being?
Is it gonna be deciding how the SDR is set up? Is it gonna be testing? Or is it just gonna be you guys kinda do everything and make the solution for them?
[01:09:18] JEFF HAWKINS:
No, okay. So this is a very pointed question from a computer science guy. And I’ll take most of this offline, but it’s a complex problem here.
Like I have a company that needs to be successful. We have to build a product to make it successful. We have lots of people around the world who are interested in these algorithms.
We publish these. I’ve given talks. There’s lots of interest in this.
So if people wanna implement these things, and the question is, how do we progress the science? How do we progress the technology? And it’s not really clear yet exactly how you do that.
Today, we’re focused on getting our product out the door. We’re not spending a lot of time with individual people who want, if someone wants to implement these algorithms, they can read our white paper and do it, great, but we can’t help them. In the future, that might change, and we don’t really know yet how that’s gonna go about.
One of my personal goals is to encourage more people to adopt these ideas, more people to adopt these approaches. So you can be certain that I’m going to try to make these things more available and build in collaborations, and so on. But each, you know, even just a year ago, we weren’t certain all this worked that well.
So now we’re pretty certain it works really well. And now we can start asking the questions like how do we open this up to more people? And I don’t know the answer to that question.
Our VP of engineering is down here someplace. You can talk to him about it. He’s right over there in the orange shirt.
All right. I’ll take a, just a few more questions, then I’ll, and then we’ll dis, disperse down below.
[01:10:40] AUDIENCE MEMBER:
Hi, I was wondering if you had a good measure to tell how intelligent one of these algorithms are, and if you did, like how you could make it more intelligent, or if you knew how to make it more intelligent. And then does that–
[01:10:55] JEFF HAWKINS:
Are you gonna be here tomorrow?
[01:10:57] AUDIENCE MEMBER:
No, I can’t make it tomorrow.
[01:10:59] JEFF HAWKINS:
Oh, you want tomorrow’s talk?
[01:11:01] AUDIENCE MEMBER:
No.
[01:11:01] JEFF HAWKINS:
Well, I can’t give you that now. But look, what I showed you is a long way from what most people would think is intelligence. And one of the things I’m gonna argue tomorrow is that intelligence is not some milestone.
It’s not passing some test. It’s not some, it’s a system that builds on a certain set of principles, and that principles can be scaled up to very large systems or very small systems. And I showed you today a teeny system, a teeny, teeny system.
No hierarchy, very small amount material. But it’s not, there’s not some plateau you have to reach to say, “You’re now intelligent.” It’s more like a continuum of capabilities, but it’s the way you go about it which is important.
Uh, so I’m making no, you know, I don’t even want to def, I, to me, intelligence is defined by a way of a learning system that builds models of the world in a certain way, and from that generates behavior and predictions. And if you build a system that does that, and it could be really small, it’s a very small intelligent system. And you know, it could be a rat or a mouse, and then you build a big one, it could be a human or a superhuman.
All right. I’ll take one more question. I’ll be, will stick around and I’ll answer other people’s questions too, but–
[01:12:03] AUDIENCE MEMBER:
So, the majority of your work was on the neocortex and in particular, like the structure of cells and how they function,
(unintelligible)
. There’s kind of a stereotypical representation of the brain in hemispheres with the left being analytic and the right being creative. Is there any research in specifically the neural work you’ve done that either confirms, refutes, or maybe gives insight into this interpretation of–
[01:12:26] INSTRUCTOR:
Yeah. No. Our work doesn’t address that specifically or any other way.
This is an area of debate about the left brain, the right brain, how different are they? Are there clearly differences, and you know, why is language usually located on one side of the brain, the left side of the brain, and not on the right side of the brain? So on.
Uh, and, you know, there’s theories about this, and the theories have, the best theories I’ve read, the ones I like, it’s because I like them ’cause they’re co– copacetic with my worldview, is that there are, for example, on, on the left side of the brain, there’s more myelination, which is the, this is a, this fatty sheath that’s on neurons. It makes them faster. It makes them sent information can be sent faster on one side of the brain than the other side of the brain, and it makes it more amenable to language because language is right at the cusp of the speed at which neurons work, and so if you don’t have that myelination, then it’s not fast enough to do it.
That kind of thing. But I don’t think the basic principles are different in the left side and right side. It’s not like, oh, there’s process
A over here and process B over here. It’s flavors. It’s flavors on variations on a theme, and at the moment, we’re just dealing with the theme, is the basic theme and not the details.
So that’s the best I can answer that one. All right. Thank you again.
I’m gonna be around here for a few minutes for those who wanna talk to me.
(applause)