A new paper suggests diminishing returns from larger and larger generative AI models. Dr Mike Pound discusses.
The Paper (No “Zero-Shot” Without Exponential Data): https://arxiv.org/abs/2404.04125
I think it’s incredibly naïve to think that because we’ve hit a boundary on one particular aspect of LLMs that the technology has peaked as a whole. There are lots of ways to improve LLMs that aren’t just increasing the parameter size, for example there’s been an uptick in smaller models that are optimized to run on client devices without large GPUs. There is probably a future where we have small 3-7B models that are competitive with today’s best 70B models, but can run in real time on any smartphone. We’ll have larger context windows, allowing LLMs to work on larger problems. And we’ll have better techniques for getting high quality information out of LLMs, there are already adversarial methods where two LLMs hold a debate on a subject that have proven more accurate and comprehensive data is possible. They’ll also continue to be embedded into different places in software that make them more useful, not just like a chatbot that lives in its own world.
There are lots of ways to improve LLMs that aren’t just increasing the parameter size
The paper isn’t about parameter size but the need for exponentially more training data to get a mere linear increase in output performance.
Improvements are made all the time. You can’t feed a very large SVM the same data as transformer networks and expect it to perform the same. Transformers are used because they can more easily learn complicated patterns with less data.
I think I’ve read somewhere that neural networks with only one hidden layer can theoretically predict anything (if the hidden layer is large enough), but an incredible amount of data is required for it to do so, so it’s not practical.
Over time other models will be discovered that can make better use of the training data.
What you mentioned is assumed video and paper in question.
The main argument being that no matter our computational techniques, the diminishing returns in predictive precision is reached far sooner than we achieve general intelligence.
No the argument is current techniques give logarithmic returns in data size, which is bad. But it said nothing about other potential techniques or made any suggestion that this was a general result.
Well obviously they cannot rule out techniques no one has though of but likewise they obviously accounted for what they deemed to be within the realm of possibility
no matter our computational techniques, the diminishing returns in predictive precision is reached far sooner than we achieve general intelligence
That’s very bold presumption. How can they be so sure of this, that any future models can’t tackle the issue? have they got proof or something.
No, they just calculate with increased size of the training roster… it’s not that complicated. Which is a fair presumption as that is how we’ve increased the predictive precision so far.
It seems far more bold to presume that general intelligence will be created any time soon when current machine learning is nowhere close.
My personal take is that the current generation of generative models peaked, for the reasons stated in the video (diminishing returns). This current gen will be useful, but progress-wise it’ll be a dead end.
In the future however I believe that models with a different architecture will cause a breakthrough, being able to perform better with less training. And probably less energy requirements, too.
Probably.
I’ve already thought that in terms of major progression AI has peaked as early as in 2022 when chatgpt and various diffusers were all hyped up. It was kinda obvious, since our silicon tech is already basically maxed out. There are lots of potential optimizations, but they are minor advancements compared to the raw compute power growth we’ve had till the near past. And in order to make the next revolution in the AI field, those moneybags will have to spend the colossal amount of money to basically reinvent either computers themselves or the ML architechture.
I don’t think that reinventing computers will do any good. The issue that I see is not hardware, but software - the current generative models are basically brute force, you throw enough data and processing power at the problem until it becomes smaller, but at the end of the day you’re still relying too much on statistical patterns behind the wrong entities.
Instead I think that the ML architecture will change. And this won’t be done by those tech bros full of money burning effigies, who have a nasty/stupid/disgraceful tendency to confuse symbolic representations with the things being represented. Instead it’ll be done by researchers in some random compsci or robotics lab, in a random place of the world. They’ll be doing some weird stuff like emulating the brain of a fruit fly, and someone will point out “hey, you see this feature? It has ML applications”. And that’ll be when they actually add some intelligence to those systems, i.e. the missing piece of the puzzle. It won’t be AGI but it’ll be better than now, at least.
under the “reinventing computers” i mean chosing another information transfering entity for our processing units. For instance, photonics is a perspective field, as photons are much smaller, thus potentially we could make even smaller logical elements also as they produce much less heat.
What’s about ML architechture, of course it won’t be the tech bros, of course it would be scientists, but don’t forget that untill someone sponsors them, the research could take literal decades before there will be discovered anything revolutional. Scientists are not some kind of gurus who live in moutains and fed by the energy of the sun. In order to make a living they have jobs besides scientific research. That’s why grants and other research funding methods do exist. And as you could’ve guessed, these are greatly dependant on guys with money and their interest in said researchi.
Not even another info transferring entity would solve it. Be it quantum computers, photonic computers, at the end of the day we’d be simply brute forcing the problem harder, due to increased processing power. But we need something else than brute force due to the diminishing returns.
Just to give you an idea. A human needs around 2400kcal/day to survive, or 100kcal/h = 116W. Only 20% of that is taken by the brain, so ~23W. (I bet that most of that is used for motor control, not reasoning.) We clearly suck as computing machines, and yet our output is considerably better than the junk yielded by LLMs and diffusion models, even if you use a really nice computer and let the model take its time producing its [babble | six fingers “art”]. Those models are clearly doing lots of unnecessary operations, while failing hard at what they’re expected to do.
Regarding research, my point is that what’s going to fix generative models is likely from outside the field of artificial intelligence. It’ll be likely something small and barely related, that happens to have some ML application.
there’s a lot to optimize in LLMs and i never said otherwise. Though, photonic computers if the field would be researched, could consume as much as an LED lamp making it even more effective than our brain. given the total amount of computers in the world, even the slightest power consumption optimization would save colossal amount of energy, and in case of photonics the raw numbers could possibly be unimagineable.
Regarding research…
I bet they simply will find a way to greatly simplify the mathematical apparatus of the neuron interaction. Matrix multiplication is kinda slow and there’s lots of it
Sam Altman gives a pretty good indication that your point is correct when he began asking for $7 trillion for new AI chip development.
gotem!
seriously tho, you don’t think OpenAI is tracking this? architecural improvements and training strategies are developing all the time
…and aren’t making progress on that front: A linear increase in generalisation still requires a more than linear increase in amount of data.
Also it’s not btw that we wouldn’t know that our current architectures won’t lead to proper intelligence, tl:dr: While current architectures can learn, and represent information, they cannot develop learning strategies or decide smartly on how to represent a particular bit of information. All the improvement that are happening are on that “how to learn better” area, we have no idea whatsoever how to make the jump on how to teach an AI to learn how to learn. AlphaZero is able to learn rules of a game, yes, but it can’t learn arbitrary information – once you throw something other than a game at it it has no idea how to make sense of anything.
“we don’t know how” != “it’s not possible”
i think OpenAI more than anyone knows the challenges with scaling data and training. anyone working on AI knows the line: “a baby can learn to recognize elephants from a single instance”. reducing training data and time is fundamental to advancement. don’t get me wrong, it’s great to put numbers to these things. i just don’t think this paper is super groundbreaking or profound. a bit clickbaity and sensational for Computerphile
…and a baby doesn’t use the same architecture, not even close, as generative AIs. Babies are T3 systems, they aren’t simply systems which have rules on how to learn, they are systems which have rules on how to develop learning strategies that they then use to learn.
I’m not doubting, in the slightest, that AI can’t get there: It’s definitely possible. It’s just not possible with the current approaches, and the iterative refinements that “oh OpenAI is constantly coming up with new topologies” refers to is just more of the same. Show me a topology that can come up with topologies, then we’ll have a chance to break through the need for exponential amounts of data.
I am just waiting until it makes the leap to 3D . With that, you will start seeing 3d assets in videogames become cheaper and quicker to make, VR rigging will soon follow, and when the tech reaches its peak - automated design for 3d printing.
3d assets in videogames become cheaper and quicker to make
You mean there’s going to be a brand-new way for people to completely fail optimising game assets.
6M vertex spheres here we come!
I think Autodesk is working on one.
Here is an alternative Piped link(s):
https://piped.video/watch?v=dDUC-LqVrPU
Piped is a privacy-respecting open-source alternative frontend to YouTube.
I’m open-source; check me out at GitHub.
deleted by creator
deleted by creator
deleted by creator
deleted by creator
deleted by creator
On the other hand, if we move from larger and larger models with as much data they can gather to less generic and more specific high quality datasets, I have a feeling there’s still a lot to gain. But quality over quantity takes a lot more effort to maintain.
The video is more about the diminishing returns when it comes to increasing size of training set. It’s following a logarithmic curve. At some point, just “adding more data” won’t do much because the cost will be too high compared to the gain in accuracy.